CN116430992A - Implicit calibration method and device for gaze tracking - Google Patents
Implicit calibration method and device for gaze tracking Download PDFInfo
- Publication number
- CN116430992A CN116430992A CN202310263502.2A CN202310263502A CN116430992A CN 116430992 A CN116430992 A CN 116430992A CN 202310263502 A CN202310263502 A CN 202310263502A CN 116430992 A CN116430992 A CN 116430992A
- Authority
- CN
- China
- Prior art keywords
- calibrated
- calibration
- saliency
- salient
- result
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/01—Input arrangements or combined input and output arrangements for interaction between user and computer
- G06F3/011—Arrangements for interaction with the human body, e.g. for user immersion in virtual reality
- G06F3/013—Eye tracking input arrangements
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/70—Determining position or orientation of objects or cameras
- G06T7/73—Determining position or orientation of objects or cameras using feature-based methods
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Human Computer Interaction (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Eye Examination Apparatus (AREA)
Abstract
Description
技术领域Technical Field
本发明涉及信息技术领域,尤其涉及一种针对视线追踪的隐式校准方法和装置。The present invention relates to the field of information technology, and in particular to an implicit calibration method and device for line of sight tracking.
背景技术Background Art
随着计算机科学技术与产业的飞速发展,人机交互方式取得了长足的发展。注视跟踪是一种人机交互方式,由于响应速度快、操作方便的特点,被广泛应用于各个领域,在一系列移动应用中扮演着越来越重要的角色。注视跟踪背后的基本概念是捕捉用户的眼球运动,并将其映射到注视平面上的点。由于估计的注视位置会偏离真实位置,校准过程即被设计用来补偿估计的注视位置与真实位置的偏移。因而,视线校准是注视跟踪的重要组成部分,它将人眼坐标转换为屏幕坐标。With the rapid development of computer science and technology and industry, human-computer interaction has made great progress. Gaze tracking is a human-computer interaction method. Due to its fast response speed and convenient operation, it is widely used in various fields and plays an increasingly important role in a series of mobile applications. The basic concept behind gaze tracking is to capture the user's eye movement and map it to points on the gaze plane. Since the estimated gaze position deviates from the true position, the calibration process is designed to compensate for the deviation between the estimated gaze position and the true position. Therefore, gaze calibration is an important part of gaze tracking, which converts the human eye coordinates into screen coordinates.
现有的视线追踪的校准多为显式校准。在典型的校准过程中,用户被要求将目光固定在屏幕上的某些刺激上,他们的眼睛的运动被相机捕捉。在这个过程中,刺激作为注视位置的基本事实,根据估计的注视位置和真实位置之间的偏移捕获变换向量。假设眼睛和屏幕之间的相对位置在短时间内保持不变,可以直接将变换向量应用于估计的注视位置进行注视校正。然而此类方法会损害用户体验,特别是在移动场景中,会导致由于更新变换向量而频繁触发重新校准过程。Most existing gaze tracking calibrations are explicit. In a typical calibration process, users are asked to fix their gaze on certain stimuli on the screen, and their eye movements are captured by a camera. In this process, the stimulus serves as the basic truth of the gaze position, and a transformation vector is captured based on the offset between the estimated gaze position and the true position. Assuming that the relative position between the eye and the screen remains unchanged over a short period of time, the transformation vector can be directly applied to the estimated gaze position for gaze correction. However, such methods can impair the user experience, especially in mobile scenarios, resulting in frequent triggering of the recalibration process due to updating the transformation vector.
综上,现有的针对视线追踪的校准方法需要用户配合校准,过程复杂,用户体验性差。In summary, the existing calibration methods for gaze tracking require user cooperation in the calibration, which is a complicated process and has a poor user experience.
发明内容Summary of the invention
本发明提供一种针对视线追踪的隐式校准方法和装置,用以解决现有技术中用户配合校准,过程复杂,用户体验性差的缺陷,实现不需要用户配合的、更为简单的、用户体验更好的视线追踪的校准。The present invention provides an implicit calibration method and device for eye tracking, so as to solve the defects of user cooperation calibration, complicated process and poor user experience in the prior art, and realize eye tracking calibration which does not require user cooperation, is simpler and has better user experience.
本发明提供一种针对视线追踪的隐式校准方法,包括:The present invention provides an implicit calibration method for gaze tracking, comprising:
获取校准窗口内的N个待校准视线跟踪结果和M个设备当前显示内容帧;其中,N小于M;Obtain N to-be-calibrated gaze tracking results and M device current display content frames within the calibration window; where N is less than M;
将所述设备当前显示内容帧的像素调整到预设分辨率,以得到待检测内容帧;Adjusting the pixels of the content frame currently displayed by the device to a preset resolution to obtain a content frame to be detected;
将所述待检测内容帧输入至视觉显著检测模块,以得到视觉显著检测结果;所述视觉显著检测模块用于检测所述待检测内容帧中每个像素的显著性;Inputting the content frame to be detected into a visual saliency detection module to obtain a visual saliency detection result; the visual saliency detection module is used to detect the saliency of each pixel in the content frame to be detected;
将所述视觉显著检测结果输入至视觉显著度量模块,以得到待校准内容帧;所述视觉显著度量模块用于度量所述视觉显著检测结果中每个像素的显著的有效性;Inputting the visual saliency detection result into a visual saliency measurement module to obtain a content frame to be calibrated; the visual saliency measurement module is used to measure the saliency effectiveness of each pixel in the visual saliency detection result;
将所述待校准内容帧输入至显著信息提取模块,以得到待输入显著信息;所述显著信息提取模块用于从所述待校准内容帧中提取显著信息;Inputting the content frame to be calibrated into a salient information extraction module to obtain salient information to be input; the salient information extraction module is used to extract salient information from the content frame to be calibrated;
将所述待输入显著信息和所述待校准视线跟踪结果输入至视线追踪校准模块,以得到视线跟踪校准结果;所述视线追踪校准模块用于对所述待校准视线跟踪结果进行校准。The to-be-input salient information and the to-be-calibrated gaze tracking result are input into a gaze tracking calibration module to obtain a gaze tracking calibration result; the gaze tracking calibration module is used to calibrate the to-be-calibrated gaze tracking result.
根据本发明提供的一种针对视线追踪的隐式校准方法,将所述待输入显著信息和所述待校准视线跟踪结果输入至视线追踪校准模块,以得到视线跟踪校准结果,之前还包括:According to an implicit calibration method for gaze tracking provided by the present invention, the salient information to be input and the gaze tracking result to be calibrated are input into a gaze tracking calibration module to obtain a gaze tracking calibration result, which also includes:
对所述待校准视线跟踪结果进行第一去噪和第二去噪;Performing a first denoising and a second denoising on the sight tracking result to be calibrated;
所述第一去噪具体包括:The first denoising specifically includes:
利用第一预设公式对所述待校准视线跟踪结果进行第一去噪;Performing a first denoising operation on the sight tracking result to be calibrated using a first preset formula;
所述第一预设公式包括:The first preset formula includes:
其中,z表示第一去噪分数值,x表示原始值,μ表示整数值的平均值,σ表示标准差;Wherein, z represents the first denoising score value, x represents the original value, μ represents the mean value of the integer value, and σ represents the standard deviation;
所述第二去噪具体包括:The second denoising specifically includes:
根据所述待校准视线跟踪结果获得注视位置;Obtaining a gaze position according to the sight tracking result to be calibrated;
计算所述待输入显著信息对应的所述注视位置的平均值,以得到平均注视位置;Calculating an average value of the gaze positions corresponding to the salient information to be input to obtain an average gaze position;
将所述待输入显著信息和所述平均注视位置聚类,以得到聚类结果;Clustering the salient information to be input and the average gaze position to obtain a clustering result;
根据所述聚类结果确定最频繁显著区域和粗略注视区域;Determining the most frequently salient region and the roughly fixated region according to the clustering result;
计算所述最频繁显著区域和所述粗略注视区域质心之间的偏移量,根据所述偏移量进行第二去噪。An offset between the most frequently salient region and the centroid of the roughly fixated region is calculated, and a second denoising is performed according to the offset.
根据本发明提供的一种针对视线追踪的隐式校准方法,所述视觉显著检测模块用于检测所述待检测内容帧中每个像素的显著性,具体包括:According to an implicit calibration method for gaze tracking provided by the present invention, the visual saliency detection module is used to detect the saliency of each pixel in the content frame to be detected, specifically comprising:
将所述待检测内容帧输入至视觉显著检测模块,生成视觉显著热图;Inputting the content frame to be detected into a visual saliency detection module to generate a visual saliency heat map;
将所述视觉显著热图归一化至预设范围,得到视觉显著检测结果。The visual saliency heat map is normalized to a preset range to obtain a visual saliency detection result.
根据本发明提供的一种针对视线追踪的隐式校准方法,所述视觉显著度量模块用于度量所述视觉显著检测结果中每个像素的显著的有效性,具体包括:According to an implicit calibration method for gaze tracking provided by the present invention, the visual saliency measurement module is used to measure the saliency effectiveness of each pixel in the visual saliency detection result, specifically comprising:
根据第一预设阈值对所述视觉显著检测结果进行二值化,以得到显著像素;Binarizing the visual saliency detection result according to a first preset threshold value to obtain salient pixels;
通过连通分量分析计算得到帧中显著像素所在显著区域/对象的数量;The number of salient regions/objects where salient pixels are located in the frame is calculated by connected component analysis;
根据第二预设公式计算得到所述视觉显著检测结果对应的所述设备当前显示内容帧的显著性集中度;Calculating, according to a second preset formula, the saliency concentration of the content frame currently displayed by the device corresponding to the visual saliency detection result;
滤除显著性集中度低于第二预设阈值的视觉显著检测结果,得到待校准内容帧;Filtering out visual saliency detection results whose saliency concentration is lower than a second preset threshold, to obtain a content frame to be calibrated;
所述第二预设公式包括:The second preset formula includes:
其中,S表示显著性集中度;n表示显著区域/对象的数量;AS表示显著区域/对象像素,AT表示整个框架的区域像素。Where S represents the saliency concentration; n represents the number of salient regions/objects; AS represents the salient region/object pixels, and AT represents the region pixels of the entire frame.
根据本发明提供的一种针对视线追踪的隐式校准方法,所述显著信息提取模块用于从所述待校准内容帧中提取显著信息,具体包括:According to an implicit calibration method for eye tracking provided by the present invention, the salient information extraction module is used to extract salient information from the content frame to be calibrated, specifically comprising:
提取所述待校准内容帧在每个连通分量域中的具有最高显著值的像素所在显著区域/对象的坐标和编号;Extracting the coordinates and serial numbers of the salient regions/objects where the pixels having the highest saliency value in each connected component domain of the content frame to be calibrated are located;
根据第三预设公式,将所述待校准内容帧上的显著性集中度、所有所述显著区域/对象的坐标和编号共同压缩为特征向量Vi,得到待输入显著信息;According to a third preset formula, the saliency concentration on the content frame to be calibrated, the coordinates and numbers of all the salient regions/objects are compressed into a feature vector V i to obtain salient information to be input;
所述第三预设公式包括:The third preset formula includes:
其中,Vi表示特征向量,ni表示显著区域/对象的编号,SCSi表示待校准内容帧上的显著性集中度,表示显著区域/对象的横坐标,表示显著区域/对象的纵坐标。Wherein, V i represents the feature vector, n i represents the number of the salient region/object, SCS i represents the saliency concentration on the content frame to be calibrated, The horizontal coordinate representing the salient region/object, The vertical coordinate representing the salient region/object.
根据本发明提供的一种针对视线追踪的隐式校准方法,所述视线追踪校准模块用于对所述待校准视线跟踪结果进行校准,具体包括:According to an implicit calibration method for gaze tracking provided by the present invention, the gaze tracking calibration module is used to calibrate the gaze tracking result to be calibrated, specifically comprising:
将所述待输入显著信息和所述待校准视线跟踪结果输入至视线追踪校准模块,将所述偏移量包含的变换向量补偿到所述待校准视线跟踪结果上进行隐式校准,得到视线跟踪校准结果。The salient information to be input and the gaze tracking result to be calibrated are input into a gaze tracking calibration module, and the transformation vector included in the offset is compensated to the gaze tracking result to be calibrated for implicit calibration to obtain a gaze tracking calibration result.
根据本发明提供的一种针对视线追踪的隐式校准方法,将所述待输入显著信息和所述待校准视线跟踪结果输入至视线追踪校准模块,将所述偏移量包含的变换向量补偿到所述待校准视线跟踪结果上进行隐式校准,得到视线跟踪校准结果,之前还包括:According to an implicit calibration method for gaze tracking provided by the present invention, the salient information to be input and the gaze tracking result to be calibrated are input into a gaze tracking calibration module, and the transformation vector included in the offset is compensated to the gaze tracking result to be calibrated for implicit calibration to obtain the gaze tracking calibration result, which also includes:
获取用户与设备的相对位置;Get the relative position of the user and the device;
若所述相对位置的变化小于或等于第三预设阈值,变换向量不变;If the change in the relative position is less than or equal to a third preset threshold, the transformation vector remains unchanged;
若所述相对位置的变化大于第三预设阈值或场景切换时,更新所述变换向量。If the change in the relative position is greater than a third preset threshold or the scene is switched, the transformation vector is updated.
本发明还提供一种针对视线追踪的隐式校准装置,包括:The present invention also provides an implicit calibration device for eye tracking, comprising:
获取单元,用于获取校准窗口内的N个待校准视线跟踪结果和M个设备当前显示内容帧;其中,N小于M;An acquisition unit, used to acquire N to-be-calibrated sight tracking results and M device current display content frames in the calibration window; wherein N is less than M;
调整单元,用于将所述设备当前显示内容帧的像素调整到预设分辨率,以得到待检测内容帧;An adjustment unit, configured to adjust the pixels of the content frame currently displayed by the device to a preset resolution to obtain a content frame to be detected;
视觉显著检测单元,用于将所述待检测内容帧输入至视觉显著检测模块,以得到视觉显著检测结果;所述视觉显著检测模块用于检测所述待检测内容帧中每个像素的显著性;A visual saliency detection unit, configured to input the content frame to be detected into a visual saliency detection module to obtain a visual saliency detection result; the visual saliency detection module is configured to detect the saliency of each pixel in the content frame to be detected;
视觉显著度量单元,用于将所述视觉显著检测结果输入至视觉显著度量模块,以得到待校准内容帧;所述视觉显著度量模块用于度量所述视觉显著检测结果中每个像素的显著的有效性;A visual saliency measurement unit, configured to input the visual saliency detection result into a visual saliency measurement module to obtain a content frame to be calibrated; the visual saliency measurement module is configured to measure the saliency effectiveness of each pixel in the visual saliency detection result;
显著信息提取单元,用于将所述待校准内容帧输入至显著信息提取模块,以得到待输入显著信息;所述显著信息提取模块用于从所述待校准内容帧中提取显著信息;A salient information extraction unit, configured to input the content frame to be calibrated into a salient information extraction module to obtain salient information to be input; the salient information extraction module is configured to extract salient information from the content frame to be calibrated;
视线追踪校准单元,用于将所述待输入显著信息和所述待校准视线跟踪结果输入至视线追踪校准模块,以得到视线跟踪校准结果;所述视线追踪校准模块用于对所述待校准视线跟踪结果进行校准。The gaze tracking calibration unit is used to input the salient information to be input and the gaze tracking result to be calibrated into the gaze tracking calibration module to obtain the gaze tracking calibration result; the gaze tracking calibration module is used to calibrate the gaze tracking result to be calibrated.
本发明还提供一种电子设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,所述处理器执行所述程序时实现如上述任一种所述针对视线追踪的隐式校准方法。The present invention also provides an electronic device, comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein when the processor executes the program, an implicit calibration method for eye tracking as described above is implemented.
本发明还提供一种非暂态计算机可读存储介质,其上存储有计算机程序,该计算机程序被处理器执行时实现如上述任一种所述针对视线追踪的隐式校准方法。The present invention also provides a non-transitory computer-readable storage medium having a computer program stored thereon, which, when executed by a processor, implements any of the above-described implicit calibration methods for line of sight tracking.
本发明提供的一种针对视线追踪的隐式校准方法和装置,通过获取校准窗口内的N个待校准视线跟踪结果和M个设备当前显示内容帧;其中,N小于M;将所述设备当前显示内容帧的像素调整到预设分辨率,以得到待检测内容帧;将所述待检测内容帧输入至视觉显著检测模块,以得到视觉显著检测结果;所述视觉显著检测模块用于检测所述待检测内容帧中每个像素的显著性;将所述视觉显著检测结果输入至视觉显著度量模块,以得到待校准内容帧;所述视觉显著度量模块用于度量所述视觉显著检测结果中每个像素的显著的有效性;将所述待校准内容帧输入至显著信息提取模块,以得到待输入显著信息;所述显著信息提取模块用于从所述待校准内容帧中提取显著信息;将所述待输入显著信息和所述待校准视线跟踪结果输入至视线追踪校准模块,以得到视线跟踪校准结果;所述视线追踪校准模块用于对所述待校准视线跟踪结果进行校准。本发明利用显著性信息识别视频帧,并仅使用这些“有用的”帧执行校准,实现不需要用户配合的、更为简单的、用户体验更好的视线追踪的校准。The present invention provides an implicit calibration method and device for eye tracking, which obtains N eye tracking results to be calibrated and M content frames currently displayed by the device within a calibration window; wherein N is less than M; adjusts the pixels of the content frame currently displayed by the device to a preset resolution to obtain a content frame to be detected; inputs the content frame to be detected into a visual saliency detection module to obtain a visual saliency detection result; the visual saliency detection module is used to detect the saliency of each pixel in the content frame to be detected; inputs the visual saliency detection result into a visual saliency measurement module to obtain the content frame to be calibrated; the visual saliency measurement module is used to measure the saliency effectiveness of each pixel in the visual saliency detection result; inputs the content frame to be calibrated into a saliency information extraction module to obtain saliency information to be input; the saliency information extraction module is used to extract saliency information from the content frame to be calibrated; inputs the saliency information to be input and the eye tracking result to be calibrated into an eye tracking calibration module to obtain an eye tracking calibration result; and the eye tracking calibration module is used to calibrate the eye tracking result to be calibrated. The present invention utilizes saliency information to identify video frames, and only uses these "useful" frames to perform calibration, thereby achieving a simpler eye tracking calibration that does not require user cooperation and provides a better user experience.
附图说明BRIEF DESCRIPTION OF THE DRAWINGS
为了更清楚地说明本发明或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作一简单地介绍,显而易见地,下面描述中的附图是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to more clearly illustrate the technical solutions in the present invention or the prior art, the following briefly introduces the drawings required for use in the embodiments or the description of the prior art. Obviously, the drawings described below are some embodiments of the present invention. For ordinary technicians in this field, other drawings can be obtained based on these drawings without paying creative work.
图1是本发明提供的针对视线追踪的隐式校准方法一个实施例的用户眼睛和屏幕的光学反射模型示意图;FIG1 is a schematic diagram of an optical reflection model of a user's eyes and a screen according to an embodiment of an implicit calibration method for eye tracking provided by the present invention;
图2是本发明提供的针对视线追踪的隐式校准方法的针对视线追踪的隐式校准方法的流程示意图之一;FIG2 is a schematic diagram of a flow chart of an implicit calibration method for gaze tracking provided by the present invention;
图3是本发明提供的针对视线追踪的隐式校准方法的针对视线追踪的隐式校准方法的流程示意图之二;FIG3 is a second flow chart of the implicit calibration method for gaze tracking provided by the present invention;
图4是本发明提供的针对视线追踪的隐式校准方法一个实施例的画面和相对应的注视点与视觉显著性热图;FIG4 is a picture of an embodiment of an implicit calibration method for gaze tracking provided by the present invention and a corresponding gaze point and visual saliency heat map;
图5是本发明提供的针对视线追踪的隐式校准方法一个实施例的用户注意力迁移图示;FIG5 is a diagram of user attention migration in an embodiment of an implicit calibration method for eye tracking provided by the present invention;
图6是本发明提供的针对视线追踪的隐式校准装置的结构示意图;FIG6 is a schematic diagram of the structure of an implicit calibration device for eye tracking provided by the present invention;
图7是本发明提供的电子设备的结构示意图。FIG. 7 is a schematic diagram of the structure of an electronic device provided by the present invention.
附图标记:Reference numerals:
610:获取单元;620:调整单元;630:视觉显著检测单元;640:视觉显著度量单元;650:显著信息提取单元;660:视线追踪校准单元;610: acquisition unit; 620: adjustment unit; 630: visual saliency detection unit; 640: visual saliency measurement unit; 650: salient information extraction unit; 660: gaze tracking calibration unit;
710:处理器;720:通信接口;730:存储器;740:通信总线。710: processor; 720: communication interface; 730: memory; 740: communication bus.
具体实施方式DETAILED DESCRIPTION
为使本发明的目的、技术方案和优点更加清楚,下面将结合本发明中的附图,对本发明中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。In order to make the purpose, technical solution and advantages of the present invention clearer, the technical solution of the present invention will be clearly and completely described below in conjunction with the drawings of the present invention. Obviously, the described embodiments are part of the embodiments of the present invention, not all of the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by ordinary technicians in this field without creative work are within the scope of protection of the present invention.
注视跟踪背后的基本概念是捕捉用户的眼球运动,并将其映射到注视平面(即屏幕)上的点,如图1所示。在这个过程中需要两个重要的信息:The basic concept behind gaze tracking is to capture the user’s eye movements and map them to points on the gaze plane (i.e., the screen), as shown in Figure 1. Two important pieces of information are required in this process:
第一个是眼睛的3D模型,根据它可以估计视觉轴,即注视方向。然后,通过视觉轴和注视平面的交点确定注视点。其中,眼睛的3D模型可以通过RGB-D相机拍摄,RGB-D相机广泛应用于当今的智能手机,如iPhoneX、华为Mate20、OPPOFindX等,一个推测的投影仪将一束结构红外光投射到用户的脸上,特别是眼睛区域,RGB-D摄像头捕捉到眼睛的反射。反射的结构光包含了深度信息,在此基础上可以构建眼睛的三维运动模型。使用红外光的优点是人眼不易察觉,并且不受环境光的影响。此外,它可以保护用户的隐私。The first is a 3D model of the eye, based on which the visual axis, i.e., the gaze direction, can be estimated. Then, the gaze point is determined by the intersection of the visual axis and the gaze plane. Among them, the 3D model of the eye can be captured by an RGB-D camera, which is widely used in today's smartphones, such as iPhoneX, Huawei Mate20, OPPOFindX, etc. A presumed projector projects a beam of structured infrared light onto the user's face, especially the eye area, and the RGB-D camera captures the reflection of the eye. The reflected structured light contains depth information, based on which a 3D motion model of the eye can be constructed. The advantage of using infrared light is that it is not easily perceived by the human eye and is not affected by ambient light. In addition, it can protect the user's privacy.
第二个是用户眼睛和视线平面之间的相对位置。如果没有这个信息,估计的注视位置会偏离真实位置。因此,需要一个视线校准过程来补偿这个偏移。在典型的校准过程中,用户被要求将目光固定在屏幕上的某些刺激上,他们的眼睛的运动被相机捕捉。在这个过程中,刺激作为注视位置的基本事实,根据估计的注视位置和真实位置之间的偏移捕获变换向量。假设眼睛和屏幕之间的相对位置在短时间内保持不变,可以直接将变换向量应用于估计的注视位置进行注视校正。The second is the relative position between the user's eyes and the gaze plane. Without this information, the estimated gaze position will deviate from the true position. Therefore, a gaze calibration process is needed to compensate for this offset. In a typical calibration process, users are asked to fix their gaze on some stimuli on the screen and their eye movements are captured by a camera. In this process, the stimulus serves as the ground truth of the gaze position and a transformation vector is captured based on the offset between the estimated gaze position and the true position. Assuming that the relative position between the eyes and the screen remains unchanged over short periods of time, the transformation vector can be directly applied to the estimated gaze position for gaze correction.
然而,正如背景技术所指出的,此类显式校准过程将损害用户体验,特别是在移动场景中,其中需要频繁触发重新校准过程以更新变换向量。However, as pointed out in the background, such an explicit calibration process will impair the user experience, especially in mobile scenarios, where the recalibration process needs to be triggered frequently to update the transformation vectors.
基于此,本发明提出一种针对视线追踪的隐式校准方法。本发明的设计源于对视觉显著性和用户注视之间的时空依赖关系的理解。Based on this, the present invention proposes an implicit calibration method for gaze tracking. The design of the present invention originates from the understanding of the spatiotemporal dependency between visual saliency and user gaze.
下面结合图1-图5描述本发明的针对视线追踪的隐式校准方法,图2和图3是本发明提供的针对视线追踪的隐式校准方法的流程示意图,如图2所示,本发明提供一种针对视线追踪的隐式校准方法,包括:The implicit calibration method for gaze tracking of the present invention is described below in conjunction with FIGS. 1 to 5 . FIGS. 2 and 3 are flowchart diagrams of the implicit calibration method for gaze tracking provided by the present invention. As shown in FIG. 2 , the present invention provides an implicit calibration method for gaze tracking, including:
步骤110:获取校准窗口内的N个待校准视线跟踪结果和M个设备当前显示内容帧;其中,N小于M。Step 110: Obtain N to-be-calibrated gaze tracking results and M device current display content frames within the calibration window; wherein N is less than M.
如图3所示,本发明利用视觉显著进行校准。校准的输入是来自当前设备播放的视频或AR内容的帧以及相应的待校准视线跟踪结果。在一些实施例中,本发明使用校准窗口来分割设备当前显示内容帧和用于校准的待校准视线跟踪结果。每个校准窗口包括N个帧{F1,F2,...,FN}和M个待校准视线跟踪结果{E1,E2,...,EM}。为了消除眼睛定位误差,在一帧中使用多个待校准视线跟踪结果。也就是说,待校准视线跟踪结果的采样率高于的帧速率(即M>N)。As shown in Figure 3, the present invention uses visual saliency for calibration. The input for calibration is a frame from a video or AR content currently played by the device and the corresponding sight tracking result to be calibrated. In some embodiments, the present invention uses a calibration window to split the device's current display content frame and the sight tracking result to be calibrated for calibration. Each calibration window includes N frames {F 1 , F 2 , ... , F N } and M sight tracking results to be calibrated {E 1 , E 2 , ... , EM }. In order to eliminate eye positioning errors, multiple sight tracking results to be calibrated are used in one frame. That is, the sampling rate of the sight tracking result to be calibrated is higher than the frame rate (ie, M>N).
需要注意的是,待校准视线跟踪结果可以是将用户的眼球运动大致投影到屏幕坐标上得到粗略的注视位置估计。It should be noted that the gaze tracking result to be calibrated can be a rough estimate of the gaze position obtained by roughly projecting the user's eye movement onto the screen coordinates.
步骤120:将所述设备当前显示内容帧的像素调整到预设分辨率,以得到待检测内容帧。Step 120: Adjust the pixels of the content frame currently displayed by the device to a preset resolution to obtain a content frame to be detected.
视觉显著热图可以作为注视位置的概率分布,这为隐式注视校正提供了机会。本发明利用两种显著程度来表示用户的注意力,还根据不同的时序动态选择合适的检测算法。The visual saliency heat map can be used as a probability distribution of gaze positions, which provides an opportunity for implicit gaze correction. The present invention uses two saliency levels to represent the user's attention and also dynamically selects a suitable detection algorithm according to different time sequences.
在此之前,需要将帧的大小调整到预设分辨率,得到待检测内容帧。在一些实施例中,预设分辨率可以是68×68。Prior to this, the frame size needs to be adjusted to a preset resolution to obtain a content frame to be detected. In some embodiments, the preset resolution may be 68×68.
进行这一步骤的原因是:首先,处理高复用率的帧,例如4K(3840×2160)会产生高CPU、GPU和能源开销,这对于资源有限的移动设备来说是负担不起的。其次,由于视频的分辨率不同,不可能提前预测每一个视频的分辨率。因此,将所有帧的大小调整到固定的分辨率是解决此问题的有效和高效的方法。需要强调的是,在这一步骤中,降低帧的分辨率不会影响显著检测精度。这是因为用于显著检测的帧的特征(即,颜色、强度、方向、对象的形状等)不会在较低分辨率下更改。The reasons for performing this step are: first, processing frames with high multiplexing rates, such as 4K (3840×2160), incurs high CPU, GPU, and energy overheads, which are unaffordable for mobile devices with limited resources. Second, since videos have different resolutions, it is impossible to predict the resolution of each video in advance. Therefore, resizing all frames to a fixed resolution is an effective and efficient way to solve this problem. It should be emphasized that in this step, reducing the resolution of the frame does not affect the salient detection accuracy. This is because the features of the frame used for salient detection (i.e., color, intensity, orientation, shape of the object, etc.) do not change at lower resolutions.
步骤130:将所述待检测内容帧输入至视觉显著检测模块,以得到视觉显著检测结果;所述视觉显著检测模块用于检测所述待检测内容帧中每个像素的显著性。Step 130: inputting the content frame to be detected into a visual saliency detection module to obtain a visual saliency detection result; the visual saliency detection module is used to detect the saliency of each pixel in the content frame to be detected.
待检测内容帧输入至视觉显著检测模块后,将被馈送到视觉显著检测组件以生成视觉显著热图,然后,将每个热图归一化到预设范围,以保持一致性,得到视觉显著检测结果。After the content frame to be detected is input into the visual saliency detection module, it will be fed into the visual saliency detection component to generate a visual saliency heat map. Then, each heat map is normalized to a preset range to maintain consistency and obtain the visual saliency detection result.
基于显著性的校准背后的基本观点是,用户在观看视频时,通常会被屏幕上的几个显著区域/对象吸引。这样的显著区域/对象被统称为显著性,它从它的邻域中脱颖而出并能立即引起用户的注意。因此,这些显著区域/对象的位置可以被视为用户注视位置的基本事实,这有助于估计用户的注视。如今,随着计算机视觉的发展,人们提出了许多有效的方法来检测视频或帧中的显著区域/目标。这些方法通常输出一个视觉显著热图,该热图显示了一帧中每个像素的显著性,可视为注视的概率分布。The basic idea behind saliency-based calibration is that users are usually attracted by several salient regions/objects on the screen when watching a video. Such salient regions/objects are collectively referred to as saliency, which stands out from its neighborhood and can immediately attract the user's attention. Therefore, the locations of these salient regions/objects can be regarded as the basic facts of the user's gaze location, which helps to estimate the user's gaze. Nowadays, with the development of computer vision, many effective methods have been proposed to detect salient regions/objects in videos or frames. These methods usually output a visual saliency heat map, which shows the saliency of each pixel in a frame, which can be regarded as the probability distribution of gaze.
在一些实施例中,本发明利用Apple的算法来检测显著性,提取出区分颜色、强度、方向的显著区域。这种显著性称为自下而上的显著性。在一个具体实施例中,从EyeTrackUAV数据集选用一个视频素材,由两名志愿者观看,同时Tobii眼动跟踪器记录他们的注视位置,提取这两个帧对应的注视位置,如图4所示,图4显示了两个视频帧和相应的显著性热图。可以看到,显著热图基本上捕获了帧上的显著区域/对象和用户的视线。In some embodiments, the present invention uses Apple's algorithm to detect saliency and extract salient areas that distinguish color, intensity, and direction. This saliency is called bottom-up saliency. In a specific embodiment, a video material is selected from the EyeTrackUAV dataset and watched by two volunteers. At the same time, the Tobii eye tracker records their gaze positions, and the gaze positions corresponding to the two frames are extracted, as shown in Figure 4, which shows two video frames and corresponding saliency heat maps. It can be seen that the saliency heat map basically captures the salient areas/objects on the frame and the user's line of sight.
在另一些实施例中,利用Apple算法进行自下而上的显著性检测。对于自上而下的显著性,选择U2-Net来检测显著性目标。In some other embodiments, the Apple algorithm is used for bottom-up saliency detection. For top-down saliency, U2-Net is selected to detect salient objects.
在时间上,自下而上的注意力和自上而下的注意力在100毫秒的水平上传递,更具体地说是150毫秒。在后期设计的显著性检测过程中,为了匹配注意力的变化,在最初的150毫秒内利用自下而上的显著性(大约是30FPS视频中5帧的长度),然后转向自上而下的显著性,以便更好地匹配注意机制。对于场景分割的识别,主要依靠关键帧的检测。因为在视频编码过程中,一旦发生场景剪切,它将被编码为关键帧。因此,关键帧覆盖了所有场景剪辑。通过比较关键帧和前一帧,可以检测出是否有场景剪切。在另一些实施例中,使用pHash散列帧并计算检测距离。通过这种方法,可以在时间上选择适当的显著性来表示用户的视觉注意。In terms of time, bottom-up attention and top-down attention are transmitted at the level of 100 milliseconds, more specifically 150 milliseconds. In the saliency detection process designed later, in order to match the change of attention, the bottom-up saliency is used in the initial 150 milliseconds (approximately the length of 5 frames in a 30FPS video), and then turns to the top-down saliency to better match the attention mechanism. For the recognition of scene segmentation, it mainly relies on the detection of key frames. Because in the video encoding process, once a scene cut occurs, it will be encoded as a key frame. Therefore, the key frame covers all scene cuts. By comparing the key frame with the previous frame, it can be detected whether there is a scene cut. In other embodiments, pHash is used to hash the frame and calculate the detection distance. In this way, appropriate saliency can be selected in time to represent the user's visual attention.
步骤140:将所述视觉显著检测结果输入至视觉显著度量模块,以得到待校准内容帧;所述视觉显著度量模块用于度量所述视觉显著检测结果中每个像素的显著的有效性。Step 140: Input the visual saliency detection result into a visual saliency measurement module to obtain a content frame to be calibrated; the visual saliency measurement module is used to measure the saliency effectiveness of each pixel in the visual saliency detection result.
显著热图基本上可以展示出注视的概率分布。然而,这种情况在某些场景中可能会失败,也就是说,无法从显著性中获得用户的注视。例如,当框架包含多个显著区域/对象时,使用显著性进行校准的效果很差,因为根本无法判断用户在看哪个区域。相反,该帧甚至可以不包含显著性,例如全黑帧。此外,如果显著性相对较大(例如特写帧),则无法确定用户注视的子区域。总之,利用空间维度的显著性来推断用户的注视并不总是可能的,一般称之为显著性的空间有效性。The saliency heatmap basically shows the probability distribution of gaze. However, this may fail in some scenarios, that is, the user's gaze cannot be obtained from the saliency. For example, when the frame contains multiple salient regions/objects, calibration using saliency works poorly because it is impossible to tell which region the user is looking at. On the contrary, the frame may not even contain saliency, such as a completely black frame. In addition, if the saliency is relatively large (such as a close-up frame), it is impossible to determine the sub-region where the user is looking. In summary, it is not always possible to infer the user's gaze using the saliency of the spatial dimension, which is generally referred to as the spatial validity of saliency.
另外,忽略显著性的时间有效性也是现有基于显著性的校准方法受到影响的另一个原因。视觉显著性在时间上与用户的注意力有关。人类的视觉注意力有两种途径,自下而上和自上而下。在自下而上的视觉注意中,呈现在大脑中的信息是外部刺激通过视觉通路传递的原始物理特征,包括颜色、强度、方向等。总之,自下而上的视觉注意是由外部环境信息驱动的。至于自上而下的注意,是指大脑的高级关节皮层,包括前额叶皮层(PFC)和后顶叶皮层(PPC),根据当前任务的目标和过去的知识,在视觉通路中执行信息。这是由大脑内部信息驱动的注意力。In addition, ignoring the temporal validity of saliency is another reason why existing saliency-based calibration methods are affected. Visual saliency is temporally related to the user's attention. There are two pathways for human visual attention, bottom-up and top-down. In bottom-up visual attention, the information presented to the brain is the original physical features of external stimuli transmitted through the visual pathway, including color, intensity, direction, etc. In short, bottom-up visual attention is driven by external environmental information. As for top-down attention, it refers to the brain's higher-level articulatory cortex, including the prefrontal cortex (PFC) and the posterior parietal cortex (PPC), which execute information in the visual pathway based on the goals of the current task and past knowledge. This is attention driven by internal information in the brain.
特别是观看视频或帧时,一旦出现新场景,用户首先会被自下而上的注意力所驱使,关注帧中的不同区域,然后会被自上而下的注意力所支配,关注基于过去知识的语义对象。如图4所示。在本发明的一些实施过程中,使用这张图片进行了一次用户调查,以验证上述神经理论。具体来说,要求志愿者观看屏幕上的一幅或另一幅图片,然后用图5替换该图片,以模拟视频中的场景。用户被要求报告他/她的第一眼,即潜意识中观察到的区域;和第二眼,即他们后来在潜意识行动后观察到的区域。共有6名志愿者参与。根据他们的反馈,所有志愿者首先观看最亮的区域(月亮),然后将注意力转移到正确的区域(超级英雄),这与理论是一致的。In particular, when watching a video or frame, once a new scene appears, the user will first be driven by bottom-up attention, focusing on different areas in the frame, and then will be dominated by top-down attention, focusing on semantic objects based on past knowledge. As shown in Figure 4. In some implementations of the present invention, a user survey was conducted using this picture to verify the above neural theory. Specifically, volunteers were asked to watch one or another picture on the screen, and then replace the picture with Figure 5 to simulate the scene in the video. The user was asked to report his/her first glance, that is, the area observed subconsciously; and the second glance, that is, the area they later observed after the subconscious action. A total of 6 volunteers participated. According to their feedback, all volunteers first looked at the brightest area (the moon) and then shifted their attention to the correct area (the superhero), which is consistent with the theory.
为了解决上述空间维度上的问题,本发明设计了一种度量方法来量化显著热图的集中度,只有高集中度的显著热图才可以作为很好的校准机会。In order to solve the above problems in spatial dimension, the present invention designs a measurement method to quantify the concentration of salient heatmaps. Only salient heatmaps with high concentration can serve as good calibration opportunities.
显著热图的集中度由两个特征决定。一是热图上显著区域/对象的数量,另一个是显著区域/对象的面积。基于此,本发明提出一种显著性度量,称为显著性集中度(SCS),其计算公式表示为第二预设公式,如下:The concentration of a salient heatmap is determined by two features. One is the number of salient regions/objects on the heatmap, and the other is the area of the salient regions/objects. Based on this, the present invention proposes a saliency measure, called saliency concentration (SCS), whose calculation formula is expressed as the second preset formula, as follows:
其中,S表示显著性集中度;n表示显著区域/对象的数量;AS表示显著区域/对象像素,AT表示整个框架的区域像素。SCS的值在0和1之间变化。n和AS与AT的比值越小,SCS值越接近1,反之亦然。Where S represents the saliency concentration; n represents the number of salient regions/objects; AS represents the salient region/object pixels, and AT represents the region pixels of the entire frame. The value of SCS varies between 0 and 1. The smaller the value of n and the ratio of AS to AT , the closer the SCS value is to 1, and vice versa.
为计算显著性集中度,需要从每一帧中提取特征n和AS。为此,首先对每一帧的热图进行二值化,以过滤出显著性值低于第一预设阈值的背景像素,得到显著像素。在一些实施例中,用于二值化的第一预设阈值可以是170。To calculate the saliency concentration, it is necessary to extract features n and AS from each frame. To this end, the heat map of each frame is first binarized to filter out background pixels whose saliency values are lower than a first preset threshold to obtain salient pixels. In some embodiments, the first preset threshold for binarization may be 170.
滤除后剩下的像素记为显著像素,反映了显著的区域/对象。显著像素集合的区域就是显著区域/对象。显然,剩余像素的比率给出了比率显著区域/对象的数量n可以通过在二值化热图上形成连通分量分析来计算。The pixels remaining after filtering are recorded as salient pixels, reflecting the salient region/object. The region of the salient pixel set is the salient region/object. Obviously, the ratio of the remaining pixels gives the ratio The number n of salient regions/objects can be calculated by performing a connected component analysis on the binarized heatmap.
计算得到显著区域/对象的数量n后,结合比率通过第二预设公式计算得到显著性热图对应的显著性集中度。所述显著性集中度也是视觉显著检测结果对应的设备当前显示内容帧的显著性集中度,同时也是待校准内容帧的显著性集中度。After calculating the number of salient regions/objects n, the ratio The saliency concentration corresponding to the saliency heat map is calculated by the second preset formula. The saliency concentration is also the saliency concentration of the content frame currently displayed by the device corresponding to the visual saliency detection result, and is also the saliency concentration of the content frame to be calibrated.
利用显著性集中度即可在空间上选择显著性,使用此度量来选择可用于校准的帧。在进行显著性检测之前,已经进行了一次选择,以确定每一帧的显著类型。经过这种时间选择和显著性检测后,仍然存在一个问题,即并不是所有的帧都能为隐式校准提供良好的机会,故而还需要将SCS值较低的值滤除。具体来说,将SCS值低于第二预设阈值的帧滤除,得到待校准内容帧,即经过滤除的待校准的内容帧都能更好的进行隐式校准。在一些实施例中,第二预设阈值可以为0.6。The saliency concentration can be used to select saliency in space, and this metric is used to select frames that can be used for calibration. Before performing saliency detection, a selection has been made to determine the saliency type of each frame. After this time selection and saliency detection, there is still a problem, that is, not all frames can provide good opportunities for implicit calibration, so it is necessary to filter out values with lower SCS values. Specifically, the frames with SCS values lower than the second preset threshold are filtered out to obtain the content frames to be calibrated, that is, the content frames to be calibrated that are filtered out can better perform implicit calibration. In some embodiments, the second preset threshold can be 0.6.
步骤150:将所述待校准内容帧输入至显著信息提取模块,以得到待输入显著信息;所述显著信息提取模块用于从所述待校准内容帧中提取显著信息。Step 150: inputting the content frame to be calibrated into a salient information extraction module to obtain salient information to be input; the salient information extraction module is used to extract salient information from the content frame to be calibrated.
之后,从剩余的、能更好的进行隐式校准的帧(即待校准的内容帧)中提取显著信息。具体地说,对于每个帧Fi,在其中的每个连通分量域中找到具有最高显著值的像素,其坐标表示相应显著区域/对象的位置。然后,将Fi上所有ni个显著区域/对象的坐标与显著区域/对象编号ni和SCS值SCSi一起压缩为特征向量Vi,如下所示,即第三预设公式:Afterwards, extract salient information from the remaining frames (i.e., the content frames to be calibrated) that can be better implicitly calibrated. Specifically, for each frame F i , find the pixel with the highest salient value in each connected component domain therein, and its coordinates represent the position of the corresponding salient region/object. Then, compress the coordinates of all n i salient regions/objects on F i together with the salient region/object number n i and the SCS value SCS i into a feature vector V i , as shown below, i.e., the third preset formula:
其中,Vi表示特征向量,ni表示显著区域/对象的编号,SCSi表示待校准内容帧上的显著性集中度,表示显著区域/对象的横坐标,表示显著区域/对象的纵坐标。Wherein, V i represents the feature vector, n i represents the number of the salient region/object, SCS i represents the saliency concentration on the content frame to be calibrated, The horizontal coordinate representing the salient region/object, The vertical coordinate representing the salient region/object.
每一帧的特征向量Vi记为待输入显著信息,待输入显著信息被馈送到视线追踪校准模块中的校准组件以进行隐式注视误差补偿。The feature vector V i of each frame is recorded as the salient information to be input, which is fed to the calibration component in the gaze tracking calibration module for implicit gaze error compensation.
步骤160:将所述待输入显著信息和所述待校准视线跟踪结果输入至视线追踪校准模块,以得到视线跟踪校准结果;所述视线追踪校准模块用于对所述待校准视线跟踪结果进行校准。Step 160: inputting the salient information to be input and the gaze tracking result to be calibrated into a gaze tracking calibration module to obtain a gaze tracking calibration result; the gaze tracking calibration module is used to calibrate the gaze tracking result to be calibrated.
利用一个帧中的所有特征向量组成的视觉显著向量({V1,V2,...,VN})可以校正待校准视线跟踪结果中注视位置{G1,G2,...,GM}中的误差。The visual saliency vector ({V 1 , V 2 , ..., V N }) composed of all feature vectors in a frame can be used to correct the error in the gaze position {G 1 , G 2 , ..., G M } in the gaze tracking result to be calibrated.
但在这之前,需要首先对待校准视线跟踪结果进行预处理,以滤除结果中的两个噪声源。But before that, the calibrated gaze tracking results need to be preprocessed to filter out two noise sources in the results.
第一个噪声源被闪烁事件擦除。具体地说,人类的眼睛通常在一分钟内眨眼15-20次。当用户眨眼时,注视位置将迅速改变。类似的现象也可以在扫视事件中观察到。然而,在这两个事件中,注视模式是不同的。对于眨眼事件,注视位置变化很快,但很快就会回到原来的位置。因此,在这种情况下,可以使用z-Score来消除异常值。在这里,z-Score的计算公式可以表示为第一预设公式:The first noise source is erased by blink events. Specifically, human eyes usually blink 15-20 times in one minute. When a user blinks, the gaze position will change rapidly. A similar phenomenon can also be observed in saccade events. However, the gaze patterns are different in these two events. For blink events, the gaze position changes quickly, but quickly returns to the original position. Therefore, in this case, z-Score can be used to eliminate outliers. Here, the calculation formula of z-Score can be expressed as the first preset formula:
其中,z表示第一去噪分数值,x表示原始值,μ表示整数值的平均值,σ表示标准差。Wherein, z represents the first denoising score value, x represents the original value, μ represents the mean of the integer values, and σ represents the standard deviation.
在实际操作过程中,计算校准窗口中待校准视线跟踪结果即每个粗略注视位置的z分数,如果其z分数的绝对值大于z分数阈值,则将其识别为异常值。在一些实施例中,z分数阈值可以设置为3。In actual operation, the z-score of each rough gaze position in the calibration window is calculated, and if the absolute value of its z-score is greater than the z-score threshold, it is identified as an outlier. In some embodiments, the z-score threshold can be set to 3.
除了眨眼事件外,眼睛定位的误差也会给注视跟踪结果带来噪声,称为第二噪声。具体地说,它会在粗略的注视位置引起轻微的抖动。为了滤除这种抖动,对每一帧的多个眼睛位置进行采样。待校准视线跟踪结果包含注视位置,根据所述待校准视线跟踪结果获得注视位置,计算对应于一个特征向量Vi的Ni个粗略注视位置{G1,G2,...,GM}的平均值。平均注视位置表示为由于在一个校准窗口中使用N个帧,因此将获得N个平均注视位置之后,从待输入显著信息获取显著向量,使用N个平均注视位置和N个显著向量{v}来进行注视校准。在一些实施例中,可以将显著向量{v}的数量N设置为10。虽然可以使用单个显著向量来预测真实的注视位置,但是由于单个帧可能包含多个显著区域/对象,因此精度仍然不足以确定注视点的准确位置。此外,用户行为的不确定性也会导致单帧的波动。例如,用户的注意力有时会被背景中不明显的区域吸引。因此,利用N个帧的序列来消除这种误差。具体地说,分别对{v}和进行聚类。然后,对于{v}和的聚类结果,选择样本最多的聚类来分别表示最频繁的显著区域和对应的粗略注视区域。通过计算两个区域的质心之间的偏移量,得到一个称为校准变换向量Vc的矢量。利用该向量可以补偿粗跟踪结果中的误差。In addition to blink events, errors in eye positioning can also introduce noise into the gaze tracking results, which is called second noise. Specifically, it causes slight jitter in the rough gaze position. In order to filter out this jitter, multiple eye positions are sampled for each frame. The gaze tracking result to be calibrated includes the gaze position. The gaze position is obtained according to the gaze tracking result to be calibrated, and the average value of N i rough gaze positions {G 1 , G 2 , ..., G M } corresponding to a feature vector V i is calculated. The average gaze position is expressed as Since N frames are used in one calibration window, N average gaze positions will be obtained After that, the salient vector is obtained from the input salient information, using N average gaze positions And N salient vectors {v} are used to perform gaze calibration. In some embodiments, the number N of salient vectors {v} can be set to 10. Although a single salient vector can be used to predict the true gaze position, the accuracy is still not enough to determine the exact position of the gaze point because a single frame may contain multiple salient regions/objects. In addition, the uncertainty of user behavior can also cause fluctuations in a single frame. For example, the user's attention is sometimes attracted by an obscure area in the background. Therefore, a sequence of N frames is used to eliminate this error. Specifically, {v} and Then, for {v} and The clustering results of the training set are selected to represent the most frequent salient region and the corresponding coarse fixation region. By calculating the offset between the centroids of the two regions, a vector called the calibration transformation vector V c is obtained. This vector can be used to compensate for the error in the coarse tracking result.
在实际操作过程中,可以使用RGB-D相机跟踪用户的眼球运动,并将其大致投影到屏幕坐标上,由此得到粗略的注视位置估计,即待校准视线跟踪结果。之后,粗估计被用作校准的输入。校准是一个机会主义的过程,可以通过监测头部运动和场景切割来调用。在校准过程中,对校准窗口中的帧提取显著性信息。基于对显著性的时空维度的认识,利用自下而上的显著性和自上而下的显著性来匹配用户注意力在时间维度上的变化。然后,通过测量显著性的空间特征来过滤低质量的显著性地图。选择合适的帧进行显著性校正,并与跟踪过程中获得的粗略注视估计进行比较,生成变换向量。然后利用变换向量对粗估计进行补偿,得到校准过程中获得的视线跟踪校准结果。In actual operation, an RGB-D camera can be used to track the user's eye movements and roughly project them onto screen coordinates, thereby obtaining a rough estimate of the gaze position, i.e., the gaze tracking result to be calibrated. Afterwards, the rough estimate is used as the input for calibration. Calibration is an opportunistic process that can be called by monitoring head movement and scene cutting. During the calibration process, saliency information is extracted for the frames in the calibration window. Based on the recognition of the spatiotemporal dimension of saliency, bottom-up saliency and top-down saliency are used to match the changes in user attention in the temporal dimension. Then, low-quality saliency maps are filtered by measuring the spatial features of saliency. Appropriate frames are selected for saliency correction and compared with the rough gaze estimate obtained during tracking to generate a transformation vector. The transformation vector is then used to compensate the rough estimate to obtain the gaze tracking calibration result obtained during calibration.
也就是说,在第二去噪过程中,通过计算两个区域的质心之间的偏移量,得到一个称为校准变换向量Vc的矢量。利用该向量可以补偿粗跟踪结果中的误差。将偏移量包含的变换向量补偿到所述待校准视线跟踪结果上进行隐式校准,得到视线跟踪校准结果。That is, in the second denoising process, by calculating the offset between the centroids of the two regions, a vector called the calibration transformation vector V c is obtained. This vector can be used to compensate for the error in the rough tracking result. The transformation vector included in the offset is compensated to the sight tracking result to be calibrated for implicit calibration to obtain the sight tracking calibration result.
上述步骤结束后就能够基本执行完整的注视跟踪。After completing the above steps, you can basically perform complete gaze tracking.
在实际跟踪的操作过程中,首先使用前置RGBD摄像头捕捉用户的3D眼睛信息{E1,E2,...,EM},将3D眼睛信息处理后获得注视位置。注视位置和注视平面(屏幕)的交集决定了注视点{G1,G2,...,GM}。然而,在这里面临的一个问题是,注视平面相对于用户眼睛的位置是不知道的。在本发明的前述步骤中,综合了来自惯性测量单元(IMU)的手机的惯性信息和来自相机视角的深度信息来估计屏幕的相对位置。In the actual tracking operation process, the front RGBD camera is first used to capture the user's 3D eye information {E1, E2, ..., EM}, and the 3D eye information is processed to obtain the gaze position. The intersection of the gaze position and the gaze plane (screen) determines the gaze point {G1, G2, ..., GM}. However, one problem faced here is that the position of the gaze plane relative to the user's eyes is unknown. In the aforementioned steps of the present invention, the inertial information of the mobile phone from the inertial measurement unit (IMU) and the depth information from the camera perspective are combined to estimate the relative position of the screen.
然而,当用户以横向姿势握住手机时,上述相对位置的估计将会出现一些问题。在这种情况下,相机将向用户面部的左侧或右侧旋转90度。因此,当被相机捕捉到时,用户的脸会稍微旋转一下。估计的注视方向也是扭曲的,特别是对于相机相反方向的眼睛。However, the estimation of the above relative position will have some problems when the user holds the phone in landscape posture. In this case, the camera will be rotated 90 degrees to the left or right of the user's face. Therefore, the user's face will be slightly rotated when captured by the camera. The estimated gaze direction is also distorted, especially for the eyes facing the opposite direction of the camera.
在不失去一般性的情况下,本发明首先考虑相机旋转到用户左侧的情况。在屏幕坐标系中,假设原点位于屏幕的左下角。可以产生两个关键的观察:i)视线沿x轴距离原点越远,当眼球旋转一定程度时,其位移越大;ii)视线沿y轴距离原点越近,当眼球旋转一定程度时,其位移越小。第一次观察是由相机捕捉到的用户面部的旋转引起的。第二个观察是由眼睛的椭圆形结构引起的。在椭圆形结构中,眼球从左到右的旋转比从上到下的旋转更明显,即使眼球旋转的程度相同。此外,与向下看相比,当用户向上看时,眼睛睁得更大。因此,当用户向下看时,相机更难捕捉到他/她的眼睛。Without loss of generality, the present invention first considers the case where the camera is rotated to the left of the user. In the screen coordinate system, the origin is assumed to be at the lower left corner of the screen. Two key observations can be made: i) the farther the line of sight is from the origin along the x-axis, the greater its displacement is when the eyeball rotates to a certain degree; ii) the closer the line of sight is to the origin along the y-axis, the smaller its displacement is when the eyeball rotates to a certain degree. The first observation is caused by the rotation of the user's face captured by the camera. The second observation is caused by the elliptical structure of the eye. In the elliptical structure, the rotation of the eyeball from left to right is more obvious than the rotation from top to bottom, even if the degree of eyeball rotation is the same. In addition, when the user looks up, the eyes are more open than when looking down. Therefore, when the user looks down, it is more difficult for the camera to capture his/her eyes.
故而,为了补偿x轴上的失真,用移动设备的前置摄像头捕捉到的用户面部的旋转来补偿注视位置(即待校准视线跟踪结果)的x值。对于y轴上的失真,当y小于y轴失真阈值时,用一个恒定值来补偿视点位置(即待校准视线跟踪结果)的y值。y轴失真阈值可以是300。有了这个补偿,本发明就可以在肖像和风景两种姿势下进行注视跟踪。最后,将待输入显著信息包含的特征向量补偿到待校准视线跟踪结果上,得到标定后的视线跟踪校准结果。Therefore, in order to compensate for the distortion on the x-axis, the rotation of the user's face captured by the front camera of the mobile device is used to compensate the x value of the gaze position (i.e., the sight tracking result to be calibrated). For the distortion on the y-axis, when y is less than the y-axis distortion threshold, a constant value is used to compensate the y value of the viewpoint position (i.e., the sight tracking result to be calibrated). The y-axis distortion threshold can be 300. With this compensation, the present invention can perform gaze tracking in both portrait and landscape postures. Finally, the feature vector contained in the salient information to be input is compensated to the sight tracking result to be calibrated to obtain the calibrated sight tracking calibration result.
在一些实施例中,本发明还包括校准过程的触发条件。本发明提出了基于用户注意力时空模型的隐式校准方案,从根本上解决了显式校准对用户体验质量的降低。In some embodiments, the present invention further includes triggering conditions for the calibration process. The present invention proposes an implicit calibration solution based on a user attention spatiotemporal model, which fundamentally solves the problem of explicit calibration reducing the quality of user experience.
本发明提供的校准机制是机会主义的,参考选择在两个方面:The calibration mechanism provided by the present invention is opportunistic, and the reference is selected in two aspects:
首先,校准是建立在用户和设备之间稳定的相对位置上的。因此,当检测到相对位置的变化时,需要重新校准。否则,可以直接使用预先计算的变换向量来执行校准跟踪。在一些实施例中,通过使用前置RGB-D摄像头跟踪用户的人脸运动来实现这样的检测。具体地说,当用户和设备之间的相对位置发生变化时,摄像头捕捉到的用户的面部姿势不可避免地会发生变化。因此,在视线跟踪过程中,不断地捕捉到用户面部的3D信息。一旦两个连续脸部姿势之间的距离大于第三预设阈值,在一些实施例中,所述第三预设阈值可以是0.005,则检测到相对位置的变化。然后触发新的校准过程以更新变换向量。这种重新校准也可以在现有的校准过程中进行,以保持校准的质量。First, calibration is based on a stable relative position between the user and the device. Therefore, when a change in relative position is detected, recalibration is required. Otherwise, the calibration tracking can be performed directly using the pre-calculated transformation vector. In some embodiments, such detection is achieved by tracking the user's face movement using a front RGB-D camera. Specifically, when the relative position between the user and the device changes, the facial posture of the user captured by the camera will inevitably change. Therefore, during the line of sight tracking process, 3D information of the user's face is continuously captured. Once the distance between two consecutive facial postures is greater than a third preset threshold, in some embodiments, the third preset threshold can be 0.005, a change in relative position is detected. A new calibration process is then triggered to update the transformation vector. This recalibration can also be performed during the existing calibration process to maintain the quality of the calibration.
此外,当场景切换出现时,执行校准。正如在步骤130中提到的,场景切换后,自下而上的注意力将立即主导用户的注视。这种潜意识行为具有很强的自信程度,将用户的注视与自下而上的显着性联系在一起。因此,需要触发对检测到的场景切换的校准,以保持剪切结果的质量。对于这种校准,为了保持与注意持续时间的一致性,校准窗口的长度可以定义为5帧。In addition, calibration is performed when a scene switch occurs. As mentioned in
本发明提出了一种针对视线追踪的隐式校准方法,利用对用户注意力机制时间和空间维度的洞察,通过跟踪用户的头部及眼部运动,结合对画面内容的分析,利用用户注意力和画面内容之间的时空关系,使用动态低质的画面对用户眼动进行隐式、被动的校准,利用“有用的”显著性信息识别视频帧,并仅使用这些“有用的”帧执行机会校准。解决了扩展现实注视跟踪中校准过程对于用户体验质量严重降低的问题,即使在移动场景中也能实现高度可靠和精确的注视跟踪,最终实现适用于扩展现实的连续注视跟踪。This paper proposes an implicit calibration method for gaze tracking. It uses insights into the temporal and spatial dimensions of the user's attention mechanism, tracks the user's head and eye movements, combines analysis of the screen content, and uses the temporal and spatial relationship between the user's attention and the screen content. It uses dynamic low-quality images to implicitly and passively calibrate the user's eye movements, uses "useful" saliency information to identify video frames, and only uses these "useful" frames to perform opportunistic calibration. It solves the problem that the calibration process in extended reality gaze tracking seriously reduces the quality of user experience, and can achieve highly reliable and accurate gaze tracking even in mobile scenes, and finally achieves continuous gaze tracking suitable for extended reality.
基于上述实施例,该方法中,将所述待输入显著信息和所述待校准视线跟踪结果输入至视线追踪校准模块,以得到视线跟踪校准结果,之前还包括:Based on the above embodiment, in the method, the to-be-input salient information and the to-be-calibrated gaze tracking result are input into a gaze tracking calibration module to obtain a gaze tracking calibration result, and the method also includes:
对所述待校准视线跟踪结果进行第一去噪和第二去噪;Performing a first denoising and a second denoising on the sight tracking result to be calibrated;
所述第一去噪具体包括:The first denoising specifically includes:
利用第一预设公式对所述待校准视线跟踪结果进行第一去噪;Performing a first denoising operation on the sight tracking result to be calibrated using a first preset formula;
所述第一预设公式包括:The first preset formula includes:
其中,z表示第一去噪分数值,x表示原始值,μ表示整数值的平均值,σ表示标准差;Wherein, z represents the first denoising score value, x represents the original value, μ represents the mean value of the integer value, and σ represents the standard deviation;
所述第二去噪具体包括:The second denoising specifically includes:
根据所述待校准视线跟踪结果获得注视位置;Obtaining a gaze position according to the sight tracking result to be calibrated;
计算所述待输入显著信息对应的所述注视位置的平均值,以得到平均注视位置;Calculating an average value of the gaze positions corresponding to the salient information to be input to obtain an average gaze position;
将所述待输入显著信息和所述平均注视位置聚类,以得到聚类结果;Clustering the salient information to be input and the average gaze position to obtain a clustering result;
根据所述聚类结果确定最频繁显著区域和粗略注视区域;Determining the most frequently salient region and the roughly fixated region according to the clustering result;
计算所述最频繁显著区域和所述粗略注视区域质心之间的偏移量,根据所述偏移量进行第二去噪。An offset between the most frequently salient region and the centroid of the roughly fixated region is calculated, and a second denoising is performed according to the offset.
具体地,将所述待输入显著信息和所述待校准视线跟踪结果输入至视线追踪校准模块之前,需要首先对待校准视线跟踪结果进行预处理,以滤除结果中的两个噪声源。Specifically, before the salient information to be input and the gaze tracking result to be calibrated are input into the gaze tracking calibration module, the gaze tracking result to be calibrated needs to be preprocessed first to filter out two noise sources in the result.
第一个噪声源被闪烁事件擦除。具体地说,人类的眼睛通常在一分钟内眨眼15-20次。当用户眨眼时,注视位置将迅速改变。类似的现象也可以在扫视事件中观察到。然而,在这两个事件中,注视模式是不同的。对于眨眼事件,注视位置变化很快,但很快就会回到原来的位置。因此,在这种情况下,可以使用z-Score来消除异常值。在这里,z-Score的计算公式可以表示为第一预设公式:The first noise source is erased by blink events. Specifically, human eyes usually blink 15-20 times in one minute. When a user blinks, the gaze position will change rapidly. A similar phenomenon can also be observed in saccade events. However, the gaze patterns are different in these two events. For blink events, the gaze position changes quickly, but quickly returns to the original position. Therefore, in this case, z-Score can be used to eliminate outliers. Here, the calculation formula of z-Score can be expressed as the first preset formula:
其中,z表示第一去噪分数值,x表示原始值,μ表示整数值的平均值,σ表示标准差。Wherein, z represents the first denoising score value, x represents the original value, μ represents the mean of the integer values, and σ represents the standard deviation.
在实际操作过程中,计算校准窗口中待校准视线跟踪结果即每个粗略注视位置的z分数,如果其z分数的绝对值大于z分数阈值,则将其识别为异常值。在一些实施例中,z分数阈值可以设置为3。In actual operation, the z-score of each rough gaze position in the calibration window is calculated, and if the absolute value of its z-score is greater than the z-score threshold, it is identified as an outlier. In some embodiments, the z-score threshold can be set to 3.
除了眨眼事件外,眼睛定位的误差也会给注视跟踪结果带来噪声,称为第二噪声。具体地说,它会在粗略的注视位置引起轻微的抖动。为了滤除这种抖动,对每一帧的多个眼睛位置进行采样。计算对应于一个特征向量Vi的Ni个粗略注视位置{G1,G2,...,GM}的平均值。平均注视位置表示为由于在一个校准窗口中使用N个帧,因此将获得N个平均注视位置之后,使用N个平均注视位置和N个显著向量{v}来进行注视校准。在一些实施例中,可以将显著向量{v}的数量N设置为10。虽然可以使用单个显著向量来预测真实的注视位置,但是由于单个帧可能包含多个显著区域/对象,因此精度仍然不足以确定注视点的准确位置。此外,用户行为的不确定性也会导致单帧的波动。例如,用户的注意力有时会被背景中不明显的区域吸引。因此,利用N个帧的序列来消除这种误差。具体地说,分别对{v}和进行聚类。然后,对于{v}和的聚类结果,选择样本最多的聚类来分别表示最频繁的显著区域和对应的粗略注视区域。通过计算两个区域的质心之间的偏移量,得到一个称为校准变换向量Vc的矢量。利用该向量可以补偿粗跟踪结果中的误差。In addition to blink events, errors in eye positioning can also introduce noise into the gaze tracking results, which is called second noise. Specifically, it causes slight jitter in the coarse gaze position. To filter out this jitter, multiple eye positions are sampled for each frame. The average of the N i coarse gaze positions {G 1 , G 2 , ..., G M } corresponding to a feature vector V i is calculated. The average gaze position is expressed as Since N frames are used in one calibration window, N average gaze positions will be obtained Then, use the N average gaze positions And N salient vectors {v} are used to perform gaze calibration. In some embodiments, the number N of salient vectors {v} can be set to 10. Although a single salient vector can be used to predict the true gaze position, the accuracy is still not enough to determine the exact position of the gaze point because a single frame may contain multiple salient regions/objects. In addition, the uncertainty of user behavior can also cause fluctuations in a single frame. For example, the user's attention is sometimes attracted by an obscure area in the background. Therefore, a sequence of N frames is used to eliminate this error. Specifically, {v} and Then, for {v} and The clustering results of the training set are selected to represent the most frequent salient region and the corresponding coarse fixation region. By calculating the offset between the centroids of the two regions, a vector called the calibration transformation vector V c is obtained. This vector can be used to compensate for the error in the coarse tracking result.
基于上述实施例,该方法中,所述视觉显著检测模块用于检测所述待检测内容帧中每个像素的显著性,具体包括:Based on the above embodiment, in the method, the visual saliency detection module is used to detect the saliency of each pixel in the content frame to be detected, specifically including:
将所述待检测内容帧输入至视觉显著检测模块,生成视觉显著热图;Inputting the content frame to be detected into a visual saliency detection module to generate a visual saliency heat map;
将所述视觉显著热图归一化至预设范围,得到视觉显著检测结果。The visual saliency heat map is normalized to a preset range to obtain a visual saliency detection result.
具体地,待检测内容帧输入至视觉显著检测模块后,将被馈送到视觉显著检测组件以生成视觉显著热图,然后,将每个热图归一化到预设范围,以保持一致性,得到视觉显著检测结果。Specifically, after the content frame to be detected is input into the visual saliency detection module, it will be fed into the visual saliency detection component to generate a visual saliency heat map. Then, each heat map is normalized to a preset range to maintain consistency and obtain a visual saliency detection result.
基于上述实施例,该方法中,所述视觉显著度量模块用于度量所述视觉显著检测结果中每个像素的显著的有效性,具体包括:Based on the above embodiment, in the method, the visual saliency measurement module is used to measure the saliency effectiveness of each pixel in the visual saliency detection result, specifically including:
根据第一预设阈值对所述视觉显著检测结果进行二值化,以得到显著像素;Binarizing the visual saliency detection result according to a first preset threshold value to obtain salient pixels;
通过连通分量分析计算得到帧中显著像素所在显著区域/对象的数量;The number of salient regions/objects where salient pixels are located in the frame is calculated by connected component analysis;
根据第二预设公式计算得到所述视觉显著检测结果对应的所述设备当前显示内容帧的显著性集中度;Calculating, according to a second preset formula, the saliency concentration of the content frame currently displayed by the device corresponding to the visual saliency detection result;
滤除显著性集中度低于第二预设阈值的视觉显著检测结果,得到待校准内容帧;Filtering out visual saliency detection results whose saliency concentration is lower than a second preset threshold, to obtain a content frame to be calibrated;
所述第二预设公式包括:The second preset formula includes:
其中,S表示显著性集中度;n表示显著区域/对象的数量;AS表示显著区域/对象像素,AT表示整个框架的区域像素。Where S represents the saliency concentration; n represents the number of salient regions/objects; AS represents the salient region/object pixels, and AT represents the region pixels of the entire frame.
具体地,显著热图的集中度由两个特征决定。一是热图上显著区域/对象的数量,另一个是显著区域/对象的面积。基于此,本发明提出一种显著性度量,称为显著性集中度(SCS),其计算公式表示为第二预设公式,如下:Specifically, the concentration of the salient heat map is determined by two features. One is the number of salient regions/objects on the heat map, and the other is the area of the salient regions/objects. Based on this, the present invention proposes a saliency measure, called saliency concentration (SCS), whose calculation formula is expressed as the second preset formula, as follows:
其中,S表示显著性集中度;n表示显著区域/对象的数量;AS表示显著区域/对象像素,AT表示整个框架的区域像素。SCS的值在0和1之间变化。n和AS与AT的比值越小,SCS值越接近1,反之亦然。Where S represents the saliency concentration; n represents the number of salient regions/objects; AS represents the salient region/object pixels, and AT represents the region pixels of the entire frame. The value of SCS varies between 0 and 1. The smaller the value of n and the ratio of AS to AT , the closer the SCS value is to 1, and vice versa.
为计算显著性集中度,需要从每一帧中提取特征n和AS。为此,首先对每一帧的热图进行二值化,以过滤出显著性值低于第一预设阈值的背景像素,得到显著像素。在一些实施例中,用于二值化的第一预设阈值可以是170。To calculate the saliency concentration, it is necessary to extract features n and AS from each frame. To this end, the heat map of each frame is first binarized to filter out background pixels whose saliency values are lower than a first preset threshold to obtain salient pixels. In some embodiments, the first preset threshold for binarization may be 170.
滤除后剩下的像素记为显著像素,反映了显著的区域/对象。显著像素集合的区域就是显著区域/对象。显然,剩余像素的比率给出了比率显著区域/对象的数量n可以通过在二值化热图上形成连通分量分析来计算。The pixels remaining after filtering are recorded as salient pixels, reflecting the salient region/object. The region of the salient pixel set is the salient region/object. Obviously, the ratio of the remaining pixels gives the ratio The number n of salient regions/objects can be calculated by performing a connected component analysis on the binarized heatmap.
计算得到显著区域/对象的数量n后,结合比率通过第二预设公式计算得到显著性热图对应的显著性集中度。所述显著性集中度也是视觉显著检测结果对应的设备当前显示内容帧的显著性集中度,同时也是待校准内容帧的显著性集中度。After calculating the number of salient regions/objects n, the combined ratio The saliency concentration corresponding to the saliency heat map is calculated by the second preset formula. The saliency concentration is also the saliency concentration of the content frame currently displayed by the device corresponding to the visual saliency detection result, and is also the saliency concentration of the content frame to be calibrated.
利用显著性集中度即可在空间上选择显著性,使用此度量来选择可用于校准的帧。在进行显著性检测之前,已经进行了一次选择,以确定每一帧的显著类型。经过这种时间选择和显著性检测后,仍然存在一个问题,即并不是所有的帧都能为隐式校准提供良好的机会,故而还需要将SCS值较低的值滤除。具体来说,将SCS值低于第二预设阈值的帧滤除,得到待校准内容帧,即经过滤除的待校准的内容帧都能更好的进行隐式校准。在一些实施例中,第二预设阈值可以为0.6。The saliency concentration can be used to select saliency in space, and this metric is used to select frames that can be used for calibration. Before performing saliency detection, a selection has been made to determine the saliency type of each frame. After this time selection and saliency detection, there is still a problem, that is, not all frames can provide good opportunities for implicit calibration, so it is necessary to filter out values with lower SCS values. Specifically, the frames with SCS values lower than the second preset threshold are filtered out to obtain the content frames to be calibrated, that is, the content frames to be calibrated that are filtered out can better perform implicit calibration. In some embodiments, the second preset threshold can be 0.6.
基于上述实施例,该方法中,所述显著信息提取模块用于从所述待校准内容帧中提取显著信息,具体包括:Based on the above embodiment, in the method, the salient information extraction module is used to extract salient information from the content frame to be calibrated, specifically including:
提取所述待校准内容帧在每个连通分量域中的具有最高显著值的像素所在显著区域/对象的坐标和编号;Extracting the coordinates and serial numbers of the salient regions/objects where the pixels having the highest saliency value in each connected component domain of the content frame to be calibrated are located;
根据第三预设公式,将所述待校准内容帧上的显著性集中度、所有所述显著区域/对象的坐标和编号共同压缩为特征向量Vi,得到待输入显著信息;According to a third preset formula, the saliency concentration on the content frame to be calibrated, the coordinates and numbers of all the salient regions/objects are compressed into a feature vector V i to obtain salient information to be input;
所述第三预设公式包括:The third preset formula includes:
其中,Vi表示特征向量,ni表示显著区域/对象的编号,SCSi表示待校准内容帧上的显著性集中度,表示显著区域/对象的横坐标,表示显著区域/对象的纵坐标。Wherein, V i represents the feature vector, n i represents the number of the salient region/object, SCS i represents the saliency concentration on the content frame to be calibrated, The horizontal coordinate representing the salient region/object, The vertical coordinate representing the salient region/object.
具体地,之后,从剩余的能更好的进行隐式校准的帧(即待校准的内容帧)中提取显著信息。具体地说,对于每个帧Fi,在其中的每个连通分量域中找到具有最高显著值的像素,其坐标表示相应显著区域/对象的位置。然后,将Fi上所有ni个显著区域/对象的坐标与显著区域/对象编号ni和SCS值SCSi一起压缩为特征向量Vi,如下所示,即第三预设公式:Specifically, then, extract salient information from the remaining frames that can be better implicitly calibrated (i.e., the content frames to be calibrated). Specifically, for each frame F i , find the pixel with the highest salient value in each connected component domain therein, and its coordinates represent the position of the corresponding salient region/object. Then, compress the coordinates of all n i salient regions/objects on F i together with the salient region/object number n i and the SCS value SCS i into a feature vector V i , as shown below, i.e., the third preset formula:
其中,Vi表示特征向量,ni表示显著区域/对象的编号,SCSi表示待校准内容帧上的显著性集中度,表示显著区域/对象的横坐标,表示显著区域/对象的纵坐标。Wherein, V i represents the feature vector, n i represents the number of the salient region/object, SCS i represents the saliency concentration on the content frame to be calibrated, The horizontal coordinate representing the salient region/object, The vertical coordinate representing the salient region/object.
每一帧的特征向量Vi记为待输入显著信息,待输入显著信息被馈送到视线追踪校准模块照中的校准组件以进行隐式注视误差补偿。The feature vector V i of each frame is recorded as the salient information to be input, which is fed to the calibration component in the gaze tracking calibration module for implicit gaze error compensation.
基于上述实施例,该方法中,所述视线追踪校准模块用于对所述待校准视线跟踪结果进行校准,具体包括:Based on the above embodiment, in the method, the sight tracking calibration module is used to calibrate the sight tracking result to be calibrated, specifically including:
将所述待输入显著信息和所述待校准视线跟踪结果输入至视线追踪校准模块,将所述偏移量包含的变换向量补偿到所述待校准视线跟踪结果上进行隐式校准,得到视线跟踪校准结果。The salient information to be input and the gaze tracking result to be calibrated are input into a gaze tracking calibration module, and the transformation vector included in the offset is compensated to the gaze tracking result to be calibrated for implicit calibration to obtain a gaze tracking calibration result.
具体地,利用一个帧中的所有特征向量组成的视觉显著向量({V1,V2,...,VN})可以校正待校准视线跟踪结果中注视位置{G1,G2,...,GM}中的误差。Specifically, the visual saliency vector ({V 1 , V 2 , ..., V N }) composed of all feature vectors in a frame can be used to correct the error in the gaze position {G 1 , G 2 , ..., G M } in the gaze tracking result to be calibrated.
在实际操作过程中,可以使用RGB-D相机跟踪用户的眼球运动,并将其大致投影到屏幕坐标上,由此得到粗略的注视位置估计,即待校准视线跟踪结果。之后,粗估计被用作校准的输入。校准是一个机会主义的过程,可以通过监测头部运动和场景切割来调用。在校准过程中,对校准窗口中的帧提取显著性信息。基于对显著性的时空维度的认识,利用自下而上的显著性和自上而下的显著性来匹配用户注意力在时间维度上的变化。然后,通过测量显著性的空间特征来过滤低质量的显著性地图。选择合适的帧进行显著性校正,并与跟踪过程中获得的粗略注视估计进行比较,生成变换向量。然后利用变换向量对粗估计进行补偿,得到校准过程中获得的视线跟踪校准结果。In actual operation, an RGB-D camera can be used to track the user's eye movements and roughly project them onto screen coordinates, thereby obtaining a rough estimate of the gaze position, i.e., the gaze tracking result to be calibrated. Afterwards, the rough estimate is used as the input for calibration. Calibration is an opportunistic process that can be called by monitoring head movement and scene cutting. During the calibration process, saliency information is extracted for the frames in the calibration window. Based on the recognition of the spatiotemporal dimension of saliency, bottom-up saliency and top-down saliency are used to match the changes in user attention in the temporal dimension. Then, low-quality saliency maps are filtered by measuring the spatial features of saliency. Appropriate frames are selected for saliency correction and compared with the rough gaze estimate obtained during tracking to generate a transformation vector. The transformation vector is then used to compensate the rough estimate to obtain the gaze tracking calibration result obtained during calibration.
也就是说,在第二去噪过程中,通过计算两个区域的质心之间的偏移量,得到一个称为校准变换向量Vc的矢量。利用该向量可以补偿粗跟踪结果中的误差。将偏移量包含的变换向量补偿到所述待校准视线跟踪结果上进行隐式校准,得到视线跟踪校准结果。That is, in the second denoising process, by calculating the offset between the centroids of the two regions, a vector called the calibration transformation vector V c is obtained. This vector can be used to compensate for the error in the rough tracking result. The transformation vector included in the offset is compensated to the sight tracking result to be calibrated for implicit calibration to obtain the sight tracking calibration result.
基于上述实施例,该方法中,将所述待输入显著信息和所述待校准视线跟踪结果输入至视线追踪校准模块,将所述偏移量包含的变换向量补偿到所述待校准视线跟踪结果上进行隐式校准,得到视线跟踪校准结果,之前还包括:Based on the above embodiment, in the method, the to-be-input salient information and the to-be-calibrated gaze tracking result are input into a gaze tracking calibration module, and the transformation vector included in the offset is compensated to the to-be-calibrated gaze tracking result for implicit calibration to obtain a gaze tracking calibration result, which also includes:
获取用户与设备的相对位置;Get the relative position of the user and the device;
若所述相对位置的变化小于或等于第三预设阈值,变换向量不变;If the change in the relative position is less than or equal to a third preset threshold, the transformation vector remains unchanged;
若所述相对位置的变化大于第三预设阈值或场景切换时,更新所述变换向量。If the change in the relative position is greater than a third preset threshold or the scene is switched, the transformation vector is updated.
具体地,当用户以横向姿势握住手机时,上述相对位置的估计将会出现一些问题。在这种情况下,相机将向用户面部的左侧或右侧旋转90度。因此,当被相机捕捉到时,用户的脸会稍微旋转一下。估计的注视方向也是扭曲的,特别是对于相机相反方向的眼睛。Specifically, when the user holds the phone in landscape position, the estimation of the above relative position will have some problems. In this case, the camera will be rotated 90 degrees to the left or right of the user's face. Therefore, the user's face will be slightly rotated when captured by the camera. The estimated gaze direction is also distorted, especially for the eyes in the opposite direction of the camera.
在不失去一般性的情况下,本发明首先考虑相机旋转到用户左侧的情况。在屏幕坐标系中,假设原点位于屏幕的左下角。可以产生两个关键的观察:i)视线沿x轴距离原点越远,当眼球旋转一定程度时,其位移越大;ii)视线沿y轴距离原点越近,当眼球旋转一定程度时,其位移越小。第一次观察是由相机捕捉到的用户面部的旋转引起的。第二个观察是由眼睛的椭圆形结构引起的。在椭圆形结构中,眼球从左到右的旋转比从上到下的旋转更明显,即使眼球旋转的程度相同。此外,与向下看相比,当用户向上看时,眼睛睁得更大。因此,当用户向下看时,相机更难捕捉到他/她的眼睛。Without loss of generality, the present invention first considers the case where the camera is rotated to the left of the user. In the screen coordinate system, the origin is assumed to be at the lower left corner of the screen. Two key observations can be made: i) the farther the line of sight is from the origin along the x-axis, the greater its displacement is when the eyeball rotates to a certain degree; ii) the closer the line of sight is to the origin along the y-axis, the smaller its displacement is when the eyeball rotates to a certain degree. The first observation is caused by the rotation of the user's face captured by the camera. The second observation is caused by the elliptical structure of the eye. In the elliptical structure, the rotation of the eyeball from left to right is more obvious than the rotation from top to bottom, even if the degree of eyeball rotation is the same. In addition, when the user looks up, the eyes are more open than when looking down. Therefore, when the user looks down, it is more difficult for the camera to capture his/her eyes.
故而,为了补偿x轴上的失真,用移动设备的前置摄像头捕捉到的用户面部的旋转来补偿注视位置(即待校准视线跟踪结果)的x值。对于y轴上的失真,当y小于y轴失真阈值时,用一个恒定值来补偿视点位置(即待校准视线跟踪结果)的y值。y轴失真阈值可以是300。有了这个补偿,本发明就可以在肖像和风景两种姿势下进行注视跟踪。最后,将待输入显著信息包含的特征向量补偿到待校准视线跟踪结果上,得到标定后的视线跟踪校准结果。Therefore, in order to compensate for the distortion on the x-axis, the rotation of the user's face captured by the front camera of the mobile device is used to compensate the x value of the gaze position (i.e., the sight tracking result to be calibrated). For the distortion on the y-axis, when y is less than the y-axis distortion threshold, a constant value is used to compensate the y value of the viewpoint position (i.e., the sight tracking result to be calibrated). The y-axis distortion threshold can be 300. With this compensation, the present invention can perform gaze tracking in both portrait and landscape postures. Finally, the feature vector contained in the salient information to be input is compensated to the sight tracking result to be calibrated to obtain the calibrated sight tracking calibration result.
在一些实施例中,本发明还包括校准过程的触发条件。本发明提出了基于用户注意力时空模型的隐式校准方案,从根本上解决了显式校准对用户体验质量的降低。In some embodiments, the present invention further includes triggering conditions for the calibration process. The present invention proposes an implicit calibration solution based on a user attention spatiotemporal model, which fundamentally solves the problem of explicit calibration reducing the quality of user experience.
本发明提供的校准机制是机会主义的,参考选择在两个方面:The calibration mechanism provided by the present invention is opportunistic, and the reference is selected in two aspects:
首先,校准是建立在用户和设备之间稳定的相对位置上的。因此,当检测到相对位置的变化时,需要重新校准。否则,可以直接使用预先计算的变换向量来执行校准跟踪。在一些实施例中,通过使用前置RGB-D摄像头跟踪用户的人脸运动来实现这样的检测。具体地说,当用户和设备之间的相对位置发生变化时,摄像头捕捉到的用户的面部姿势不可避免地会发生变化。因此,在视线跟踪过程中,不断地捕捉到用户面部的3D信息。一旦两个连续脸部姿势之间的距离大于第三预设阈值,在一些实施例中,所述第三预设阈值可以是0.005,则检测到相对位置的变化。然后触发新的校准过程以更新变换向量。这种重新校准也可以在现有的校准过程中进行,以保持校准的质量。First, calibration is based on a stable relative position between the user and the device. Therefore, when a change in relative position is detected, recalibration is required. Otherwise, the calibration tracking can be performed directly using the pre-calculated transformation vector. In some embodiments, such detection is achieved by tracking the user's face movement using a front RGB-D camera. Specifically, when the relative position between the user and the device changes, the facial posture of the user captured by the camera will inevitably change. Therefore, during the line of sight tracking process, 3D information of the user's face is continuously captured. Once the distance between two consecutive facial postures is greater than a third preset threshold, in some embodiments, the third preset threshold can be 0.005, a change in relative position is detected. A new calibration process is then triggered to update the transformation vector. This recalibration can also be performed during the existing calibration process to maintain the quality of the calibration.
此外,当场景切换出现时,执行校准。正如在步骤130中提到的,场景切换后,自下而上的注意力将立即主导用户的注视。这种潜意识行为具有很强的自信程度,将用户的注视与自下而上的显着性联系在一起。因此,需要触发对检测到的场景切换的校准,以保持剪切结果的质量。对于这种校准,为了保持与注意持续时间的一致性,校准窗口的长度可以定义为5帧。In addition, calibration is performed when a scene switch occurs. As mentioned in
在一个具体实施例中,选择iPhone Xs Max作为当前设备,它集成了2.49GHz的Apple A12 Bionic,4GB RAM,6.5英寸屏幕,TrueDepth摄像头,运行iOS 13.6操作系统。TrueDepth相机提供了一种RGB-D相机。本发明提供的针对视线追踪的隐式校准方法的实现可以应用于任何配备TrueDepth摄像头的iOS设备,如iPhone 11、iPad Pro等。此外,本发明可以在任何配备RGB-D摄像头的Android设备上实现,如华为Mate 20、OPPO Find X、荣誉魔术2等。本发明提供的针对视线追踪的隐式校准方法的算法采用SWIFT和Objective-C++编写。为了保证评估帧在不同用户之间的重复性,在实现中使用了视频作为可视输入。这种实现可以通过简单的设置轻松转换为AR场景。为了获取RGB-D相机数据,使用ARKit框架用于iOS的OpenCV以提供多个帧处理功能。In a specific embodiment, an iPhone Xs Max is selected as the current device, which integrates a 2.49GHz Apple A12 Bionic, 4GB RAM, a 6.5-inch screen, a TrueDepth camera, and runs an iOS 13.6 operating system. The TrueDepth camera provides an RGB-D camera. The implementation of the implicit calibration method for gaze tracking provided by the present invention can be applied to any iOS device equipped with a TrueDepth camera, such as an iPhone 11, an iPad Pro, etc. In addition, the present invention can be implemented on any Android device equipped with an RGB-D camera, such as Huawei Mate 20, OPPO Find X,
本发明提供的一种针对视线追踪的隐式校准方法,通过获取校准窗口内的N个待校准视线跟踪结果和M个设备当前显示内容帧;其中,N小于M;将所述设备当前显示内容帧的像素调整到预设分辨率,以得到待检测内容帧;将所述待检测内容帧输入至视觉显著检测模块,以得到视觉显著检测结果;所述视觉显著检测模块用于检测所述待检测内容帧中每个像素的显著性;将所述视觉显著检测结果输入至视觉显著度量模块,以得到待校准内容帧;所述视觉显著度量模块用于度量所述视觉显著检测结果中每个像素的显著的有效性;将所述待校准内容帧输入至显著信息提取模块,以得到待输入显著信息;所述显著信息提取模块用于从所述待校准内容帧中提取显著信息;将所述待输入显著信息和所述待校准视线跟踪结果输入至视线追踪校准模块,以得到视线跟踪校准结果;所述视线追踪校准模块用于对所述待校准视线跟踪结果进行校准。本发明利用显著性信息识别视频帧,并仅使用这些“有用的”帧执行校准,实现不需要用户配合的、更为简单的、用户体验更好的视线追踪的校准。The present invention provides an implicit calibration method for eye tracking, which obtains N eye tracking results to be calibrated and M content frames currently displayed by the device within a calibration window; wherein N is less than M; the pixels of the content frame currently displayed by the device are adjusted to a preset resolution to obtain a content frame to be detected; the content frame to be detected is input into a visual saliency detection module to obtain a visual saliency detection result; the visual saliency detection module is used to detect the saliency of each pixel in the content frame to be detected; the visual saliency detection result is input into a visual saliency measurement module to obtain the content frame to be calibrated; the visual saliency measurement module is used to measure the saliency effectiveness of each pixel in the visual saliency detection result; the content frame to be calibrated is input into a saliency information extraction module to obtain saliency information to be input; the saliency information extraction module is used to extract saliency information from the content frame to be calibrated; the saliency information to be input and the eye tracking result to be calibrated are input into an eye tracking calibration module to obtain an eye tracking calibration result; the eye tracking calibration module is used to calibrate the eye tracking result to be calibrated. The present invention utilizes saliency information to identify video frames, and only uses these "useful" frames to perform calibration, thereby achieving a simpler eye tracking calibration that does not require user cooperation and provides a better user experience.
下面对本发明提供的针对视线追踪的隐式校准装置进行描述,下文描述的针对视线追踪的隐式校准装置与上文描述的针对视线追踪的隐式校准方法可相互对应参照。The implicit calibration device for gaze tracking provided by the present invention is described below. The implicit calibration device for gaze tracking described below and the implicit calibration method for gaze tracking described above can refer to each other.
图6是本发明实施例提供的针对视线追踪的隐式校准装置的结构示意图,如图6所示,本发明实施例提供一种针对视线追踪的隐式校准装置,包括:获取单元610;调整单元620;视觉显著检测单元630;视觉显著度量单元640;显著信息提取单元650;视线追踪校准单元660;FIG6 is a schematic diagram of the structure of an implicit calibration device for eye tracking provided by an embodiment of the present invention. As shown in FIG6 , an implicit calibration device for eye tracking provided by an embodiment of the present invention includes: an
其中,in,
获取单元610,用于获取校准窗口内的N个待校准视线跟踪结果和M个设备当前显示内容帧;其中,N小于M;An
调整单元620,用于将所述设备当前显示内容帧的像素调整到预设分辨率,以得到待检测内容帧;An adjusting
视觉显著检测单元630,用于将所述待检测内容帧输入至视觉显著检测模块,以得到视觉显著检测结果;所述视觉显著检测模块用于检测所述待检测内容帧中每个像素的显著性;A visual
视觉显著度量单元640,用于将所述视觉显著检测结果输入至视觉显著度量模块,以得到待校准内容帧;所述视觉显著度量模块用于度量所述视觉显著检测结果中每个像素的显著的有效性;A visual
显著信息提取单元650,用于将所述待校准内容帧输入至显著信息提取模块,以得到待输入显著信息;所述显著信息提取模块用于从所述待校准内容帧中提取显著信息;The salient
视线追踪校准单元660,用于将所述待输入显著信息和所述待校准视线跟踪结果输入至视线追踪校准模块,以得到视线跟踪校准结果;所述视线追踪校准模块用于对所述待校准视线跟踪结果进行校准。The gaze
基于上述实施例,该装置中,将所述待输入显著信息和所述待校准视线跟踪结果输入至视线追踪校准模块,以得到视线跟踪校准结果,之前还包括:Based on the above embodiment, in the device, the to-be-input salient information and the to-be-calibrated gaze tracking result are input into a gaze tracking calibration module to obtain a gaze tracking calibration result, which also includes:
对所述待校准视线跟踪结果进行第一去噪和第二去噪;Performing a first denoising and a second denoising on the sight tracking result to be calibrated;
所述第一去噪具体包括:The first denoising specifically includes:
利用第一预设公式对所述待校准视线跟踪结果进行第一去噪;Performing a first denoising operation on the sight tracking result to be calibrated using a first preset formula;
所述第一预设公式包括:The first preset formula includes:
其中,z表示第一去噪分数值,x表示原始值,μ表示整数值的平均值,σ表示标准差;Wherein, z represents the first denoising score value, x represents the original value, μ represents the mean value of the integer value, and σ represents the standard deviation;
所述第二去噪具体包括:The second denoising specifically includes:
根据所述待校准视线跟踪结果获得注视位置;Obtaining a gaze position according to the sight tracking result to be calibrated;
计算所述待输入显著信息对应的所述注视位置的平均值,以得到平均注视位置;Calculating an average value of the gaze positions corresponding to the salient information to be input to obtain an average gaze position;
将所述待输入显著信息和所述平均注视位置聚类,以得到聚类结果;Clustering the salient information to be input and the average gaze position to obtain a clustering result;
根据所述聚类结果确定最频繁显著区域和粗略注视区域;Determining the most frequently salient region and the roughly fixated region according to the clustering result;
计算所述最频繁显著区域和所述粗略注视区域质心之间的偏移量,根据所述偏移量进行第二去噪。An offset between the most frequently salient region and the centroid of the roughly fixated region is calculated, and a second denoising is performed according to the offset.
基于上述实施例,该装置中,所述视觉显著检测模块用于检测所述待检测内容帧中每个像素的显著性,具体包括:Based on the above embodiment, in the device, the visual saliency detection module is used to detect the saliency of each pixel in the content frame to be detected, specifically including:
将所述待检测内容帧输入至视觉显著检测模块,生成视觉显著热图;Inputting the content frame to be detected into a visual saliency detection module to generate a visual saliency heat map;
将所述视觉显著热图归一化至预设范围,得到视觉显著检测结果。The visual saliency heat map is normalized to a preset range to obtain a visual saliency detection result.
基于上述实施例,该装置中,所述视觉显著度量模块用于度量所述视觉显著检测结果中每个像素的显著的有效性,具体包括:Based on the above embodiment, in the device, the visual saliency measurement module is used to measure the saliency effectiveness of each pixel in the visual saliency detection result, specifically including:
根据第一预设阈值对所述视觉显著检测结果进行二值化,以得到显著像素;Binarizing the visual saliency detection result according to a first preset threshold value to obtain salient pixels;
通过连通分量分析计算得到帧中显著像素所在显著区域/对象的数量;The number of salient regions/objects where salient pixels are located in the frame is calculated by connected component analysis;
根据第二预设公式计算得到所述视觉显著检测结果对应的所述设备当前显示内容帧的显著性集中度;Calculating, according to a second preset formula, the saliency concentration of the content frame currently displayed by the device corresponding to the visual saliency detection result;
滤除显著性集中度低于第二预设阈值的视觉显著检测结果,得到待校准内容帧;Filtering out visual saliency detection results whose saliency concentration is lower than a second preset threshold, to obtain a content frame to be calibrated;
所述第二预设公式包括:The second preset formula includes:
其中,S表示显著性集中度;n表示显著区域/对象的数量;AS表示显著区域/对象像素,AT表示整个框架的区域像素。Where S represents the saliency concentration; n represents the number of salient regions/objects; AS represents the salient region/object pixels, and AT represents the region pixels of the entire frame.
基于上述实施例,该装置中,所述显著信息提取模块用于从所述待校准内容帧中提取显著信息,具体包括:Based on the above embodiment, in the device, the salient information extraction module is used to extract salient information from the content frame to be calibrated, specifically including:
提取所述待校准内容帧在每个连通分量域中的具有最高显著值的像素所在显著区域/对象的坐标和编号;Extracting the coordinates and serial numbers of the salient regions/objects where the pixels having the highest saliency value in each connected component domain of the content frame to be calibrated are located;
根据第三预设公式,将所述待校准内容帧上的显著性集中度、所有所述显著区域/对象的坐标和编号共同压缩为特征向量Vi,得到待输入显著信息;According to a third preset formula, the saliency concentration on the content frame to be calibrated, the coordinates and numbers of all the salient regions/objects are compressed into a feature vector V i to obtain salient information to be input;
所述第三预设公式包括:The third preset formula includes:
其中,Vi表示特征向量,ni表示显著区域/对象的编号,SCSi表示待校准内容帧上的显著性集中度,表示显著区域/对象的横坐标,表示显著区域/对象的纵坐标。Wherein, V i represents the feature vector, n i represents the number of the salient region/object, SCS i represents the saliency concentration on the content frame to be calibrated, The horizontal coordinate representing the salient region/object, The vertical coordinate representing the salient region/object.
基于上述实施例,该装置中,所述视线追踪校准模块用于对所述待校准视线跟踪结果进行校准,具体包括:Based on the above embodiment, in the device, the sight tracking calibration module is used to calibrate the sight tracking result to be calibrated, specifically including:
将所述待输入显著信息和所述待校准视线跟踪结果输入至视线追踪校准模块,将所述偏移量包含的变换向量补偿到所述待校准视线跟踪结果上进行隐式校准,得到视线跟踪校准结果。The salient information to be input and the gaze tracking result to be calibrated are input into a gaze tracking calibration module, and the transformation vector included in the offset is compensated to the gaze tracking result to be calibrated for implicit calibration to obtain a gaze tracking calibration result.
基于上述实施例,该装置中,将所述待输入显著信息和所述待校准视线跟踪结果输入至视线追踪校准模块,将所述偏移量包含的变换向量补偿到所述待校准视线跟踪结果上进行隐式校准,得到视线跟踪校准结果,之前还包括:Based on the above embodiment, in the device, the to-be-input salient information and the to-be-calibrated gaze tracking result are input into a gaze tracking calibration module, and the transformation vector included in the offset is compensated to the to-be-calibrated gaze tracking result for implicit calibration to obtain a gaze tracking calibration result, which also includes:
获取用户与设备的相对位置;Get the relative position of the user and the device;
若所述相对位置的变化小于或等于第三预设阈值,变换向量不变;If the change in the relative position is less than or equal to a third preset threshold, the transformation vector remains unchanged;
若所述相对位置的变化大于第三预设阈值或场景切换时,更新所述变换向量。If the change in the relative position is greater than a third preset threshold or the scene is switched, the transformation vector is updated.
本发明提供的一种针对视线追踪的隐式校准装置,通过获取校准窗口内的N个待校准视线跟踪结果和M个设备当前显示内容帧;其中,N小于M;将所述设备当前显示内容帧的像素调整到预设分辨率,以得到待检测内容帧;将所述待检测内容帧输入至视觉显著检测模块,以得到视觉显著检测结果;所述视觉显著检测模块用于检测所述待检测内容帧中每个像素的显著性;将所述视觉显著检测结果输入至视觉显著度量模块,以得到待校准内容帧;所述视觉显著度量模块用于度量所述视觉显著检测结果中每个像素的显著的有效性;将所述待校准内容帧输入至显著信息提取模块,以得到待输入显著信息;所述显著信息提取模块用于从所述待校准内容帧中提取显著信息;将所述待输入显著信息和所述待校准视线跟踪结果输入至视线追踪校准模块,以得到视线跟踪校准结果;所述视线追踪校准模块用于对所述待校准视线跟踪结果进行校准。本发明利用显著性信息识别视频帧,并仅使用这些“有用的”帧执行校准,实现不需要用户配合的、更为简单的、用户体验更好的视线追踪的校准。The present invention provides an implicit calibration device for eye tracking, which obtains N eye tracking results to be calibrated and M content frames currently displayed by the device within a calibration window; wherein N is less than M; adjusts the pixels of the content frame currently displayed by the device to a preset resolution to obtain a content frame to be detected; inputs the content frame to be detected into a visual saliency detection module to obtain a visual saliency detection result; the visual saliency detection module is used to detect the saliency of each pixel in the content frame to be detected; inputs the visual saliency detection result into a visual saliency measurement module to obtain the content frame to be calibrated; the visual saliency measurement module is used to measure the saliency effectiveness of each pixel in the visual saliency detection result; inputs the content frame to be calibrated into a saliency information extraction module to obtain saliency information to be input; the saliency information extraction module is used to extract saliency information from the content frame to be calibrated; inputs the saliency information to be input and the eye tracking result to be calibrated into an eye tracking calibration module to obtain an eye tracking calibration result; the eye tracking calibration module is used to calibrate the eye tracking result to be calibrated. The present invention utilizes saliency information to identify video frames, and only uses these "useful" frames to perform calibration, thereby achieving a simpler eye tracking calibration that does not require user cooperation and provides a better user experience.
图7示例了一种电子设备的实体结构示意图,如图7所示,该电子设备可以包括:处理器(processor)710、通信接口(Communications Interface)720、存储器(memory)730和通信总线740,其中,处理器710,通信接口720,存储器730通过通信总线740完成相互间的通信。处理器710可以调用存储器730中的逻辑指令,以执行针对视线追踪的隐式校准方法,该方法包括:获取校准窗口内的N个待校准视线跟踪结果和M个设备当前显示内容帧;其中,N小于M;将所述设备当前显示内容帧的像素调整到预设分辨率,以得到待检测内容帧;将所述待检测内容帧输入至视觉显著检测模块,以得到视觉显著检测结果;所述视觉显著检测模块用于检测所述待检测内容帧中每个像素的显著性;将所述视觉显著检测结果输入至视觉显著度量模块,以得到待校准内容帧;所述视觉显著度量模块用于度量所述视觉显著检测结果中每个像素的显著的有效性;将所述待校准内容帧输入至显著信息提取模块,以得到待输入显著信息;所述显著信息提取模块用于从所述待校准内容帧中提取显著信息;将所述待输入显著信息和所述待校准视线跟踪结果输入至视线追踪校准模块,以得到视线跟踪校准结果;所述视线追踪校准模块用于对所述待校准视线跟踪结果进行校准。Figure 7 illustrates a schematic diagram of the physical structure of an electronic device. As shown in Figure 7, the electronic device may include: a processor (processor) 710, a communication interface (Communications Interface) 720, a memory (memory) 730 and a
此外,上述的存储器730中的逻辑指令可以通过软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本发明的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本发明各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、磁碟或者光盘等各种可以存储程序代码的介质。In addition, the logic instructions in the above-mentioned
又一方面,本发明还提供一种非暂态计算机可读存储介质,其上存储有计算机程序,该计算机程序被处理器执行时实现以执行上述各方法提供的针对视线追踪的隐式校准方法,该方法包括:获取校准窗口内的N个待校准视线跟踪结果和M个设备当前显示内容帧;其中,N小于M;将所述设备当前显示内容帧的像素调整到预设分辨率,以得到待检测内容帧;将所述待检测内容帧输入至视觉显著检测模块,以得到视觉显著检测结果;所述视觉显著检测模块用于检测所述待检测内容帧中每个像素的显著性;将所述视觉显著检测结果输入至视觉显著度量模块,以得到待校准内容帧;所述视觉显著度量模块用于度量所述视觉显著检测结果中每个像素的显著的有效性;将所述待校准内容帧输入至显著信息提取模块,以得到待输入显著信息;所述显著信息提取模块用于从所述待校准内容帧中提取显著信息;将所述待输入显著信息和所述待校准视线跟踪结果输入至视线追踪校准模块,以得到视线跟踪校准结果;所述视线追踪校准模块用于对所述待校准视线跟踪结果进行校准。以上所描述的装置实施例仅仅是示意性的,其中所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。本领域普通技术人员在不付出创造性的劳动的情况下,即可以理解并实施。On the other hand, the present invention also provides a non-transitory computer-readable storage medium having a computer program stored thereon, which, when executed by a processor, is implemented to execute the implicit calibration method for eye tracking provided by the above methods, the method comprising: obtaining N eye tracking results to be calibrated and M device current display content frames within a calibration window; wherein N is less than M; adjusting the pixels of the device current display content frame to a preset resolution to obtain a content frame to be detected; inputting the content frame to be detected into a visual saliency detection module to obtain a visual saliency detection result; the visual saliency detection module is used to detect each of the content frames to be detected. The saliency of each pixel; the visual saliency detection result is input into the visual saliency measurement module to obtain the content frame to be calibrated; the visual saliency measurement module is used to measure the saliency effectiveness of each pixel in the visual saliency detection result; the content frame to be calibrated is input into the salient information extraction module to obtain the salient information to be input; the salient information extraction module is used to extract salient information from the content frame to be calibrated; the salient information to be input and the sight tracking result to be calibrated are input into the sight tracking calibration module to obtain the sight tracking calibration result; the sight tracking calibration module is used to calibrate the sight tracking result to be calibrated. The device embodiments described above are merely schematic, wherein the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the scheme of this embodiment. Ordinary technicians in this field can understand and implement it without creative labor.
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到各实施方式可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件。基于这样的理解,上述技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品可以存储在计算机可读存储介质中,如ROM/RAM、磁碟、光盘等,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行各个实施例或者实施例的某些部分所述的方法。Through the description of the above implementation methods, those skilled in the art can clearly understand that each implementation method can be implemented by means of software plus a necessary general hardware platform, and of course, it can also be implemented by hardware. Based on this understanding, the above technical solution is essentially or the part that contributes to the prior art can be embodied in the form of a software product, and the computer software product can be stored in a computer-readable storage medium, such as ROM/RAM, a disk, an optical disk, etc., including a number of instructions for a computer device (which can be a personal computer, a server, or a network device, etc.) to execute the methods described in each embodiment or some parts of the embodiments.
最后应说明的是:以上实施例仅用以说明本发明的技术方案,而非对其限制;尽管参照前述实施例对本发明进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本发明各实施例技术方案的精神和范围。Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention, rather than to limit it. Although the present invention has been described in detail with reference to the aforementioned embodiments, those skilled in the art should understand that they can still modify the technical solutions described in the aforementioned embodiments, or make equivalent replacements for some of the technical features therein. However, these modifications or replacements do not deviate the essence of the corresponding technical solutions from the spirit and scope of the technical solutions of the embodiments of the present invention.
Claims (10)
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202310263502.2A CN116430992A (en) | 2023-03-10 | 2023-03-10 | Implicit calibration method and device for gaze tracking |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202310263502.2A CN116430992A (en) | 2023-03-10 | 2023-03-10 | Implicit calibration method and device for gaze tracking |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| CN116430992A true CN116430992A (en) | 2023-07-14 |
Family
ID=87088159
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN202310263502.2A Pending CN116430992A (en) | 2023-03-10 | 2023-03-10 | Implicit calibration method and device for gaze tracking |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN116430992A (en) |
Cited By (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN117898675A (en) * | 2024-01-19 | 2024-04-19 | 南京励翱科技有限公司 | Intelligent screening system, device and storage medium for Alzheimer disease |
| CN119126973A (en) * | 2024-08-21 | 2024-12-13 | 上海联影智元医疗科技有限公司 | Eye movement correction method, device, system and computer equipment |
-
2023
- 2023-03-10 CN CN202310263502.2A patent/CN116430992A/en active Pending
Cited By (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN117898675A (en) * | 2024-01-19 | 2024-04-19 | 南京励翱科技有限公司 | Intelligent screening system, device and storage medium for Alzheimer disease |
| CN119126973A (en) * | 2024-08-21 | 2024-12-13 | 上海联影智元医疗科技有限公司 | Eye movement correction method, device, system and computer equipment |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| Sitzmann et al. | Saliency in VR: How do people explore virtual environments? | |
| CN109086726B (en) | Local image identification method and system based on AR intelligent glasses | |
| US10380421B2 (en) | Iris recognition via plenoptic imaging | |
| US10635167B2 (en) | Smooth pursuit gaze tracking | |
| Steil et al. | Fixation detection for head-mounted eye tracking based on visual similarity of gaze targets | |
| Sugano et al. | Calibration-free gaze sensing using saliency maps | |
| US10109056B2 (en) | Method for calibration free gaze tracking using low cost camera | |
| US10488195B2 (en) | Curated photogrammetry | |
| US9189082B2 (en) | Enhanced handheld screen-sensing pointer | |
| JP2016515242A (en) | Method and apparatus for gazing point estimation without calibration | |
| US20140196082A1 (en) | Comment information generating apparatus and comment information generating method | |
| CN108960045A (en) | Eyeball tracking method, electronic device and non-transitory computer readable recording medium | |
| CN116430992A (en) | Implicit calibration method and device for gaze tracking | |
| US11749141B2 (en) | Information processing apparatus, information processing method, and recording medium | |
| Chen et al. | 3D face reconstruction and gaze tracking in the HMD for virtual interaction | |
| CN111950401B (en) | Method, image processing system, device and medium for determining position of key point area | |
| CN114610150B (en) | Image processing method and device | |
| US11615767B2 (en) | Information processing apparatus, information processing method, and recording medium | |
| CN111429338A (en) | Method, apparatus, device and computer-readable storage medium for processing video | |
| CN106919246A (en) | The display methods and device of a kind of application interface | |
| Shi et al. | SalientGaze: Saliency-based gaze correction in virtual reality | |
| CN113780414B (en) | Eye movement behavior analysis method, image rendering method, component, device and medium | |
| CN109040604A (en) | Image processing method, device, storage medium and mobile terminal | |
| Lei et al. | Quantifying the impact of motion on 2d gaze estimation in real-world mobile interactions | |
| KR102305880B1 (en) | User's Gaze Tracking Method, and Medium Being Recorded with Program for Executing the Method |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination |