CN104346601B - Object identifying method and equipment - Google Patents
Object identifying method and equipment Download PDFInfo
- Publication number
- CN104346601B CN104346601B CN201310320936.8A CN201310320936A CN104346601B CN 104346601 B CN104346601 B CN 104346601B CN 201310320936 A CN201310320936 A CN 201310320936A CN 104346601 B CN104346601 B CN 104346601B
- Authority
- CN
- China
- Prior art keywords
- object properties
- pair
- subject area
- attribute
- face
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/161—Detection; Localisation; Normalisation
- G06V40/165—Detection; Localisation; Normalisation using facial parts and geometric relationships
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/168—Feature extraction; Face representation
- G06V40/171—Local features and components; Facial parts ; Occluding parts, e.g. glasses; Geometrical relationships
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/174—Facial expression recognition
- G06V40/175—Static expression
Landscapes
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Oral & Maxillofacial Surgery (AREA)
- General Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- Human Computer Interaction (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Theoretical Computer Science (AREA)
- Geometry (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Image Analysis (AREA)
Abstract
Description
技术领域technical field
本发明涉及用于图像中的对象识别的方法和设备。更具体而言,本发明涉及用于识别图像中的对象区域的对象属性的方法和设备。The present invention relates to methods and devices for object recognition in images. More specifically, the present invention relates to methods and devices for identifying object attributes of object regions in images.
背景技术Background technique
近年来,图像中的对象检测/识别普及地应用于图像处理、计算机视觉和模式识别的领域,并且在其中起到重要的作用,对象可以是人脸、手、身体等中的任一种。In recent years, object detection/recognition in images has been widely used in the fields of image processing, computer vision, and pattern recognition, and plays an important role in them. The object can be any of human face, hand, body, etc.
一种常见的对象检测/识别是检测和识别图像中的脸部。在脸部识别中,通常实现包含至少一个脸部图像的图像中的每一脸部的属性(诸如,表情)的识别,并且存在多种用于实现这样的脸部识别的技术。A common type of object detection/recognition is to detect and recognize faces in images. In face recognition, recognition of an attribute such as an expression of each face in an image containing at least one face image is generally achieved, and there are various techniques for realizing such face recognition.
下文将以图像中包含的脸部的脸部表情识别为例来解释现有技术中的用于图像中的脸部属性识别的当前技术。用于脸部表情识别的方法的基本原理遵循图1所示的框架。The following will take the facial expression recognition of a face contained in an image as an example to explain the current technology for facial attribute recognition in an image in the prior art. The basic principle of the method for facial expression recognition follows the framework shown in Fig. 1.
更具体而言,对于输入的脸部图像,脸部表情识别方法首先获得图像中包含的脸部区域(脸部检测),然后根据在脸部区域中提取的脸部特征点将可能处于不同姿态的对应于该脸部区域的脸部对齐(脸部配准)。然后,该方法提取经对齐的脸部图像的特征(特征提取),并且最终根据所提取的特征确定脸部的对应于该脸部区域的表情。More specifically, for the input face image, the facial expression recognition method first obtains the face area contained in the image (face detection), and then according to the facial feature points extracted in the face area, it may be in different poses The face alignment corresponding to the face region (face registration). Then, the method extracts features of the aligned face images (feature extraction), and finally determines the expression of the face corresponding to the face region from the extracted features.
对于特征提取,一些方法关注于脸部图像中的显著区域(salient region),这里如图2所示,显著区域指的是脸部图像中的通常被视为代表脸部的特性部分的区域(诸如眼睛区域、鼻子区域、嘴部区域等)。For feature extraction, some methods focus on the salient region (salient region) in the face image, as shown in Figure 2 here, the salient region refers to the region in the face image that is usually considered to represent the characteristic part of the face ( such as the eye area, nose area, mouth area, etc.).
在这样的情况中,分别提取四个显著区域的特征(即,左眼区域特征f左眼,右眼区域特征f右眼,鼻子区域特征f鼻子,和嘴部区域特征f嘴),并且通过将这四个显著区域特征连结到一起来表示脸部的特征(f总),从而,In such a case, the features of four salient regions (i.e., the left-eye region feature flefteye , the right-eye region feature frighteye , the nose region feature fnose , and the mouth region feature fmouth ) are respectively extracted, and passed These four salient region features are concatenated to represent the features of the face ( ftotal ), thus,
f总=f左眼+f右眼+f鼻子+f嘴 f total = f left eye + f right eye + f nose + f mouth
特征f总被用于预测对应于脸部图像的脸部的表情。The feature f is always used to predict the expression of a face corresponding to a face image.
通常,这样的基于脸部区域中的显著区域的方法提取显著区域的特征而不是脸部图像的整个区域的特征,然后根据所提取的特征来预测脸部的表情,如示出现有技术中的基于脸部图像中的显著区域的脸部表情识别的流程图的图3的左部所示。图3的右部示意性地示出这样的基于显著区域的脸部表情识别方法的示例,其中,在检测到脸部图像中的若干脸部特征点之后,四个显著区域(即,左眼区域、右眼区域、鼻子区域和嘴部区域)被相应地定位。Usually, such methods based on salient regions in the face region extract the features of the salient regions instead of the features of the entire region of the face image, and then predict the facial expression according to the extracted features, as shown in the prior art The flow chart of facial expression recognition based on salient regions in facial images is shown in the left part of FIG. 3 . The right part of Fig. 3 schematically shows an example of such a facial expression recognition method based on salient regions, wherein, after detecting several facial feature points in the face image, four salient regions (i.e., left eye region, right eye region, nose region and mouth region) are positioned accordingly.
Industrial Technology Research Institute(TW)名下的美国专利申请US2012/0169895A1公开了一种用于基于脸部图像中的显著区域捕获脸部表情的方法。该方法分别从四个显著区域捕获图像中的人脸的显著区域特征以生成目标特征向量,然后将该目标特征向量与多个先前存储的特征向量进行比较以生成参数值。当参数值高于阈值时,该方法选择图像之一作为目标图像。基于该目标图像,可进一步执行脸部表情识别和分类过程。例如,识别目标图像以获得脸部表情状态,并且根据脸部表情状态对图像进行分类。US Patent Application US2012/0169895A1 in the name of Industrial Technology Research Institute (TW) discloses a method for capturing facial expressions based on salient regions in facial images. The method captures salient region features of the face in an image from four salient regions respectively to generate a target feature vector, and then compares the target feature vector with a plurality of previously stored feature vectors to generate parameter values. When the parameter value is above a threshold, the method selects one of the images as the target image. Based on this target image, further facial expression recognition and classification processes can be performed. For example, target images are recognized to obtain facial expression states, and images are classified according to the facial expression states.
作为显著区域的替代,其它类型的脸部图像的代表性区域可被用于进行脸部属性识别。Instead of salient regions, representative regions of other types of facial images can be used for facial attribute recognition.
Mitsubishi electric research laboratories,INC名下的美国专利申请US2010/0111375A1公开了一种基于脸部图像中包含的子块(patch)的集合来识别图像中的脸部属性的方法。更具体而言,该方法将脸部图像分割成一组子块,并且将每个子块与原型子块逐一比较以确定匹配的原型子块,并且根据与匹配的原型子块相关联的属性集合来确定脸部的一组属性。这里,在该方法中提取的该子块集合可等同于显著区域中的各部分。US patent application US2010/0111375A1 in the name of Mitsubishi electric research laboratories, INC discloses a method for identifying facial attributes in an image based on a collection of patches contained in the facial image. More specifically, the method divides the face image into a set of sub-blocks, and compares each sub-block with the prototype sub-block one by one to determine the matching prototype sub-block, and according to the attribute set associated with the matching prototype sub-block Determines a set of attributes for a face. Here, the set of sub-blocks extracted in the method may be equivalent to each part in the salient region.
Renesas Electronics Corporation名下的美国专利申请US2012/0076418A1公开了一种脸部属性估计方法和设备。该方法从脸部区域中提取特定区域,并且设定该特定区域中的小区域。然后,该方法利用相似性计算方法来逐一计算该小区域与所存储的脸部组成部分中的每一个之间的相似性,以确定脸部属性。这里,除了特定区域的数量之外,在该方法中使用的特定区域可等同于显著区域。The US patent application US2012/0076418A1 in the name of Renesas Electronics Corporation discloses a face attribute estimation method and device. This method extracts a specific area from a face area, and sets a small area in the specific area. Then, the method uses a similarity calculation method to calculate the similarity between the small area and each of the stored face components one by one to determine the face attribute. Here, in addition to the number of specific regions, the specific regions used in this method may be equivalent to salient regions.
现有技术中的上述方法通常从显著区域或者其等同区域(诸如,脸部图像中的多个子块或者一个小的特定区域)提取特征,并且将所提取的特征与对应于多个已知的脸部属性的一组预先定义的特征中的每一个进行比较(即,一对一比较),以便进行脸部属性识别。The above-mentioned methods in the prior art usually extract features from salient regions or their equivalent regions (such as multiple sub-blocks or a small specific region in a face image), and compare the extracted features with those corresponding to multiple known Each of a set of predefined features of facial attributes is compared (ie, one-to-one comparison) for facial attribute recognition.
此外,要被识别的脸部图像中的被定位的显著区域或等同区域在识别期间没有改变,因此在识别期间对于所有的比较,仅存在一个且恒定的得自脸部图像的特征向量。即,仅有一个来自脸部图像的特征向量被用于与对应于多个已知脸部属性的多个先前存储的特征向量中的每一个进行比较。Furthermore, the located salient regions or equivalent regions in the face image to be recognized do not change during recognition, so there is only one and constant feature vector from the face image for all comparisons during recognition. That is, only one feature vector from the facial image is used for comparison with each of a plurality of previously stored feature vectors corresponding to a plurality of known facial attributes.
但是,在识别期间对于所有的一对一比较使用要被识别的脸部区域的一个恒定特征可能不够高效以至于不能准确地识别脸部区域。However, using one constant feature of the face region to be recognized for all one-to-one comparisons during recognition may not be efficient enough to identify the face region accurately.
应指出,一些显著区域可能对于一些类型的表情不具有区别性(discriminative)。例如,对于悲伤表情和中性表情,鼻子区域就不存在很大的区别,因此,鼻子区域对于悲伤表情和中性表情的识别不具有区别性。另一个问题是显著区域中的一些部分不具有区别性。例如,对于悲伤表情和中性表情,眼睛区域的眉毛部分不具有区别性。也就是说,如果所定位的显著区域以及由此从该区域提取的特征对于与预先定义的脸部属性的集合的比较是恒定的,则一些区域以及区域的一些部分可能对于在一些表情对中的一些类型的表情的识别是冗余的。It should be noted that some salient regions may not be discriminative for some types of expressions. For example, for sad and neutral expressions, there is no great difference in the nose area, therefore, the nose area is not discriminative for sad and neutral expressions. Another problem is that some parts of the salient region are not discriminative. For example, the eyebrow part in the eye area is not discriminative for sad and neutral expressions. That is, if the located salient region and thus the features extracted from this region are constant for comparison with a predefined set of face attributes, some regions and some parts of the regions may be important for being in some expression pairs The recognition of some types of expressions is redundant.
如上所述,仍需要一种能够基于来自图像中的脸部区域中的更具区别性的特征准确识别脸部区域的属性的方法。As noted above, there remains a need for a method that can accurately identify attributes of face regions based on more distinctive features from face regions in images.
发明内容Contents of the invention
本发明是针对于图像中的对象的识别被开发的,并且旨在解决如上所述的问题。The present invention was developed for the recognition of objects in images and aims to solve the problems described above.
根据本发明的一个方面,提供了一种用于识别图像中的对象区域的方法,该方法包含提取步骤,用于对于预先定义的对象属性的集合中的每一对象属性对,基于该对象属性对的相异性提取对象区域的对应于该对象属性对的特征;以及识别步骤,用于基于所提取的对象区域的特征识别对象区域的对象属性。According to one aspect of the present invention, there is provided a method for identifying object regions in an image, the method comprising an extraction step for, for each object attribute pair in a predefined set of object attributes, based on the object attribute The dissimilarity of the pair extracts features of the object region corresponding to the pair of object attributes; and an identifying step for identifying object attributes of the object region based on the extracted features of the object region.
根据本发明的另一方面,提供了一种用于识别图像中的对象区域的设备,包含:提取单元,被配置用于对于预先定义的对象属性的集合中的每一对象属性对,基于该对象属性对的相异性提取对象区域的对应于该对象属性对的特征;以及识别单元,被配置用于基于所提取的对象区域的特征识别对象区域的对象属性。According to another aspect of the present invention, there is provided an apparatus for identifying an object region in an image, comprising: an extraction unit configured for each object attribute pair in a set of predefined object attributes, based on the The dissimilarity of object attribute pairs extracts features of the object region corresponding to the object attribute pairs; and a recognition unit configured to recognize object attributes of the object region based on the extracted features of the object region.
根据本发明的方法和设备对于预先定义的对象属性的集合中的每一对象属性对,基于该对象属性对的相异性提取对象区域的对应于该对象属性对的特征,并且将该特征用于对象识别。因此,识别效率和准确率可提高。According to the method and device of the present invention, for each object attribute pair in the set of predefined object attributes, based on the dissimilarity of the object attribute pair, the feature of the object area corresponding to the object attribute pair is extracted, and the feature is used for object recognition. Therefore, recognition efficiency and accuracy can be improved.
参照附图阅读示例性实施例的以下说明,本发明的其它特征将变得十分明显。Other features of the present invention will become apparent from the following description of exemplary embodiments read with reference to the accompanying drawings.
附图说明Description of drawings
并入说明书中并且构成说明书的一部分的附图示出了本发明的实施例,并且与描述一起用于解释本发明的原理。在附图中,相似的附图标记指示相似的项目。The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate the embodiments of the invention and together with the description serve to explain the principles of the invention. In the drawings, like reference numerals indicate like items.
图1示出现有技术中的脸部表情识别的典型过程。Fig. 1 shows a typical process of facial expression recognition in the prior art.
图2示出脸部中的典型显著区域。Figure 2 shows typical salient regions in a face.
图3是示出现有技术中的脸部表情识别方法的流程图。FIG. 3 is a flowchart showing a facial expression recognition method in the prior art.
图4是示出可实现本发明的实施例的计算机系统的示例性硬件配置的框图。FIG. 4 is a block diagram showing an exemplary hardware configuration of a computer system that can implement an embodiment of the present invention.
图5是示出根据本发明的对象属性识别方法的流程图。FIG. 5 is a flowchart illustrating an object attribute recognition method according to the present invention.
图6是示出根据本发明的对象属性识别设备的框图。FIG. 6 is a block diagram showing an object attribute recognition device according to the present invention.
图7是解释脸部图像中的脸部区域的示图。FIG. 7 is a diagram for explaining face regions in a face image.
图8示意性地示出脸部区域中的特征点。Fig. 8 schematically shows feature points in the face area.
图9是示出提取步骤中的过程的流程图。Fig. 9 is a flowchart showing the procedure in the extraction step.
图10示意性地示出脸部区域中的器官区域的定位。FIG. 10 schematically shows the localization of organ regions in the face region.
图11示意性地示出脸部表情对的示例。Fig. 11 schematically shows an example of facial expression pairs.
图12是示意性地示出脸部表情对的模板的确定的流程图。FIG. 12 is a flowchart schematically showing determination of templates for facial expression pairs.
图13示出若干示例性平均图像。Figure 13 shows several exemplary averaged images.
图14示出对于脸部表情对中的每一表情的被相应分割的图像。Fig. 14 shows correspondingly segmented images for each expression in a pair of facial expressions.
图15示出从对于脸部表情对中的每一表情的分割图像获得的脸部表情对的模板。FIG. 15 shows templates of facial expression pairs obtained from segmented images for each of the facial expression pairs.
图16示出依赖于脸部表情对的模板的针对脸部表情对的脸部区域中的相异像素块的定位。Fig. 16 shows the positioning of distinct pixel blocks in the face regions of a facial expression pair depending on the template of the facial expression pair.
图17是示出特征提取步骤中的过程的流程图。Fig. 17 is a flowchart showing the procedure in the feature extraction step.
图18是示出识别步骤的一种实现中的过程的流程图。Figure 18 is a flowchart illustrating the process in one implementation of the identifying step.
图19是示出识别步骤的另一种实现中的过程的流程图。Fig. 19 is a flowchart showing the procedure in another implementation of the identifying step.
具体实施方式Detailed ways
下文将参照附图详细描述本发明的实施例。Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings.
应注意,在附图中相似的附图标记和字母指示相似的项目,并且因此一旦一个项目在一个附图中被定义,则对于随后的附图无需再对其进行论述。It should be noted that like reference numerals and letters designate like items in the drawings, and thus once an item is defined in one figure, it need not be discussed for subsequent figures.
首先将解释本公开的上下文中所使用的某些术语的含义。First, the meaning of certain terms used in the context of the present disclosure will be explained.
在本公开的上下文中,图像将指的是多种类型的图像,诸如彩色图像、灰度图像等。由于本发明的处理主要针对灰度图像执行,因此除非另外声明,否则本公开中的图像将指的是包含多个像素的灰度图像。In the context of this disclosure, an image will refer to various types of images, such as color images, grayscale images, and the like. Since the processing of the present invention is mainly performed on grayscale images, an image in this disclosure will refer to a grayscale image including a plurality of pixels unless otherwise stated.
应指出,本发明的解决方案也可应用于其它类型的图像(诸如彩色图像),只要这样的图像可被转换成灰度图像并且本发明的处理可针对经转换的灰度图像执行即可。It should be noted that the inventive solution is also applicable to other types of images, such as color images, as long as such images can be converted into grayscale images and the inventive processing can be performed on the converted grayscale images.
图像通常可包含至少一个对象图像,并且对象图像通常包含对象区域,因此在本公开的上下文中,对象图像和对象区域彼此等同并且可替代地使用。常见的图像中的对象是图像中的脸部。An image may typically contain at least one object image, and an object image typically contains an object region, so in the context of this disclosure, object image and object region are equivalent to each other and are used interchangeably. A common in-image object is a face in an image.
图像中的对象区域的特征通常是代表这样的对象区域的特性的特征,并且通常可以是颜色特征、纹理特征、形状特征等。常用的特征是颜色特征,其是代表图像的全局性特征并且通常可通过基于各颜色区段(color bin)的颜色直方图被获得。图像的特征通常被以向量的形式获得,向量的每一成分对应于一颜色区段。A feature of an object region in an image is typically a feature representative of the characteristics of such an object region, and may typically be a color feature, a texture feature, a shape feature, or the like. A commonly used feature is a color feature, which is a global feature representing an image and can usually be obtained through a color histogram based on each color bin. Image features are usually obtained in the form of vectors, each component of which corresponds to a color segment.
对象属性指的是可对应于不同条件的对象的表观状态,并且对象属性可属于不同的类别。以脸部为例,脸部属性的类别可以是选自包含脸部表情、当脸部为人脸时对应于该脸部的人的性别以及人的年龄的组中的一种,脸部属性类别不因此受限,并且还可以是其它类别。当脸部属性对应于脸部表情时,脸部属性可以是一种表情(例如,悲伤、微笑、大笑等)。Object properties refer to apparent states of objects that may correspond to different conditions, and object properties may belong to different categories. Taking the face as an example, the category of the facial attribute can be selected from the group containing facial expressions, the gender of the person corresponding to the face when the face is a human face, and the age of the person. Not limited thereby, and may be other categories as well. When the facial attribute corresponds to a facial expression, the facial attribute may be an expression (eg, sad, smiling, laughing, etc.).
当然,对象属性并不因此受限,例如,对象可以是人身体,并且对象属性可对应于当人在奔跑、站立、下跪或者平躺等时的不同身体状态。Of course, the object attributes are not limited thereto. For example, the object may be a human body, and the object attributes may correspond to different body states when the person is running, standing, kneeling, or lying down.
对象属性对是由包含于预先定义的对象属性的集合中的任何预先定义的数量的对象属性构成的对,在该集合中所有对象属性可在某一类别集合中进行区分,并且该集合可被预先制备,该预先定义的对象属性的集合可形成至少一个对象属性对,各对象属性对具有相同数量的对象属性。An object attribute pair is a pair of any predefined number of object attributes contained in a predefined set of object attributes in which all object attributes are distinguishable in a certain set of categories and which can be identified by Pre-prepared, the predefined set of object attributes may form at least one object attribute pair, and each object attribute pair has the same number of object attributes.
对象属性对中包含的对象属性可被从该预先定义的对象属性的集合中任意选择,并且在这样的情况中,该预先定义的对象属性的集合可包含Cn t个对象属性对,其中n是该集合中的对象属性的数量,并且t是对象属性对中包含的对象属性的数量。The object attributes contained in the object attribute pairs may be arbitrarily selected from the predefined set of object attributes, and in such a case, the predefined set of object attributes may contain C n t object attribute pairs, where n is the number of object properties in the set, and t is the number of object properties contained in the object property pair.
优选地,对象属性对中包含的对象属性的数量可以是2。Preferably, the number of object attributes contained in an object attribute pair may be two.
优选地,对象属性对中的对象属性可以是如下这样的对象属性,即该对象属性之间的差别大并且甚至是相反的。例如,以脸部为例,对象属性对可特别地由大笑表情和哭表情构成,从而对于这样的对象属性对所提取的部分更加具有区别性。Preferably, the object attributes in an object attribute pair may be object attributes that differ greatly and are even opposite. For example, taking a face as an example, the object attribute pair may be specifically composed of a laughing expression and a crying expression, so that the extracted parts of such an object attribute pair are more distinctive.
在本公开中,术语“第一”、“第二”等仅仅用于区分元素或者步骤,而不是要指示时间顺序、优先选择或者重要性。In the present disclosure, the terms "first", "second", etc. are only used to distinguish elements or steps, and are not intended to indicate chronological order, priority or importance.
图4是示出可实施本发明的实施例的计算机系统1000的硬件配置的框图。FIG. 4 is a block diagram showing a hardware configuration of a computer system 1000 in which an embodiment of the present invention can be implemented.
如图4所示,计算机系统包括计算机1110。计算机1110包括处理单元1120、系统存储器1130、不可移除非易失性存储器接口1140、可移除非易失性存储器接口1150、用户输入接口1160、网络接口1170、视频接口1190、和输出外围接口1195,它们通过系统总线1121连接。As shown in FIG. 4 , the computer system includes a computer 1110 . Computer 1110 includes processing unit 1120, system memory 1130, non-removable non-volatile memory interface 1140, removable non-volatile memory interface 1150, user input interface 1160, network interface 1170, video interface 1190, and output peripheral interface 1195, which are connected through the system bus 1121.
系统存储器1130包括ROM(只读存储器)1131和RAM(随机存取存储器)1132。BIOS(基本输入输出系统)1133驻留在ROM1131中。操作系统1134、应用程序1135、其它程序模块1136和一些程序数据1137驻留在RAM1132中。The system memory 1130 includes a ROM (Read Only Memory) 1131 and a RAM (Random Access Memory) 1132 . A BIOS (Basic Input Output System) 1133 resides in the ROM 1131 . Operating system 1134 , application programs 1135 , other program modules 1136 , and some program data 1137 reside in RAM 1132 .
不可移除非易失性存储器1141(诸如硬盘)连接到不可移除非易失性存储器接口1140。不可移除非易失性存储器1141例如可存储操作系统1144、应用程序1145、其它程序模块1146以及一些程序数据1147。A non-removable non-volatile memory 1141 such as a hard disk is connected to the non-removable non-volatile memory interface 1140 . The non-removable non-volatile memory 1141 can store an operating system 1144 , application programs 1145 , other program modules 1146 and some program data 1147 , for example.
可移除非易失性存储器(例如软盘驱动器1151和CD-ROM驱动器1155)连接到可移除非易失性存储器接口1150。例如,软盘1152可插入软盘驱动器1151,并且CD(紧致盘)1156可插入CD-ROM驱动器1155。Removable nonvolatile memory such as a floppy disk drive 1151 and a CD-ROM drive 1155 is connected to the removable nonvolatile memory interface 1150 . For example, a floppy disk 1152 can be inserted into the floppy disk drive 1151 , and a CD (Compact Disc) 1156 can be inserted into the CD-ROM drive 1155 .
诸如鼠标1161和键盘1162的输入设备连接到用户输入接口1160。Input devices such as a mouse 1161 and a keyboard 1162 are connected to the user input interface 1160 .
计算机1110可通过网络接口1170连接到远程计算机1180。例如,网络接口1170可经局域网1171连接到远程计算机1180。可替换地,网络接口1170可连接到调制解调器(调制器-解调器)1172,并且调制解调器1172经广域网1173连接到远程计算机1180。The computer 1110 can be connected to a remote computer 1180 through a network interface 1170 . For example, network interface 1170 may be connected to remote computer 1180 via local area network 1171 . Alternatively, network interface 1170 may be connected to modem (modulator-demodulator) 1172 , and modem 1172 is connected to remote computer 1180 via wide area network 1173 .
远程计算机1180可包括诸如硬盘的存储器1181,其存储远程应用程序1185。Remote computer 1180 may include memory 1181 , such as a hard disk, that stores remote application programs 1185 .
视频接口1190连接到监视器1191。The video interface 1190 is connected to a monitor 1191 .
输出外围接口1195连接到打印机1196和扬声器1197。The output peripheral interface 1195 is connected to a printer 1196 and a speaker 1197 .
图4所示的计算机系统仅是说明性的,并且决不打算限制本发明、其应用或者使用。The computer system shown in Figure 4 is illustrative only and is in no way intended to limit the invention, its application or use.
图4所示的计算机系统可对于任一实施例被实现为孤立计算机,或者设备中的处理系统,其中可去除一个或多个不必要的组件或者可添加一个或多个附加的组件。The computer system shown in FIG. 4 may be implemented for either embodiment as a stand-alone computer, or as a processing system in a device, where one or more unnecessary components may be removed or one or more additional components may be added.
下文将参照图5描述根据本发明的基本实施例的对象识别方法,图5示出根据本发明的基本实施例的方法中的过程。Hereinafter, an object recognition method according to a basic embodiment of the present invention will be described with reference to FIG. 5 , which shows a process in the method according to a basic embodiment of the present invention.
在步骤S100(下文被称为提取步骤)中,对于预先定义的对象属性的集合中的每一对象属性对,基于该对象属性对的相异性(dissimilarity)提取对象区域的对应于该对象属性对的特征。In step S100 (hereinafter referred to as the extraction step), for each object attribute pair in the set of predefined object attributes, based on the dissimilarity (dissimilarity) of the object attribute pair, extract the object region corresponding to the object attribute pair Characteristics.
如上所述,该预先定义的对象属性的集合的所有对象属性属于同一类别,并且对象属性对可由该预先定义的对象属性的集合中包含的任何预定数量的(诸如,两个)对象属性构成。As mentioned above, all object attributes of the predefined set of object attributes belong to the same category, and an object attribute pair may consist of any predetermined number (such as two) of object attributes contained in the predefined set of object attributes.
作为替代,对象属性对可以是在它们之间满足预定关系的预定数量的(诸如,两个)对象属性。Alternatively, the object attribute pair may be a predetermined number (such as two) of object attributes satisfying a predetermined relationship between them.
在一种实现中,对象区域可以是已经被对齐(align)的对象区域,并且对象区域的对齐可被以多种方式(诸如基于在对象区域中检测到的特征点)实现。应指出,对象区域是否对齐对于提取操作的实现而言不是必需的。In one implementation, the object region may be an object region that has been aligned, and the alignment of the object region may be achieved in various ways, such as based on feature points detected in the object region. It should be noted that whether the object regions are aligned or not is not essential to the implementation of the extraction operation.
在步骤S200(下文被称为识别步骤)中,基于所提取的对象区域的特征识别对象区域的对象属性。In step S200 (hereinafter referred to as an identifying step), object attributes of the object area are identified based on the extracted features of the object area.
在一种实现中,提取步骤中的过程可包含用于定位该对象区域中的对应于该对象属性对的模板的至少一个块的过程(定位步骤),该模板表征该对象属性对之间的相异性;以及用于基于所定位的至少一个块提取该对象区域的对应于该对象属性对的特征的过程(特征提取步骤)。In one implementation, the process in the extraction step may comprise a process for locating at least one block in the object region corresponding to a template of the object attribute pair (locating step), the template characterizing the relationship between the object attribute pair dissimilarity; and a process for extracting features of the object region corresponding to the object attribute pair based on the located at least one block (feature extraction step).
这里,模板可被视为对象属性对的表征对象属性之间的相异性的相异性模板,并且通常由对象属性对中包含的对象属性的图像之间的至少一个相异像素块形成。实际上,每一相异像素块可对应于对象属性对中包含的预定数量的对象属性的图像之间的对应像素块,该对应像素块位于各图像的对应位置并且具有对应的大小,其中各个图像中的相异像素块的位置和大小可依赖于预定的规则(诸如,在各个图像具有不同的大小时依赖于各个对象属性的图像的大小之间的比率)彼此映射。Here, the template can be regarded as a dissimilarity template representing the dissimilarity between the object attributes of the object attribute pair, and is generally formed by at least one dissimilar pixel block between images of the object attributes contained in the object attribute pair. In practice, each distinct pixel block may correspond to a corresponding pixel block between images of a predetermined number of object attributes contained in an object attribute pair, the corresponding pixel blocks are located at corresponding positions and have corresponding sizes in each image, wherein each The positions and sizes of distinct pixel blocks in an image may be mapped to each other depending on a predetermined rule, such as a ratio between sizes of images depending on respective object attributes when respective images have different sizes.
优选地,对象区域的图像以及对象属性对中包含的对象属性可被预处理(诸如,被对齐),以便具有相同大小,并且在此情况下,模板中的相异像素块中的每一个可对应于对象属性对中包含的预定数量的对象属性的图像之间的对应像素块,该对应像素块位于各个图像中的相同位置并且具有相同大小。Preferably, the image of the object region and the object attributes contained in the object attribute pairs may be pre-processed, such as aligned, so as to have the same size, and in this case, each of the distinct pixel blocks in the template may Corresponding pixel blocks between images corresponding to a predetermined number of object attributes included in the object attribute pairs are located at the same position in the respective images and have the same size.
因此,从该对象区域中针对该对象属性对定位的至少一个块可以是根据相异像素块的这样的位置和大小而被定位的像素块,只要像素块可根据预定的规则相互映射即可,并且优选地该像素块具有相同的位置和大小。Therefore, at least one block positioned for the object attribute pair from the object region may be a pixel block positioned according to such positions and sizes of different pixel blocks, as long as the pixel blocks can be mapped to each other according to a predetermined rule, And preferably the pixel blocks have the same position and size.
每一像素块的大小可被任意设定,而不会影响本发明的解决方案的实现。The size of each pixel block can be set arbitrarily without affecting the implementation of the solution of the present invention.
在一种实现中,对象属性对的模板可通过如下方式实现:将分别对应于该对象属性对中包含的两个对象属性的两个平均对象区域图像划分成彼此对应的多个块;提取与每一对象属性对应的每一被划分的平均对象区域图像的多个块中的每一个的特征;确定这两个被划分的平均对象区域图像中的对应块的特征之间的相似性;并且选择这两个被划分的平均对象区域图像中的这样的块以形成模板,该块之间的相似性低于预先定义的阈值。In one implementation, the template of an object attribute pair can be realized by dividing two average object region images respectively corresponding to the two object attributes included in the object attribute pair into a plurality of blocks corresponding to each other; extracting and a feature of each of the plurality of blocks of each divided average object area image corresponding to each object attribute; determining a similarity between features of corresponding blocks in the two divided average object area images; and Blocks in the two divided average object area images are selected to form a template, the similarity between which blocks is below a pre-defined threshold.
这里,对应划分指的是对象属性对中的对象属性的各个图像可被按对应的模式划分,从而一个对象属性图像中的经划分的块中的每一个可根据预定规则被映射到另一对象属性图像中的经划分的块中的每一个。优选地,对象属性对中的对象属性的各个图像具有相同大小,因而用于各图像的划分模式相同并且具有相同的标度,从而一个对象属性图像中的经划分的块与另一对象属性图像中的对应的经划分的块具有相同的位置和大小。划分模式可以是任何模式,诸如网格等。Here, the corresponding division means that each image of the object attribute in the object attribute pair can be divided in a corresponding mode, so that each of the divided blocks in one object attribute image can be mapped to another object according to a predetermined rule Each of the divided blocks in the attribute image. Preferably, the individual images of the object attributes in an object attribute pair are of the same size, and thus the division mode for each image is the same and has the same scale, so that a divided block in one object attribute image is identical to the other object attribute image The corresponding divided blocks in have the same location and size. The division pattern can be any pattern, such as grid or the like.
对象属性对的模板可被预先制备和存储,或者可在提取操作期间被制备。获得对象属性对的模板的操作可被包含在提取步骤中,或者可不被包含在提取步骤中。Templates for object attribute pairs can be prepared and stored in advance, or can be prepared during an extraction operation. The operation of obtaining templates of object attribute pairs may or may not be included in the extraction step.
对应于对象属性的平均对象区域图像可被以多种方式预先制备,并且在一般性实现中,可通过将对应于同一对象属性的具有相同大小的多个相似对象区域图像进行平均来被制备。The average object region image corresponding to an object attribute can be pre-prepared in various ways, and in a general implementation, can be prepared by averaging a plurality of similar object region images of the same size corresponding to the same object attribute.
优选地,定位过程可基于对象区域中包含的辅助区域(auxiliary region)来执行,以便进一步提高操作效率。辅助区域可被以多种方式(诸如,依赖于对象区域中的被标识的特征点的位置)定位。在这样的情况中,定位过程可定位辅助区域中的对应于表征对象属性对的相异性的模板的至少一个块,并且表征对象属性对的相异性的模板也可基于对象属性对中的对象属性的图像中的这样的辅助区域被确定,而不是基于对象属性对中的对象属性的图像的整体被确定。Preferably, the positioning process can be performed based on an auxiliary region (auxiliary region) included in the object region, so as to further improve operation efficiency. The auxiliary region can be positioned in various ways, such as depending on the location of the identified feature points in the object region. In such a case, the locating process may locate at least one block in the auxiliary region corresponding to the template characterizing the dissimilarity of the object-attribute pair, and the template characterizing the dissimilarity of the object-attribute pair may also be based on the object attribute in the object-attribute pair Such auxiliary regions in the image are determined instead of the entirety of the image based on the object attributes in the object attribute pairs.
在一种实现中,特征提取过程可包含从在对象区域中定位的至少一个块中的每一个中提取特征,并且将所提取的各个块的特征连结作为对象区域的特征。因此,最终提取的特征通常表现为向量的形式,向量中的每一个分量对应于每一块。In one implementation, the feature extraction process may include extracting features from each of the at least one block located in the object region, and concatenating the extracted features of the respective blocks as features of the object region. Therefore, the final extracted features are usually expressed in the form of vectors, and each component in the vector corresponds to each block.
在识别步骤的过程中,对象属性的识别可被以多种方式实现。During the identifying step, the identification of object properties can be realized in various ways.
在一种实现中,识别可被以所谓的“一对一(one against one)”方式实现,在此方式中,对于预先定义的对象属性的集合,对象属性可在Cn t轮中进行投票,其中n是该集合中包含的对象属性的数量,并且t是对象属性对中包含的对象属性的数量并且优选地为2。具有最高得分的对象属性将被确定为对象属性。In one implementation, identification can be achieved in a so-called "one against one" fashion, where object attributes can be voted in C n t rounds for a predefined set of object attributes , where n is the number of object properties contained in the set, and t is the number of object properties contained in object property pairs and is preferably 2. The object attribute with the highest score will be determined as the object attribute.
更具体而言,该识别过程可包含标识步骤,用于对于预先定义的对象属性的集合中的每一对象属性对,基于对象区域的对应于该对象属性对的特征标识与该对象区域对应的该对象属性对中包含的两个对象属性中的一个对象属性,并且将该对象区域所对应的对象属性的得分增加预定值,其中,该预先定义的对象属性的集合中所包含的全部对象属性具有相同的初始得分;以及属性确定步骤,用于确定该预先定义的对象属性的集合中的具有最高得分的对象属性为该对象区域的对象属性。More specifically, the identification process may include an identification step for, for each object attribute pair in the predefined set of object attributes, identifying the object corresponding to the object region based on the features of the object region corresponding to the object attribute pair. One object attribute among the two object attributes included in the object attribute pair, and the score of the object attribute corresponding to the object area is increased by a predetermined value, wherein all object attributes included in the predefined object attribute set have the same initial score; and an attribute determining step, which is used to determine that the object attribute with the highest score in the set of predefined object attributes is the object attribute of the object area.
在另一种实现中,识别可被以所谓的“一胜一(one beating one)”方式实现,其中,在预先定义的对象属性的集合中包含的预先定义的对象属性的数量为n的情况下,对象属性可在n-1轮中被确定,其中在一轮中仅有对于一个对象属性对胜出的属性将前进至下一轮,并且最终胜出的属性将被确定为对象属性。In another implementation, the recognition can be implemented in a so-called "one beating one" manner, where the number of predefined object properties contained in the set of predefined object properties is n cases Here, object attributes may be determined in n-1 rounds, where only the winning attribute for one object attribute pair in a round will advance to the next round, and the final winning attribute will be determined as the object attribute.
更具体而言,该识别过程可包含标识步骤,用于对于预先定义的对象属性的集合中的一个对象属性对,基于对象区域的对应于该一个对象属性对的特征标识与该对象区域对应的该一个对象属性对中包含的两个对象属性中的一个对象属性,以及属性确定步骤,用于基于该对象区域所对应的对象属性以及该预先定义的对象属性的集合中的除该一个对象属性对之外的剩余对象属性确定该对象区域的对象属性,其中,如果剩余对象属性的数量等于0,则该对象区域所对应的对象属性被确定为该对象区域的对象属性,否则将该对象区域所对应的对象属性以及该预先定义的对象属性的集合中的除该一个对象属性对之外的剩余对象属性重新归组为新的对象属性集合,并且对于该新的对象属性集合依次执行该标识步骤和属性确定步骤。More specifically, the identification process may include an identification step for, for an object attribute pair in the set of predefined object attributes, identify the object corresponding to the object area based on the feature corresponding to the object attribute pair of the object area One object attribute among the two object attributes contained in the object attribute pair, and the attribute determining step, are used to divide the one object attribute based on the object attribute corresponding to the object area and the set of predefined object attributes Determine the object attributes of the object region for the remaining object attributes other than the pair, wherein, if the number of remaining object attributes is equal to 0, then the object attributes corresponding to the object region are determined as the object attributes of the object region, otherwise the object region The corresponding object attributes and the remaining object attributes in the predefined object attribute set except for the one object attribute pair are regrouped into a new object attribute set, and the identification is performed sequentially for the new object attribute set The steps and properties determine the steps.
应指出,上述方法可每次对于可包含至少一个对象区域的图像中的一个对象区域执行,并且可重复与对象区域的数量相同的次数,其中一个对象区域仅包含一个要被识别的对象。It should be noted that the above method can be performed each time for one object region in an image which can contain at least one object region, and can be repeated as many times as the number of object regions, wherein one object region contains only one object to be recognized.
图6是示出根据本发明的对象识别设备的框图。FIG. 6 is a block diagram showing an object recognition device according to the present invention.
用于图像中的对象区域的识别的设备600可包含提取单元601,被配置为对于预先定义的对象属性的集合中的每一对象属性对,基于该对象属性对的相异性提取对象区域的对应于该对象属性对的特征;以及识别单元602,被配置为基于所提取的对象区域的特征识别对象区域的对象属性。The apparatus 600 for identifying an object region in an image may include an extraction unit 601 configured to, for each object attribute pair in a set of predefined object attributes, extract a correspondence of the object region based on the dissimilarity of the object attribute pair the features of the object attribute pair; and the identification unit 602 is configured to identify the object attributes of the object area based on the extracted features of the object area.
优选地,提取单元601可包含定位单元601-1,被配置用于定位该对象区域中的对应于该对象属性对的模板的至少一个块,该模板表征该对象属性对之间的相异性;以及特征提取单元601-2,被配置用于基于所定位的至少一个块提取该对象区域的对应于该对象属性对的特征。Preferably, the extraction unit 601 may include a positioning unit 601-1 configured to locate at least one block corresponding to the template of the object attribute pair in the object region, the template representing the dissimilarity between the object attribute pair; And a feature extraction unit 601-2 configured to extract features of the object region corresponding to the object attribute pair based on the located at least one block.
优选地,该定位单元601-1可包含被配置用于依赖于对象区域中的被标识的特征点的位置定位对象区域中的辅助区域的单元;以及被配置用于定位辅助区域中的对应于对象属性对的表征对象属性对之间的相异性的模板的至少一个块的单元。Preferably, the positioning unit 601-1 may include a unit configured to locate the auxiliary area in the object area depending on the position of the identified feature point in the object area; and configured to locate the auxiliary area corresponding to A unit of at least one block of a template of an object attribute pair characterizing a dissimilarity between the object attribute pair.
优选地,特征提取单元601-2可包含被配置用于从对象区域中的该至少一个块中的每一个中提取特征的单元,以及被配置用于将所提取的各个块的特征连结作为对象区域的特征的单元。Preferably, the feature extraction unit 601-2 may include a unit configured to extract features from each of the at least one block in the object area, and configured to concatenate the extracted features of each block as an object The unit that characterizes the region.
优选地,该识别单元602可包含标识单元602-1,被配置用于对于预先定义的对象属性的集合中的每一对象属性对,基于对象区域的对应于该对象属性对的特征标识与该对象区域对应的该对象属性对中包含的两个对象属性中的一个对象属性,并且将该对象区域所对应的对象属性的得分增加预定值,其中,该预先定义的对象属性的集合中所包含的全部对象属性具有相同的初始得分;以及属性确定单元602-2,被配置用于确定该预先定义的对象属性的集合中的具有最高得分的对象属性为该对象区域的对象属性。Preferably, the recognition unit 602 may include an identification unit 602-1 configured to, for each object attribute pair in the set of predefined object attributes, based on the feature identification corresponding to the object attribute pair of the object area and the One of the two object attributes contained in the object attribute pair corresponding to the object area, and the score of the object attribute corresponding to the object area is increased by a predetermined value, wherein the predefined object attribute set contains All the object attributes have the same initial score; and the attribute determining unit 602-2 is configured to determine the object attribute with the highest score in the set of predefined object attributes as the object attribute of the object area.
附加地或者作为替代,该识别单元602可包含标识单元602-3,被配置用于对于预先定义的对象属性的集合中的一个对象属性对,基于对象区域的对应于该一个对象属性对的特征标识与该对象区域对应的该一个对象属性对中包含的两个对象属性中的一个对象属性,以及属性确定单元602-4,被配置用于基于该对象区域所对应的对象属性以及该预先定义的对象属性的集合中的除该一个对象属性对之外的剩余对象属性确定该对象区域的对象属性,其中,如果剩余对象属性的数量等于0,则该对象区域所对应的对象属性被确定为该对象区域的对象属性,否则将该对象区域所对应的对象属性以及该预先定义的对象属性的集合中的除该一个对象属性对之外的剩余对象属性重新归组为新的对象属性集合,并且对于该新的对象属性集合依次执行该标识操作和属性确定操作。Additionally or as an alternative, the identifying unit 602 may include an identifying unit 602-3 configured to, for an object attribute pair in a set of predefined object attributes, based on the feature of the object region corresponding to the object attribute pair Identify one object attribute among the two object attributes included in the object attribute pair corresponding to the object area, and the attribute determining unit 602-4 is configured to be based on the object attribute corresponding to the object area and the predefined The remaining object attributes in the set of object attributes except the one object attribute pair determine the object attributes of the object region, wherein, if the number of remaining object attributes is equal to 0, the object attribute corresponding to the object region is determined as The object attribute of the object area, otherwise the object attribute corresponding to the object area and the remaining object attributes in the set of predefined object attributes except the one object attribute pair are regrouped into a new object attribute set, And for the new object attribute set, the identifying operation and the attribute determining operation are executed in sequence.
表征对象属性对之间的相异性的模板可与设备600分离地被如上所述地预先形成并且存储。附加地或者作为替代,设备600可包含被配置用于以上述方式形成对象属性对的表征该对象属性对之间的相异性的模板的单元。Templates characterizing dissimilarities between pairs of object attributes may be pre-formed and stored separately from device 600 as described above. Additionally or alternatively, the device 600 may comprise a unit configured to form a template of an object attribute pair characterizing the dissimilarity between the object attribute pair in the manner described above.
[有利的技术效果][Favourable technical effect]
总而言之,本发明提供了一种新的用于图像中的对象区域的对象属性的识别的思路,其中引入了对象属性对的概念以改进对象区域的特征提取和识别。In summary, the present invention provides a new idea for object attribute recognition of object regions in images, wherein the concept of object attribute pairs is introduced to improve feature extraction and recognition of object regions.
更具体而言,对象属性对中包含的对象属性之间的相异性被用于针对对象属性对提取对象区域中的相异像素块,并且所提取的对象区域的特征被用于确定对象区域与对象属性对中的哪一对象属性相对应。因此,对象区域的特征的提取和识别被逐对地执行,由此,识别效率和准确性可被提高。More specifically, the dissimilarity between the object attributes contained in the object attribute pair is used to extract the dissimilar pixel blocks in the object region for the object attribute pair, and the features of the extracted object region are used to determine the difference between the object region and Which object property in the object property pair corresponds to. Accordingly, extraction and recognition of features of object regions are performed pair by pair, whereby recognition efficiency and accuracy can be improved.
应指出,对象区域的这样的相异像素块对于在每一轮中用作比较基础的预先定义的对象属性的集合中的每一对象属性对被确定和提取,并且可反映对象属性对中包含的对象属性之间的相异性。此外,这样的被提取的部分可在识别期间被适应性地改变,即,对象区域的相异像素块可依赖于每一轮比较中的对比物而改变,而不是保持恒定。It should be noted that such distinct pixel blocks of the object region are determined and extracted for each object attribute pair in the set of predefined object attributes used as the basis for comparison in each round, and may reflect the object attribute pairs contained in The dissimilarity between the object properties of . Furthermore, such extracted parts may be adaptively changed during recognition, ie, distinct pixel blocks of an object region may change depending on the contrast in each round of comparison, rather than remain constant.
因此,对象区域的可能对于对象属性对是公共的而不是区别性的一些部分可不被提取,并且所提取的部分可更准确地反映对象属性对中包含的对象属性之间的相异性,并且有助于准确地确定对象区域对应于对象属性对中包含的对象属性中的哪一个,从而对象区域的对象属性可被更准确地确定。Therefore, some parts of the object region that may be common rather than distinctive to the object attribute pair may not be extracted, and the extracted part may more accurately reflect the dissimilarity between the object attributes contained in the object attribute pair, and have It helps to accurately determine which of the object attributes contained in the object attribute pair the object region corresponds to, so that the object attributes of the object region can be more accurately determined.
下文,为了有助于透彻理解本发明的实现,使用脸部作为要被识别的对象的例子以便解释本发明的解决方案的示例性实现。应指出,本发明的解决方案还可应用于其他类型的对象。In the following, in order to facilitate a thorough understanding of the implementation of the invention, a face is used as an example of an object to be recognized in order to explain an exemplary implementation of the solution of the invention. It should be noted that the solution of the invention can also be applied to other types of objects.
对于要被识别的图像中的脸部区域,其属性可属于多种类别。例如,脸部属性的类别可以是选自包含脸部表情、当脸部为人脸时的与该脸部对应的人的性别以及年龄的组中的一种。当然,脸部属性的类别并不因此受限,并且可以是除上述类别之外的其它类别。For a face region in an image to be recognized, its attributes may belong to various categories. For example, the category of the face attribute may be one selected from the group including facial expression, gender and age of the person corresponding to the face when the face is a human face. Of course, the categories of face attributes are not limited thereby, and may be other categories than the above-mentioned ones.
[示例1][Example 1]
下文,将描述根据本发明的用于识别图像中的脸部区域的脸部属性(诸如,脸部表情)的过程。Hereinafter, a process for recognizing facial attributes (such as facial expressions) of a face region in an image according to the present invention will be described.
一般来说,对于其表情要被识别的输入图像中的脸部区域,针对预先定义的脸部表情的集合中的每一脸部表情对,基于该脸部表情对中包含的脸部表情之间的相异性提取该脸部区域的对应于该脸部表情对的特征,然后基于所提取的脸部区域的特征来识别该脸部区域的脸部表情。当输入的图像中存在多个脸部时,此过程被重复与脸部的数量相同的次数。In general, for a facial region in an input image whose expression is to be recognized, for each facial expression pair in a set of predefined facial expressions, based on the facial expressions contained in the facial expression pair, The feature of the facial region corresponding to the facial expression pair is extracted based on the dissimilarity between them, and then the facial expression of the facial region is recognized based on the extracted features of the facial region. When there are multiple faces in the input image, this process is repeated as many times as the number of faces.
下文将描述此过程的细节。The details of this process are described below.
最初,对于可包含至少一个脸部的输入图像,检测该输入图像中的脸部区域,通常一个脸部区域对应于图像中的一个脸部。图7示出从输入图像中检测到的矩形的脸部区域。Initially, for an input image that may contain at least one face, face regions in the input image are detected, usually one face region corresponds to one face in the image. Fig. 7 shows a rectangular face region detected from an input image.
优选地,在将检测到的脸部区域用于特征提取之前,脸部区域通常被分别对齐,并且该对齐可被以多种方式执行。Preferably, before the detected face regions are used for feature extraction, the face regions are usually aligned respectively, and this alignment can be performed in various ways.
在一种实现中,脸部区域基于从脸部图像提取的预定数量的特征点被对齐,其中特征点的数量可基于操作者的经验被设定,并且不限于某些特定数量。特征点提取方法可以是诸如Xudong Cao,Yichen Wei,Fang Wen,Jian Sun.Face alignment by explicitshape regression CVPR,2012,以及D.Cristinacce and T.F.Cootes.Boostedregression active shape models.BMVC,2007中公开的ASM。应指出,特征点提取方法并不因此受限,并且可以是本领域中已知的任何其它方法。In one implementation, the face regions are aligned based on a predetermined number of feature points extracted from the face image, wherein the number of feature points can be set based on the operator's experience and is not limited to some specific number. The feature point extraction method can be ASM disclosed in such as Xudong Cao, Yichen Wei, Fang Wen, Jian Sun. Face alignment by explicit shape regression CVPR, 2012, and D. Cristinacce and T. F. Cootes. Boosted regression active shape models. BMVC, 2007. It should be noted that the feature point extraction method is not limited thereby, and may be any other method known in the art.
图8示意性地示出从脸部区域提取7个特征点,并且如图8所示,这7个特征点为:两个眼睛中的每一个的两个眼角、鼻尖、以及嘴部的两个嘴角。Fig. 8 schematically shows extracting 7 feature points from the face area, and as shown in Fig. 8, these 7 feature points are: two corners of each of the two eyes, the tip of the nose, and two corners of the mouth. corner of the mouth.
对齐可被如下地执行。应指出,以下的在本领域中已知的用于对齐的过程仅仅是示例性的,并且对齐还可通过其他过程来执行。Alignment can be performed as follows. It should be noted that the following procedures for alignment known in the art are exemplary only, and alignment may also be performed by other procedures.
在对齐时,被提取的7个特征点的平均位置根据预定数量的人工标记的样本被计算。假定存在n个标记的样本,七个点Pi(xi,yi)(i=1~7)的平均位置被计算如下:At the time of alignment, the average position of the extracted 7 feature points is calculated based on a predetermined number of manually labeled samples. Assuming that there are n labeled samples, the average position of seven points P i ( xi , y i ) (i=1~7) is calculated as follows:
这里,x和y可代表横轴位置和竖轴位置。Here, x and y may represent a horizontal axis position and a vertical axis position.
这七个点Pi(xi,yi)(i=1~7)的平均位置被定义为目标面部(objective face),并且采用仿射映射过程以使输入面部与目标面部对齐。The average position of these seven points P i (xi , y i ) (i=1˜7) is defined as the objective face, and an affine mapping process is used to align the input face with the objective face.
对齐的面部的大小可以是200*200像素。应指出,对齐的脸部区域的大小不被限制,并且可以是任何其他大小。The size of the aligned faces may be 200*200 pixels. It should be noted that the size of the aligned face regions is not limited and may be any other size.
接下来,输入图像中的可能已被对齐的脸部区域将经受特征提取。图9是示出用于特征提取的过程的流程图,其中步骤S101由虚线示出,这意味着该步骤是可选的。Next, the possibly aligned face regions in the input image are subjected to feature extraction. Fig. 9 is a flowchart showing the process for feature extraction, where step S101 is shown by a dotted line, which means that this step is optional.
在该特征提取过程中,在脸部区域中定位对应于预先定义的脸部表情的集合之中的脸部表情对的模板的至少一个相异像素块,该模板表征脸部表情对中的脸部表情之间的相异性(S102),然后,基于被定位的至少一个相异像素块来提取脸部区域的对应于该脸部表情对的特征(S103)。In this feature extraction process, at least one distinct pixel block corresponding to a template of a facial expression pair from a predefined set of facial expressions, which template characterizes the face in the facial expression pair, is located in the face region The dissimilarity between facial expressions (S102), and then, based on the located at least one dissimilar pixel block, extract the features of the face region corresponding to the facial expression pair (S103).
脸部表情对的模板可由脸部表情对中的各个脸部表情的图像之中的彼此对应的至少一个块构成,该至少一个块可反映该脸部表情对中的脸部表情图像之间的相异性,并且稍后将详细描述模板的细节。The template of the facial expression pair can be formed by at least one block corresponding to each other among the images of the facial expressions in the facial expression pair, and the at least one block can reflect the difference between the facial expression images in the facial expression pair. dissimilarity, and the details of the template will be described in detail later.
对于定位相异像素块的过程,在一种实现中,可对于每一脸部表情对根据该脸部表情对的模板直接依赖于预先确定的对应性关系(诸如,在相同位置以及具有相同的块大小)在脸部图像中定位至少一个相异像素块。应指出,至少一个相异像素块和模板中的块之间的对应性关系并不因此受限,并且可满足其他规则。For the process of locating different pixel blocks, in one implementation, for each facial expression pair, the template of the facial expression pair can be directly relied on the predetermined corresponding relationship (such as, in the same position and with the same block size) to locate at least one distinct block of pixels in the face image. It should be noted that the correspondence relationship between at least one distinct pixel block and the blocks in the template is not thus limited and other rules may be satisfied.
在另一种实现中,可预先执行用于在脸部图像中定位辅助区域(诸如,器官区域)的过程(S101),从而脸部图像中的相异像素块的定位可仅针对脸部图像中的辅助区域(诸如,脸部图像中的器官区域)执行,而不是针对整个脸部图像执行。In another implementation, the process (S101) for locating an auxiliary region (such as an organ region) in a facial image may be performed in advance, so that the positioning of distinct pixel blocks in the facial image may only be performed for the facial image It is performed on auxiliary regions in the face image, such as organ regions in the face image, rather than the entire face image.
辅助区域可以是任何形状,诸如矩形、方形等,并且可根据操作者的实验为任何大小。The auxiliary area can be of any shape, such as rectangular, square, etc., and can be of any size according to the operator's experimentation.
如图10所示,可定位四个器官区域,包含两个眼睛区域、一个鼻子区域和一个嘴部区域。在识别过程中,对于每个对齐的脸部,这四个区域的大小是固定的。例如,在200*200像素的脸部中,眼部矩形的大小为80*60,鼻子矩形的大小为140*40,并且嘴部矩形的大小为140*80。As shown in Figure 10, four organ regions can be located, including two eye regions, one nose region and one mouth region. During recognition, the sizes of these four regions are fixed for each aligned face. For example, in a face of 200*200 pixels, the size of the eye rectangle is 80*60, the size of the nose rectangle is 140*40, and the size of the mouth rectangle is 140*80.
优选地,器官区域的定位可由脸部图像中的特征点确定。设坐标的原点是图像的左上角。在定位左眼区域时,矩形区域的中心可与图10中的线AB的中点一致。类似地,右眼区域的矩形的中心可与图10中的线CD的中点相同。对于鼻子区域,如果左上角的位置坐标为(n1,n2),右下角的位置坐标为(n3,n4),并且鼻尖E的位置坐标为(e1,e2),则这三个点的坐标满足以下方程:Preferably, the location of the organ region can be determined by feature points in the face image. Let the origin of the coordinates be the upper left corner of the image. When locating the left-eye area, the center of the rectangular area may coincide with the midpoint of the line AB in FIG. 10 . Similarly, the center of the rectangle of the right eye region may be the same as the midpoint of line CD in FIG. 10 . For the nose area, if the position coordinates of the upper left corner are (n1, n2), the position coordinates of the lower right corner are (n3, n4), and the position coordinates of the nose tip E are (e1, e2), then the coordinates of these three points satisfy The following equation:
e1=α*(n1+n3),e2=n2+β*(n4-n2),e1=α*(n1+n3), e2=n2+β*(n4-n2),
这里,0.3≤α≤0.7,0.5≤β≤0.8。Here, 0.3≤α≤0.7, 0.5≤β≤0.8.
对于嘴部区域,设H(h1,h2)为线FG的中点,嘴部区域的左上角为(m1,m2),右下角为(m3,m4)。坐标的关系满足以下方程:For the mouth region, let H(h1, h2) be the midpoint of the line FG, the upper left corner of the mouth region is (m1, m2), and the lower right corner is (m3, m4). The relationship of the coordinates satisfies the following equation:
h1=γ*(m1+m3),h2=m2+δ*(m4-m2),h1=γ*(m1+m3), h2=m2+δ*(m4-m2),
这里,0.3≤γ≤0.7,0.3≤δ≤0.6。Here, 0.3≤γ≤0.7, 0.3≤δ≤0.6.
由此,脸部图像中的四个辅助区域可被定位,并且针对脸部表情对的脸部图像中的相异像素块可参照该脸部表情对的模板仅在该辅助区域中被定位。在这样的情况中,优选地,脸部表情对的模板可仅由该脸部表情对中的各脸部表情中的辅助区域中的相异块构成,脸部表情中的辅助区域以预定方式与在脸部区域中定位的辅助区域对应(诸如,与脸部区域中的辅助区域处于相同位置以及具有相同大小)。Thereby, four auxiliary regions in the facial image can be located, and different pixel blocks in the facial image for a facial expression pair can be located only in the auxiliary region with reference to the template of the facial expression pair. In such a case, preferably, the template of the facial expression pair may only be composed of distinct blocks in the auxiliary regions in each facial expression in the facial expression pair, the auxiliary regions in the facial expressions are formed in a predetermined manner Corresponding to (such as being at the same location and having the same size as) the auxiliary region positioned in the face region.
下文,将详细描述脸部表情对的模板。Hereinafter, templates of facial expression pairs will be described in detail.
在本发明中,对于预先定义的脸部表情的集合中的每一脸部表情对确定模板。脸部表情对可包含预定数量的脸部表情,并且优选地包含两种脸部表情,并且其中的任何两种脸部表情可形成脸部表情对。In the present invention, a template is determined for each pair of facial expressions in a set of predefined facial expressions. A facial expression pair may contain a predetermined number of facial expressions, and preferably two facial expressions, and any two of them may form a facial expression pair.
例如,如图11所示,设预先定义的脸部表情的集合包含三种脸部表情:悲伤、中性和愤怒,因此可存在C3 2个脸部表情对。例如,如图11所示,脸部表情对可由悲伤表情和中性表情构成,脸部表情对可有中性表情和愤怒表情构成,并且脸部表情对可由悲伤表情和愤怒表情构成。For example, as shown in FIG. 11 , it is assumed that the predefined facial expression set includes three facial expressions: sad, neutral and angry, so there may be C 3 2 facial expression pairs. For example, as shown in FIG. 11 , a facial expression pair can be composed of a sad expression and a neutral expression, a facial expression pair can be composed of a neutral expression and an angry expression, and a facial expression pair can be composed of a sad expression and an angry expression.
在另一种实现中,脸部表情对中包含的脸部表情可以为它们之间的差别很大甚至彼此相反的脸部表情。例如,脸部表情对可具体地由大笑表情和哭表情构成,从而对于这样的脸部表情对被提取的块更具区别性。In another implementation, the facial expressions included in the facial expression pair may be facial expressions that are very different from each other or even opposite to each other. For example, a facial expression pair may be specifically composed of a laughing expression and a crying expression, so that the extracted blocks for such a facial expression pair are more distinctive.
图12示出用于配置脸部表情对的模板的过程的流程图。这样的过程可在执行本发明的过程之前被预先执行,从而预先定义的脸部表情的集合中包含的所有脸部表情对的模板可被预先配置和存储。作为替代,这样的过程可随着本发明的过程的执行及时执行。FIG. 12 shows a flowchart of a process for configuring templates for facial expression pairs. Such a process may be pre-executed before performing the process of the present invention, so that templates for all facial expression pairs contained in a predefined set of facial expressions may be preconfigured and stored. Alternatively, such processes may be performed in time with the performance of the processes of the present invention.
首先,分别对应于脸部表情对中包含的两个脸部表情的两个平均脸部图像被彼此对应地划分成多个块。First, two average face images respectively corresponding to two facial expressions contained in a facial expression pair are divided into a plurality of blocks corresponding to each other.
各表情的平均脸部图像通常可通过将同一表情中的被对齐的脸部进行平均而被构建。以大笑表情的平均脸部为例,假定具有N个对齐的大笑样本,大笑的平均脸部图像I通过下式获得:An average face image for each expression can generally be constructed by averaging the aligned faces in the same expression. Taking the average face of a big smile as an example, assuming that there are N aligned big smile samples, the average face image I of a big smile is obtained by the following formula:
此等式指的是脸部图像中的各对齐的像素的灰度值在具有权重1/N的情况下被加到一起,以获得大笑的平均脸部图像。This equation means that the gray values of the aligned pixels in the face image are added together with a weight of 1/N to obtain the average face image of a big smile.
图13示例性地示出大笑、中性、悲伤和微笑的平均脸部图像,并且这样的平均脸部图像通常预先基于脸部表情数据库被制备并被存储。FIG. 13 exemplarily shows average facial images of laughing, neutral, sad, and smiling, and such average facial images are usually prepared in advance based on a facial expression database and stored.
图14示意性地示出两个平均脸部图像的对应划分。这两个平均脸部图像分别利用相同的模式(诸如,网格)被划分,并且被划分的块的大小没有被限制。例如,在200*200像素的平均脸部图像中,块的大小为10*10像素。Fig. 14 schematically shows the corresponding division of two average face images. The two average face images are respectively divided using the same pattern such as grid, and the size of divided blocks is not limited. For example, in an average face image of 200*200 pixels, the block size is 10*10 pixels.
应指出,各个平均脸部图像的划分模式并不因此受限,并且可按其它方式彼此对应,例如,当各个平均脸部图像具有不同的大小时,划分模式中的每一个块可根据各个平均脸部图像的大小的比率彼此对应。It should be noted that the division patterns of the respective average facial images are not limited thereby, and may correspond to each other in other ways, for example, when the respective average facial images have different sizes, each block in the division pattern may be divided according to the respective average The ratios of the sizes of the face images correspond to each other.
接下来,将提取每个被划分的平均脸部图像的多个块中的每一个的特征。存在多种提取方法,并且该多种提取方法中的每一种可被应用于本过程。Next, the features of each of the blocks of each divided average face image will be extracted. There are various extraction methods, and each of the various extraction methods can be applied to this process.
如本领域中已知的,特征提取方法可以是诸如Timo Ojala,Matti Pietikainen,and Topi Maenpaa,Multi-resolution gary-scale and rotation invariant textureclassification with local binary patterns,IEEE Transaction on PatternAnalysis and Machine Intelligence,2002中公开的局部二元模式(LBP),或者诸如VilleOjansivu,and Janne Heikkila,Blur insensitive texture classification usinglocal phase quantization,ICISP2008中公开的局部相位量化(LPQ)。As known in the art, feature extraction methods can be such as disclosed in Timo Ojala, Matti Pietikainen, and Topi Maenpaa, Multi-resolution gary-scale and rotation invariant texture classification with local binary patterns, IEEE Transaction on Pattern Analysis and Machine Intelligence, 2002 Local Binary Pattern (LBP), or Local Phase Quantization (LPQ) such as disclosed in VilleOjansivu, and Janne Heikkila, Blur insensitive texture classification using local phase quantization, ICISP2008.
在LBP的情况中,块大小与相异像素块的大小相同,并且区段的总数为例如59。因此,每个块LBP特征具有59个维度。特征计算过程被简述如下:In the case of LBP, the block size is the same as that of the distinct pixel block, and the total number of sections is 59, for example. Therefore, each block LBP feature has 59 dimensions. The feature calculation process is briefly described as follows:
1)对于输入图像中的每个像素,计算LBP8,1 1) For each pixel in the input image, calculate LBP 8, 1
a)获得作为当前像素的中心像素的值a) Get the value of the center pixel as the current pixel
b)提取八个相邻区域中的像素值b) Extract pixel values in eight adjacent regions
c)通过双线性插值来计算gP,(P=0,1,…,7)c) Calculate g P by bilinear interpolation, (P=0, 1, ..., 7)
这里,gp是相邻像素之一的灰度值,并且gc是中心像素的灰度值。Here, gp is the grayscale value of one of the adjacent pixels, and gc is the grayscale value of the central pixel.
2)使用Ville Ojansivu和Janne Heikkila,Blur insensitive textureclassification using local phase quantization.ICISP2008中公开的LBP值映射表,通过将块中的各像素的所述LBP加到一起来构建59维的LBP直方图。2) Using the LBP value mapping table disclosed in Ville Ojansivu and Janne Heikkila, Blur insensitive texture classification using local phase quantization.ICISP2008, a 59-dimensional LBP histogram is constructed by adding the LBPs of each pixel in the block together.
接下来,确定两个经划分的平均脸部图像中的对应块的特征之间的相似性。Next, the similarity between features of corresponding blocks in the two divided average face images is determined.
例如,对应块的相似性(参照图14)可使用欧几里德距离来被确定。假定两个特征向量f1=<a1,a2,…an>,f2=<b1,b2,…bn>,f1和f2的相似性如下:For example, the similarity of corresponding blocks (refer to FIG. 14 ) may be determined using Euclidean distance. Assuming two feature vectors f1=<a1, a2,...an>, f2=<b1, b2,...bn>, the similarity of f1 and f2 is as follows:
因此,这两个划分的平均脸部图像之间的相似性可如上所述地被逐个块地确定。应指出,相似性的确定并不因此受限,并且可被以本领域已知的其它方式实现。Therefore, the similarity between the two divided average face images can be determined block by block as described above. It should be noted that the determination of similarity is not thus limited and may be accomplished in other ways known in the art.
最后,选择两个划分的平均脸部图像中的它们之间的相似性低于预定阈值的块以形成模板。Finally, blocks in the two divided average face images whose similarity between them is lower than a predetermined threshold are selected to form a template.
更具体而言,两个经划分的平均脸部图像中的对应块的相似性被以升序排序,第一预定数量的块的对被选择,并且该预定数量的块的对的索引被保存以作为表情对的模板。该预定数量(还对应于预先定义的阈值,该预先定义的阈值可以是预定数量的对中的最后一个对的相似性)可通过实验被优化。表情对的模板的一个示例在图15中被示出。More specifically, the similarities of corresponding blocks in the two divided average face images are sorted in ascending order, a first predetermined number of pairs of blocks is selected, and indexes of the predetermined number of pairs of blocks are saved as as a template for emoticon pairs. This predetermined number (also corresponding to a predefined threshold, which may be the similarity of the last pair of the predetermined number of pairs) may be optimized experimentally. An example of a template for an emoticon pair is shown in FIG. 15 .
因此,对于每一表情对,可根据已经如上所述地形成的该表情对的模板在脸部区域中定位一组相异像素块。图16示出此过程。Thus, for each expression pair, a set of distinct pixel blocks can be located in the face region according to the template for that expression pair that has been formed as described above. Figure 16 illustrates this process.
首先,根据该模板将输入的脸部图像划分成块,并且块划分可与模板划分相同(诸如具有同样的模式并且同样的块大小)。然后,由于表情对的模板调整了脸部图像中的相异像素块的索引,相异像素块根据模板中的索引被定位在对齐的脸部图像中。实际上,相异像素块的大小可以是10*10像素。First, the input face image is divided into blocks according to the template, and the block division may be the same as the template division (such as having the same pattern and the same block size). Then, since the template of the expression pair adjusts the indices of the different pixel blocks in the face image, the different pixel blocks are positioned in the aligned face image according to the indices in the template. Actually, the size of the distinct pixel block may be 10*10 pixels.
如上所述,当脸部图像中的辅助区域被预先定位时,上述过程可仅对于辅助区域执行。As described above, when the auxiliary area in the face image is pre-located, the above-described process may be performed only for the auxiliary area.
基于要被识别的脸部区域中的这样定位的相异像素块,可提取脸部区域的特征。图17示出用于这样的特征提取的流程图。特别地,可对于脸部区域中的至少一个块中的每一个提取特征(S1031),并且所提取的各个块的特征被连结作为脸部区域的特征(S1032)。Based on such positioned distinct pixel blocks in the face region to be recognized, features of the face region can be extracted. Figure 17 shows a flowchart for such feature extraction. In particular, features may be extracted for each of at least one block in the face area (S1031), and the extracted features of the respective blocks are concatenated as features of the face area (S1032).
特征提取方式可以是本领域中已知的任何方法,并且例如,可以与上述特征提取过程中的方法(例如,LBP)相同。The feature extraction method may be any method known in the art, and for example, may be the same as the method (eg, LBP) in the feature extraction process described above.
然后,连结所有相异像素块的特征以表示脸部表情的特征。最终向量的维度为59*n,这里n是相异像素块的总数,并且59代表用于特征提取的区段的总数并且可以是任何其它数字。在仅利用辅助区域中的相异像素块的情况下,在每一器官区域中,每个相异像素块的特征被以固定的顺序连结,然后四个获得的器官区域的特征被连结。Then, the features of all different pixel blocks are concatenated to represent the features of facial expressions. The dimension of the final vector is 59*n, where n is the total number of distinct pixel blocks, and 59 represents the total number of segments for feature extraction and can be any other number. In the case of using only distinct pixel blocks in the auxiliary region, in each organ region, the features of each distinct pixel block are concatenated in a fixed order, and then the features of the four obtained organ regions are concatenated.
下文将描述脸部图像的脸部表情的识别。Recognition of facial expressions of facial images will be described below.
图18是示出识别过程的一种实现的流程图,在该实现中,识别被以所谓的“一对一”方式实现,在该方式中,脸部表情将对于预先定义的脸部表情的集合被投票Cn t次,这里n是该集合中包含的脸部表情的数量并且t是脸部表情对中包含的脸部表情的数量,即,对于一个脸部表情对进行一次投票,并且具有最高得分的脸部表情将被确定为最终脸部表情。Fig. 18 is a flow chart showing an implementation of the recognition process, in which the recognition is carried out in a so-called "one-to-one" manner, in which facial expressions are compared to predefined facial expressions The set is voted C n t times, where n is the number of facial expressions contained in the set and t is the number of facial expressions contained in a pair of facial expressions, i.e., one vote is taken for a pair of facial expressions, and The facial expression with the highest score will be determined as the final facial expression.
各表情被指示为B1,...,Bn,并且各表情的得分可最初设定为0,指示为f(B1)=…=f(Bn)=0。对于每一脸部表情对,当脸部区域被确定为对应于其中的一个脸部表情(Bi)时,这样的脸部表情的得分被增大恒定值,例如f(Bi)=f(Bi)+1。 Each expression is denoted as B 1 , . For each facial expression pair, when a facial region is determined to correspond to one of the facial expressions (B i ), the score of such facial expression is increased by a constant value, e.g. f(B i )=f (B i )+1.
最终,对应于最大得分f(B)=max{f(B1)…f(Bn)}的脸部表情被确定为脸部区域的脸部表情。Finally, the facial expression corresponding to the maximum score f(B)=max{f(B 1 )...f(B n )} is determined as the facial expression of the face region.
图19是示出识别过程的另一实现的流程图,在该实现中,识别可被以所谓的“一胜一”方式实现,在该方式中,在预先定义的脸部表情的集合中包含的预先定义的脸部表情的数量为n的情况下,脸部区域的脸部表情将通过n-1轮被确定,其中仅有在一轮中脸部表情对的胜出的表情将前进至下一轮,并且最终胜出的表情被确定为最终脸部表情。Fig. 19 is a flow chart showing another implementation of the recognition process, in which the recognition can be implemented in a so-called "one wins one" approach, in which the set of predefined facial expressions contains In the case where the number of predefined facial expressions is n, the facial expressions of the face region will be determined through n-1 rounds, where only the winning expression of the facial expression pair in one round will advance to the next One round, and the winning expression is determined as the final facial expression.
设各表情被指示为B1,…,Bn,并且脸部表情对包含两个脸部表情。一开始,从预先定义的脸部表情的集合中任意选择脸部表情对(Bi,Bi),并且对于该脸部表情对,确定该脸部区域与哪个脸部表情对应。例如,确定,脸部区域的脸部表情为Bi。Let each expression be denoted as B 1 , . . . , B n , and a facial expression pair contains two facial expressions. Initially, a facial expression pair (B i , Bi ) is arbitrarily selected from a set of predefined facial expressions, and for this facial expression pair, it is determined which facial expression the facial region corresponds to. For example, it is determined that the facial expression of the face area is B i .
然后,将从表情的初始集合中排除表情Bi,并且剩余的表情将被重新编组为新的表情集合。对于该新的集合,再次执行上述过程。Then, the expression Bi will be excluded from the initial set of expressions, and the remaining expressions will be reorganized into a new set of expressions. For this new collection, the above process is performed again.
因此,这样的过程将被执行n-1轮,并且最终剩余的表情被识别为脸部图像的表情。Therefore, such a process will be performed for n-1 rounds, and finally the remaining expressions are recognized as those of the face image.
总体而言,“一胜一”方式比“一对一”方式更加高效,而准确度差不多相同。Overall, the "one wins one" approach is more efficient than the "one-on-one" approach with about the same accuracy.
针对一个脸部表情对的表情确定可被以本领域已知的任何方式(诸如通过分类器)实现。在分类器的情况下,对于表情对,特征向量通过二值分类器被分类。采用诸如Chih-Chung Chang and Chih-Jen Lin LIBSVM:a library for support vectormachines.2011公开的线性SVM作为该分类器。判定函数为sgn(wTf+b),其中w和b被存储在字典中,w是对于SVM被训练的权重并且b是偏置项(bias item),并且f是脸部区域的特征向量。Expression determination for a facial expression pair may be achieved in any manner known in the art, such as by a classifier. In the case of classifiers, for expression pairs, the feature vectors are classified by a binary classifier. A linear SVM such as that disclosed by Chih-Chung Chang and Chih-Jen Lin LIBSVM: a library for support vectormachines.2011 is adopted as the classifier. The decision function is sgn(w T f+b), where w and b are stored in a dictionary, w is the weight trained for the SVM and b is the bias item, and f is the feature vector of the face region .
实验结果Experimental results
表1示出了试验中所使用的数据集。此数据集中的图像是被从互联网下载的,具有自然表情的正面脸部图像。Table 1 shows the datasets used in the experiments. The images in this dataset are frontal face images with natural expressions downloaded from the Internet.
表1Table 1
表2示出本申请的解决方案对比诸如美国专利申请US2012/0169895的性能(例如,脸部表情识别准确度)。Table 2 shows the performance (eg, facial expression recognition accuracy) of the solution of the present application compared to such as US patent application US2012/0169895.
表2Table 2
现有技术的混识矩阵(confusion matrix)如表3所示,并且本申请的含混矩阵如表4所示。The confusion matrix of the prior art is shown in Table 3, and the confusion matrix of the present application is shown in Table 4.
表3table 3
表4Table 4
[示例2][Example 2]
下文将描述根据本发明的用于识别图像中的脸部属性(诸如,脸部的年龄)的过程。Hereinafter, a process for identifying attributes of a face in an image, such as the age of the face, according to the present invention will be described.
设存在预先定义的年龄阶段的集合,包含儿童、少年、成人、老年人等。然后,将在该预先定义的年龄阶段的集合中配置年龄阶段对,并且将逐对地确定脸部的年龄。该过程可被与示例1中的实现类似地被实现。There are collections of pre-defined age groups, including children, teens, adults, seniors, etc. Age stage pairs will then be configured in this predefined set of age stages, and the age of the face will be determined pair by pair. This process can be implemented similarly to that in Example 1.
更具体而言,一开始,该过程检测脸部并且在输入的脸部图像中定位脸部区域。More specifically, initially, the process detects faces and locates face regions in an input face image.
接下来,该过程根据被训练的每一年龄阶段对的模板对于每一年龄阶段对来定位脸部区域中的相异像素块.Next, the process locates the distinct pixel blocks in the face region for each age-stage pair according to the trained template for each age-stage pair.
接下来,该过程基于用于对每个年龄阶段对进行分类的相异像素块获得脸部区域的特征。一个年龄阶段对的特征可通过将相异像素块的特征连结而表示。Next, the process obtains features of the face region based on the distinct pixel blocks used to classify each age-stage pair. The characteristics of an age-stage pair can be represented by concatenating the characteristics of different pixel blocks.
接下来,该过程针对每一年龄阶段对确定脸部的年龄阶段,并且整合分类结果以确定脸部的年龄阶段。Next, the process determines the age stage of the face for each age stage pair, and integrates the classification results to determine the age stage of the face.
优选地,在定位相异像素块之前,脸部可被对齐。Preferably, the faces may be aligned prior to locating the distinct pixel blocks.
优选地,在定位相异像素块之前或期间,可在脸部区域中定位辅助区域,从而在定位相异像素块以及后续操作期间,可仅处理辅助区域。Preferably, before or during locating the different pixel blocks, the auxiliary region can be located in the face area, so that during locating the different pixel blocks and subsequent operations, only the auxiliary region can be processed.
应指出,上述示例仅是说明性的,而不是限制性的。本发明的解方法并不被因此被限制,并且可应用于其它类型的对象属性识别。It should be noted that the above examples are only illustrative and not restrictive. The solution method of the present invention is not thus limited, and can be applied to other types of object attribute recognition.
[工业应用性][Industrial applicability]
本发明可用于多种应用。例如,本发明可应用于检测和跟踪图像中的对象的状态,诸如照相机、观众响应系统和自动相片注释中的微笑检测。The present invention can be used in a variety of applications. For example, the invention can be applied to detecting and tracking the state of objects in images, such as cameras, audience response systems, and smile detection in automatic photo annotation.
更具体而言,在一种实现中,可检测对象,并且然后可利用本发明的方法来识别该对象的属性。More specifically, in one implementation, an object can be detected, and then the methods of the present invention can be utilized to identify attributes of the object.
在照相机应用的情况中,通过照相机捕获图像。系统通过脸部检测技术从所捕获的图像来选择脸部图像。将脸部图像输入脸部表情识别模块。一些预先定义的表情(例如,高兴、悲伤、中性等)被识别。然后,将识别结果输入评估模块,该评估模块根据观众的表情来评估会议效果。此系统最终输出会议效果的评估结果。In the case of a camera application, an image is captured by a camera. The system selects face images from the captured images through face detection techniques. Input the facial image into the facial expression recognition module. Some pre-defined expressions (eg, happy, sad, neutral, etc.) are recognized. Then, the recognition result is input into the evaluation module, which evaluates the meeting effect according to the expressions of the audience. The system finally outputs the evaluation result of the meeting effect.
可采用多种方式来实行本发明的方法和系统。例如,可通过软件、硬件、固件或它们的任何组合来实行本发明的方法和系统。上文所述的该方法的步骤的顺序仅是说明性的,并且除非另外具体说明,否则本发明的方法的步骤不限于上文具体描述的顺序。此外,在一些实施例中,本发明还可具体化为记录介质中记录的程序,包括用于实施根据本发明的方法的机器可读指令。因此,本发明还涵盖了存储用于实施根据本发明的方法的程序的记录介质。The methods and systems of the present invention can be implemented in a number of ways. For example, the methods and systems of the present invention may be implemented by software, hardware, firmware, or any combination thereof. The sequence of steps of the method described above is illustrative only, and unless specifically stated otherwise, the steps of the method of the present invention are not limited to the sequence specifically described above. Furthermore, in some embodiments, the present invention can also be embodied as a program recorded in a recording medium, including machine-readable instructions for implementing the method according to the present invention. Therefore, the present invention also covers a recording medium storing a program for implementing the method according to the present invention.
虽然已经参考示例实施例描述了本发明,但是本领域技术人员应当理解,上述示例仅仅是说明性的而不是打算限制本发明的范围。本领域技术人员应理解上述实施例可在不背离本发明的范围和精神的情况下被修改。本发明的范围由所附的权利要求限定,所附的权利要求的范围将被给予最宽泛的解释,以便包含所有这些修改以及等同结构和功能。While the invention has been described with reference to exemplary embodiments, those skilled in the art will appreciate that the foregoing examples are illustrative only and are not intended to limit the scope of the invention. It will be appreciated by those skilled in the art that the above-described embodiments may be modified without departing from the scope and spirit of the invention. The scope of the present invention is defined by the appended claims, which are to be given the broadest interpretation so as to embrace all such modifications and equivalent structures and functions.
Claims (10)
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201310320936.8A CN104346601B (en) | 2013-07-26 | 2013-07-26 | Object identifying method and equipment |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201310320936.8A CN104346601B (en) | 2013-07-26 | 2013-07-26 | Object identifying method and equipment |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| CN104346601A CN104346601A (en) | 2015-02-11 |
| CN104346601B true CN104346601B (en) | 2018-09-18 |
Family
ID=52502177
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN201310320936.8A Expired - Fee Related CN104346601B (en) | 2013-07-26 | 2013-07-26 | Object identifying method and equipment |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN104346601B (en) |
Families Citing this family (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN105426812B (en) * | 2015-10-27 | 2018-11-02 | 浪潮电子信息产业股份有限公司 | A kind of expression recognition method and device |
Citations (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN101667248A (en) * | 2008-09-04 | 2010-03-10 | 索尼株式会社 | Image processing apparatus, imaging apparatus, image processing method, and program |
| CN102663413A (en) * | 2012-03-09 | 2012-09-12 | 中盾信安科技(江苏)有限公司 | Multi-gesture and cross-age oriented face image authentication method |
Family Cites Families (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP4457980B2 (en) * | 2005-06-21 | 2010-04-28 | ソニー株式会社 | Imaging apparatus, processing method of the apparatus, and program for causing computer to execute the method |
| JP4197019B2 (en) * | 2006-08-02 | 2008-12-17 | ソニー株式会社 | Imaging apparatus and facial expression evaluation apparatus |
| JP2012243179A (en) * | 2011-05-23 | 2012-12-10 | Sony Corp | Information processor, information processing method and program |
| CN102314687B (en) * | 2011-09-05 | 2013-01-23 | 华中科技大学 | Method for detecting small targets in infrared sequence images |
-
2013
- 2013-07-26 CN CN201310320936.8A patent/CN104346601B/en not_active Expired - Fee Related
Patent Citations (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN101667248A (en) * | 2008-09-04 | 2010-03-10 | 索尼株式会社 | Image processing apparatus, imaging apparatus, image processing method, and program |
| CN102663413A (en) * | 2012-03-09 | 2012-09-12 | 中盾信安科技(江苏)有限公司 | Multi-gesture and cross-age oriented face image authentication method |
Also Published As
| Publication number | Publication date |
|---|---|
| CN104346601A (en) | 2015-02-11 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| Vieira et al. | Detecting siblings in image pairs | |
| Alexandre | Gender recognition: A multiscale decision fusion approach | |
| Liu et al. | BRINT: binary rotation invariant and noise tolerant texture classification | |
| Sagonas et al. | 300 faces in-the-wild challenge: The first facial landmark localization challenge | |
| Sun et al. | Learning discriminative part detectors for image classification and cosegmentation | |
| US8606022B2 (en) | Information processing apparatus, method and program | |
| Becker et al. | Evaluating open-universe face identification on the web | |
| CN103729649B (en) | A kind of image rotation angle detection method and device | |
| CN106485260A (en) | The method and apparatus that the object of image is classified and computer program | |
| CN111582027B (en) | Identity authentication method, identity authentication device, computer equipment and storage medium | |
| CN111783770A (en) | Image rectification method, device and computer readable storage medium | |
| CN104598881B (en) | Feature based compresses the crooked scene character recognition method with feature selecting | |
| JP6997369B2 (en) | Programs, ranging methods, and ranging devices | |
| Tatsuma et al. | Food image recognition using covariance of convolutional layer feature maps | |
| WO2013122009A1 (en) | Reliability level acquisition device, reliability level acquisition method and reliability level acquisition program | |
| CN107766774A (en) | Face identification system and method | |
| Ismail et al. | Efficient enhancement and matching for iris recognition using SURF | |
| CN102637255A (en) | Method and device for processing faces contained in images | |
| US9002115B2 (en) | Dictionary data registration apparatus for image recognition, method therefor, and program | |
| JP2012048624A (en) | Learning device, method and program | |
| JP2017129990A (en) | Device, method, and program for image recognition | |
| JP2012221053A (en) | Image recognition apparatus, image recognition method and program | |
| CN104346601B (en) | Object identifying method and equipment | |
| CN107480628A (en) | A kind of face identification method and device | |
| CN105760881A (en) | Facial modeling detection method based on Haar classifier method |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| C06 | Publication | ||
| PB01 | Publication | ||
| C10 | Entry into substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| GR01 | Patent grant | ||
| GR01 | Patent grant | ||
| CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20180918 |
|
| CF01 | Termination of patent right due to non-payment of annual fee |