CN101091186A - Segment and generate compact representations for digital images - Google Patents
Segment and generate compact representations for digital images Download PDFInfo
- Publication number
- CN101091186A CN101091186A CN 200580043979 CN200580043979A CN101091186A CN 101091186 A CN101091186 A CN 101091186A CN 200580043979 CN200580043979 CN 200580043979 CN 200580043979 A CN200580043979 A CN 200580043979A CN 101091186 A CN101091186 A CN 101091186A
- Authority
- CN
- China
- Prior art keywords
- color
- pixels
- pixel
- tile
- image
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Landscapes
- Image Analysis (AREA)
- Compression Of Band Width Or Redundancy In Fax (AREA)
- Facsimile Image Signal Circuits (AREA)
Abstract
Description
版权声明Copyright Notice
本专利说明书包含受版权保护的内容。版权所有人对出于阅读目的从相关的专利局文件复制本专利说明书或有关材料没有异议,但是除此之外保留所有版权。This patent specification contains material that is protected by copyright. The copyright owner has no objection to the reproduction of this patent specification or related material from the relevant Patent Office file for reading purposes, but otherwise reserves all copyrights.
技术领域technical field
本发明一般地涉及数字图像处理领域,并且更具体地涉及产生数字图像的高等级描述。The present invention relates generally to the field of digital image processing, and more particularly to generating high-level descriptions of digital images.
背景技术Background technique
图像分段是将图像划分为或分开为语义上或视觉上连贯的区域。每个区域是具有类似属性或多个属性的一组连通的像素。对于单色图像,用于分段的一个基本属性是亮度幅度,而对于彩色图像是颜色分量。Image segmentation is the division or separation of an image into semantically or visually coherent regions. Each region is a group of connected pixels with similar attribute or attributes. A basic property used for segmentation is the magnitude of brightness for monochrome images, and color components for color images.
与不断增加的计算处理能力相结合的扫描技术的激增导致文档分析系统领域中的许多进展。这些系统可用于,通常借助于OCR技术,从扫描文档中提取语义信息。这种系统还可以用于通过根据页的每个部分的内容选择性地使用适当的压缩方法改进文档图像的压缩。改进的文档压缩使其自身适合于诸如存档和电子分发的应用。The proliferation of scanning technologies combined with ever-increasing computational processing power has led to many advances in the field of document analysis systems. These systems can be used to extract semantic information from scanned documents, usually by means of OCR techniques. Such a system can also be used to improve the compression of document images by selectively using the appropriate compression method according to the content of each part of the page. Improved document compression lends itself to applications such as archiving and electronic distribution.
分段是用于文档图像分析的处理阶段,其中在可以执行高等级处理诸如区域分类和布局分析之前,必须首先将低等级像素分段为原语对象。布局分析根据关于文档布局的一些预定规则,将原语对象分类为已知的对象类型。典型地,布局分析不分析原始扫描图像数据,而是工作于可替代的数据组,诸如来自页的分段的斑点或连通组分。除了各个对象属性之外,布局分析还可以使用对象分组,以便确定它们的分类。Segmentation is a processing stage for document image analysis where low-level pixels must first be segmented into primitive objects before high-level processing such as region classification and layout analysis can be performed. Layout analysis classifies primitive objects into known object types according to some predetermined rules about the layout of the document. Typically, layout analysis does not analyze raw scanned image data, but works on alternative data sets, such as blobs or connected components from segments of pages. In addition to individual object properties, layout analysis can also use object groupings in order to determine their classification.
下面描述用于图像分段的若干已知方法。Several known methods for image segmentation are described below.
阈值划分是用于分段的最简单的方法,并且如果将处理的图像是两级的(例如,黑白文档图像),其可能是快和有效的。然而,如果图像是复杂的,具有多个亮度或颜色等级的区域,在二值化过程中可能丢失这些区域中的某一些。更复杂的阈值划分技术采用自适应或多级阈值划分,其中按局部等级执行阈值估计和二值化处理。然而,这些方法仍然不能正确地分段对象。Thresholding is the simplest method for segmentation, and it can be fast and efficient if the image to be processed is two-level (eg black and white document image). However, if the image is complex, with regions of multiple brightness or color levels, some of these regions may be lost during the binarization process. More sophisticated thresholding techniques employ adaptive or multilevel thresholding, where threshold estimation and binarization are performed at a local level. However, these methods still cannot segment objects correctly.
基于聚类(clustering)的方法,诸如k均值和向量量化,往往产生好的分段结果,但是它们是需要多轮的迭代算法。因此,这种方法可能慢,并且难以实现。Clustering-based methods, such as k-means and vector quantization, tend to produce good segmentation results, but they are iterative algorithms that require many rounds. Therefore, this approach can be slow and difficult to implement.
分裂和合并图像分段技术基于四元树数据表示,其中如果原始图像段的属性不是一致的,则将正方形图像段分裂为4个象限。如果发现4个相邻的正方形一致,将这些正方形合并为由这4个相邻的正方形构成的单个正方形。分裂和合并处理通常开始于全图像等级。因此,仅在缓冲了整个页之后才能开始处理,这需要高的存储器带宽。另外,这种方法往往是计算密集的。The split and merge image segmentation technique is based on a quadtree data representation, where a square image segment is split into 4 quadrants if the attributes of the original image segment are not consistent. If 4 adjacent squares are found to be identical, merge these squares into a single square made of these 4 adjacent squares. The splitting and merging process usually starts at the full image level. Therefore, processing can start only after the entire page has been buffered, which requires high memory bandwidth. Additionally, this approach tends to be computationally intensive.
区域生长是一种用于图像分段的公知方法,并且是概念上最简单的方法之一。将具有类似属性或多个属性的相邻像素分为一组,以便形成段区域。然而,在实际中,必须对生长模式施加相当复杂的约束以便达到可接受的结果。已有的区域生长方法可能具有若干不理想的效果,因为这些方法往往偏向于初始种子的位置。不同的种子选择可能带来不同的分段结果,并且如果种子点位于边缘上,则可能出现问题。Region growing is a well known method for image segmentation and is one of the conceptually simplest methods. Adjacent pixels with similar attributes or attributes are grouped together to form segment regions. In practice, however, rather complex constraints must be imposed on the growth pattern in order to achieve acceptable results. Existing region growing methods may have several undesirable effects, since these methods tend to be biased towards the location of the initial seed. Different seed choices may bring different segmentation results, and if the seed point is on an edge, there may be problems.
与不断增加的计算处理能力相结合的扫描技术的激增导致在文档分析系统领域内的许多进展。这些系统可用于,通常借助于OCR技术,从扫描文档中提取语义信息。这种技术被用在数目日益增加的应用内,诸如自动表格阅读,并且还可以用于通过根据页的每个部分的内容选择性地使用适当的压缩方法,改进文档的压缩。改进的文档压缩使其自身适合于诸如存档和电子分发的应用。The proliferation of scanning technologies combined with ever-increasing computational processing power has led to many advances in the field of document analysis systems. These systems can be used to extract semantic information from scanned documents, usually by means of OCR techniques. This technique is used in an increasing number of applications, such as automatic form reading, and can also be used to improve the compression of documents by selectively using the appropriate compression method according to the content of each part of the page. Improved document compression lends itself to applications such as archiving and electronic distribution.
某些文档分析系统执行布局分析,以便将文档分为根据它们的内容分类的区域。典型地,布局分析不分析原始扫描的图像数据,而是工作于可替代的数据组,诸如来自页的分段的斑点或连通组分。除了各个对象属性之外,布局分析还可以使用对象分组,以便确定它们的分类。Some document analysis systems perform layout analysis to divide documents into regions classified according to their content. Typically, layout analysis does not analyze raw scanned image data, but instead works on alternative data sets, such as blobs or connected components from segments of pages. In addition to individual object properties, layout analysis can also use object groupings in order to determine their classification.
一般地,执行页的二值分段以便产生用于布局分析的数据,并且这可以通过简单地给原始图像设置阈值获得。这种二值分段的一个优点是分段的对象位于有助于进行布局分析的简单的包含分层结构内。不幸的是,许多复杂的颜色文档的布局不能完全以二值图像完整地表示。从颜色到二值图像转换中固有的信息内容的减少可能导致重要特征的退化,并且甚至导致文档的详细结构的丢失。Typically, binary segmentation of the pages is performed to generate data for layout analysis, and this can be obtained by simply thresholding the original image. One advantage of this binary segmentation is that the segmented objects reside within a simple containment hierarchy that facilitates layout analysis. Unfortunately, the layout of many complex color documents cannot be fully represented in binary images. The reduction of information content inherent in the conversion from color to binary images can lead to the degradation of important features, and even to the loss of the detailed structure of the document.
因此页文档分析的颜色分段在保持页的内容方面具有优点,但是也给其带来了附加的复杂性。首先,分段分析本身变得更棘手,并且增加了处理要求。第二,由于分段的页对象不形成包含分层结构,复杂化了对这些分段页对象的分析。这限制了布局分析的准确性和有效性。Color segmentation for page document analysis therefore has advantages in preserving the content of the page, but also brings it additional complexity. First, segmentation analysis itself becomes trickier and increases processing requirements. Second, because segmented page objects do not form containment hierarchies, analysis of these segmented page objects is complicated. This limits the accuracy and effectiveness of layout analysis.
文档布局分析系统还可以采用用于验证文档区域的文本分类的技术。这些方法中的某一些使用像素和、阴影(shadowing)和投影轮廓的直方图分析。这些方法通常是不可靠的,因为难以将健壮的统计量应用于这种方法,并且难以为可能是单行或许多行并且不知道文档中的字符集和文本对齐的文本进行调整。The document layout analysis system may also employ techniques for validating text classification of document regions. Some of these methods use histogram analysis of pixel sums, shadowing and projected contours. These methods are generally unreliable because it is difficult to apply robust statistics to this method, and it is difficult to adjust for text that may be a single line or many lines and does not know the character set and text alignment in the document.
发明内容Contents of the invention
根据本发明的第一个方面,提供了一种给包括多个像素的数字图像分段的方法。该方法包括步骤:从数字图像产生像素的多个块;和以单轮的方式使用像素块为每个块生成至少一个连通组分。生成步骤又包括:将像素块分段为至少一个连通组分,每个连通组分包括空间上相连并且语义相关的一组像素;将该块的该至少一个连通组分与从以前处理过的至少一个其它块分段出的至少一个连通组分合并;并且以紧凑的形式存储该块的连通组分在图像中的位置。According to a first aspect of the invention there is provided a method of segmenting a digital image comprising a plurality of pixels. The method comprises the steps of: generating a plurality of blocks of pixels from a digital image; and generating at least one connected component for each block using the blocks of pixels in a single round. The generating step further includes: segmenting the block of pixels into at least one connected component, each connected component comprising a set of pixels that are spatially connected and semantically related; and combining the at least one connected component of the block with previously processed at least one connected component segmented by at least one other block is merged; and the position of the connected component of the block in the image is stored in a compact form.
语义相关的像素可以包括颜色类似的像素。Semantically related pixels may include similarly colored pixels.
产生步骤可以包括子步骤:将数字图像布置成多个带,每个带包括预定数目的连贯的像素行;并且一个接一个地缓冲和处理这些带。该处理步骤又包括对每个当前缓冲的带执行的子步骤:将当前带布置成像素的多个块;和为生成步骤一个接一个地缓冲和处理当前带的块。The generating step may comprise the sub-steps of: arranging the digital image into a plurality of bands, each band comprising a predetermined number of consecutive rows of pixels; and buffering and processing the bands one after the other. This processing step in turn includes the sub-steps performed for each currently buffered band: arranging the current band into blocks of pixels; and buffering and processing the blocks of the current band one by one for the generation step.
存储子步骤可以包括存储M-1个二值位图,其中M个连通组分在一个块内,M是整数。The storing sub-step may include storing M-1 binary bitmaps, wherein M connected components are in one block, and M is an integer.
存储子步骤可以包括存储索引图。The storing sub-step may include storing the index map.
分段子步骤可以包括:为每个块估计若干代表性的颜色;将每个块量化为代表性的颜色;和从每个量化的块形成连通组分。分段子步骤还可以包括:将形成的连通组分的子集合并。合并子步骤可以包括收集连通组分的统计量。统计量可以包括边界框、像素计数、边界长度和平均颜色中的一个或多个。该方法还可以包括步骤:将被认为是噪声的所形成的连通组分删除。噪声可以包括具有低于预定阈值的像素计数和高于另一个预定阈值的边界长度与像素计数比的连通组分。合并步骤可以包括:将块的连通组分与左边和上边的块的连通组分合并;和更新合并的连通组分的统计量。统计量可以包括边界框、像素计数、填充比率和平均颜色中的一个或多个。The segmentation sub-step may include: estimating a number of representative colors for each block; quantizing each block into a representative color; and forming connected components from each quantized block. The segmenting sub-step may also include: merging the formed subsets of connected components. The merging substep may include collecting statistics on the connected components. Statistics may include one or more of bounding box, pixel count, border length, and mean color. The method may further comprise the step of deleting formed connected components considered to be noise. The noise may include connected components having a pixel count below a predetermined threshold and a boundary length to pixel count ratio above another predetermined threshold. The merging step may include: merging the connected components of the block with the connected components of the blocks to the left and above; and updating statistics of the combined connected components. Statistics may include one or more of bounding box, pixel count, fill ratio, and average color.
估计子步骤可以包括:基于每个块中的像素的YUV数据,形成与多个颜色库有关的直方图;基于直方图统计量对每个块分类;和基于块分类合并库颜色,以便形成代表性颜色。该方法还包括在一轮中为每个像素形成索引图的步骤。量化步骤可以包括:将非空库量化为代表性颜色;创建到代表性颜色的库映射;和使用库映射将索引图重新映射为代表性颜色。形成子步骤可以包括:基于Y值确定亮度带;基于U和V值确定颜色列;将像素颜色累积到映射库;和递增映射库的像素计数。确定亮度带的步骤还可以包括亮度带抗混叠。确定颜色列的步骤还可以包括颜色列抗混叠。The estimating sub-step may include: forming a histogram associated with a plurality of color bins based on the YUV data of the pixels in each block; classifying each block based on the histogram statistics; and merging the bin colors based on the block classification to form a representative sex color. The method also includes the step of forming an index map for each pixel in one round. The quantizing step may include: quantizing the non-empty library to a representative color; creating a library map to the representative color; and remapping the index map to the representative color using the library map. The forming sub-step may include: determining a brightness band based on Y values; determining a color column based on U and V values; accumulating pixel colors into a map library; and incrementing a pixel count of the map library. The step of determining the luma bands may also include luma band anti-aliasing. The step of determining the color column may also include color column anti-aliasing.
合并步骤可以包括为接触左和上边界的当前块内的各个连通组分执行的下面的子步骤:寻找沿着公共边界触及当前连通组分的一列连通组分;和确定合并的最佳候选。The merging step may include the following sub-steps performed for each connected component within the current block that touches the left and upper boundaries: finding a list of connected components that touch the current connected component along a common boundary; and determining the best candidate for merging.
根据本发明的另一个方面,提供了一种包括处理器和存储器的装置,用于根据前面的方法中的任意一个方面给包括多个像素的数字图像分段。According to another aspect of the invention there is provided an apparatus comprising a processor and a memory for segmenting a digital image comprising a plurality of pixels according to any one of the preceding aspects of the method.
根据本发明的另一个方面,提供了一种包括计算机可读介质的计算机程序产品,该计算机可读介质具有记录在其中的用于根据前面的方法中的任意一个方面给包括多个像素的数字图像分段的计算机程序。According to another aspect of the present invention, there is provided a computer program product comprising a computer readable medium having recorded therein a digital pixel comprising a plurality of pixels according to any one of the preceding methods. A computer program for image segmentation.
根据本发明的另一个方面,提供了一种自动产生颜色文档的紧凑表示的方法。该方法包括步骤:在一轮中按块光栅顺序,将颜色文档页的数字图像分段为连通组分;基于整个页的紧凑的、连通组分统计量,使用布局分析将页的数字图像划分为前景和背景图像;在前景图像的至少一个部分遮蔽了背景图像的位置,在一个轮中以块光栅顺序修复背景图像的至少一个部分;及合并前景图像和背景图像以形成紧凑的文档。According to another aspect of the invention, a method of automatically generating a compact representation of a color document is provided. The method comprises the steps of: segmenting a digital image of a color document page into connected components in block raster order in one round; segmenting the digital image of the page using layout analysis based on a compact, connected component statistic for the entire page being the foreground and background images; inpainting at least a portion of the background image in block raster order in one pass where at least a portion of the foreground image obscures the background image; and merging the foreground and background images to form a compact document.
该方法还可以包括对背景图像下采样的步骤。此外,该方法可包括压缩背景图像的步骤。压缩步骤可以涉及有损压缩。另外,该方法可以包括对有损压缩的背景图像的不同压缩。The method may also include the step of downsampling the background image. Furthermore, the method may comprise the step of compressing the background image. The compression step may involve lossy compression. Additionally, the method may include different compression of the lossy compressed background image.
根据本发明的另一个方面,提供了一种包括处理器和存储器的装置,用于根据前面的方法中的任意一个方面自动产生颜色文档的紧凑表示。According to another aspect of the invention there is provided an apparatus comprising a processor and memory for automatically generating a compact representation of a color document according to any one of the preceding aspects of the method.
根据本发明的另一个方面,提供了一种包括计算机可读介质的计算机程序产品,该计算机可读介质具有记录在其中的用于根据前面的方法中的任意一个方面自动产生颜色文档的紧凑表示的计算机程序。According to another aspect of the present invention there is provided a computer program product comprising a computer readable medium having recorded thereon a compact representation for automatically generating a color profile according to any one of the preceding method aspects computer program.
根据本发明的另一个方面,提供了一种分析包括多个像素的数字图像的方法。该方法包括步骤:将数字图像分段为对象,其中以多于两个标签表示分段;为每个对象提供一组属性;为对象的子集,使用包含测量确定在共享边界的相邻对象间是否存在父亲-孩子关系;基于对象的属性,形成共享共同父亲的多组对象;和根据它们的属性和分组给对象分类。According to another aspect of the invention, a method of analyzing a digital image comprising a plurality of pixels is provided. The method comprises the steps of: segmenting a digital image into objects, wherein the segments are represented by more than two labels; providing each object with a set of attributes; for a subset of objects, determining adjacent objects at a shared boundary using a containment measure whether there is a parent-child relationship between them; based on the attributes of the objects, forming groups of objects that share a common parent; and classifying objects according to their attributes and groupings.
可以使用每个对象周围的边界框和描述对象间的触及关系的信息确定包含。如果两个对象在边界接触,并且一个对象的边界框完全包含另一个对象的边界框,则该对象包含该另一个对象。Containment can be determined using a bounding box around each object and information describing touching relationships between objects. If two objects touch at the boundary, and one object's bounding box completely contains the other object's bounding box, then the object contains the other object.
形成组步骤可以包括:考虑共同父亲的一列孩子中的孩子对象的对;和使用对象属性,确定每个对是否应被分组到一起。仅可以考虑对具有相同父亲的一列孩子对象中的相邻对象进行分组。可以基于边界框和颜色信息给对象分组。The forming a group step may include: considering pairs of child objects in a list of children of a common parent; and using object properties, determining whether each pair should be grouped together. Only adjacent objects in a column of child objects with the same parent can be considered for grouping. Objects can be grouped based on bounding box and color information.
可以根据组内对象的文本类质量测试将一组对象分类为文本。用于文本类质量的测试可以包括:为每个对象识别表示对象位置的单个值;形成这些值的直方图;和按照直方图的属性识别文本。A group of objects may be classified as text based on a text-like quality test of the objects within the group. A test for text-like quality may include: identifying for each object a single value representing the location of the object; forming a histogram of these values; and identifying text according to attributes of the histogram.
本发明还可以包括步骤:根据其它对象的属性将它们添加到对象的文本分类组,而不管它们的父亲-孩子属性如何。The present invention may also include the step of adding other objects to the text classification group of objects according to their attributes, regardless of their parent-child attributes.
根据本发明的另一个方面,提供了一种分析包括文档页的多个像素的数字图像的方法。该方法包括步骤:给数字图像分段,以便基于图像形成对象;形成对象的组;和确定每个对象组是否表示文本。确定步骤包括:根据对象在页上的位置,为每个对象识别单个值;形成这些值的直方图;和按照直方图的属性识别文本。According to another aspect of the invention, a method of analyzing a digital image comprising a plurality of pixels of a document page is provided. The method includes the steps of: segmenting the digital image to form objects based on the image; forming groups of objects; and determining whether each group of objects represents text. The determining step includes: identifying a single value for each object based on the position of the object on the page; forming a histogram of the values; and identifying the text according to the properties of the histogram.
直方图的属性可以是具有多于指定的对象数目的直方图中的库内的对象的总数。可替换地,属性可以是直方图内的计数的平方和。An attribute of the histogram may be the total number of objects in the library that have more than the specified number of objects in the histogram. Alternatively, the attribute may be the sum of squares of the counts within the histogram.
表示对象位置的每个对象的单个值可以是对象的边界框的边。A single value per object representing the object's location may be the sides of the object's bounding box.
根据本发明的另一个方面,提供了一种包括处理器和存储器的装置,用于根据按照前面任意一个方面的方法,分析包括多个像素的数字图像。According to another aspect of the invention there is provided an apparatus comprising a processor and a memory for analyzing a digital image comprising a plurality of pixels according to a method according to any one of the preceding aspects.
根据本发明的另一个方面,提供了一种包括计算机可读介质的计算机程序产品,该计算机可读介质具有记录在其中的用于根据按照前面任意一个方面的方法分析包括多个像素的数字图像的计算机程序。According to another aspect of the present invention there is provided a computer program product comprising a computer readable medium having recorded thereon a method for analyzing a digital image comprising a plurality of pixels according to a method according to any one of the preceding aspects computer program.
根据本发明的另一个方面,提供了一种修复包括多个像素的数字图像的方法。该方法包括步骤:从数字图像产生多个像素块;和按照光栅顺序为至少一个块改变至少一串(run)像素的像素值。改变步骤包括对每个块执行的如下的子步骤:确定与一个对象有关的块内的一串像素的开始和结束像素,该串包括被分组到一起的相邻的像素;根据该串外的像素的像素值,修改所述串内的对象的至少一个像素值;和确定该块内不相应于对象的像素的活性(acitivity)测量值;以及如果该块的活性测量值小于预定的阈值,将具有至少一串像素的每个块内的所有像素值改变为设置值。According to another aspect of the present invention, a method of inpainting a digital image comprising a plurality of pixels is provided. The method includes the steps of: generating a plurality of blocks of pixels from a digital image; and varying pixel values of at least one run of pixels in raster order for at least one block. The altering step includes the following sub-steps performed for each block: determining the beginning and ending pixels of a string of pixels within the block associated with an object, the string comprising adjacent pixels grouped together; modifying at least one pixel value of an object within the string; and determining a measure of activity (acitivity) of a pixel within the block that does not correspond to an object; and if the measure of activity of the block is less than a predetermined threshold, All pixel values within each block having at least one string of pixels are changed to the set value.
该方法还包括步骤:根据扩大的对象之外的像素的像素值,修改对象之外扩大的对象像素的至少一个像素值。The method further includes the step of modifying at least one pixel value of pixels of the enlarged object outside the object based on pixel values of pixels outside the enlarged object.
该产生步骤可以包括子步骤:将数字图像布置为多个带,每个带包括预定数目的连贯的像素行;和一个接一个地缓冲和处理这些带。该处理步骤可以包括对每个当前的缓冲带执行的如下的子步骤:将当前带布置为多个像素块;和为改变步骤一个接一个地缓冲和处理当前带的块。This generating step may comprise the sub-steps of: arranging the digital image into a plurality of bands, each band comprising a predetermined number of consecutive rows of pixels; and buffering and processing the bands one after the other. This processing step may comprise the following sub-steps performed for each current buffer band: arranging the current band into a plurality of pixel blocks; and buffering and processing the blocks of the current band one by one for the changing step.
所述串包括块的像素的光栅行中的相邻像素。The string comprises adjacent pixels in a raster row of pixels of the block.
该方法还包括使用基于块的压缩方法压缩各个块的步骤。基于块的压缩方法可以是JPEG。该方法还可以包括使用另一种压缩技术进一步压缩基于块压缩的块的步骤。The method also includes the step of compressing each block using a block-based compression method. The block-based compression method may be JPEG. The method may also include the step of further compressing the blocks based on the block compression using another compression technique.
可以根据对象之外的像素的像素值,使用根据串的左边和右边的像素值内插的值,或来自串的左边的一个像素的值,修改对象的至少一个像素值。At least one pixel value of the object may be modified according to pixel values of pixels outside the object, using a value interpolated from pixel values to the left and right of the string, or a value from a pixel to the left of the string.
可以将具有至少一串像素的每个块的所有像素值改变为以前处理过的块的平均值,或该块内的可视像素的平均值。All pixel values of each block having at least one string of pixels may be changed to the average value of previously processed blocks, or the average value of the visible pixels within the block.
该方法还可以包括步骤:如果未发现扩大的对象像素串的结尾,则将像素的颜色值设置为不相应于扩大的对象像素串左边的对象的像素的颜色值。The method may further comprise the step of, if the end of the enlarged object pixel string is not found, setting the color value of the pixel to the color value of a pixel not corresponding to an object to the left of the enlarged object pixel string.
像素值可以是颜色值。Pixel values can be color values.
根据本发明的另一个方面,提供了一种包括处理器和存储器的装置,用于根据前面任意一个方面的方法,修复包括多个像素的数字图像。According to another aspect of the present invention, there is provided an apparatus comprising a processor and a memory for repairing a digital image comprising a plurality of pixels according to the method of any one of the preceding aspects.
根据本发明的另一个方面,提供了一种包括计算机可读介质的计算机程序产品,该计算机可读介质具有记录在其中的用于根据前面任意一个方面的方法,修复包括多个像素的数字图像的计算机程序。According to another aspect of the present invention there is provided a computer program product comprising a computer readable medium having recorded therein a method for restoring a digital image comprising a plurality of pixels according to any one of the preceding aspects computer program.
根据本发明的另一个方面,提供了一种改变包括多个像素的数字图像的像素值的方法,这些像素的至少一部分相应于图像中的一个对象,该方法包括步骤:将数字图像布置为多个带,每个带包括预定数目的连贯的像素行;和依次一个接一个缓冲和处理这些带。该处理步骤可以包括对每个当前的缓冲带执行的如下的子步骤:将当前带布置为多个像素块;和依次一个接一个处理这些块。块处理步骤包括针对每个块的如下子步骤:确定块内不相应于图像内的对象的像素的活性测量值;如果活性测量值小于预定的阈值,则将该块内的所有像素的像素值改变为一个像素值;使用JPEG压缩该块;和使用另一种压缩方法压缩JPEG压缩的块。According to another aspect of the present invention, there is provided a method of changing pixel values of a digital image comprising a plurality of pixels, at least some of which correspond to an object in the image, the method comprising the steps of arranging the digital image as a plurality of bands, each band comprising a predetermined number of consecutive pixel rows; and sequentially buffering and processing the bands one after the other. This processing step may comprise the following sub-steps performed on each current buffer strip: arranging the current strip into a plurality of pixel blocks; and processing these blocks sequentially one by one. The block processing step comprises, for each block, the following sub-steps: determining activity measures for pixels within the block that do not correspond to objects within the image; change to a pixel value; compress the block using JPEG; and compress the JPEG-compressed block using another compression method.
每个带可包括数字图像的16行像素,块包括16×16个像素,并且以流水线方式执行压缩步骤。Each band may comprise 16 rows of pixels of the digital image, the blocks comprise 16x16 pixels, and the compression step is performed in a pipelined manner.
改变块内的所有像素的颜色值的步骤可以包括将像素的颜色值设置为在不相应于紧挨着扩大的对象像素串的左边和右边的对象的像素间进行线性内插而获得的颜色值,扩大的对象像素是串外并且与串相邻的像素。The step of changing the color values of all pixels within the block may include setting the color values of the pixels to color values obtained by linear interpolation between pixels not corresponding to objects immediately left and right of the expanded object pixel string , the expanded target pixels are pixels outside the string and adjacent to the string.
该方法还可以包括步骤:将块内的像素的颜色值设置为不相应于块内的对象的像素的平均颜色值。The method may further comprise the step of setting the color values of pixels within the block to an average color value of pixels not corresponding to objects within the block.
该方法还可以包括步骤:将块内的像素的颜色值设置为之前的块的平均颜色。The method may further comprise the step of setting the color value of the pixels within the block to the average color of previous blocks.
可以通过扩大定义对象的位置的掩蔽,确定扩大的对象的像素。The pixels of the dilated object may be determined by dilating the mask defining the location of the object.
其他的压缩方法可以包括ZLIB。Other compression methods may include ZLIB.
像素值可以是颜色值。Pixel values can be color values.
根据本发明的另一个方面,提供了一种包括处理器和存储器的装置,用于根据前面任意一个方面的方法,改变包括多个像素的数字图像的像素值,这些像素的至少一部分相应于图像中的对象。According to another aspect of the present invention there is provided an apparatus comprising a processor and memory for altering the pixel values of a digital image comprising a plurality of pixels, at least some of which correspond to image objects in .
根据本发明的另一个方面,提供了一种包括计算机可读介质的计算机程序产品,该计算机可读介质具有记录在其中的用于根据前面任意一个方面的方法,改变包括多个像素的数字图像的像素值的计算机程序,这些像素的至少一部分相应于图像中的对象。According to another aspect of the present invention there is provided a computer program product comprising a computer readable medium having recorded therein a method for altering a digital image comprising a plurality of pixels according to any one of the preceding aspects A computer program of pixel values at least some of which correspond to objects in an image.
根据本发明的另一个方面,提供了一种给包括多个像素的数字图像分段的方法。该方法包括步骤:从数字图像产生多个像素块;和以一轮的方式使用像素块为每个块生成至少一个连通组分。生成步骤又包括:将像素块分段为至少一个连通组分,每个连通组分包括一组空间上相连并且语义上相关的像素;将块的至少一个连通组分与从以前处理过的至少一个其他块分段出的至少一个连通组分合并;和以紧凑形式存储该块的连通组分在图像中的位置。According to another aspect of the invention, a method of segmenting a digital image comprising a plurality of pixels is provided. The method comprises the steps of: generating a plurality of pixel blocks from a digital image; and using the pixel blocks in a round to generate at least one connected component for each block. The generating step further includes: segmenting the block of pixels into at least one connected component, each connected component comprising a set of spatially connected and semantically related pixels; merging at least one connected component segmented from one other block; and storing the position of the connected component of the block in the image in a compact form.
语义相关的像素可以包括颜色类似的像素。Semantically related pixels may include similarly colored pixels.
存储子步骤可以包括存储M-1个二值位图,其中M个连通组分在一个块内,M是整数。The storing sub-step may include storing M-1 binary bitmaps, wherein M connected components are in one block, and M is an integer.
存储子步骤可以包括存储索引图。The storing sub-step may include storing the index map.
分段子步骤可以包括:为每个块估计若干代表性的颜色;将每个块量化为代表性的颜色;和从每个量化的块形成连通组分。分段子步骤还可以包括:将形成的连通组分的子集合并。合并子步骤可以包括收集连通组分的统计量。统计量可以包括边界框、像素计数、边界长度和平均颜色中的一个或多个。该方法还可以包括步骤将被认为是噪声的所形成的连通组分删除的步骤。噪声可以包括具有低于预定阈值的像素计数和高于另一个预定义阈值的边界长度与像素计数比的连通组分。合并步骤可以包括:将块的连通组分与左边和上边的块的连通组分合并;和更新合并的连通组分的统计量。统计量可以包括边界框、像素计数、填充比率和平均颜色中的一个或多个。The segmentation sub-step may include: estimating a number of representative colors for each block; quantizing each block into a representative color; and forming connected components from each quantized block. The segmenting sub-step may also include: merging the formed subsets of connected components. The merging substep may include collecting statistics on the connected components. Statistics may include one or more of bounding box, pixel count, border length, and mean color. The method may further comprise the step of removing formed connected components considered to be noise. The noise may include connected components having a pixel count below a predetermined threshold and a boundary length to pixel count ratio above another predefined threshold. The merging step may include: merging the connected components of the block with the connected components of the blocks to the left and above; and updating statistics of the combined connected components. Statistics may include one or more of bounding box, pixel count, fill ratio, and average color.
估计子步骤可以包括:基于每个块中的像素的YUV数据,形成与多个颜色库有关的直方图;基于直方图统计量对每个块分类;和基于块分类合并库颜色,以便形成代表性颜色。该方法还包括在一轮中为每个像素形成索引图的步骤。量化步骤可以包括:将非空库量化为代表性颜色;创建到代表性颜色的库映射;和使用库映射将索引图重新映射到代表性颜色。形成子步骤可以包括:基于Y值确定亮度带;基于U和V值确定颜色列;将像素颜色累积到映射库;和递增映射库的像素计数。确定亮度带的步骤还可以包括亮度带抗混叠。确定颜色列的步骤还可以包括颜色列抗混叠。The estimating sub-step may include: forming a histogram associated with a plurality of color bins based on the YUV data of the pixels in each block; classifying each block based on the histogram statistics; and merging the bin colors based on the block classification to form a representative sex color. The method also includes the step of forming an index map for each pixel in one round. The quantizing step may include: quantizing the non-empty library to a representative color; creating a library map to the representative color; and remapping the index map to the representative color using the library map. The forming sub-step may include: determining a brightness band based on Y values; determining a color column based on U and V values; accumulating pixel colors into a map library; and incrementing a pixel count of the map library. The step of determining the luma bands may also include luma band anti-aliasing. The step of determining the color column may also include color column anti-aliasing.
合并步骤可以包括为接触左和上边界的当前块内的各个连通组分执行的下面的子步骤:寻找沿着公共边界接触当前连通组分的一列连通组分;和确定合并的最佳候选。The merging step may include the following sub-steps performed for each connected component within the current block that touches the left and upper boundaries: finding a list of connected components that touch the current connected component along a common boundary; and determining the best candidate for merging.
根据本发明的另一个方面,提供了一种包括处理器和存储器的装置,用于根据前面的方法中的任意一个方面给包括多个像素的数字图像分段。According to another aspect of the invention there is provided an apparatus comprising a processor and a memory for segmenting a digital image comprising a plurality of pixels according to any one of the preceding aspects of the method.
根据本发明的另一个方面,提供了一种包括计算机可读介质的计算机程序产品,该计算机可读介质具有记录在其中的用于根据前面的方法中的任意一个方面给包括多个像素的数字图像分段的计算机程序。According to another aspect of the present invention, there is provided a computer program product comprising a computer readable medium having recorded therein a digital pixel comprising a plurality of pixels according to any one of the preceding methods. A computer program for image segmentation.
根据本发明的另一个方面,提供了一种自动产生颜色文档的紧凑表示的方法。该方法包括步骤:在一轮中以块光栅的顺序,将颜色文档页的数字图像分段为连通组分;基于整个页的紧凑的、连通组分的统计量,使用布局分析将页的数字图像划分为前景和背景图像;在前景图像的至少一个部分遮蔽了背景图像的位置,在单轮中以块光栅的顺序修复背景图像的至少一个部分;和合并前景图像和背景图像形成紧凑的文档。According to another aspect of the invention, a method of automatically generating a compact representation of a color document is provided. The method comprises the steps of: segmenting a digital image of a color document page into connected components in the order of a block raster in one round; using layout analysis to segment the digital image of the page dividing the image into foreground and background images; inpainting at least one portion of the background image in a single pass in a block-raster order where at least one portion of the foreground image obscures the background image; and merging the foreground and background images to form a compact document .
该方法还可以包括对背景图像下采样的步骤。另外,该方法可包括压缩背景图像的步骤。压缩步骤可以涉及有损压缩。另外,该方法可以包括对有损压缩的背景图像的不同压缩。The method may also include the step of downsampling the background image. Additionally, the method may include the step of compressing the background image. The compression step may involve lossy compression. Additionally, the method may include different compression of the lossy compressed background image.
根据本发明的另一个方面,提供了一种包括处理器和存储器的装置,用于根据前面的方法中的任意一个方面自动产生颜色文档的紧凑表示。According to another aspect of the invention there is provided an apparatus comprising a processor and memory for automatically generating a compact representation of a color document according to any one of the preceding aspects of the method.
根据本发明的另一个方面,提供了一种包括计算机可读介质的计算机程序产品,该计算机可读介质具有记录在其中的用于根据前面的方法中的任意一个方面自动产生颜色文档的紧凑表示的计算机程序。According to another aspect of the present invention there is provided a computer program product comprising a computer readable medium having recorded thereon a compact representation for automatically generating a color profile according to any one of the preceding method aspects computer program.
根据本发明的另一个方面,提供了一种分析包括多个像素的数字图像的方法。该方法包括步骤:将数字图像分段为对象,其中以多于两个标签表示分段;为每个对象提供一组属性;为对象的子集,使用包含测量确定在共享边界的相邻对象间是否存在父亲-孩子关系;基于对象的属性,形成共享共同父亲的多组对象;和根据对象的属性和分组对它们分类。According to another aspect of the invention, a method of analyzing a digital image comprising a plurality of pixels is provided. The method comprises the steps of: segmenting a digital image into objects, wherein the segments are represented by more than two labels; providing each object with a set of attributes; for a subset of objects, determining adjacent objects at a shared boundary using a containment measure whether there is a parent-child relationship between them; based on the attributes of the objects, forming groups of objects that share a common parent; and classifying the objects according to their attributes and groupings.
可以使用每个对象周围的边界框和描述对象间的触及关系的信息确定包含。如果两个对象在边界接触,并且一个对象的边界框完全包含另一个对象的边界框,则该对象包含该另一个对象。Containment can be determined using a bounding box around each object and information describing touching relationships between objects. If two objects touch at the boundary, and one object's bounding box completely contains the other object's bounding box, then the object contains the other object.
形成组步骤可以包括:考虑共同父亲的一列孩子中的孩子对象的对;和使用对象属性,确定每个对是否应被分组到一起。仅可以考虑为具有相同父亲的一列孩子对象中的相邻对象进行分组。可以基于边界框和颜色信息对对象分组。The forming a group step may include: considering pairs of child objects in a list of children of a common parent; and using object properties, determining whether each pair should be grouped together. Grouping can only be considered for adjacent objects in a column of child objects that have the same parent. Objects can be grouped based on bounding box and color information.
可以根据组内对象的文本类质量测试将一组对象分类为文本。用于文本类质量的测试可以包括:为每个对象识别表示对象位置的单个值;形成这些值的直方图;和按照直方图的属性识别文本。A group of objects may be classified as text based on a text-like quality test of the objects within the group. A test for text-like quality may include: identifying for each object a single value representing the location of the object; forming a histogram of these values; and identifying text according to attributes of the histogram.
本发明还可以包括步骤:根据其它对象的属性将它们添加到对象的文本分类组,而不论的它们的父亲-孩子属性如何。The present invention may also include the step of adding other objects to the text classification group of objects according to their attributes, regardless of their parent-child attributes.
根据本发明的另一个方面,提供了一种分析包括文档页的多个像素的数字图像的方法。该方法包括步骤:给数字图像分段,以便基于图像形成对象;形成对象的组;和确定每个对象组是否表示文本。确定步骤包括:根据对象在页上的位置,为每个对象识别单个值;形成这些值的直方图;和按照直方图的属性识别文本。According to another aspect of the invention, a method of analyzing a digital image comprising a plurality of pixels of a document page is provided. The method includes the steps of: segmenting the digital image to form objects based on the image; forming groups of objects; and determining whether each group of objects represents text. The determining step includes: identifying a single value for each object based on the position of the object on the page; forming a histogram of the values; and identifying the text according to the properties of the histogram.
直方图的属性可以是直方图中具有多于指定的对象数目的库中的对象的总数。可替换地,属性可以是直方图内的计数的平方和。An attribute of the histogram may be the total number of objects in the library that have more than a specified number of objects in the histogram. Alternatively, the attribute may be the sum of squares of the counts within the histogram.
表示对象位置的每个对象的单个值可以是对象的边界框的边缘。A single value per object representing the object's position may be the edge of the object's bounding box.
根据本发明的另一个方面,提供了一种包括处理器和存储器的装置,用于根据按照前面任意一个方面的方法,分析包括多个像素的数字图像。According to another aspect of the invention there is provided an apparatus comprising a processor and a memory for analyzing a digital image comprising a plurality of pixels according to a method according to any one of the preceding aspects.
根据本发明的另一个方面,提供了一种包括计算机可读介质的计算机程序产品,该计算机可读介质具有记录在其中的用于根据按照前面任意一个方面的方法,分析包括多个像素的数字图像的计算机程序。According to another aspect of the present invention, there is provided a computer program product comprising a computer readable medium having recorded therein a method for analyzing a digital image comprising a plurality of pixels according to a method according to any one of the preceding aspects. Image computer program.
根据本发明的另一个方面,提供了一种修复包括多个像素的数字图像的方法。该方法包括步骤:从数字图像产生多个像素块;和为至少一个块,按照光栅顺序改变至少一串像素的像素值。改变步骤包括对每个块的执行如下的子步骤:确定块内与一个对象有关的一串像素的开始和结束像素,该串包括被分组到一起的相邻的像素;根据该串外的像素的像素值,修改所述串内的对象的至少一个像素值;和确定不相应于该块内对象的像素的活性测量值;以及如果块的活性测量值小于预定的阈值,将具有至少一串像素的每个块内的所有像素值改变为设置值。According to another aspect of the present invention, a method of inpainting a digital image comprising a plurality of pixels is provided. The method includes the steps of: generating a plurality of pixel blocks from a digital image; and, for at least one block, varying pixel values of at least one series of pixels in a raster order. The altering step comprises the following sub-steps performed for each block: determining the start and end pixels of a string of pixels associated with an object within the block, the string comprising adjacent pixels grouped together; modify at least one pixel value of an object within the string; and determine an activity measure for a pixel that does not correspond to an object within the block; and if the measure of activity for the block is less than a predetermined threshold, will have at least one of the string All pixel values within each block of pixels are changed to the set value.
该方法还包括步骤:根据扩大的对象之外的像素的像素值,修改对象之外的扩大的对象像素的至少一个像素值。The method further includes the step of modifying at least one pixel value of pixels of the enlarged object outside the object based on pixel values of pixels outside the enlarged object.
该产生步骤可以包括子步骤:将数字图像布置为多个带,每个带包括预定数目的连贯的像素行;和一个接一个地缓冲和处理这些带。该处理步骤可以包括对每个当前缓冲的带执行的如下的子步骤:将当前带布置为多个像素块;和为改变步骤一个接一个地缓冲和处理当前带的块。This generating step may comprise the sub-steps of: arranging the digital image into a plurality of bands, each band comprising a predetermined number of consecutive rows of pixels; and buffering and processing the bands one after the other. This processing step may comprise the following sub-steps performed on each currently buffered band: arranging the current band into a plurality of pixel blocks; and buffering and processing the blocks of the current band one by one for the changing step.
所述串包括块的像素的光栅行中的相邻像素。The string comprises adjacent pixels in a raster row of pixels of the block.
该方法还可以包括使用基于块的压缩方法压缩各个块的步骤。基于块的压缩方法可以是JPEG。该方法还可以包括使用另一种压缩技术进一步压缩基于块压缩的块的步骤。The method may also include the step of compressing individual blocks using a block-based compression method. The block-based compression method may be JPEG. The method may also include the step of further compressing the blocks based on the block compression using another compression technique.
可以根据对象之外的像素的像素值,使用根据串的左边和右边的像素值内插的值,或来自串的左边的一个像素的值,修改对象的至少一个像素值。At least one pixel value of the object may be modified according to pixel values of pixels outside the object, using a value interpolated from pixel values to the left and right of the string, or a value from a pixel to the left of the string.
可以将具有至少一串像素的每个块的所有像素值改变为以前处理过的块的平均值,或该块内的可视像素的平均值。All pixel values of each block having at least one string of pixels may be changed to the average value of previously processed blocks, or the average value of the visible pixels within the block.
该方法还可以包括步骤:如果未发现扩大的对象像素串的结尾,则将像素的颜色值设置为不相应于扩大的对象像素串左边的对象的像素的颜色值。The method may further comprise the step of, if the end of the enlarged object pixel string is not found, setting the color value of the pixel to the color value of a pixel not corresponding to an object to the left of the enlarged object pixel string.
像素值可以是颜色值。Pixel values can be color values.
根据本发明的另一个方面,提供了一种包括处理器和存储器的装置,用于根据前面任意一个方面的方法,修复包括多个像素的数字图像。According to another aspect of the present invention, there is provided an apparatus comprising a processor and a memory for repairing a digital image comprising a plurality of pixels according to the method of any one of the preceding aspects.
根据本发明的另一个方面,提供了一种包括计算机可读介质的计算机程序产品,该计算机可读介质具有记录在其中的用于根据前面任意一个方面的方法,修复包括多个像素的数字图像的计算机程序。According to another aspect of the present invention there is provided a computer program product comprising a computer readable medium having recorded therein a method for restoring a digital image comprising a plurality of pixels according to any one of the preceding aspects computer program.
根据本发明的另一个方面,提供了一种改变包括多个像素的数字图像的像素值的方法,这些像素的至少一部分相应于图像中的对象,该方法包括步骤:将数字图像布置为多个带,每个带包括预定数目的连贯的像素行;和依次一个接一个缓冲和处理这些带。该处理步骤可以包括用于每个当前的缓冲带的如下的子步骤:将当前带布置为多个像素块;和依次一个接一个处理这些块。块处理步骤包括针对每个块的如下子步骤:确定块内不相应于图像内的对象的像素的活性测量值;如果活性测量值小于预定的阈值,则将该块内的所有像素的像素值改变为一个像素值;使用JPEG压缩该块;和使用另一种压缩方法压缩JPEG压缩的块。According to another aspect of the present invention, there is provided a method of changing pixel values of a digital image comprising a plurality of pixels, at least some of which correspond to objects in the image, the method comprising the steps of arranging the digital image as a plurality of strips, each strip comprising a predetermined number of consecutive rows of pixels; and sequentially buffering and processing the strips one after the other. This processing step may comprise the following sub-steps for each current buffer strip: arranging the current strip into a plurality of pixel blocks; and processing these blocks sequentially one after the other. The block processing step comprises, for each block, the following sub-steps: determining activity measures for pixels within the block that do not correspond to objects within the image; change to a pixel value; compress the block using JPEG; and compress the JPEG-compressed block using another compression method.
每个带可包括数字图像的16行像素,块包括16×16个像素,并且以流水线方式执行压缩步骤。Each band may comprise 16 rows of pixels of the digital image, the blocks comprise 16x16 pixels, and the compression step is performed in a pipelined fashion.
改变块内的所有像素的颜色值的步骤可以包括将像素的颜色值设置为通过在不相应于紧挨着扩大的对象像素串的左边和右边的对象的像素间进行线性内插而获得的颜色值,扩大的对象像素是串外并且与串相邻的像素。The step of changing the color values of all pixels within the block may include setting the color values of the pixels to a color obtained by linear interpolation between pixels not corresponding to objects immediately left and right of the expanded object pixel string value, the pixels to be expanded are pixels outside the string and adjacent to the string.
该方法还可以包括步骤:将块内的像素的颜色值设置为不相应于块内的对象的像素的平均颜色值。The method may further comprise the step of setting the color values of pixels within the block to an average color value of pixels not corresponding to objects within the block.
该方法还可以包括步骤:将块内的像素的颜色值设置为在前块的平均颜色。The method may further comprise the step of setting the color value of the pixels within the block to the average color of the previous block.
可以通过扩大定义对象的位置的掩蔽,确定扩大的对象的像素。The pixels of the dilated object may be determined by dilating the mask defining the location of the object.
其他的压缩方法可以包括ZLIB。Other compression methods may include ZLIB.
像素值可以是颜色值。Pixel values can be color values.
根据本发明的另一个方面,提供了一种包括处理器和存储器的装置,用于根据前面任意一个方面的方法,改变包括多个像素的数字图像的像素值,这些像素的至少一部分相应于图像中的对象。According to another aspect of the present invention there is provided an apparatus comprising a processor and memory for altering the pixel values of a digital image comprising a plurality of pixels, at least some of which correspond to image objects in .
根据本发明的另一个方面,提供了一种包括计算机可读介质的计算机程序产品,该计算机可读介质具有记录在其中的用于根据前面任意一个方面的方法,改变包括多个像素的数字图像的像素值的计算机程序,这些像素的至少一部分相应于图像中的对象。According to another aspect of the present invention there is provided a computer program product comprising a computer readable medium having recorded therein a method for altering a digital image comprising a plurality of pixels according to any one of the preceding aspects A computer program of pixel values, at least some of which correspond to objects in an image.
附图说明Description of drawings
下面参考附图描述若干实施例,其中:Several embodiments are described below with reference to the accompanying figures, in which:
图1是提供根据本发明的实施例的分段、分析和压缩数字图像的概要的高层流程图;Figure 1 is a high-level flowchart providing an overview of segmenting, analyzing and compressing a digital image according to an embodiment of the invention;
图2是图1的颜色分段步骤的流程图;Fig. 2 is the flow chart of the color segmentation step of Fig. 1;
图3是图2的获得下一个像素的步骤的详细流程图;Fig. 3 is a detailed flowchart of the steps of obtaining the next pixel in Fig. 2;
图4是图1的布局分析步骤的流程图;Fig. 4 is a flowchart of the layout analysis steps of Fig. 1;
图5是图1的产生压缩输出图像的步骤的流程图;FIG. 5 is a flowchart of the steps of generating a compressed output image of FIG. 1;
图6是包括字符“i”和相关联的连通组分的图像的方框图;Figure 6 is a block diagram of an image including the character "i" and associated connected components;
图7是可以实施本发明的实施例的通用计算机系统的方框图;FIG. 7 is a block diagram of a general-purpose computer system on which embodiments of the present invention may be implemented;
图8是根据本发明的实施例的用于为扫描的输入页产生压缩输出的系统的方框图;8 is a block diagram of a system for generating compressed output for scanned input pages according to an embodiment of the present invention;
图9是图10的颜色连通组分分析和斑点统计步骤的详细流程图;Fig. 9 is a detailed flowchart of the color connected component analysis and spot statistics steps of Fig. 10;
图10是图2的颜色分段步骤的流程图;Fig. 10 is a flowchart of the color segmentation step of Fig. 2;
图11是图10的颜色量化步骤的详细流程图;Fig. 11 is a detailed flowchart of the color quantization step of Fig. 10;
图12是图9的形成新斑点或生长已有斑点的步骤的详细流程图;Figure 12 is a detailed flowchart of the steps of forming new spots or growing existing spots of Figure 9;
图13是图10的贴片(tile)间合并步骤的流程图;Fig. 13 is a flowchart of the step of merging between tiles of Fig. 10;
图14是图13的处理最佳斑点和CC对的步骤的流程图;Figure 14 is a flowchart of the steps of processing the best spots and CC pairs of Figure 13;
图15是示出了斑点合并处理的例子的方框图;FIG. 15 is a block diagram showing an example of blob merging processing;
图16是示出了当前贴片和两个相邻贴片的状态间的合并的方框图;FIG. 16 is a block diagram illustrating the merging between the states of the current tile and two adjacent tiles;
图17是示出了贴片间合并的条件的表;FIG. 17 is a table showing the conditions for merging between tiles;
图18是图14的CC映射处理的结果的例子;FIG. 18 is an example of the result of the CC mapping process of FIG. 14;
图19是图10的贴片内合并步骤的流程图;FIG. 19 is a flowchart of the intra-tile merging step of FIG. 10;
图20是图4的分组CC步骤的流程图;Fig. 20 is a flowchart of the grouping CC steps of Fig. 4;
图21是图4的检查CC组的步骤的流程图;FIG. 21 is a flowchart of the steps of checking the CC group of FIG. 4;
图22是图20的寻找一个组的孩子CC的步骤的流程图;Fig. 22 is a flowchart of the steps of finding a child CC of a group in Fig. 20;
图23是图20的初始分组步骤的流程图;Figure 23 is a flowchart of the initial grouping step of Figure 20;
图24是图21的检查对齐步骤的流程图;Fig. 24 is a flowchart of the check alignment step of Fig. 21;
图25(a)是示出了基于多于两个量化等级的分段的简单例子的图像;Figure 25(a) is an image showing a simple example of segmentation based on more than two quantization levels;
图25(b)和(c)是示出了图25(a)的图像的二值分段的相应结果的图像;Figures 25(b) and (c) are images showing the corresponding results of the binary segmentation of the image of Figure 25(a);
图26是包含两个对象即字母“g”和“h”的文档的区域的图示;Figure 26 is an illustration of a region of a document containing two objects, the letters "g" and "h";
图27(a)示出了图27(b)的图像的边界框的底的值的直方图;Figure 27(a) shows a histogram of the values of the base of the bounding box of the image of Figure 27(b);
图27(b)示出了对从图像中分段出的部分的不规则布置的边界框的选择;Figure 27(b) shows a selection of irregularly arranged bounding boxes for a segmented portion from an image;
图27(c)示出了图27(d)的页的相应的直方图;Figure 27(c) shows the corresponding histogram for the page of Figure 27(d);
图27(d)示出了用于文本组的页上的边界框的布置;Figure 27(d) shows the arrangement of bounding boxes on a page for text groups;
图28是颜色斑点划分系统的方框图;Fig. 28 is a block diagram of the color spot division system;
图29(a)到29(c)是颜色直方图输出的图像,包括原始贴片,索引图和调色板,其中调色板的上部和下部的灰色部分表示空库;Figures 29(a) to 29(c) are images of the color histogram output, including the original patch, index map and palette, where the gray parts in the upper and lower parts of the palette represent empty libraries;
图30是图5的修复贴片的步骤的流程图;Fig. 30 is a flowchart of the steps of repairing the patch of Fig. 5;
图31是图30的形成贴片前景位掩蔽的步骤的流程图;Fig. 31 is a flow chart of the steps of forming a tile foreground bit mask in Fig. 30;
图32是图30的修复像素和测量贴片活性的步骤的流程图;32 is a flowchart of the steps of inpainting pixels and measuring patch activity of FIG. 30;
图33是图30的贴片平坦化步骤的流程图;Figure 33 is a flowchart of the patch planarization step of Figure 30;
图34是背景图像贴片中的示例像素和在全分辨率前景掩蔽中被检查的相应像素的方框图;Figure 34 is a block diagram of example pixels in a background image tile and the corresponding pixels inspected in the full-resolution foreground mask;
图35包括一维修复的例子的曲线图;Figure 35 includes graphs of examples of one-dimensional repair;
图36是示出了使用6×6贴片的例子的颜色斑点处理的方框图;FIG. 36 is a block diagram illustrating blob processing for an example using 6×6 tiles;
图37包括示出了斑点合并的例子的图像;Figure 37 includes images showing an example of blob merging;
图38包括示出了用于比较目的的原始贴片和合并斑点输出的图像;Figure 38 includes images showing raw tiles and merged blob outputs for comparison purposes;
图39是示出了以光栅顺序合并贴片上的斑点的方框图;Figure 39 is a block diagram illustrating merging spots on tiles in raster order;
图40是示出了图11的形成2D直方图和第一调色板的步骤的流程图;FIG. 40 is a flowchart showing the steps of forming a 2D histogram and a first palette of FIG. 11;
图41是示出了图11的形成第二调色板的步骤的流程图;FIG. 41 is a flow chart illustrating the steps of forming a second palette of FIG. 11;
图42是示出了图41的产生两颜色调色板的步骤的流程图;Figure 42 is a flowchart illustrating the steps of generating a two-color palette of Figure 41;
图43是示出了图41的产生多颜色调色板的步骤的流程图;Figure 43 is a flowchart showing the steps of generating a multi-color palette of Figure 41;
图44是示出了图11的将像素和第二调色板相关联的步骤的流程图;Figure 44 is a flowchart illustrating the step of associating pixels with a second palette of Figure 11;
图45是示出了图44的映射两级贴片的步骤的详细流程图;FIG. 45 is a detailed flowchart illustrating the steps of mapping two-level tiles of FIG. 44;
图46是示出了图44的映射多颜色贴片的步骤的详细流程图;FIG. 46 is a detailed flowchart illustrating the steps of mapping multi-color tiles of FIG. 44;
图47是图11的分析直方图和对贴片分类的步骤的流程图;Figure 47 is a flowchart of the steps of analyzing the histogram of Figure 11 and classifying the tiles;
图48是示出了对所有操作条件进行更新之前和之后的CC和斑点候选计数的表;Figure 48 is a table showing CC and blob candidate counts before and after updating all operating conditions;
图49是示出了合并斑点和CC候选的例子的方框图;Figure 49 is a block diagram illustrating an example of merging blobs and CC candidates;
图50是示出了图10的后合并处理的步骤的流程图;FIG. 50 is a flowchart illustrating the steps of the post-merge process of FIG. 10;
图51是根据本发明的另一个实施例的系统的方框图;Figure 51 is a block diagram of a system according to another embodiment of the present invention;
图52是图51的颜色分段模块的方框图;和Figure 52 is a block diagram of the color segmentation module of Figure 51; and
图53是示出了设置点的Delaunay三角测量和Voronoi图的图。FIG. 53 is a diagram showing a Delaunay triangulation and a Voronoi diagram of a set point.
具体实施方式Detailed ways
公开了用于处理和压缩数字图像的方法、装置和计算机程序产品。在下面的描述中,提出了若干特定细节,包括具体的无损压缩技术、颜色空间、空间分辨率、贴片大小等。然而,本领域的技术人员从该公开可以明了,可以做出修改和/或替代而不脱离本发明的范围和精神。在其他情况中,可能忽略特定的细节以便不会使得本发明模糊不清。Methods, apparatus and computer program products for processing and compressing digital images are disclosed. In the following description, several specific details are presented, including specific lossless compression techniques, color space, spatial resolution, tile size, etc. However, it will be apparent to those skilled in the art from this disclosure that modifications and/or substitutions can be made without departing from the scope and spirit of the invention. In other instances, specific details may have been omitted in order not to obscure the invention.
在附图的任意一个或多个内引用具有相同附图标记的步骤和/或特征的情况下,出于本说明书的目的,这些步骤和/或特征具有相同的功能(多个)或操作(多个),除非出现相反的意图。Where steps and/or features with the same reference number are referenced within any one or more of the drawings, for the purposes of this description, these steps and/or features have the same function(s) or operation ( multiple), unless a contrary intent appears.
在本说明书的上下文中,单词“包括”具有开放式的,非排他的含意:“主要包括但不必仅包括”,但既不是“基本上由...组成”,也不是“仅由...组成”。词“包括(comprising)”的变体,诸如“包括(comprise)”和“包括(comprises)”,具有相应的含意。In the context of this specification, the word "comprises" has an open-ended, non-exclusive meaning: "consisting essentially but not necessarily exclusively", but neither "consisting essentially of" nor "consisting only of.. .composition". Variations of the word "comprising", such as "comprise" and "comprises", have corresponding meanings.
详细说明的内容被组织为如下的章节。The contents of the detailed description are organized into the following chapters.
1概述1 Overview
2颜色分段2 color segments
2.1获得输入图像的下一个贴片2.1 Get the next patch of the input image
2.2形成贴片的颜色分段2.2 Form the color segmentation of the patch
2.2.1颜色量化2.2.1 Color Quantization
2.2.1.1形成2D直方图和第一调色板2.2.1.1 Form 2D histogram and first palette
2.2.1.2分析直方图和对图像分类2.2.1.2 Analyzing histograms and classifying images
2.2.1.3形成第二调色板2.2.1.3 Form the second palette
2.2.1.3.1产生2颜色调色板2.2.1.3.1 Generate 2 color palette
2.2.1.3.2产生多颜色调色板2.2.1.3.2 Generating a multi-color palette
2.2.1.4关联像素与第二调色板2.2.1.4 Associating pixels with the second palette
2.2.1.4.1映射两级贴片2.2.1.4.1 Mapping two-level patches
2.2.1.4.2映射多级贴片2.2.1.4.2 Mapping multi-level tiles
2.2.2颜色CC分析和斑点统计量2.2.2 Color CC Analysis and Blob Statistics
2.2.2.1形成斑点2.2.2.1 Formation of spots
2.2.2.2斑点合并的例子2.2.2.2 Example of blob merging
2.2.3贴片内合并2.2.3 Intra-patch merging
2.2.4贴片间合并2.2.4 Merging between patches
2.2.4.1贴片间合并条件2.2.4.1 Merging conditions between patches
2.2.4.2贴片间合并例子2.2.4.2 Example of merging between patches
2.2.4.3处理最佳斑点和CC对2.2.4.3 Dealing with Optimal Spots and CC Pairs
2.2.4.4 CC映射结果2.2.4.4 CC mapping results
2.2.5后合并处理2.2.5 Post-merge processing
2.3分段例子2.3 Segmentation example
3布局分析3 layout analysis
3.1对CC分组3.1 Grouping CCs
3.1.1寻找父亲CC的孩子3.1.1 Finding the child of the father CC
3.1.2初始化分组3.1.2 Initialization grouping
3.1.2.1两个CC的分组测试3.1.2.1 Group test of two CCs
3.2检查分组3.2 Check Grouping
3.2.1检查对齐3.2.1 Check alignment
3.2.2对齐的例子3.2.2 Examples of Alignment
4产生压缩的输出图像4 produces a compressed output image
4.1修复贴片4.1 Repair patch
4.1.1形成贴片前景位掩蔽4.1.1 Form patch foreground bit masking
4.1.2修复像素和测量贴片活性4.1.2 Inpainting pixels and measuring patch activity
4.1.3修复例子4.1.3 Repair example
4.1.4贴片平坦化4.1.4 Chip planarization
5硬件实施例5 hardware embodiment
5.1颜色分段模块5.1 Color segmentation module
6计算机实现6 computer implementation
7工业实用性7 Industrial applicability
在下文中以上述顺序详细描述以上部分。Hereinafter, the above parts are described in detail in the above order.
1概述1 Overview
本发明的第一个实施例是运行在通用计算机上的处理。图1提供了分段、分析和压缩数字图像的处理100的高层概述。处理100的输入优选地是分辨率为300dpi的RGB图像。然而,适当地修改处理100,也可以输入其他颜色空间的图像。同样,可以输入不同分辨率的图像。在步骤110,使用对输入图像进行的单轮处理,执行图像到连通组分(CC)的颜色分段。使用单轮处理,可以迅速地处理数字图像,从而可以处理大量高分辨率的图像。连通组分是全部连接在一起(接触)的类似的颜色像素的组。例如,形成以白纸上的黑墨水打印的字母“i”的主体或主干的像素构成一个连通组分,“i”上的点构成另一个CC,且围绕字母“i”作为背景的任何白像素构成另一个CC。图6示出了包括字符“i”的图像600,其中黑点表示黑像素,并且白点表示白像素。示出了“i”的主干CC 610和“i”的点CC 612。还示出了空白像素的另一个结果CC 614。A first embodiment of the present invention is a process running on a general purpose computer. Figure 1 provides a high level overview of the
作为分段的一部分,计算描述CC的紧凑信息和统计量。对数字下采样,并且直接提供到步骤130。在步骤120,使用这个紧凑CC信息和统计量对CC执行布局分析。该布局分析确定页上的特征的布局,页例如包括文本字符、段落、表和图像。在步骤130,使用该布局信息创建一个或多个前景图像。一般地由在步骤120中识别出的文本字符构成前景图像,并且优选地是二值图像。可以用输入分辨率(例如,300dpi)存储前景图像,而可以用较低的分辨率(例如,150dpi)存储背景图像。前景元素被从背景图像中去除。然后,使用不同的技术压缩前景和背景,并且以复合图像格式存储。复合格式可以是,例如,PDF文档。下面在单独的章节中描述颜色分段、布局分析和压缩数字图像中的每一个。As part of the segmentation, compact information and statistics describing the CCs are computed. The numbers are down-sampled and provided directly to step 130 . At
用于本发明的实施例的一个应用是分析来自扫描仪的光栅像素图像,并且尽可能多地提取高层信息。根据该信息,可以生成页的高层描述。为了该目的,系统可以被设计为通过以硬件执行像素分析,更快地运行。然而,从下面描述中将明了,该系统也可以完全以软件实现。另外,虽然下面描述了特定输出格式PDF,可以对系统进行修改,以便利用其他页描述格式作为输出格式。One application for embodiments of the present invention is to analyze raster pixel images from a scanner and extract as much high-level information as possible. From this information, a high-level description of the page can be generated. For this purpose, the system can be designed to run faster by performing pixel analysis in hardware. However, as will be apparent from the description below, the system may also be implemented entirely in software. Also, while the specific output format PDF is described below, the system can be modified to utilize other page description formats as output formats.
图8是根据本发明的实施例的用于生成扫描输入文档810的压缩或紧凑表示850的系统800的高层方框图。系统800包括前端模块820、中间模块830和后端模块830。前端模块820是优选地以硬件实现的基于贴片的前端,但是可以是ASIC和由嵌入式处理器执行的软件的组合。贴片光栅顺序是一种基于贴片的处理方法,其中每次一个地从上到下和从左到右处理贴片。该模块对输入图像执行颜色分段,并且将贴片的背景图像提供给后端模块840(由箭头指示)。前端模块820以整页的分辨率(例如,300dpi)执行颜色连通组分分析。前端模块820还从数字图像810生成连通组分(CC),并且将CC提供给执行布局分析的中间模块。该模块可以完全以软件实现。前端模块820向模块840提供下采样图像。将中间模块830的输出提供给后端模块840,其执行基于贴片的修复,并且生成数字图像810的紧凑表示。与前端模块820类似,后端模块840可以至少部分地以带有在嵌入式处理器和ASIC上执行的软件的硬件实现。该紧凑输出可以包括较低空间分辨率的数字图像和较高分辨率的前景位图。FIG. 8 is a high-level block diagram of a system 800 for generating a compressed or compact representation 850 of a scanned input document 810 according to an embodiment of the invention. The system 800 includes a front-end module 820 , a middle module 830 and a back-end module 830 . Front-end module 820 is a patch-based front-end, preferably implemented in hardware, but could be a combination of an ASIC and software executed by an embedded processor. Tile raster order is a tile-based processing method in which tiles are processed one at a time, top-to-bottom and left-to-right. This module performs color segmentation on the input image and provides the background image of the tile to the backend module 840 (indicated by the arrow). The front-end module 820 performs color connected component analysis at full page resolution (eg, 300dpi). The front-end module 820 also generates connected components (CCs) from the digital image 810 and provides the CCs to intermediate modules that perform layout analysis. This module can be fully implemented in software. Front-end module 820 provides the downsampled image to module 840 . The output of the intermediate module 830 is provided to a backend module 840 which performs patch-based inpainting and generates a compact representation of the digital image 810 . Like the front-end module 820, the back-end module 840 may be implemented at least in part in hardware with software executing on an embedded processor and an ASIC. This compact output can include a lower spatial resolution digital image and a higher resolution foreground bitmap.
前端模块820执行涉及检查每个像素的所有分析工作,并且形成语义上相关的像素的区域的颜色CC。来自前端模块820的输出是关于页上的所有颜色CC的信息。用于每个CC的信息包括边界框、平均颜色、接触列表和像素数目。当为了最佳性能以硬件实现算法时,算法应当是在带宽方面有效的。对于图像处理任务,算法不随机访问整个扫描页。而是,算法每次工作于小贴片上。The front-end module 820 performs all the analytical work involved in examining each pixel and forming color CCs for regions of semantically related pixels. The output from the front end module 820 is information about all color CCs on the page. Information for each CC includes bounding box, mean color, contact list and number of pixels. When implementing an algorithm in hardware for best performance, the algorithm should be bandwidth efficient. For image processing tasks, the algorithm does not randomly access the entire scanned page. Instead, the algorithm works on small patches at a time.
2颜色分段2 color segments
图2更详细地示出了图1的步骤110。图2中所示的处理工作于输入图像的贴片上,这些贴片的大小,例如,可以是32×32像素。可以按光栅顺序处理贴片-即,从左到右,和从上到下。第一个被处理的贴片是输入图像的左上角的贴片,并且最后处理的贴片是输入图像的右下角的贴片。出于效率的目的进行这种贴片划分。FIG. 2 shows step 110 of FIG. 1 in more detail. The processing shown in Fig. 2 works on tiles of the input image, the size of these tiles may be, for example, 32x32 pixels. Tiles can be processed in raster order - ie, left to right, and top to bottom. The first tile to be processed is the top-left tile of the input image, and the last tile processed is the bottom-right tile of the input image. This tile division is done for efficiency purposes.
在步骤210,获得将处理的输入图像的下一个贴片。可以使用指针信息有效地访问每个贴片。参考图3更详细地描述步骤210。步骤220、230、240和250的处理局限于当前贴片的像素。在步骤220,可选择地对当前贴片执行去半色调处理。例如,以该方法处理的扫描输入文档可能包含来自打印处理的半色调。半色调可以使得难以进行后续分析,并且可能不能很好地压缩。因此,可以使用本步骤220检测和去除半色调。仅作为例子,可以进行下面的去半色调处理。去半色调处理可以工作于16×16的贴片上。可以将每个32×32RGB像素的输入贴片划分为4个16×16的贴片,并且分别处理每一个。半色调检测可以每次工作于一个颜色通道(即,R、G和B)上。如果在任何通道中检测到半色调,可以在所有通道上执行半色调去除。为了检测半色调,将贴片中的每个像素量化为4个等级。输入颜色通道值的范围是0-255。该4个等级是范围0-63、64-127、128-191和192-255。可以测量彼此邻接的像素间的等级变化数。可以水平地和垂直地进行这种测量。At
由于半色调通常为小点,该检测要求改变数是大的,并且水平改变数小于垂直改变数。这防止将由文本字符的边缘引起的等级的改变检测为半色调。可以指定用于检测的阈值。如果在16×16的贴片中检测到半色调,可以使用,例如,空间模糊去除半色调。半色调检测器还可以使用来自以前分析过的贴片的信息。例如,如果接触当前贴片的贴片中具有在贴片中检测到的半色调,当前贴片可能也包含半色调。当使用该信息时,可以调整阈值,以便放松半色调检测要求或使其更严格。Since halftones are usually small dots, this detection requires that the number of changes be large, and that the number of horizontal changes be smaller than the number of vertical changes. This prevents a change in level caused by the edge of a text character from being detected as a halftone. A threshold for detection can be specified. If halftones are detected in a 16x16 tile, the halftones can be removed using, for example, spatial blurring. Halftone detectors can also use information from previously analyzed tiles. For example, if a tile touching the current tile has halftones detected in the tile, the current tile may also contain halftones. When using this information, the threshold can be adjusted to relax the halftone detection requirement or to make it more stringent.
在步骤230,如果当前贴片的颜色空间已经不在YUV颜色空间中,则执行转换,以便将贴片内的像素转换到YUV颜色空间。因此,该步骤可选择地取决于输入图像的颜色空间。虽然在该实施例中使用YUV颜色空间,可以采用其他颜色空间而不脱离本发明的范围和精神。用于从RGB转换到YUV的转换公式可以与,例如,独立JPEG组(IJG)JPEG库中所使用的相同。In
在步骤240,对当前贴片执行形成连通组分(CC)的颜色分段,并且计算关于贴片中的CC的紧凑信息和统计量。颜色CC包括一个或多个语义上相关的跨一个或多个贴片的斑点。例如,语义上相关的斑点可以有类似的颜色。斑点是单个贴片中具有类似着色特性的一组连通的像素。斑点划分是一种颜色分段并且形成连通组分表示的处理。每个CC具有如下的统计量:像素大小、平均颜色、二值掩蔽、斑点边界长度和边界框。图10更详细地示出了该步骤。In
在步骤250,对当前贴片下采样,以便形成背景图像的相应部分。例如,可以使用库式滤波器在两个维度上以2∶1下采样,但是可以采用其他方法而不脱离本发明的范围和精神。在决定步骤260,进行检查,以便确定是否剩余任何更多的贴片要处理。如果步骤260的结果为否(即,已经处理了图像中的所有贴片),处理终止。然而,如果步骤260的结果为是,处理在步骤210处继续。At
2.1获得输入图像的下一个贴片2.1 Get the next patch of the input image
图3详细示出了图2的步骤210。处理210工作在输入数字图像的带上。图像的带是若干连贯的图像行。每个带的高度可以与贴片的高度相同。因此,例如,图像的第一个32行形成一个带,并且在该情况下,图像的第一个带;下面的32个图像行形成下一个带,并且依此类推。每个带的宽度可以是输入图像的宽度。在决定步骤310,进行检查以便确定是否需要读入另一个带。当已经处理了当前带的所有贴片时,需要读入另一个带。另一次需要读入另一个带是第一次执行步骤310,这是因为此时还未读带。如果步骤310的结果为是,处理在步骤320继续。否则,处理在步骤340处继续。FIG. 3 shows step 210 of FIG. 2 in detail.
在步骤320,从输入图像读取数据的下一个带,例如,从盘读入存储器内的缓冲器。存储器缓冲器可以布置为在连续的存储器位置包含带的每行像素。另外,保持每行的开始的存储器位置的记录:即,每个带行的指针。在步骤330,初始化确定要访问的贴片-当前贴片-的变量tx(即,设置为0)。在步骤340,将行指针信息更新为指向当前贴片。调用图3的处理210的其他处理通过引用在步骤340更新的行指针信息获得新的贴片。行指针信息可以包括每个贴片行的指针。因此,每个贴片可以有32个行指针。给定行指针指向给定行中贴片的第一个像素的存储器位置。通过沿着指向每个相应带线的开始的指针之外的(贴片宽度*tx)个存储器位置步进每个行指针,可以更新行指针信息。在步骤350,将变量tx增加1,从而下一次调用图3的处理210时,将另一个贴片输入处理。At
2.2形成贴片的颜色分段2.2 Form the color segmentation of the patch
颜色斑点划分是一种以贴片光栅顺序工作的图像分段算法。斑点是单个贴片内的相同量化标签的像素的连通组。每个斑点具有如下的统计量:像素大小、平均颜色、二值掩蔽、斑点边界长度和边界框。其目的是将文档图像分段为一组非重叠的连通组分,其中每个连通组分包含连通的语义相关的像素的集合,例如,特定文本字母内的像素集合形成一个连通组分,围绕着该文本的图像的一部分内的像素形成另一个连通组分等。Color blob division is an image segmentation algorithm that works in tile raster order. A blob is a connected group of pixels of the same quantized label within a single patch. Each blob has the following statistics: pixel size, mean color, binary mask, blob boundary length, and bounding box. Its purpose is to segment a document image into a set of non-overlapping connected components, where each connected component contains a connected set of semantically related pixels, e.g., a set of pixels within a particular text letter forms a connected component, around Pixels within a portion of the image that bears the text form another connected component, and so on.
图28是具有4个模块2810、2820、2830和2840的颜色斑点划分系统2800的方框图。颜色量化模块2810接收输入的颜色贴片(例如,对于RGB颜色空间,24位的像素值),确定贴片中主颜色的数目,并且根据主颜色量化贴片。将主颜色和量化的贴片作为输入提供给连通组分和斑点统计模块2820,其在单个光栅轮中对量化的贴片执行8路连通组分分析。在同一个光栅轮中,收集斑点统计量,诸如像素数、平均颜色、斑点边界长度和边界框信息。将斑点和统计量作为输入提供给贴片内合并模块2830,其通过基于颜色、大小和边界统计量合并斑点,减少贴片内的假的和小的斑点的数目。将来自该模块2830的结果斑点和统计量作为输入提供给贴片间合并模块2840,其根据斑点统计量,将当前贴片内的斑点与相邻贴片(左边和上边)内的接触斑点分组在一起,以便形成作为系统2800的输出的连通组分。将参考图10的处理进一步对其进行描述。FIG. 28 is a block diagram of a color
图10是用于将图像分段为颜色CC的步骤240的详细流程图。作为分段处理的一部分,计算描述CC的紧凑信息和统计量。通过接收来自步骤230的像素的输入贴片开始分段处理。在决定块1005,进行检查以便确定该贴片是否是平坦的。如果贴片是平坦的,处理在步骤1040处继续到贴片间合并阶段。否则,处理在步骤1010处继续。在步骤1010,对贴片执行颜色量化。颜色量化使用三个主要步骤找到贴片内的主颜色,而不考虑像素几何形状:1)颜色减少,2)贴片分类,和3)寻找主颜色和量化。通过颜色直方图方法确定输入贴片内的主颜色。通过根据主颜色量化输入贴片像素,创建颜色标签的量化贴片。主颜色是被人类观看者察觉为是贴片内的视觉上明显的颜色。该算法适合于硬件(HW)实现,并且以软件(SW)实现也相当快。Figure 10 is a detailed flowchart of
步骤1020在单个光栅轮中对量化的贴片执行8路连通组分分析以便形成斑点。在步骤1030,执行贴片内合并处理,以便通过基于颜色、大小和边界信息合并斑点,减少贴片内假的和小的斑点的数目。在步骤1040,执行贴片间的合并。将量化的贴片中识别出的斑点与当前贴片左边和上边的两个以前处理的贴片内所识别出的斑点进行比较,以便合并为颜色CC。从而,一个颜色CC包括跨越一个或多个贴片的一个或多个类似着色的斑点。这样,除了边界信息之外,颜色CC具有针对斑点的相同类型的上述统计量。
在步骤1050,将当前贴片内的斑点和这些斑点形成的颜色CC存储在紧凑的贴片状态数据结构中。这个贴片状态不包含像素数据。贴片状态仅包含将新创建的斑点合并到已有的颜色CC内所需的信息。可以用高的存储器效率执行贴片间合并处理1040,因为在分段处理的任意阶段,对于与当前贴片合并来说,仅需两个或更少的贴片状态。另外,步骤1050更新每个颜色CC的接触列表。接触列表描述哪些连通组分彼此邻接。在前端中作为颜色CC分析的一部分产生该接触列表。图2中的步骤240产生接触列表。然后处理终止。At
2.2.1颜色量化2.2.1 Color Quantization
颜色量化的目的是将整个颜色输入减少到缩减的颜色集合,以便为连通组分产生做准备。为了找到主颜色,对每个输入像素进行一次检查,并且产生直方图。本发明的实施例采用这样的直方图,其使用亮度作为第一个维度,并且组合两个色度分量作为第二个维度。这与根据三个颜色分量的轴将库划分为三维的传统的颜色直方图不相类似。本发明的实施例产生有助于更容易地找到好的主颜色的紧凑的直方图。根据直方图的特性,将贴片分类为三类-平坦的、两级的、多颜色的。根据贴片分类产生用于贴片的调色板。在产生调色板后,根据像素映射到的调色板颜色,给每个像素分配量化标签。该方法设计为用于高速处理和低的存储器需求。平坦的贴片仅具有一个量化标签。两级贴片具有两个量化标签。多颜色贴片具有多至4个量化标签。The purpose of color quantization is to reduce the entire color input to a reduced set of colors in preparation for connected component generation. To find the dominant color, each input pixel is checked once and a histogram is generated. Embodiments of the present invention employ a histogram that uses luma as the first dimension and combines the two chroma components as the second dimension. This is dissimilar to a traditional color histogram that divides the library into three dimensions based on axes of three color components. Embodiments of the present invention produce compact histograms that help find good dominant colors more easily. According to the characteristics of the histogram, the patches are classified into three categories - flat, two-level, multicolor. Generates palettes for tiles based on tile classification. After the palette is generated, each pixel is assigned a quantization label based on the palette color to which the pixel is mapped. The method is designed for high speed processing and low memory requirements. Flat patches have only one quantized label. Two-level patches have two quantized labels. Multicolor tiles have up to 4 quantization labels.
图10的颜色量化步骤1010在图11中进一步展开。此处给出图11中的每个步骤的简要描述,并且跟着给出每个步骤的详细描述。在步骤1110,使用第一调色板同时形成2D直方图和索引图。图40提供了步骤1110的进一步的细节。在步骤1120,根据2D直方图的统计量执行贴片分类。图47提供了步骤1120的进一步细节。在步骤1130,基于贴片分类形成第二调色板。这涉及压缩第一调色板。图41提供了步骤1130的进一步细节。在步骤1140,将像素与第二调色板相关联。将索引图重新映射到第二调色板颜色中的一个,以便产生具有量化标签的量化贴片。然后,处理终止。The
2.2.1.1形成2D直方图和第一调色板2.2.1.1 Form 2D histogram and first palette
图40更详细地示出了图11的步骤1110的处理。通过预定的映射方法,以像素光栅顺序在一轮中将输入的整个颜色贴片量化为索引图。图29(a)示出了输入的原始贴片的例子,和产生的结果索引图。映射可以配置为针对组织为8个亮度带和4个颜色列的32个颜色库。每个颜色库可以具有颜色累积器、像素计数器和注册ID,以放置在该库内的第一个像素的YUV值设置该注册ID。可以根据2D直方图的状态改变预定的映射方法。结果,没有针对每个库的预定颜色且像素颜色顺序可以影响第一调色板的组成。最后每个非空库的平均颜色构成第一调色板。图29(c)示出了产生的调色板,其中上部和下部的灰部分代表空库。FIG. 40 shows the processing of
在步骤4010,从贴片获得具有颜色值(YUV)的像素。在步骤4015中执行预定的映射,以便将像素映射到亮度带和颜色库(即,bin_mapped)。预定的映射可以如下:At
band=Y>>5,且band=Y>>5, and
column=(|U-REF_U|+|V-REF_V|)*NORMALISING_FACTOR[band].column=(|U-REF_U|+|V-REF_V|)*NORMALISING_FACTOR[band].
灰色的色度值可以用于REF_U和REF_V:(即,对于8位的RGB输入数据,REF_U=128,REF_V=128)。使用选择的REF_U和REF_V预先计算用于每个带的NORMALISING_FACTOR,以便将每个带标准化到RGB颜色空间中的4个库。可以使用表1的伪码产生NORMALISING_FACTOR。A chromaticity value of gray can be used for REF_U and REF_V: (ie, for 8-bit RGB input data, REF_U=128, REF_V=128). Precompute the NORMALISING_FACTOR for each band using REF_U and REF_V chosen to normalize each band to 4 bins in the RGB color space. The NORMALISING_FACTOR can be generated using the pseudocode of Table 1.
表1Table 1
Set max_dist[0:7]to 0Set max_dist[0:7]to 0
for each r in 0to 255for each r in 0 to 255
{{
for each g in 0 to 255for each g in 0 to 255
{{
for each b in 0 to 255for each b in 0 to 255
{{
c=RGB2YUV(r,g,b);c = RGB2YUV(r, g, b);
band=c.y>>5;Band=c.y>>5;
dist=|c.u-128|+|c.v-128|;|c.u-128|+|c.v-128|;
if(dist>max_dist[band])If(dist>max_dist[band])
max_dist[band]=dist;max_dist[band]=dist;
}}
}}
}}
for each band in 0 to 7for each band in 0 to 7
NORMALISING_FAGTOR[band]=1/max_dist[band]*4NORMALISING_FAGTOR[band]=1/max_dist[band]*4
步骤4020到4025执行用于两级贴片轮廓增强的可选择的“带抗混叠”。在步骤4020,如果允许进行“带抗混叠”,并且映射带和上边的带或下边的带之间的亮度差未超过指定的阈值(例如,16),在步骤4025执行“带抗混叠”。否则处理在步骤4035处继续。
在步骤4025,执行带抗混叠。尝试找到上边带或下边带内的接近的非空库。候选库是以band-1或band+1映射的一个。在下面两个条件的任意一个中,以候选库替换映射库(bin_mapped):In
1候选库非空,并且其注册ID(Y)距离小于16,并且bin_mapped为空。1 Candidate library is not empty, and its registration ID (Y) distance is less than 16, and bin_mapped is empty.
2候选库和bin_mapped两个不为空,并且Y比bin_mapped的注册ID(Y)更接近候选库的注册ID(Y)。2 The candidate library and bin_mapped are not empty, and Y is closer to the registration ID (Y) of the candidate library than the registration ID (Y) of bin_mapped.
步骤4035到4055执行“库抗混叠”处理。步骤4035检查映射库(bin_mapped)是否为空。如果映射库不为空,步骤4040如下检查映射误差:
max(|U-registration ID(U)|,|V-registration ID(V)|)<MAX_BIN_ERROR[band],max(|U-registration ID(U)|, |V-registration ID(V)|)<MAX_BIN_ERROR[band],
其中MAX_BIN_ERROR[band]是上面用于产生标准化因子的伪码中定义的每个带内的max_dist的八分之一。where MAX_BIN_ERROR[band] is one-eighth of the max_dist within each band defined in the pseudocode above for generating normalization factors.
如果步骤4035返回假(否),处理在步骤4040处继续。否则,处理在步骤4045处继续。在决定步骤4040中,进行检查以便确定映射误差是否超过了指定的阈值,该阈值是针对该带的最大库误差。If
如果映射误差在阈值内,处理在步骤4060处继续。否则,执行步骤4055以寻找更接近的库。在步骤4055,从列0开始搜索,并且向前移动到映射带中的列3。当满足下面的条件中的任何一个时终止搜索:If the mapping error is within the threshold, processing continues at
1发现空库,和1 finds an empty library, and
2发现具有在允许的阈值内的映射误差的库2 Discover libraries with mapping errors within allowed thresholds
如果步骤4055的搜索在条件1上终止,将(YUV)值注册在空库内,并且该空库替代bin_mapped。如果两个条件都失败了,以具有最小映射误差的库替代bin_mapped。然后处理在步骤4060处继续。If the search of
在步骤4035的测试之后,如果映射库为空,处理在步骤4045处继续。在决定步骤4045中,进行检查以便确定已经发现了同一个带内的接近的非空库。步骤4045从列0到3搜索,试图寻找满足以前定义的映射误差阈值的非空库。如果找到这种库,在步骤4052中以空库替代bin_mapped,并且然后处理在步骤4052处继续。否则如果步骤4052返回假(否),以步骤4050中的颜色(YUV)值记录bin_mapped,并且处理在步骤4060处继续。After the test of
在步骤4060,将像素颜色(YUV)累积在映射库(bin_mapped)内,并且递增bin_mapped内的像素计数。在步骤4065,为当前像素记录bin_mapped的位置。在步骤4070,进行检查以便确定贴片中是否存留有更多的像素。如果结果为是,处理在步骤4010处继续。否则,它在步骤4075处继续,其中每个非空库的像素计数除以其累积的颜色。每个非空库的平均颜色形成第一调色板。然后处理终止。At
2.2.1.2分析直方图和对图像分类2.2.1.2 Analyzing histograms and classifying images
贴片分类是发现贴片内的主颜色的方法。基于调色板内的分布和颜色变化,将贴片分类为3组:平坦的、两级的和多颜色的。平坦的贴片具有对于人类的眼睛来说在视觉上不变的颜色,并且通常形成2D直方图内的一个簇。平坦调色板具有多至3个颜色,并且颜色变化小。两级贴片具有两个独特的颜色,并且通常在2D直方图内垂直排成列。两级调色板具有跨越少数亮度带的颜色,但是每个亮度带内的颜色变化是小的。多颜色贴片通常在2D直方图内的大量库上展开。多颜色调色板包括前两个测试失败的贴片。Tile classification is a method of discovering the dominant colors within a tile. Based on the distribution and color variation within the palette, patches were classified into 3 groups: flat, two-level, and multicolor. Flat patches have visually invariant colors to the human eye and generally form a cluster within the 2D histogram. A flat palette has up to 3 colors with little color variation. Two-level tiles have two unique colors and are usually arranged vertically within a 2D histogram. A two-level palette has colors that span a few lightness bands, but the color variation within each lightness band is small. Multi-color patches are usually unrolled over a large library within a 2D histogram. The multi-color palette includes patches that failed the first two tests.
图11的步骤1120分析2D直方图中的库分布和颜色特性,并且从而对贴片进行分类。将贴片分为3组:平坦的、两级的和多颜色的。图47更详细地示出了步骤1120的处理。在步骤4710中,执行平坦贴片测试。如果结果为是,在步骤4712中将贴片分类为是平坦的。否则,在步骤4720执行第二个测试,以便确定贴片是否是两级的。如果两级测试的结果为是,在步骤4722将贴片分类为是两级的。否则,在步骤4724将贴片分类为多颜色的。下面更详细地描述步骤4710和4720。
关于步骤4710,第一LumRange定义为库不为全空的最高和最低亮度带间的范围。对于通过平坦测试的贴片,贴片必须满足下面所有3个条件:Regarding
1 非空库的数目<=31 The number of non-empty libraries <= 3
2 LumRange<=2;和2 LumRange <= 2; and
3 FlatColourVariance<FLAT_COLOUR_VARIANCE3 FlatColourVariance<FLAT_COLOUR_VARIANCE
其中FlatColourVariance定义为最大库和其余库之间的像素计数加权曼哈顿距离的总和。阈值参数可以是FLAT_COLOUR_VARIANCE=15。where FlatColourVariance is defined as the sum of pixel-count-weighted Manhattan distances between the largest bin and the remaining bins. The threshold parameter may be FLAT_COLOUR_VARIANCE=15.
关于对于通过两级测试的贴片的步骤4720,贴片必须满足下面所有3个条件:Regarding
1 非空库的数目<=BILEVEL_MAX_BIN_CNT1 The number of non-empty libraries <= BILEVEL_MAX_BIN_CNT
2 LumRange>2;和2 LumRange > 2; and
3 MaxColourVariance<BILEVEL_COLOUR_VARIANCE3 MaxColourVariance<BILEVEL_COLOUR_VARIANCE
MaxColourVariance定义为max(ColourVariance[band]),其中ColourVariance[band]是带内的最大库和其余库之间的像素计数加权的曼哈顿距离的总和。参数值可以是BILEVEL_MAX_BIN_CNT=16,和BILEVEL_COLOUR_VARIANCE=40。MaxColourVariance is defined as max(ColourVariance[band]), where ColourVariance[band] is the sum of the pixel count-weighted Manhattan distances between the largest bin within a band and the remaining bins. The parameter values may be BILEVEL_MAX_BIN_CNT=16, and BILEVEL_COLOUR_VARIANCE=40.
2.2.1.3形成第二调色板2.2.1.3 Form the second palette
图41更详细地示出了图11的步骤1130的处理。步骤4110测试贴片是否是平坦的。如果贴片是,则步骤4120形成平坦颜色。这可以通过计算非空库的加权平均实现。如果步骤4110中的测试结果为否,处理在步骤4130处继续,测试贴片是否被分类为是两级的。如果测试结果为是,处理移动到步骤4140。在步骤4010,产生两种颜色的调色板。否则,如果步骤4130返回假,处理在步骤4150处继续。在步骤4150中,产生多颜色调色板。FIG. 41 shows the processing of
2.2.1.3.1产生两颜色调色板2.2.1.3.1 Generate a two-color palette
图42提供了步骤4140中产生用于两级贴片的两种颜色调色板的进一步的细节。目的是形成对比颜色表示该图像。由于打印中的半色调和配准(registration)误差,表示原始的两种对比颜色的颜色一般会受到污染。结果,前景和背景区域的平均颜色不是原始图像的良好表示。排除过渡区域内的颜色使得图像看上去更锐化了。FIG. 42 provides further details of the two-color palette generated in
在步骤4210中,选择最暗的和最亮的颜色,以便形成主颜色的初始调色板。在步骤4220中,使用6个被最多填充的库产生库列表。根据像素计数从该调色板中找到顶部的6个库。步骤4230到步骤4270顺序地处理列表中的颜色。在步骤4230中,从列表中获得下一个库颜色C。决定步骤4240测试颜色C是否已经包括在初始调色板内,或该颜色是否离两个极端太远。如果结果为是,忽略该颜色,并且处理返回步骤4230,取得下一个库颜色进行处理。如果步骤4240的结果为否,处理在步骤4250处继续。决定步骤4250测试该颜色是否适合于合并到初始调色板。适合于合并的颜色是位于初始调色板的任何颜色附近的颜色。如果步骤4250的测试结果为是,基于与加权的像素计数的较近的曼哈顿颜色距离,将该颜色合并到初始调色板颜色中的一个。将C的像素计数增加到要合并到的调色板颜色的像素计数。处理在步骤4270处继续。如果步骤4250的结果为否,忽略该颜色,并且处理进到步骤4270,以便检查是否有任何未处理的颜色。如果步骤4270的测试返回是,处理返回步骤4230。否则,处理终止。In
2.2.1.3.2产生多颜色调色板2.2.1.3.2 Generating a multi-color palette
图43展开了图41的步骤4150,其产生用于多颜色贴片的多颜色调色板。在步骤4310中,选择最暗和最亮的颜色形成作为初始主颜色的初始调色板。在步骤4320中,如果下面两个条件为真,则将第三个颜色添加到调色板:Figure 43 expands on
1 LumRange>THIRD_COLOUR_MIN_LD;和1 LumRange > THIRD_COLOUR_MIN_LD; and
2 LargestVar>THIRD_COLOUR_MIN_VAR2 LargestVar > THIRD_COLOUR_MIN_VAR
LargestVar定义为距离最暗和最亮亮度带之间的库中的最亮和最暗颜色的均值颜色的最大曼哈顿颜色距离。如果上面的测试为真,将产生LargestVar的颜色添加为第三个初始调色板颜色。阈值可以是THIRD_COLOUR_MIN_LD=4和THIRD_COLOUR_MIN_VAR=40。LargestVar is defined as the maximum Manhattan color distance from the mean color of the brightest and darkest colors in the library between the darkest and brightest luminance bands. If the above test is true, add the color resulting in LargestVar as the third initial palette color. The thresholds may be THIRD_COLOUR_MIN_LD=4 and THIRD_COLOUR_MIN_VAR=40.
在步骤4330中,将顶部(即,被最多填充的)6个库添加到一个库列表。步骤4340到4395顺序地处理该列表的颜色。在步骤4340中,从该列表获得下一个库颜色C。步骤4350测试该颜色是否已经包含在初始调色板内。如果结果为是,忽略该颜色,并且处理返回步骤4340。否则,步骤4360尝试将该颜色合并到调色板颜色中的一个。将该颜色合并到调色板中具有最近的曼哈顿颜色距离的颜色,如果该距离在BIN_MERGE_THRESHOLD1内(其中阈值可以是BIN_MERGE_THRESHOLD1=10)的话。如果步骤4360的尝试成功了,处理在步骤4395处继续检查是否有更多的颜色要处理。否则,处理进入步骤4370。In step 4330, the top (ie, most populated) 6 bins are added to a bin list. Steps 4340 through 4395 sequentially process the colors of the list. In step 4340, the next library color C is obtained from the list. Step 4350 tests whether the color is already contained in the initial palette. If yes, the color is ignored and processing returns to step 4340. Otherwise, step 4360 attempts to merge the color into one of the palette colors. Merge the color into the palette with the closest Manhattan color distance if that distance is within BIN_MERGE_THRESHOLD1 (where the threshold may be BIN_MERGE_THRESHOLD1=10). If the attempt at step 4360 was successful, processing continues at step 4395 to check if there are more colors to process. Otherwise, the process proceeds to step 4370.
步骤4370测试是否可将另一种颜色添加到调色板。如果步骤4370返回真(是),处理在步骤4380处继续。如果下面的伪码内的测试为真,在步骤4380中添加额外颜色。Step 4370 tests whether another color can be added to the palette. If step 4370 returns true (Yes), processing continues at step 4380. Additional colors are added in step 4380 if the test in the pseudo-code below is true.
Number_palette_colours<MAX_NUM_PALETTE_COLOURSNumber_palette_colours<MAX_NUM_PALETTE_COLOURS
&&&&
((
minDist>BIN_MERGE_THRESHOLD2 minDist > BIN_MERGE_THRESHOLD2
‖" "
((
minDist>BIN_MERGE_THRESHOLD3 minDist > BIN_MERGE_THRESHOLD3
&&&&
(minDist*pCnt)>BIN_NEW_MIN (minDist*pCnt)>BIN_NEW_MIN
&&&&
pixel_count_closest_palette_colour>BFN_DONT_TOUCH_CNT pixel_count_closest_palette_colour > BFN_DONT_TOUCH_CNT
))
))
MinDist是C到调色板颜色的最近曼哈顿颜色距离。PCnt是C的像素计数。Pixel_count_closest_palette_colour是产生minDist的调色板颜色的像素计数。阈值可以是MAX_NUM_PALETTE_COLOURS=4,BIN_MERGE_THRESHOLD2=70,BIN_MERGE_THRESHOLD3=40,BIN_NEW_MIN=4000和BIN_DONT_TOUCH_CNT=150。MinDist is the nearest Manhattan color distance from C to the palette color. PCnt is the pixel count of C. Pixel_count_closest_palette_colour is the pixel count of the palette color that yields minDist. The thresholds may be MAX_NUM_PALETTE_COLOURS=4, BIN_MERGE_THRESHOLD2=70, BIN_MERGE_THRESHOLD3=40, BIN_NEW_MIN=4000 and BIN_DONT_TOUCH_CNT=150.
从步骤4380,处理在步骤4395处继续。From step 4380, processing continues at step 4395.
如果步骤4370中的测试为假,处理在步骤4390处继续。在步骤4390中,将库颜色C与具有最近的曼哈顿颜色距离的调色板颜色合并。以加权的像素计数合并颜色,并且将C的像素计数增加到要合并到的调色板颜色的像素计数。处理在步骤4395处继续。如果在步骤4395没有更多的颜色要处理,处理终止。If the test in step 4370 is false, processing continues at step 4390. In step 4390, the library color C is merged with the palette color with the closest Manhattan color distance. Colors are merged at a weighted pixel count, and the pixel count of C is increased to the pixel count of the palette color to be merged into. Processing continues at step 4395. If there are no more colors to process at step 4395, processing terminates.
2.2.1.4将像素与第二调色板相关联2.2.1.4 Associating pixels with a second palette
一旦发现了主颜色,将贴片内的像素量化为主颜色之一。与用于连通组分分析的主颜色列表一起产生量化映射。针对每个组的量化处理如下:Once the dominant color is found, the pixels within the tile are quantized to one of the dominant colors. Produces quantization maps together with primary color lists for connected components analysis. The quantization process for each group is as follows:
1)平坦的-不量化;1) flat - not quantized;
2)两级的-将调色板重新映射到两个主颜色之一,或寻找一个阈值将原始像素二值化;和2) two-level - remap the palette to one of the two primary colors, or find a threshold to binarize the original pixels; and
3)多颜色的-将调色板重新映射到主颜色之一。3) Multicolored - Remaps the palette to one of the main colors.
二值化产生更锐化的轮廓,但是由于二值化需要找到合适的阈值,所以要花费较长时间。寻找阈值的步骤是:1)在亮度通道上执行一阶导数,2)识别边缘像素,和3)使用来自边缘像素的平均亮度值作为阈值。边缘像素是这样的像素,其中它们周围的3×3一阶导数输出全部在预定阈值以上。Binarization produces sharper contours, but takes longer since binarization needs to find an appropriate threshold. The steps to find the threshold are: 1) perform a first derivative on the luminance channel, 2) identify edge pixels, and 3) use the average luminance value from the edge pixels as the threshold. Edge pixels are pixels where the 3x3 first derivative outputs around them are all above a predetermined threshold.
图44提供了图11的步骤1140的进一步的细节。步骤4410测试贴片是否被分类为是两级的。如果测试结果为是,处理在步骤4420处继续。在步骤4430中,映射多颜色的贴片。否则,处理在步骤4430处继续。在步骤4420映射两级贴片。FIG. 44 provides further details of
2.2.1.4.1映射两级贴片2.2.1.4.1 Mapping two-level patches
图45提供了步骤4420的进一步的细节。整个处理4420以量化误差检查将所有非空库映射到第二调色板内的两种颜色。步骤4510到步骤4570对每个非空库执行量化和误差检查。如果没发现大的量化误差,处理在库量化之后,将所有像素重新映射到第二调色板。如果发现了大的量化误差,将贴片重新分类为是多颜色的,并且使其进行多颜色贴片映射。下面解释映射两级贴片的细节。FIG. 45 provides further details of
步骤4510确定是否需要轮廓增强,并且为进行轮廓增强的库选择用于量化的优选的极端颜色。如果两个调色板颜色的一个像素计数超出了另一个4倍,则需要轮廓增强。设第二调色板内的两个颜色具有第一颜色C1和像素计数P1以及第二颜色C2和像素计数P2。如果(P1/P2)或(P2/P1)大于5,则需要轮廓增强,并且将OUTLINE_ENHANCE设置为真。优选的极端颜色可以是具有较小像素计数的颜色。
步骤4515得到具有像素计数pCnt的下一个非空库。步骤4520计算两个颜色的量化误差。计算到两个调色板颜色的曼哈顿距离(D1和D2),并且将较小的一个定义为minDist。MinDist是量化误差。然后处理在决定步骤4525继续,检查量化误差是否太大。下面的伪码定义量化误差太大时的条件:
(minDist*pCnt)>BIN_NEW_BILEVEL_THRESHOLD(minDist*pCnt)>BIN_NEW_BILEVEL_THRESHOLD
‖‖
((
minDist>BIN_NEW_BILEVEL_COLOUR_DIFFminDist > BIN_NEW_BILEVEL_COLOUR_DIFF
&&&&
pCnt>BIN_MERGE_BILEVEL_CNT_MINpCnt>BIN_MERGE_BILEVEL_CNT_MIN
))
其中阈值可以是BIN_NEW_BILEVEL_THRESGOLD=6000,BIN_NEW_BILEVEL_COLOUR_DIFF=50和BIN_MERGE_BILEVEL_CNT_MIN=100。Wherein the thresholds may be BIN_NEW_BILEVEL_THRESGOLD=6000, BIN_NEW_BILEVEL_COLOUR_DIFF=50 and BIN_MERGE_BILEVEL_CNT_MIN=100.
如果步骤4525处的测试为真,处理在步骤4540处继续。在步骤4540中,将当前库添加到额外主颜色列表。然后处理在步骤4570处继续。如果步骤4525中的测试为假,处理在决定步骤4530处继续,检查额外主颜色列表是否为空。如果该列表不空(否),处理转换到步骤4570,查看是否有更多的非空库要处理。否则,如果步骤4530的结果为是,处理在步骤4545处继续,确定是否需要轮廓增强,以及两个距离是否接近。以下面的伪码给出测试条件:If the test at
OUTLINE_ENHANCEOUTLINE_ENHANCE
&&&&
abs(D1-D2)<BILEVEL_THRESHOLD_MARGINabs(D1-D2)<BILEVEL_THRESHOLD_MARGIN
其中阈值可以是BILEVEL_THRESHOLD_MARGIN=16。The threshold may be BILEVEL_THRESHOLD_MARGIN=16.
如果步骤4545中的测试为真,处理在步骤4550处继续。在步骤4550中,将库量化为优选的颜色。处理在步骤4570处继续。否则,处理在步骤4555处继续,基于D1和D2将库量化为较接近的颜色。然后处理在步骤4570处继续,检查是否有更多的非空库要处理。如果有更多的非空库,处理返回步骤4515。如果没有更多的非空库,处理在步骤4560处继续,检查额外主颜色列表是否为空。其确定在库量化处理过程中是否有大的量化误差。如果该列表为空,处理在步骤4575处继续,并且根据步骤4550或4555中的库映射(视合适情况而定),将所有像素重新映射到两个调色板颜色之一。然后,处理终止。如果在步骤4560中列表非空,处理在步骤4565处继续,将一个额外的颜色添加到调色板。将额外主颜色列表内的具有最高像素计数的库选择为第三调色板颜色。在步骤4430中,将贴片重新映射为多颜色贴片。然后处理终止。If the test in
2.2.1.4.2映射多级贴片2.2.1.4.2 Mapping multi-level tiles
在图46上展开图44的步骤4430。步骤4610得到下一个非空库。步骤4620将库量化为调色板颜色之一。这可以基于最近的曼哈顿颜色距离进行。步骤4630检查是否有更多的非空库要处理。如果步骤4630中的测试为真,处理在步骤4610处继续。如果步骤4630中的测试返回否,处理在步骤4640处继续。在步骤4640中,根据步骤4620中的库映射,将所有像素重新映射到调色板颜色之一。然后,处理终止。
2.2.2颜色CC分析和斑点统计量2.2.2 Color CC Analysis and Blob Statistics
图10的本处理1020采用来自以前步骤的量化贴片,并且形成斑点。每个斑点具有下面的统计量:边界框、大小、均值颜色、位掩蔽、斑点边界长度。以快速和有效的方式在单个光栅轮中形成斑点。以光栅顺序,将属于相同颜色类的量化贴片中的相邻像素分为一组,以便形成“串”。在每串(段)的结尾,将串与上一行上的接触段(在8路连通性方面)比较,以便进行生长或合并。如果将被给予斑点标签的串触及相同类的斑点,则发生生长。相反,当相同类的两个斑点相接触时发生合并。如果不可能出现生长,则形成新的斑点。在每串的结尾处更新斑点统计量。The
图36是使用示例的6×6的输入贴片3610的颜色斑点划分处理3600的图示。对具有若干颜色的输入贴片3610应用颜色量化,以便产生量化贴片3620。量化贴片3620包含相应于主颜色类的类标签。在该情况下,贴片中有两个主颜色,因此给出类标签0和1。在一行中,通过将相同类的相邻标签分组在一起以便形成串,开始连通组分分析。例如,在图36中,第一行的前4个“0”形成一个串,并且接下来的两个“1”形成另一个串。通过将来自量化贴片3620中的连续行的相同类的串结合在一起形成斑点。贴片3630示出了贴片内的串。当将当前段与上行中的接触段进行比较时,有三种可能的行为:FIG. 36 is an illustration of a color blob partitioning process 3600 using an example 6×6 input tile 3610 . Color quantization is applied to an input tile 3610 having several colors to produce a quantized tile 3620 . Quantization tile 3620 contains class labels corresponding to primary color classes. In this case there are two dominant colors in the patch, so
1通过将当前段添加到已有斑点,生长该斑点;1 grow the blob by adding the current segment to the existing blob;
2通过将两个斑点的统计量统一为一个合并两个斑点;2 Merge two blobs by unifying their statistics into one;
3通过使用当前段初始化斑点形成新斑点。3 Form a new blob by initializing the blob with the current segment.
贴片3640示出了得到的斑点,斑点0和斑点1,其中斑点0具有外部边界框,并且斑点1具有内部边界框。在形成串和斑点的同时累积斑点的统计量。所以,在该处理阶段的结尾,每个斑点具有针对斑点0和斑点1的图36中所示的所有统计量。位掩蔽3650和3660是代表性的。实际上不在该阶段形成示例性的位掩蔽3650和3660。将量化的颜色用作斑点的均值颜色。可替换地,可以通过累积斑点中的实际像素值而不是量化的颜色确定斑点的均值颜色。这给出更准确的均值颜色。再次地,斑点统计量可以包括大小、均值颜色、边界长度、边界框和位掩蔽。Tile 3640 shows the resulting blobs,
图9更详细地示出了图10的步骤1020,步骤1020采用循环结构从量化贴片中形成斑点,从上到下地一次处理一个贴片行。在步骤910中,获得当前贴片行以便进行处理。在步骤920中,形成相同量化标签的像素的连续的段。这可以通过记录其开始和结束位置进行。该段是当前段Sc。开始位置是前一个段的结束位置右边的像素。在新贴片行的情况下,开始位置是该行的第一个像素。结束位置是在从左到右在当前贴片行上一个像素接一个像素地检查量化标签的过程中,在量化标签发生变化之前的最后的像素。在当前贴片行在检测到变化之前结束的情况下,结束位置是该行的最后的像素。出于稍后的重新估计斑点的颜色的目的,在从其开始到结束位置的检查过程中,累积原始的全色贴片内的段内的每个像素的YUV值。可替换地,可以使用该段的量化颜色,而不进行颜色重新估计。FIG. 9 shows step 1020 of FIG. 10 in more detail.
在步骤930中,使用该段形成新斑点或生长已有斑点。图12提供了该步骤的进一步的细节。在决定步骤940中,进行检查以便确定该行中是否剩余有未处理的像素。如果决定步骤940返回真(是),处理返回步骤920。直到处理了当前贴片行内的所有像素,这才发生。如果步骤940返回假(否),处理在步骤950处继续。在决定步骤950中,进行检查以便确定是否剩余未处理的行。如果步骤950返回真(是),处理在步骤910处继续。直到没有更多的未处理的贴片行,这才发生。如果步骤950返回假(否),处理终止;量化贴片内的每个像素已经分配到斑点。In
2.2.2.1形成斑点2.2.2.1 Formation of spots
图12更详细地示出了步骤930。图12的处理采用图9的步骤920中识别出的当前段Sc作为输入。在步骤1205中,将变量k初始化为1。该值引用与当前段Sc连接的当前贴片行之上的贴片行的第k个段Sk。优选地,连接是8路连接。在步骤1210中,执行Sk和Sc间的比较。在决定步骤1215中,进行检查以便确定Sc和Sk是否是相同的类。如果两个段Sk和Sc具有相同的量化标签,处理从步骤1215移动到步骤1220。否则,处理在决定方框1235处继续。在决定步骤1220中,进行检查以便确定Sk是否是相同类的第一个相连段。因此,如果Sk是具有与Sc相同的量化标签的第一个相连段,处理在步骤1225处继续。否则,处理在步骤1230处继续。在步骤1225中,生长当前斑点。因此,基于第k个段所属的斑点生长当前段。生长斑点涉及更新斑点的大小、边界信息、边界框和累积的YUV值。然后,处理在步骤1235处继续。相反,如果决定步骤1220返回假,这指示着已经给当前段分配了斑点标签,即,斑点[i],并且与另一个斑点,即,斑点[j]接触,并且处理在步骤1230处继续。如果这两个斑点具有不同的斑点标签,即,i≠j,则在步骤1230将这两个斑点合并在一起。图15中给出了步骤1230中的斑点合并处理。然后,处理在步骤1235处继续。Figure 12 shows step 930 in more detail. The process of FIG. 12 takes as input the current segment Sc identified in
从步骤1215、1225和1230,处理继续到决定框1235。在决定框1235中,进行检查以便确定是否已经处理了最后的相连段。如果第k个段不是与当前段相连的最后的段,处理在步骤1240处继续,并且将k增加1。然后,处理返回步骤1210处理下一个相连的段。否则,如果决定步骤1235返回真,处理移动到决定框1245。在决定框1245中,进行检查以便确定是否没有相连的段是相同的类(即,与当前段相同的量化标签)。如果步骤1245返回真(是),处理在步骤1250处继续。在步骤1250中,使用当前段形成新斑点。形成新斑点涉及给当前段分配新的斑点标签,将斑点数目增加1,并且使用当前段的信息初始化斑点统计量。在步骤1250后处理终止。类似地,如果决定框1245返回假(否),处理终止。From
2.2.2.2斑点合并的例子2.2.2.2 Example of blob merging
图15示出了斑点合并1500的例子。段1510和1520都属于斑点[i],其与属于斑点[j]的段1530相连。示出了段1520的当前开始和当前结尾,同样示出了段1530的上开始和上结尾。图15中还示出了段1520和1530的重叠。斑点合并处理1500涉及结合两个斑点的统计量,将一个斑点标签映射到另一个,并且将斑点数目减少1。例如,在图15中,将标签j映射到标签i。使用下面的伪码指令组合斑点统计量:An example of blob merging 1500 is shown in FIG. 15 .
83 blob[i].boundingBox=combine(blob[i].botmdingBox,blob[j].boundingBox)83 blob[i].boundingBox=combine(blob[i].botmdingBox,blob[j].boundingBox)
83 blob[i].size+=blob[j].size83 blob[i].size+=blob[j].size
83 blob[i].tileBorderPixelCount+=blob[j].tileBorderPixelCount83 blob[i].tileBorderPixelCount+=blob[j].tileBorderPixelCount
83 blob[i].horizontalEdges+=blob[j].horizontalEdges-2 overlap83 blob[i].horizontalEdges+=blob[j].horizontalEdges-2 overlap
83 blob[i].vertiealEdges+=blob[j].verticalEdges83 blob[i].verticalEdges+=blob[j].verticalEdges
83 blob[i].YUV+=blob[j].YUV83 blob[i].YUV+=blob[j].YUV
2.2.3贴片内合并2.2.3 Intra-patch merging
一旦通过连通组分分析形成了贴片的斑点,下一个阶段是使用颜色、大小和斑点边界长度统计量将语义相关的斑点合并在一起。从这个阶段起,斑点仅可以被合并而不是分段。因此,贴片从被过度分段变为更接近正确的分段等级。图37和38中示出了贴片内合并的一个例子,其中使用斑点统计量评估是否合并斑点的给定部分。在例子中,有4个量化颜色,并且连通组分分析返回10个斑点。图37示出了在左边进行合并3710之前的斑点。这些斑点中的许多是由于残留的半色调图案和两个不同颜色区域间的颜色“渗色”的影响。在应用贴片内合并之后,具有相对长的边界长度的小斑点被合并到具有较小颜色特征的较大的斑点内。在例子3700中示出了合并3720之后的斑点。图38包含原始贴片3810与合并的斑点3820的比较3800。Once the patches of blobs have been formed by connected component analysis, the next stage is to merge semantically related blobs together using color, size, and blob boundary length statistics. From this stage, blobs can only be merged and not segmented. Thus, the tile goes from being over-segmented to being closer to the correct segmentation level. An example of intra-tile merging is shown in Figures 37 and 38, where blob statistics are used to evaluate whether to merge a given portion of blobs. In the example, there are 4 quantized colors, and the connected components analysis returns 10 blobs. Figure 37 shows the blob before merging 3710 on the left. Many of these spots are due to residual halftone patterns and the effects of color "bleeding" between areas of two different colors. After applying intra-tile merging, small blobs with relatively long boundary lengths are merged into larger blobs with smaller color features. The blob after merging 3720 is shown in example 3700 . FIG. 38 contains a
量化和斑点形成处理通常创建许多小的不需要的或不正确的斑点。它们是由输入图像内的噪声斑点、剩余半色调而导致的小斑点,或由较大的连通组分的边缘处的渗色影响而导致的细的、大高宽比的斑点的形式。通过将这些斑点与不正确的斑点触及的颜色接近的斑点相合并,可以去除不正确的斑点。Quantization and speckling processes often create many small unwanted or incorrect spots. They are in the form of small blobs caused by noise blobs within the input image, leftover halftones, or thin, high aspect ratio blobs caused by bleeding effects at the edges of larger connected components. Incorrect blobs can be removed by merging these blobs with blobs of a similar color touched by the incorrect blob.
通过限制由分段和连通组分处理产生的贴片内的斑点的数目,可以改进速度和存储器使用率。如果贴片内有太多斑点,可以通过合并一些类似着色的斑点减小其数目,即使这些斑点不相接触。这产生了具有独立的不连通部分,但是被作为单个元素对待的斑点。由于这仅发生在被在稍后的步骤中丢弃的具有大量小的噪声元素的贴片内,所以质量不会受到影响。Speed and memory usage can be improved by limiting the number of blobs within a tile resulting from segmentation and connected component processing. If there are too many blobs within a patch, the number can be reduced by merging similarly colored blobs, even if they are not touching. This produces blobs that have independent disconnected parts, but are treated as a single element. Since this only happens within tiles with a large number of small noise elements that are discarded in a later step, the quality is not affected.
图19详细地示出了图10的步骤1030。在步骤1905中,获得当前贴片内的斑点。在步骤1910中,检查斑点的周长与面积的比,以便确定其是否是高的。如果发现该参数高于阈值等级,处理在步骤1915处继续。否则,如果步骤1910返回假,处理在步骤1930处继续。在步骤1915中,检查斑点内触及贴片边缘的像素的比。如果该比值在阈值之上,处理在步骤1920处继续。在步骤1920中,将斑点标记为“强制合并贴片间”。设置用于该斑点的强制合并标志,其使得该斑点在图14的步骤1420中更可能与相邻贴片内的连通组分合并。在步骤1920之后,处理在步骤1930处继续。如果在步骤1915中斑点的贴片边缘比低于阈值,处理在步骤1925处继续。在步骤1925中,将斑点与具有最接近的颜色的相邻斑点合并。找到触及当前斑点的所有斑点以及它们的颜色与当前斑点的距离。然后将当前斑点与相邻的颜色最接近的斑点合并。然后,处理在步骤1930处继续。在决定步骤1930中,进行检查以便确定贴片中是否有更多的斑点要处理。如果步骤1930返回真(是),处理返回步骤1905。否则,处理在步骤1935处继续。FIG. 19 shows step 1030 of FIG. 10 in detail. In
在决定步骤1935中,检查贴片内的斑点的当前数目,以便确定是否太多。如果发现斑点数目高于预定限制,处理在步骤1940处继续。否则,处理结束。在步骤1940中,将未触及贴片边缘的相同量化颜色类的斑点合并在一起。为每个量化颜色类进行该处理。不合并触及贴片边缘的斑点,因为这些斑点可以形成大得多的CC的一部分,并且合并它们可能对质量具有有害的影响。在步骤1945中,再次检查贴片内的斑点的当前数目,以便确定贴片内是否仍然有太多斑点。如果现在发现该数目在预定限制之下(否),处理终止。否则,处理在步骤1950处继续。在步骤1950中,合并触及贴片边缘的各个颜色的斑点。步骤1950执行类似于步骤1940的处理,但是将触及贴片边缘的斑点考虑在内,以便将斑点数目减少到限制之下。然后,处理终止。In
2.2.4贴片间合并2.2.4 Merging between patches
图13详细地示出了步骤1040的贴片间合并处理。不以特定的顺序为当前贴片的左和上边界中的每一个重复该处理。下面的描述适合于沿着当前贴片的左或上边界进行合并。例如,32×32的贴片具有与其相邻贴片状态的32像素的边界,并且每个像素具有斑点标签。相邻贴片状态也具有32像素的边界,并且每个像素具有CC标签。所以,对于沿着边界的每个像素阶跃,存在当前贴片内的斑点标签,以及相邻贴片状态内的相应的CC标签。通过获得用于下一个贴片边界像素的当前贴片内的斑点标签和其相邻贴片状态内的CC标签,在步骤1310中处理沿着公共边界开始。FIG. 13 shows the inter-tile merging process of
在决定框1320中执行测试,以便随着处理沿着边界移动,检测CC标签、斑点标签或最后的像素的改变。如果当前像素是最后的边界像素,决定框1320返回是。如果步骤1320返回假(否),处理在步骤1380处继续。否则,如果步骤1320返回真(是),处理在步骤1330处继续。步骤1330检查可用作合并候选的斑点数目和CC数目。在决定步骤1340中,进行检查以便确定是否满足候选计数条件。如果斑点和CC的合并候选计数满足如图17中所示的并且在下面描述的预定条件1700,决定框1340返回是。A test is performed in decision block 1320 to detect changes in the CC label, blob label, or last pixel as processing moves along the boundary. If the current pixel is the last boundary pixel, decision block 1320 returns yes. If step 1320 returns false (No), processing continues at step 1380 . Otherwise, if step 1320 returns true (Yes), processing continues at step 1330 . Step 1330 checks the number of blobs and the number of CCs that can be used as merge candidates. In decision step 1340, a check is made to determine if the candidate count condition is met. Decision block 1340 returns yes if the merge candidate counts of the blob and CC satisfy the
如果图13的决定步骤1340返回假(否),处理在步骤1370处继续。否则,处理在步骤1350处继续。步骤1350基于颜色距离度量识别合并候选中用于合并的最佳的斑点和CC对。设(Ycc,Ucc,Vcc)和(Yblob,Ublob,Vblob)是CC和斑点候选对的YUV颜色值,由下式给出颜色平方距离sd:If decision step 1340 of FIG. 13 returns false (No), processing continues at step 1370 . Otherwise, processing continues at step 1350 . Step 1350 identifies the best blob and CC pair for merging among the merging candidates based on the color distance metric. Let (Y cc , U cc , V cc ) and (Y blob , U blob , V blob ) be the YUV color values of CC and blob candidate pairs, the color squared distance sd is given by:
sd=Wy(Ycc-Yblob)2+Wu(Ucc-Ublob)2+Wv(Vcc-Vblob)2,sd=W y (Y cc -Y blob ) 2 +W u (U cc -U blob ) 2 +W v (V cc -V blob ) 2 ,
其中Wy,Wy,Wy分别是Y,U和V通道的权重。权重Wy,Wy,Wy可以分别设置为0.6,0.2和0.2。最佳斑点和CC对是具有最小平方距离值的一个。where W y , W y , W y are the weights of the Y, U and V channels, respectively. The weights W y , W y , W y can be set to 0.6, 0.2 and 0.2, respectively. The best blob and CC pair is the one with the smallest squared distance value.
由步骤1360处理这个最佳斑点和CC对,步骤1360执行各种合并操作,并且参考图14更详细地描述。在步骤1360之后,或从决定框1340的否之后,处理在步骤1370处继续,其中更新斑点和CC候选计数。图48示出了对所有操作条件进行更新之前和之后的CC和斑点候选计数。如果发现已经改变了CC和斑点标签两者,将CC和斑点候选计数为在合并之前所有可能的计数组合设置为1(更新前的“x”表示“无关紧要的”)。如果仅发现改变了一个标签,并且贴片边界的任意一侧上有两个候选,根据合并两个候选中的哪一个,计数可以设置为0或1。如果选择第二个斑点进行合并(见图49中的4920),则在合并已经发生之后,将两个候选计数设置为0。然而,如果选择第一个斑点或没有选择斑点进行合并,则将两个计数设置为1。在需要候选计数更新的其余情况中,将已经改变的标签的计数增加1,而未改变的标签的计数设置为1。This best blob and CC pair is processed by
在步骤1320或1370之后,处理在步骤1380处继续。在决定框1380中,进行检查以便确定当前像素是否是最后的边界像素。如果步骤1380返回假(否),处理在步骤1310中移动到下一个像素位置。否则,处理终止。After step 1320 or 1370, processing continues at step 1380. In decision block 1380, a check is made to determine if the current pixel is the last boundary pixel. If step 1380 returns false (No), processing moves to the next pixel location in step 1310 . Otherwise, processing terminates.
2.2.4.1贴片间合并条件2.2.4.1 Merging conditions between patches
根据图17,除CC和斑点标签两者同时改变(在该情况下,对于合并来说,每侧上有一个候选就足够了)时之外,当在一侧上有两个候选,并且在另一侧上有一个时通常发生合并。这使得避免了将一侧上的两个相邻候选合并到另一侧上的相同候选。图17示出了可以执行贴片间合并操作的条件。如果当前CC和斑点计数是(1,1),则为了发生合并,两个标签必须同时改变。然而,如果当前CC和斑点计数是(1,2)或(2,1),则任何标签的改变足以发生合并。情况(2,2)从不会出现。According to Figure 17, except when both the CC and the blob label change at the same time (in which case one candidate on each side is sufficient for merging), when there are two candidates on one side, and on Merging usually occurs when there is one on the other side. This avoids merging two adjacent candidates on one side into the same candidate on the other side. FIG. 17 shows conditions under which an inter-tile merge operation can be performed. If the current CC and blob count are (1, 1), then both labels must change at the same time for the merge to occur. However, if the current CC and blob count are (1,2) or (2,1), any label change is sufficient for merging to occur. Case (2, 2) never arises.
2.2.4.2贴片间合并的例子2.2.4.2 Example of merging between tiles
图49提供了在斑点和CC候选间进行合并的示例。存在3种情况:i)在4910中,每侧仅有一个候选;ii)4920具有一个CC和两个斑点,和iii)4930具有两个CC和一个斑点。在情况4920中,CC候选与两个斑点候选相连。假设两个斑点的颜色与CC接近,但是仅有一个可以与CC合并。如果按每次每侧一个候选地从上到下的顺序执行合并,则顶部的斑点与CC合并,剩下底部的斑点不合并。当底部斑点对于合并可能是更好的候选时,这可能不能产生最理想的结果。因此,对于类似于4920和4930中的情况,一侧上需要两个候选。情况4910在每侧上仅需要一个候选,因为对于4连通,没有可替换的合并组合。Figure 49 provides an example of merging between blobs and CC candidates. There are 3 cases: i) in 4910, there is only one candidate per side; ii) 4920 has one CC and two spots, and iii) 4930 has two CCs and one spot. In case 4920, the CC candidate is concatenated with two blob candidates. Suppose two spots are close in color to CC, but only one can be merged with CC. If merging is performed one candidate per side from top to bottom at a time, the top blobs are merged with the CC, leaving the bottom blobs unmerged. This may not yield the most desirable results when the bottom blob might be a better candidate for merging. Thus, for cases like in 4920 and 4930, two candidates are required on one side. Case 4910 requires only one candidate on each side, since for 4-connectivity there is no alternative merge combination.
通过以贴片光栅顺序的贴片间合并斑点,形成跨越多于一个贴片的颜色相连的组分。如图39的例子3900中所示,这被以光栅顺序执行,其中贴片3910内的斑点与分别位于贴片3910左边和上边的两个相邻贴片3930和3920的任意一个内的斑点合并。Color-connected components spanning more than one tile are formed by inter-tile merging blobs in tile raster order. As shown in example 3900 of FIG. 39, this is performed in raster order, where blobs within
每个CC存储在保持关于其边界框、均值颜色、以像素衡量的尺寸和接触的CC的信息的数据结构内。在贴片间合并处理中,给当前贴片内的每个斑点分配CC标签,并且使用斑点统计量更新相应的CC数据结构。在当前贴片1630和沿着如图16中所示的当前贴片1630的左和上边界的两个相邻贴片1610,1620内的状态1612、1622间执行合并。当前贴片1630是一块已经被处理以便形成斑点的像素数据,并且具有用于沿着公共边界的像素的斑点标签1634。相反,以前的贴片状态1612、1614不包含像素数据,仅有斑点统计量和与这些斑点链接的连通组分信息。贴片状态1612、1622是压缩的数据结构,其包含关于沿着该贴片1610、1620中的边界的斑点的信息,以及指向这些斑点所属的CC1614、1624的指针。具体地,有两种贴片状态:左贴片状态1612和上贴片状态1622;每一个具有用于分别与它们右边和下边的贴片合并的,沿着它们与当前贴片1630的公共边界的每个像素的CC标签信息1614、1624。以前的贴片中的斑点现在是连通组分的一部分。Each CC is stored within a data structure that holds information about its bounding box, mean color, size in pixels, and CCs it touches. In the inter-tile merging process, a CC label is assigned to each blob within the current tile, and the corresponding CC data structure is updated with blob statistics. Merging is performed between the
2.2.4.3处理最佳斑点和CC对2.2.4.3 Dealing with Optimal Spots and CC Pairs
图14更详细地示出了图13的步骤1360,其中采用在步骤1350识别出的最佳斑点和CC对作为输入。处理在步骤1410中开始。在决定步骤1410中,进行检查以便确定该斑点是否已经与另一个CC合并。如果在决定框1410中识别出的斑点已经被分配了CC标签,处理在步骤1440处继续。否则,处理移动到决定框1420。在决定框1420中,将在步骤1330中计算的识别出的斑点和CC对间的颜色距离与颜色合并阈值比较。该阈值可以是450。如果设置了强制合并标志,则阈值可以是900。如果颜色距离小于阈值,处理在步骤1430处继续。否则,处理在步骤1460处继续。在步骤1430中,将该斑点和识别出的CC合并在一起。这可以通过使用斑点的统计量更新CC的统计量,并且将CC的标签分配给斑点进行。然后处理终止。在步骤1460中,为当前斑点形成新的CC。然后处理终止。FIG. 14 shows step 1360 of FIG. 13 in more detail, taking as input the best blob and CC pair identified at step 1350 . Processing begins in
在决定框1440中,将识别出的CC和识别出的斑点所属的CC间的颜色距离与用于合并的颜色阈值进行比较。如果两个CC间的颜色距离在阈值之下,处理在步骤1450处继续。在步骤1450中,将这些CC映射在一起。这可以通过将它们的统计量组合在一起,并且设置将CC链接在一起的“映射到”指针进行。然后处理终止。类似地,如果步骤1440返回假(否),处理终止。In
2.2.4.4 CC映射结果2.2.4.4 CC mapping results
图18是根据图14的步骤1450的CC映射处理的结果的图示。该图示出了作为合并CC1805、1810、1820、1830的结果,可以形成CC的链表1850。在该图示中,CC(k)1830具有指向NULL 1840的映射到指针1832,其指示它从未被合并到另一个CC内,因此它称为根CC。另外,通过在若干合并上累积各个CC统计量,确定根CC的统计量1834。例如,CC 1830的最终统计量是在将CC(h)1805,CC(i)1810,CC(j)1820和CC(k)1830被合并在一起之前,这些CC的组合统计量。合并这些统计量的次序不重要。在图18中,CC(i)1810指向CC(j)1820,CC(h)1805和CC(j)1820指向CC(k)1830。FIG. 18 is an illustration of the results of the CC mapping process according to step 1450 of FIG. 14 . The figure shows that as a result of merging
2.2.5后合并处理2.2.5 Post-merge processing
图50是图10的后合并处理步骤1050的流程图。在步骤5010,为当前贴片中的在与图14的步骤1460相同的处理中的每个未合并的斑点形成新CC。在步骤5020中,使用斑点标签输出用于存储斑点的形状和外观的二值图像。对于具有n个斑点的贴片,仅需要输出n-1个二值图像,因为第n个二值图像被隐含地存储为去掉n-1个区域后剩余的区域。因此,包含单个斑点的平坦贴片不需要存储任何二值图像。在可替换的实施例中,可以将二值图像存储在单个压缩的数据结构诸如索引图内,其中每个像素位置具有一个斑点索引,并且被使用log2(n)位表示。例如,如果每个贴片的最大斑点数目是16,使用4位的数字编码每个像素位置处的斑点索引。在另一个可替换的实施例中,可以使用1位位图存储用于两级贴片的二值图像。在步骤5030中,更新当前贴片内的每个CC的接触列表。这可以通过识别出该贴片内所有相邻的CC进行。在步骤5040中,输出贴片状态。颜色分段处理将斑点和CC信息存储在用于与下一个输入贴片合并的压缩的贴片状态数据结构内。如上所述,有两个贴片状态,其中左边的贴片状态具有沿着右贴片边界的CC标签信息,上部的贴片状态具有沿着下贴片边界的CC标签信息。FIG. 50 is a flowchart of the
2.3分段例子2.3 Segmentation example
图25(a)示出了一个简单的例子,其示出了基于多于2个量化等级分段的优点。背景2510是黑色的,并且其上放置了一个白色的三角形2520和灰色的单词“text”的字母2530。如图25(b)和(c)所示,该图像的二值分段典型地导致文本2530与背景2510或三角形2520的合并。这些分段中的任何一个都不能用于文档布局分析以便选择文本区域-文本特征已经丢失了。Figure 25(a) shows a simple example showing the advantages of segmentation based on more than 2 quantization levels. The background 2510 is black and on it is placed a white triangle 2520 and the letters 2530 of the word "text" in gray. Binary segmentation of the image typically results in merging of text 2530 with background 2510 or triangles 2520 as shown in Figures 25(b) and (c). Neither of these segments can be used for document layout analysis for selecting text regions - the text features are lost.
同时,存在允许简化连通组分的布局分析的二值分段的某些特征。考虑具有连通外边界和由4路连通形成的CC的页的情况。在该情况下,除了页的边缘在,在边界处接触另一个CC的每个CC或是被该CC包含,或是包含该CC,并且一个CC仅可以由一个其他CC包含。可以产生一个明确的包含分层结构,并且以树结构表示。树中的每个连续的层包含与前面相反极性的CC,并且每个分支由一组共享唯一父亲的CC构成。由于该分层结构可用于选择作为将被分组在一起的候选的CC的子集(共享唯一父亲的那些),这种分层结构可用于对CC分组。由于一次需要考虑少数的CC,这在处理速度方面是有益的,并且在准确性方面也是有益的。来自页的不同区域的CC可以在树的不同分支上,并且不被分组在一起。另外,可以在树的顶部开始CC的处理,并且可以为低于某种分类(例如,文本)的CC的树分支终止,以便进一步改进处理时间。At the same time, there are certain features of binary segmentation that allow simplifying the layout analysis of connected components. Consider the case of pages with connected outer boundaries and CCs formed by 4-way connectivity. In this case, except at the edge of the page, every CC that touches another CC at the boundary is either contained by that CC or contains that CC, and a CC can only be contained by one other CC. An explicit containment hierarchy can be generated and represented in a tree structure. Each successive level in the tree contains CCs of the opposite polarity to the preceding, and each branch consists of a set of CCs that share a unique parent. This hierarchy can be used to group CCs since it can be used to select a subset of CCs (those that share a single parent) that are candidates to be grouped together. This is beneficial in terms of processing speed and also in terms of accuracy since a small number of CCs need to be considered at a time. CCs from different regions of the page can be on different branches of the tree and are not grouped together. Additionally, the processing of CCs can be started at the top of the tree, and tree branches can be terminated for CCs below a certain category (eg, text) to further improve processing time.
如果分段不是二值的,一般地不能有帮助地产生明确的分层结构。考虑图25(a)中的字母“e”。该字母的外边界与黑色背景和白色三角形两者接触,从而背景或三角形可以被认为是该CC的父亲。三角形的情况甚至更复杂,因为其外边界接触背景和所有字母。尽管有这种不确定性,可以实现在对对象进行分组时使用分层结构的益处。这是通过定义在边界相接触的两个CC间的父亲-孩子关系特性,而不需要针对每个孩子的唯一的父亲完成的。例如,如果这些CC在边界处接触,并且一个CC(父亲)的边界框完全包围第二个(孩子)的边界框,则可以在两个CC间定义父亲-孩子关系。为图25(a)中所示的例子使用这个定义,三角形和单词“text”的所有字母都定义为背景CC的孩子。If the segments are not binary, a clear hierarchical structure cannot generally be helpfully generated. Consider the letter "e" in Figure 25(a). The outer border of the letter touches both the black background and the white triangle, so either the background or the triangle can be considered the parent of the CC. The case of a triangle is even more complicated, since its outer border touches the background and all letters. Despite this uncertainty, the benefits of using a hierarchy when grouping objects can be realized. This is done by defining the parent-child relationship properties between two CCs that touch at the boundary, without requiring a unique parent for each child. For example, a parent-child relationship can be defined between two CCs if these CCs touch at the border and the bounding box of one CC (parent) completely surrounds the bounding box of the second (child). Using this definition for the example shown in Figure 25(a), the triangle and all letters of the word "text" are defined as children of the background CC.
3布局分析3 layout analysis
布局分析是该系统的一部分,其中识别页的前景内容。中间(布局分析)模块采用来自前端模块的一列连通组分和“接触列表”作为输入。布局分析的输出基本上是关于扫描图像中哪些连通组分表示前景内容(即,文本、表、黑点(bullet point))的决定。布局分析基于颜色分段,而不是二值图像。在可能发现的前景对象的分类方面这具有若干益处,但是不存在类似对于二值图像来说存在的清楚的包含分层结构。出于效率,布局分析仅使用边界框和少许其它一般的连通组分的统计量,以便作为其分组的基础。布局分析不访问原始像素数据或甚至是位级别的分段。Layout analysis is part of this system, in which the foreground content of a page is identified. The middle (layout analysis) module takes as input a list of connected components and a "contact list" from the front-end module. The output of layout analysis is basically a decision about which connected components in the scanned image represent foreground content (ie, text, tables, bullet points). Layout analysis is based on color segments rather than binary images. This has several benefits in terms of classification of foreground objects that may be found, but there is no clear containment hierarchy like that exists for binary images. For efficiency, layout analysis uses only bounding boxes and a few other general connected component statistics as the basis for its grouping. Layout analysis does not access raw pixel data or even bit-level segmentation.
布局分析的主要步骤是:基于接触列表形成包含分层结构,基于CC的边界框和颜色对它们分组,和测试这些组,以便确定CC是否类似文本的行那样被很好地对齐了。使用接触列表提供用于CC的分层结构,它是两级包含分层结构的多颜色等同物。给定的CC可以被认为是它的接触列表元素的一个子集的父亲。具体地,给定CC可以是该给定CC所接触的,并且其边界框完全包含在父亲CC的边界框内的那些CC的父亲。The main steps of layout analysis are: forming containment hierarchies based on contact lists, grouping CCs based on their bounding boxes and colors, and testing these groups to determine whether CCs are well aligned like lines of text. Hierarchies are provided for CC using contact lists, which are the multi-color equivalent of a two-level containment hierarchy. A given CC can be thought of as the parent of a subset of its contact list elements. Specifically, a given CC may be the parent of those CCs that are touched by the given CC and whose bounding boxes are completely contained within the parent CC's bounding box.
图4详细地示出了图1的步骤120,其采用压缩CC信息、统计量和由步骤110产生的“接触列表”信息作为输入。接触列表描述哪些CC彼此邻接-即,哪些CC共享边界。图4的处理不访问输入图像。该信息的一部分是输入图像中的所有CC的列表。FIG. 4 shows step 120 of FIG. 1 in detail, which takes as input compressed CC information, statistics, and "contact list" information produced by
在步骤410中,基于CC的统计量对其分类,从而从CC列表形成颜色包含分层结构。颜色包含分层结构是这样的结构,其中每个节点是一个CC。父亲节点具有父亲节点接触到的、并且其边界框完全包含在父亲CC的边界框内的CC作为其孩子。孩子节点可以有多于一个父亲节点。分析可以仅基于边界框大小和形状。具有宽度和高度均小于1/100英寸(例如300dpi分辨率的3个像素)的CC被认为是噪声,并且被去除。具有宽度或高度在1英寸以上,或宽度和高度均在8/15英寸以上的连通组分被分类为图像。任何其它的东西被分类为潜在的文本。可替换的实施例可以包括涉及其它文档布局特征的分类(诸如表、连通组分中的像素数目),并且可以使用其它值。In
在步骤420中,将潜在文本CC分组在一起,表示文本区域。CC通常被与附近的CC分组在一起,并且有效的分组算法通过在确定分组之前寻找相邻CC利用这个事实。前端中使用的高分辨率颜色分段方法可以找到在典型的扫描文档上进行分组时考虑的数千个同胞(sibling)。在这些情况下,使用简单的对式比较寻找相邻CC,一种O(N2)的方法,可能是慢的,并且必须使用确定邻居的更复杂的方法。可以在颜色包含分层结构的节点上执行三角测量。如果页上的CC的边界框的中心在平面上定义节点,有效的三角测量方法可用于此目的,诸如Delaunay三角测量。这些方法通常是O(NlogN)个处理。In
图53示出了平面内的一组节点的Delaunay三角测量(由虚线指出)以及Voronoi图(由实线指出)的示例5300。Voronoi图是页到区域的一种分段,所述区域到给定的点比到其它点更近。Delaunay三角测量是两重Voronoi图,该图可以通过将共享Voronoi图中的边界的点连接在一起产生。在这个三角测量中,平面内随机放置的点中的平面内的典型的点具有大约5个连接到它的点。可以将这些点考虑为分组阶段中好的邻居候选。Figure 53 shows an example 5300 of a Delaunay triangulation (indicated by dashed lines) and a Voronoi diagram (indicated by solid lines) of a set of nodes in a plane. A Voronoi diagram is a segmentation of pages into regions that are closer to a given point than to other points. A Delaunay triangulation is a dual Voronoi diagram that can be produced by connecting together points that share a boundary in the Voronoi diagram. In this triangulation, a typical point in the plane has about 5 points connected to it among randomly placed points in the plane. These points can be considered as good neighbor candidates in the grouping phase.
三角测量的输出使其适合于作为形成CC分组的方法。基于在Delaunay三角测量内相邻的那些CC的边界框的对式比较,将这些CC分组在一起。然后在这种初始分组之后在相邻CC对上进行后续轮,以便将这些组结合在一起,或将未分组的CC放置在已有组内。处理还可以在单个轮中通过该数据寻找不同类型的分组(文本、表等)。文本CC的组一般有如下特征:类似的颜色;边界框的类似大小;沿着水平或垂直轴粗略对齐(取决于文本对齐);和相对于CC的大小沿着对齐轴靠近在一起。The output of triangulation makes it suitable as a method of forming CC groupings. These CCs are grouped together based on a pairwise comparison of the bounding boxes of those CCs that are adjacent within the Delaunay triangulation. Subsequent rounds are then performed on adjacent CC pairs after this initial grouping in order to join the groups together, or to place ungrouped CCs within existing groups. Processing can also look for different types of groupings (text, table, etc.) through the data in a single round. Groups of text CCs generally have the following characteristics: similar colors; similar sizes of bounding boxes; roughly aligned along the horizontal or vertical axis (depending on text alignment); and close together along the alignment axis relative to the size of the CCs.
在步骤430中,检查或验证CC组,以便确定哪些CC组是文本字符。由处理器存储关于组内容和在分组阶段产生的合并的信息。可以将关于各个单独分组的信息存储在一个数据结构内,该数据结构包括颜色、边界框和组的内容。在分组阶段中当组的内容改变时,更新这些结构。在可替换的实施例中,将分组标记包括在CC数据结构内,并且可以从CC数据重构数据,诸如组颜色和边界框。在步骤430中,对文本字符CC进行对齐测试作为额外的检查,以便确保CC是文本。In
形成的组一般包括全部文本,但是可能还包括不希望被分类为文本的图像部分。为了减轻这个问题,检查这些组,以便查看组中的连通组分是否类似于文本以整齐的行(或列)排列,或类似于噪声或图像的类似着色的区域往往表现出的那样随机排列。The formed group typically includes the entire text, but may also include parts of the image that are not desired to be classified as text. To mitigate this problem, the groups are examined to see whether the connected components in the group are arranged in neat rows (or columns) like text, or randomly like noise or similarly colored regions of images tend to exhibit.
这主要可以通过形成边界框的边的4个直方图,每侧(即,左、上、右或下边)一个来完成。它们中的一个应当是满库,文本的基线在那里,并且其它位置为空。为了对其进行检查,可以找到这些直方图库的平方和,并且与预期值比较。如果发现4个直方图库的任意一个比随机布置的边界框所预期的高得多,可认为该组是文本。使用所有4个边界框的边,从而允许以侧面或从上向下方式扫描的页或以列而非行进行排列的文本。This can primarily be done by forming 4 histograms of the sides of the bounding box, one for each side (ie left, top, right or bottom). One of them should be full, the baseline of the text is there, and the other is empty. To check this, the sum of squares of these histogram bins can be found and compared to the expected value. If any of the 4 histogram bins were found to be much higher than expected from a randomly placed bounding box, the group was considered to be text. Uses all 4 sides of the bounding box, allowing pages to be scanned sideways or top-to-bottom, or text arranged in columns rather than rows.
3.1对CC分组3.1 Grouping CCs
图20详细地示出图4的步骤420,其对由步骤110分段的一组CC分组。通过获得根CC,处理在步骤2010处开始。根CC是在颜色分段阶段未被合并到其它CC内的一个。在步骤2020中,找到根CC的孩子CC。形成该根CC的孩子CC列表。孩子可以定义为其边界框完全包含在当前根CC的边界框内,并且在边界接触该边界框的CC。参考图22更详细地描述步骤2020。FIG. 20 shows step 420 of FIG. 4 in detail, grouping the set of CCs segmented by
从步骤2020,处理移动到步骤2030。在步骤2030中,对当前CC的孩子进行邻居分析。对于每个孩子,找到在某个定义的路径上接近的一组相邻的CC。这可以通过例如找到每个孩子CC的边界框的中心的Delaunay三角测量实现。三角测量中的边表示相邻CC间的连通。可替换的方法可以使用边界框数据和该CC列表的颜色信息的不同元素定义接近。在步骤2040中,使用邻居数据执行初始分组。该处理步骤2040形成相同孩子列表内的类似属性(例如,几何形状和颜色)的对象组,以便确定文档布局的特征。图23更详细地描述步骤2040。From
在决定步骤2050中,进行检查以便确定是否剩余有更多的根CC要处理。如果有更多的根CC,处理返回步骤2010,并且获得下一个根CC,并且随后进行处理。否则,分组阶段(420)终止。In
3.1.1寻找父亲CC的孩子3.1.1 Finding the child of the father CC
图22示出了寻找孩子CC和形成给定父亲CC的孩子列表的步骤2020。在步骤2210中,从父亲CC的接触CC列表中获得接触CC。在步骤2220中,找到根CC。这可以使用与该接触CC相关联的来自颜色分段阶段的任意合并信息完成。在决定步骤2230中,检查来自图4的步骤410的CC分类,以便确定CC是否满足类测试(即,将该CC存储在孩子列表内是否适合)。可以存储除了噪声大小CC之外的所有CC,但是可替换的实施例可以存储类的其它组合,例如,仅存储潜在的文本。如果接触CC的类是适合的,处理在步骤2240处继续。否则,处理在步骤2260处继续。在决定步骤2240中,检测CC与父亲CC的包含性。包含测试涉及检查该CC的边界框是否完全被父亲CC的边界框覆盖,但是可替换的方法也是可行的。如果满足包含测试,处理在步骤2260处继续。在步骤2250中可以将该CC包括在孩子列表内,然后处理继续到步骤2260。在孩子列表中任何CC仅应出现一次,从而步骤2250包括检查该CC未包括在列表内。可以使用散列表这些检查。然后处理在决定步骤2260处继续。步骤2260测试父亲的接触列表中是否有任意更多的元素。如果是的,处理在步骤2210处继续。否则,完成了当前CC的孩子列表,并且处理终止。Figure 22 shows the
3.1.2初始分组3.1.2 Initial grouping
图23示出了使用来自图20的步骤2030的邻居数据执行的初始分组的图20的步骤2040。该方法可以设计为仅分组文本,但是在可替换的实施例中,该方法还可以对其它文档对象(诸如表)分类或分组。初始分组可以是两轮处理,其中第一轮将CC结合到组内,并且第二轮将组结合在一起成为更大的组。在步骤2305中,将计数器PASS设置为1。在步骤2310中,获得孩子。在步骤2320中,获得孩子的邻居。在步骤2330中,进行检查以便确定孩子和邻居是否满足分组测试。使用一系列测试对这些对象进行分组测试。测试可以基于初始分类、几何形状和颜色,并且可以在第一和第二轮中不同。下面描述的图26提供了对文档的一个区域的提取,以该文档解释步骤2320的分组测试。FIG. 23 shows step 2040 of FIG. 20 performing initial grouping using neighbor data from
参考图23,如果满足分组测试,在步骤2340将孩子和邻居CC分组在一起。然后,处理在步骤2350处继续。否则,如果步骤2330返回假,处理在步骤2350处继续。如果两个CC被分组在一起,将每一个标记为属于相同组。CC被标记的组取决于这两个CC的在先分组。如果还未对任何一个分组,则形成包含两个CC的新分组。如果仅仅已经对两个中的一个进行了分组,将另一个CC包括在该组内。如果已经对两个进行了分组,并且组是相同的,不采取行动。最后,如果两个已经被分组,并且组不同,将两个组合并为单个组,并且将另一个组标记为空。Referring to Figure 23, if the grouping test is met, at
在步骤2350,进行检查以便确定是否有当前CC的更多的邻居。如果存在更多邻居,处理在步骤2320处继续。否则,处理在步骤2360处继续。在步骤2360中,处理检查父亲CC更多的孩子。如果存在更多的孩子,处理在步骤2310处继续。否则,处理在步骤2370处继续。在步骤2370中,进行测试以便确定是否完成了两轮(PASS>1?)。如果是这种情况,处理终止。如果仅完成了第一轮,处理在步骤2380处继续,并且递增计数器PASS。处理在步骤2390处继续。在步骤2390中,处理返回孩子列表的开始。然后处理返回步骤2310,并且开始第二轮。At
在可替换的实施例中,处理2040在三角测量数据的边而不是孩子CC和邻居对中循环。由于每个邻居对仅被考虑一次,这稍微更有效。In an alternative embodiment, the
3.1.2.1对两个CC的分组测试3.1.2.1 Group testing of two CCs
为了示出对两个相邻CC的优选分组测试,图26示出了从刚好包含两个对象-字母“g”和“h”的文档区域中的简单提取。虚线表示每个字母的边界框的坐标。第i个CC的左和右x坐标以及上和下y坐标分别位于xi 1,xi r,yi t,yi b。下标1和2分别指示第一个和第二个CC(即,字母“g”和“h”)。本实施例的处理还使用YUV空间内的每个CC的颜色[yi,ui,Vi]和边界框的宽度wi和高度hi。To illustrate a preferred grouping test for two adjacent CCs, Figure 26 shows a simple extraction from a region of the document containing exactly two objects - the letters "g" and "h". Dashed lines indicate the coordinates of the bounding box of each letter. The left and right x coordinates and the upper and lower y coordinates of the i-th CC are located at x i 1 , x i r , y i t , and y i b , respectively.
将两个CC的水平重叠距离定义为由两个CC覆盖的水平部分的长度,或如果CC不重叠则为0。类似地定义垂直重叠距离dyov,并且如图26中所示。在图26中水平重叠是0,并且从而未标注。重叠距离可以表示如下:Define the horizontal overlap distance of two CCs as the length of the horizontal portion covered by the two CCs, or 0 if the CCs do not overlap. The vertical overlap distance dy ov is similarly defined and shown in FIG. 26 . The horizontal overlap is 0 in FIG. 26 and is thus not labeled. The overlap distance can be expressed as follows:
两个CC的水平内部距离定义为一个CC的左边和另一个的右边间的最短距离,或如果水平重叠距离不为0则其为0。使用CC的上和下边以相同的方式定义垂直内部距离。图26中示出了水平距离dxin,而对于这个例子,垂直内部距离为0,并且未被表示。这些内部距离可以表示如下:The horizontal internal distance of two CCs is defined as the shortest distance between the left side of one CC and the right side of the other, or 0 if the horizontal overlap distance is not 0. Define the vertical interior distance in the same way using the top and bottom sides of the CC. The horizontal distance dx in is shown in Figure 26, while for this example the vertical inner distance is 0 and is not represented. These internal distances can be expressed as follows:
在第一轮中,如果两个相邻CC满足基于颜色、大小和对齐的3个条件的要求,这些CC被作为文本分组在一起。如果:In the first round, if two adjacent CCs meet the requirements of 3 conditions based on color, size and alignment, these CCs are grouped together as text. if:
则满足颜色条件,其中阈值参数可以是Tc=500。Then the color condition is satisfied, wherein the threshold parameter may be Tc=500.
如果:if:
则满足大小测试,其中wmin是两个CC的最小宽度,wmax是最大宽度,hmin是最小高度,并且hmax是最大高度。阈值参数可以是TR=0.55。Then the size test is satisfied, where w min is the minimum width of the two CCs, w max is the maximum width, h min is the minimum height, and h max is the maximum height. The threshold parameter may be T R =0.55.
如果满足下面条件中的任意一个,则满足对齐条件:An alignment condition is met if any of the following conditions are met:
[(dxov>0)and(dyin/max(wmin,hmin)<Ts)],[(dx ov >0) and (dy in /max(w min , h min )<T s )],
或or
[(dyov>0)and(dxin/max(wmin,hmin)<Ts)], (5)[(dy ov >0) and (dx in /max(w min , h min )<T s )], (5)
阈值参数可以是Ts=0.65。The threshold parameter may be T s =0.65.
第二轮使用基于组而不是单个CC的参数。可以使用每组元素的均值颜色[Yi,Ui,Vi]、宽度Wi和高度Hi。对于未分组的CC的情况,将这些值设置为单独CC的颜色、宽度和高度参数。测试还使用被考虑的CC的中心间的距离D,其定义如下:The second round uses parameters based on groups rather than individual CCs. The mean color [Y i , U i , V i ], width W i and height Hi of each group of elements may be used. For the case of ungrouped CCs, set these values as the individual CC's color, width and height parameters. The test also uses the distance D between the centers of the CCs under consideration, which is defined as follows:
如同对于第一轮,如果满足一系列条件则将组合并。这些条件涉及颜色类似性Tc,大小TR和间隔To,并且由下式描述:As for the first round, groups are merged if a series of conditions are met. These conditions relate to color similarity T c , size T R and spacing T o , and are described by:
max(Wmin/Wmax,Hmin/Hmax)>TR,max(W min /W max , H min /H max )>T R ,
(7) (7)
min(Wmin/Wmax,Hmin/Hmax)>TR2,min(W min /W max , H min /H max )>T R2 ,
D/max(Wmin,Hmin)<TD,D/max(W min , H min )<T D ,
其中,参数值可以是:如果任意组包含3个或更少元素,TCg=500,TR=0.55,TR2=0.3,和TD=1,以及否则TCg=100,TR=0.55,TR2=0.3,和TD=2。在第二分组阶段中不使用对齐测试。where the parameter values may be: T Cg =500, T R =0.55, T R2 =0.3, and T D =1 if any group contains 3 or less elements, and T Cg =100, T R =0.55 otherwise , T R2 =0.3, and T D =2. Alignment testing is not used in the second grouping stage.
阈值可以取决于被进行分组测试的CC的特征,例如,每个CC的像素计数。The threshold may depend on the characteristics of the CCs being group tested, for example, the pixel count per CC.
3.2检查组3.2 Inspection Team
图21详细地示出了用于检查组的图4的步骤430。该处理确定每个组是否由文本构成。主要基于是否发现组内的对象在行或列上对齐进行该决定。假设组是文本,对其进行文本类属性测试,并且如果组未通过这些测试,则将其否决。FIG. 21 shows in
在步骤2110中,获得在步骤420中形成的下一个组。在步骤2120中,估计组内的文本字符的大小。估计的大小基于单个字符的长度的统计量。这些长度可以定义为对象的边界框的宽度和高度的最大值。该测量对倾斜和页上文本的对齐相当不灵敏,并且对于给定大小的典型字体内的字符集足够统一。在可替换的实施例中,可以使用边界框面积、像素计数和/或笔划宽度作为长度的测量值。可以形成字符长度的直方图,并且估计的大小可以基于与具有多于库内元素阈值数目的直方图库相关联的最大长度。使用的阈值最少为3个对象,并且至少为组内对象数目的15%。如果不存在这样的库,不返回估计。In
在决定步骤2125中,进行检查以便确定是否找到字符的大小。如果未能找到适合的字符大小,拒绝该组,并且处理在步骤2160处继续。否则,处理在步骤2130处继续。In
步骤2130处理组内的CC和该组的边界框内包含的其它适合的、但是仍未分配到任何组的CC。该处理2130对于增加可能被初始分组错过的文本,以及小对象诸如可能从基于分类的初始分组中被忽略的标点符号是有益的。仅有与该组中的对象共享父亲、并且颜色足够类似的对象可以被添加到该组。如果满足下面的条件,则满足组和包含的CC的颜色类似性条件:Step 2130 processes the CCs within the group and other CCs contained within the group's bounding box that are suitable but are not yet assigned to any group. This process 2130 is beneficial for augmenting text that might have been missed by the initial grouping, as well as small objects such as punctuation that might have been omitted from the initial grouping based on classification. Only objects that share a parent with objects in the group and are sufficiently similar in color can be added to the group. The color similarity condition for a group and contained CCs is met if the following conditions are met:
其中[Y,U,V]是组的颜色,[y,u,v]是CC的颜色。参数值可以是TCg2 = 500。where [Y, U, V] is the color of the group and [y, u, v] is the color of the CC. The parameter value may be T Cg2 =500.
可替换地,可以在步骤2130中应用几何测试,并且可以放松CC的边界框完全包含在组的边界框内的要求,从而将接近组的对象结合到组内。步骤2130的其它可替换方案可以合并某些对象以便形成字符。这旨在针对具有复杂字符的脚本,诸如中文的,这些字符可能被分段为多于一个单独对象,并且有益于改进后面处理中的对齐测试的准确性。仅在两个对象的边界框重叠时,可以合并两个对象。如果合并将创建大于1.6的纵横比,或创建比步骤2120中估计的字符大小大的合并对象,则限制不发生合并。Alternatively, a geometric test may be applied in step 2130, and the requirement that the CC's bounding box is completely contained within the group's bounding box may be relaxed, thereby incorporating objects close to the group into the group. Other alternatives to step 2130 may combine certain objects to form characters. This is aimed at scripts with complex characters, such as Chinese, which may be segmented into more than one single object, and is useful for improving the accuracy of alignment tests in later processing. Two objects can be merged only if their bounding boxes overlap. If the merging would create an aspect ratio greater than 1.6, or create a merging object larger than the character size estimated in
在步骤2150中,检查组内的对象的对齐。该测试将文本组与其它组区分开,并且在下面更详细地描述。在该步骤后,在步骤2160执行测试,以便确定是否有更多组要处理。如果有更多的组,处理返回步骤2110。否则,处理430结束。In
3.2.1检查对齐3.2.1 Check alignment
图24更详细地给出了图21的步骤2150。在该步骤过程中,将组内的CC的子集定义为字符。字符可以是具有大于2120中估计的字符大小的一半,并且小于该大小的两倍的大小的那些对象。FIG. 24 shows step 2150 of FIG. 21 in more detail. During this step, a subset of CCs within a group is defined as characters. Characters may be those objects having a size greater than half the size of the character estimated in 2120, and less than twice that size.
基于关于组元素的一系列参数的直方图分析,步骤2430到2450进行组的接受测试。这些参数是每个字符的边界框的左、上、下和右边。使用多个参数允许识别出不同对齐的文本,由于文本在页上的对齐取决于许多因素,诸如语言和文本在页上的倾斜。可替换地,可以使用水平和垂直边界框参数的各种组合识别更宽范围的文本对齐。
在步骤2430中,为下一个参数的组元素值形成直方图。可以根据组字符的大小缩放直方图内的库的大小。可以为上和下边界框边使用组内字符的平均高度的1/5的值(取整),并且可以为左和右边界框边使用组内字符的平均宽度的1/5(也取整)。设置直方图内的库范围,从而所有数据被包括在该范围的每一端处的非空库内。可以将直方图覆盖的最低值设置为组内的参数的最低值。In
决定步骤2440测试直方图内的值是否很好地对齐,形成离散的簇(cluster)(理想地表示不同行文本的基线),而不是随机散开。步骤2440测试组内的字符数N是否大于具有优选值T=7的阈值T。
对于小组(N<T),步骤2440检查3个参数AL1,AL2和OV。AL1是直方图内的最大库的计数。AL2是直方图内第二大库的计数。OV是组内重叠字符的最大子集的大小。表2内的伪码描述了用于该组的测试。如果伪码返回Y,则组通过对齐测试,并且如果伪码返回N,测试失败。For small groups (N<T),
表2Table 2
IF(N<3)IF(N<3)
return Nreturn N
ELSE IF(N=3)ELSE IF(N=3)
IF(AL1>=2 and OV>=3)IF(AL1>=2 and OV>=3)
return Yreturn Y
ENDEND
ELSE IF(N=4)ELSE IF(N=4)
IF(AL1>=2 and OV>=4 and AL2>=2)IF(AL1>=2 and OV>=4 and AL2>=2)
return Yreturn Y
ELSE IF(AL1>=3 and OV>=4)ELSE IF(AL1>=3 and OV>=4)
return Yreturn Y
ENDEND
ELSE IF(N=5)ELSE IF(N=5)
IF(AL1>=4 and OV>=5)IF(AL1>=4 and OV>=5)
return Yreturn Y
ENDEND
ELSE IF(N=6)ELSE IF (N=6)
IF(AL1>=2 and OV>=6 and AL2>=2)IF(AL1>=2 and OV>=6 and AL2>=2)
return Yreturn Y
ELSE IF (AL1>=4 and OV>=6)ELSE IF (AL1>=4 and OV>=6)
return Yreturn Y
ELSE IF(AL1>=3 and AL2>=3)ELSE IF(AL1>=3 and AL2>=3)
return Yreturn Y
ENDEND
ELSE IF(N=7)ELSE IF (N=7)
IF(AL1>=2 and OV>=6 and AL2>=2)IF(AL1>=2 and OV>=6 and AL2>=2)
return Yreturn Y
ELSE IF(AL1>=6)ELSE IF(AL1>=6)
return Yreturn Y
ELSE IF(AL1>=3 and OV>=4 and AL2>=3) ELSE IF(AL1>=3 and OV>=4 and AL2>=3)
returnYreturnY
ENDEND
ENDEND
return Nreturn N
对于大组,进行直方图库的平方和与组内随机布置的CC的预期值的比较的测试。下面给出用于该测试的等式:For large groups, a test was performed comparing the sum of squares of the histogram bin to the expected value of CCs randomly placed within the group. The equations used for this test are given below:
其中m是直方图库的总数,n是字符总数,并且hi是直方图第i个库的填充率。该等式右手侧的项是期望值(均值)的2倍,并且对于足够大的m和n,近似为这样的值,对于该值存在随机布置的字符被接受的0.1%的可能性。在下面描述的图27中示出了该处理的例子。where m is the total number of histogram bins, n is the total number of characters, and hi is the fill rate of the ith bin of the histogram. The term on the right-hand side of this equation is 2 times the expected value (mean), and for sufficiently large m and n, approximately the value for which there is a 0.1% probability that a randomly placed character will be accepted. An example of this processing is shown in FIG. 27 described below.
参考图24,如果根据该测试接受组,并且处理在步骤2470处继续。在步骤2470中,保持该组,对齐检查结束。然后处理终止。否则,如果步骤2440返回假(否),处理在步骤2450处继续。决定步骤2450检查是否有更多参数要测试。如果有,处理针对下一个参数在步骤2430处继续。否则,如果已经测试了所有参数,处理在步骤2480处继续,其中拒绝该组。然后处理终止。Referring to FIG. 24, if the group is accepted according to this test, and processing continues at
前面的描述公开了一次基于一个参数的测试和拒绝那些所有测试都失败了的组。然而,以本公开来看,本领域的技术人员将明了,可以实践针对不同参数的组合测试的可替换的方法,而不脱离本发明的范围和精神,诸如接受差不多足够好地在两个不同但类似的参数(诸如,边界框的上和下边)中对齐的组,或基于许多参数创建该组的整体评分。The preceding description exposes a test based on one parameter and rejects groups where all tests fail. However, in view of this disclosure, it will be apparent to those skilled in the art that alternative approaches to combined testing of different parameters can be practiced without departing from the scope and spirit of the invention, such as accepting that almost But align groups in similar parameters (such as top and bottom bounding box), or create an overall score for the group based on many parameters.
3.2.2对齐例子3.2.2 Alignment example
图27(b)示出了可能从图2700中分段的部分得出的不规则布置的边界框2710、2712、2714,...的选择。图27(b)示出了随机布置的连通组分。图27(a)示出了该图的边界框的底的值的直方图2720。图27(d)示出了具有很好地对齐的连通组分的文本组的边界框2740、2742在页上的布置,并且图27(c)示出了相应的直方图2730。如图所示,文本的直方图2730具有少许大的值簇,而其它直方图2720具有更均匀散布的值。使用直方图库的值的平方和作为测量值,图27(a)给出了值19,并且图27(c)给出了值47。m=19并且n=13的上面的接受测试的右手侧的项是45。因此根据该测试,拒绝图27(b)的图像数据,而接受图27(d)的文本数据。可替换地,接受测试可以基于其它统计量,诸如明显大于平均计数的库内的值的总数。FIG. 27( b ) shows a selection of irregularly arranged bounding boxes 2710 , 2712 , 2714 , . Figure 27(b) shows connected components arranged randomly. Figure 27(a) shows a histogram 2720 of the values of the base of the bounding box of the graph. FIG. 27( d ) shows the arrangement on a page of the bounding boxes 2740 , 2742 of text groups with well-aligned connected components, and FIG. 27( c ) shows the corresponding histogram 2730 . As shown, the histogram 2730 for text has a few large clusters of values, while the other histograms 2720 have more evenly spread values. Using the sum of squares of the values of the histogram bin as measurements, Figure 27(a) gives a value of 19, and Figure 27(c) gives a value of 47. The entry on the right hand side of the test above for m=19 and n=13 is 45. According to this test, therefore, the image data of Fig. 27(b) is rejected and the text data of Fig. 27(d) is accepted. Alternatively, the acceptance test may be based on other statistics, such as the total number of values within the library that are significantly greater than the average count.
4产生压缩的输出图像4 produces a compressed output image
通过以估计的背景颜色在前景区域上绘制,后端模块使用修复处理,使得背景图像更可压缩。优选地在较低分辨率(例如,150dpi)的背景图像上执行修复。修复算法是单轮的、基于贴片的算法,并且努力增强可压缩性。通过从左边和右边的周围像素进行内插,而不是使用大区域的一个平均颜色,选择每个像素的颜色。The backend module uses an inpainting process to make the background image more compressible by painting over the foreground region with the estimated background color. Inpainting is preferably performed on a lower resolution (eg 150dpi) background image. The inpainting algorithm is a single-round, patch-based algorithm and strives to enhance compressibility. The color of each pixel is chosen by interpolating from surrounding pixels on the left and right, rather than using one average color over a large area.
该算法执行下面的步骤:1)组合用于所有前景组分的掩蔽,形成用于该贴片的一个前景掩蔽;2)扩大该掩蔽,从而修复围绕该前景组分的小的附加区域;3)在贴片上以光栅顺序:The algorithm performs the following steps: 1) Combine the masks for all foreground components to form one foreground mask for the tile; 2) Dilate the mask, thereby inpainting a small additional area around the foreground component; 3) ) in raster order on the tile:
a如果像素未被掩蔽,更新贴片的活性;和a If the pixel is not masked, update the activity of the patch; and
b如果像素被掩蔽了,以从左边和右边最近的未掩蔽像素的颜色内插出的颜色绘制像素;b if the pixel is masked, draw the pixel with the color interpolated from the colors of the nearest unmasked pixels on the left and right;
4)如果未掩蔽区域的活性低于某个阈值,以未掩蔽像素的均值颜色绘制整个贴片(这给出了以ZLib压缩JPEG的改进的压缩);和5)如果掩蔽了整个贴片,以前面的贴片的均值颜色绘制整个贴片。上面的步骤2)消除了渗色影响,并且改进了压缩,并且锐化了输出质量。4) if the activity of the unmasked region is below a certain threshold, paint the whole patch with the mean color of the unmasked pixels (this gives improved compression with ZLib compressed JPEGs); and 5) if the whole patch is masked, Paint the entire patch with the mean color of the previous patches. Step 2) above removes bleeding effects and improves compression and sharpens output quality.
图5详细地示出了图1的步骤130。步骤510到540使用类似于在图2和3中描述的基于贴片的处理系统。在步骤510中,获得将处理的下一个贴片。在步骤520中,在当前贴片上执行修复处理,以便去除图1的步骤120中识别出的任何前景CC,并且平坦化视觉上看上去接近平坦的任何贴片。在步骤530中,压缩当前贴片。这可以使用具有以2∶1水平和垂直子采样的2个色度通道的YCrCb颜色空间内的JPEG完成。每个缩小分辨率的背景图像的贴片包括16×16个像素,其意味着这些像素可以直接编码为针对Y通道的4个8×8像素的JPEG块,以及针对Cr和Cb通道中的每一个的1个8×8 JPEG块,而不需要贴片间所需的任何缓冲。FIG. 5 shows step 130 of FIG. 1 in detail.
在步骤540中,进行检查以便确定是否存在更多贴片要处理。如果有任何更多贴片要修复和压缩,处理返回步骤510。否则,处理在步骤550处继续。在步骤550中,压缩前景,这涉及压缩在步骤120中识别出的前景元素。根据颜色将前景元素分组,并且为每个类似着色的前景元素组创建完整输入分辨率的一个二值图像。然后可以将每个创建的图像编码为CCITT G4 Fax,如果图像足够大,使得编码产生输出文档中的压缩优势的话。In
在步骤560中,产生输出文档。以复合压缩格式存储压缩的背景和压缩的前景图像。该格式可以是,例如,PDF文档。还可以使用Flate(Zlib)压缩进一步压缩JPEG编码的背景图像。对于包含由步骤520和530产生的大量重复的平坦块的JPEG图像,这带来了明显的空间节省。可以写合成文档,合成文档包含Flate和JPEG压缩的背景图像,和包含大小、位置、顺序和-在二值前景图像的情况下-在页上绘制每个图像的颜色的细节的页描述。In
4.1修复贴片4.1 Repair patch
图30详细地示出了图5的步骤520。该处理修改下采样的背景图像,以便增加可压缩性,并且通过从背景中去除前景CC且以低可视活性平坦化图像贴片,增强前景CC的锐度。还修复前景CC周围的小区域,以便去除渗色影响,从而增强图像并且增加可压缩性。FIG. 30 shows step 520 of FIG. 5 in detail. This process modifies the downsampled background image in order to increase compressibility and enhances the sharpness of the foreground CCs by removing them from the background and flattening the image tiles with low visual activity. A small area around the foreground CC is also repaired in order to remove bleeding effects, thereby enhancing the image and increasing compressibility.
通过以通过在选择的前景CC的左边和右边的像素的颜色间进行内插估计的颜色绘制该前景CC,图30中示出的处理520从低分辨率的背景中去除选择的前景CC。这增加了可压缩性。还修复围绕着该CC的外部的小区域,以便进一步增加可压缩性,并且增强图像的外观。通过识别具有低可视活性的背景图像的贴片,并且将所有它们的像素设置为相同颜色,进一步增加可压缩性。The
图30的处理的输入是在图2的步骤250中创建的下采样的背景图像,来自图2的步骤240的CC列表,以及来自图1的步骤120的前景或背景选择信息。在步骤3010中,检查贴片以便查看贴片是否被标记为是“平坦的”。如果是,处理在贴片平坦化步骤3040处继续,否则,处理在步骤3020处继续,其中为贴片形成全分辨率前景位掩蔽。其具有在相应于前景CC的每个位置设置的位。在步骤3030中,通过以从该CC的左边和右边的像素的颜色内插出的颜色修复它们,去除背景图像中相应于前景CC的区域和围绕它们的小区域。还测量贴片内未修复像素的活性。在步骤3040中,如果发现贴片活性足够低,以致于在视觉上是平坦的,将整个贴片平坦化为恒定颜色。然后处理终止。The inputs to the process of FIG. 30 are the downsampled background image created in
4.1.1形成贴片前景位掩蔽4.1.1 Form patch foreground bit masking
图31更详细地示出了用于形成贴片前景位掩蔽的图30的步骤3020。在步骤3110中,创建用于贴片的初始位掩蔽。该位掩蔽的宽度与贴片相同,并且比贴片高一行。该位掩蔽的第一行被设置为与上面的贴片的最后一行相同,除非该贴片是文档中的第一个带,在这种情况下,将位掩蔽的第一行设置为空。这允许修复区域在具有与贴片边界对齐的底边的前景CC之下延伸。FIG. 31 shows step 3020 of FIG. 30 for forming the tile foreground bit mask in more detail. In
在步骤3120中,获得贴片内的下一个CC。在步骤3130中,检查该CC以便确定该CC是否是前景CC(在步骤120中)。如果CC是前景CC,处理在步骤3140处继续。否则,处理在步骤3150处继续。在步骤3140中,使用逐位OR操作,将相应于该CC和当前贴片的位掩蔽与步骤3110中创建的位掩蔽组合在一起。然后,处理在步骤3150处继续。在步骤3150中,进行检查以便确定当前贴片内是否有更多CC。如果是,处理返回步骤3120。否则,如果已经处理了贴片内的所有CC,步骤3150的结果是否,并且处理在步骤3160处继续。在步骤3160中,保存形成的掩蔽的最后一行,从而当处理页上直接在该贴片之下的贴片时,可以在步骤3110中使用其。图31的处理3020中创建的位掩蔽是输入图像的全分辨率。In
4.1.2修复像素和测量贴片活性4.1.2 Inpainting pixels and measuring patch activity
图32更详细地示出了图30的步骤3030。以光栅顺序检查每个像素,直到发现应修复的一个,其标记着要修复的一串像素的开始。检查后续像素,直到找到要修复的像素串的结尾。然后以在修复串的左边和右边的像素的颜色间线性内插的颜色给该像素水平串着色。FIG. 32 shows step 3030 of FIG. 30 in more detail. Each pixel is checked in raster order until one is found that should be inpainted, which marks the beginning of a string of pixels to inpaint. Subsequent pixels are checked until the end of the string of pixels to be repaired is found. This horizontal string of pixels is then colored with a color that is linearly interpolated between the colors of the pixels to the left and right of the repair string.
参考图32,步骤3210得到贴片的一行。在步骤3220中,根据行的累积像素活性找到下一个像素串的开始。以光栅顺序检查行内的每个像素,并且累积贴片内的像素活性,直到找到应修复的像素。通过累积像素值和这些像素值的平方,并且保持测量的像素的数目的计数,记录未修复的像素的像素活性。为了决定每个像素是否应修复,将像素位置和在图30的步骤3020中创建的贴片前景位掩蔽内的相应位置进行比较。由于贴片位掩蔽的分辨率是背景图像的两倍,掩蔽中有4个相应于背景图像内的一个像素覆盖的区域的像素。为了改进压缩和从前景CC中去除渗色边影响,修复前景CC的边周围的小附加区域。为了进行该处理,检查全分辨率前景掩蔽中的8个像素,以便决定是否修复当前背景像素。图34示出了背景图像贴片3430中的示例像素3410,以及检查的相应的全分辨率掩蔽像素3420。如果在掩蔽3440中设置了由3420指示的这些8个像素中的任何一个,即,相应于前景CC的位置,修复像素3410。由于将位掩蔽3440存储为位向量的矩阵,使用逐位AND运算可以快速检查这些8个像素3420。可替换地,这可以通过扩大位掩蔽3420内的设置像素,并且然后下采样来实现。记录每行上未修复的最后检查的像素的颜色,并且如果将修复下一个贴片的相同行的第一个像素,使其在处理右边下一个贴片时可用,从而将该颜色用作内插值。Referring to FIG. 32,
参考图32,在决定步骤3230中,进行检查以便确定在行结尾之前是否找到了要修复的任何像素。如果找到了要修复的像素,处理在步骤3250处继续。否则,处理在步骤3240处继续。在步骤3250中,以光栅顺序检查该行内剩余的像素,以便找到该行内应修复的像素串的结尾。此处使用与步骤3220中使用的测试相同的测试确定是否应修复像素。在步骤3260中,进行检查以便确定在行的结尾之前是否找到了要修复的像素串的结尾。如果在达到行的结尾之前找到了要修复的像素串的结尾,处理在步骤3270处继续。在步骤3270中,将要修复的像素串内的每个像素设置为在左边和右边的最近的未修复像素的颜色间线性内插的颜色。如果要修复的像素串延伸到贴片的左手边,在前一贴片中为相同行保存的最后的值被用作左内插值。然后,处理在步骤3220处继续,并且搜索要修复的下一个像素串。如果在步骤3250中达到了行的结尾而未找到不应修复的任何像素,决定步骤3260指挥处理在步骤3280继续。在步骤3280中,将要修复的像素串内的每个像素设置为左边最近的未修复的像素的值,其可以是在以前贴片内为相同行保存的值。在步骤3280之后,处理在步骤3240处继续。Referring to FIG. 32, in
在步骤3240中,进行检查以便确定贴片内是否有更多的行要处理。如果贴片内有更多的行,处理在步骤3210处继续。否则,如果没有更多的行,处理结束。In
4.1.3修复的例子4.1.3 Repair example
图35示出了某些一维修复的例子。给两个例子3510、3520提供之前和之后的曲线图,其中像素强度被绘制为位置X的函数。在第一个例子3510中,要修复的像素串完全在贴片内,并且以紧接着左边和右边的未修复像素的值之间线性内插的值代替修复像素的值(以对角影线指示)。修复区域稍微大于前景组分以便去除任何渗色影响;这被以扩大的掩蔽区域示出。通过在左边和右边的未掩蔽像素颜色间进行内插找到修复每个像素的颜色。在图25的第二个例子3520中,要修复的像素串越过了贴片边3530。当处理左手贴片时,以紧接着左边的未修复像素的值替代要修复的像素的值。当处理例子3520的右手贴片时,将修复像素的值设置为该行最后记录的未修复像素的值和右边第一个未修复像素的值间的线性内插的值。Figure 35 shows some examples of one-dimensional repairs. Two examples 3510, 3520 are provided with before and after graphs where pixel intensity is plotted as a function of position X. In the first example 3510, the string of pixels to be inpainted is entirely within the tile, and the value of the inpainted pixel is replaced by a value that is linearly interpolated between the values of the immediately left and right uninpainted pixels (diagonally hatched instruct). The inpainted area is slightly larger than the foreground component in order to remove any bleeding effects; this is shown as the enlarged masked area. Find and fix the color of each pixel by interpolating between the left and right unmasked pixel colors. In the second example 3520 of FIG. 25 , the pixel string to be repaired crosses the
4.1.4贴片平坦化4.1.4 Chip planarization
图33更详细地示出了图30的步骤3040。在决定步骤3310中,进行检查以便确定是否修复了所有像素。尤其是,检查在步骤3220中记录的被修复的像素的数目。如果修复了贴片内的所有像素,则认为贴片的可视区域是低的,并且处理在步骤3320处继续。在步骤3320中,将当前贴片内的所有像素着色(设置)为光栅顺序的以前贴片的均值颜色。当使用基于块的压缩技术例如JPEG时,这明显地增加了背景图像的可压缩性。然后处理终止。如果步骤3310确定不是修复了贴片内的所有像素,处理在决定步骤3330处继续。在步骤3330中,对照预定的阈值检查在步骤3220中测量的未修复像素的活性。如果发现活性小于该阈值,认为重构图像内的贴片的活性可视区域接近平坦。在该情况下,处理在步骤3340处继续,并且通过以贴片内未修复的可视像素的均值颜色绘制贴片内的所有像素,使得该贴片完全平坦。这显著提高了背景图像的可压缩性。在步骤3340后,或如果发现贴片内的活性高于阈值在步骤3330后,图33的处理结束。FIG. 33 shows step 3040 of FIG. 30 in more detail. In decision step 3310, a check is made to determine if all pixels have been repaired. In particular, the number of inpainted pixels recorded in
5硬件实施例5 hardware embodiment
图51示出了根据本发明的第二实施例的系统5100。系统5100以处理流水线的不同阶段间的高效的数据流为特征。该设计极大地减少了存储器带宽,并且能够使得系统5100快速运行。Figure 51 shows a
扫描仪通常按像素光栅顺序获得扫描数据。然后存储像素数据,并且通常为进一步的图像处理进行压缩。在传统的扫描到文档应用中,通常需要从存储设备中检索扫描数据,解压缩,并且然后保持在用于分段和布局分析的存储器内以便处理图像数据。这通常是高速扫描仪的情况,这只是因为分段处理不能跟上扫描仪一页接一页流式传输光栅数据的速度。Scanners typically acquire scan data in a raster order of pixels. The pixel data is then stored and usually compressed for further image processing. In traditional scan-to-document applications, scan data typically needs to be retrieved from a storage device, decompressed, and then held in memory for segmentation and layout analysis in order to process the image data. This is often the case with high-speed scanners, simply because segment processing cannot keep up with the speed at which the scanner streams the raster data page by page.
这不仅需要大的存储器缓冲器,而且由于每个图像像素必须至少被写和读两次,还需要高存储器带宽。首先必须由扫描仪将像素写到存储器,并且然后压缩器从存储器读取数据并且压缩数据。稍后,解压缩器必须读取压缩数据,并且将数据解压缩到存储器内。最后,图像处理器可给解压缩数据分段。对于每个原始的图像像素,至少有一次冗余的存储器读和写,更不要说压缩的数据。对于高分辨率扫描仪(例如,600dpi),这意味着超过200MB的额外数据。This not only requires a large memory buffer, but also requires high memory bandwidth since each image pixel must be written and read at least twice. First the pixels must be written to memory by the scanner, and then the compressor reads the data from memory and compresses the data. Later, the decompressor must read the compressed data and decompress the data into memory. Finally, the image processor can segment the decompressed data. For each raw image pixel, there is at least one redundant memory read and write, not to mention compressed data. For high resolution scanners (eg, 600dpi), this means over 200MB of additional data.
本发明的这个实施例采用高速、自动分段,其直接工作于来自扫描仪5105的实时的一页接一页流式传输的光栅数据。结果,完全消除了冗余的存储器读和写,并且还极大地减少了存储器缓冲器的大小。This embodiment of the invention employs high speed, automatic segmentation that works directly on the raster data streamed page by page from the
总线5110从扫描仪5105运送扫描的光栅数据,其被写到模块5115(线缓冲器)中。在该例子中,使用64线缓冲器,但是可以根据后面解释的带的高度使用其它的大小。线缓冲器5115存储由代码分段模块5125处理的数据带,同时收集新的进入的扫描数据带。模块5125通过总线5120从线缓冲器5115读取数据贴片,并且以贴片为基础将该数据颜色分段为连通组分。当模块5125结束一个数据带时,从线缓冲器5115读取新的数据带以便处理。然后旧的带缓冲器用于收集新的进入的光栅数据。由贴片的高度确定带的高度,并且线缓冲器5115需要两倍带的高度。在该实施例中,优选的贴片大小为32×32。
在这个实施例中以硬件实现模块5125,从而带的处理速度可以跟上扫描仪5105产生新数据带的速度。模块5125的输出是到布局分析模块5140的总线5135上的压缩的连通组分(CC),和到修复模块5150的总线5130上的未压缩或压缩的下采样图像。可以将总线5135上的数据写到存储器,直到产生了页的明显区域或CC的整个页。模块5140仅使用紧凑的连通组分数据执行布局分析。由于数据是紧凑的,执行布局分析所需的处理能力是小的。因此,布局分析模块可以实现为由嵌入式处理器实时执行的软件(SW)。布局分析模块的输出是在总线5145上提供的前景信息,其紧随来自总线5135的数据。也可以将总线5130上的数据写到存储器内,直到产生了页的明显区域或CC的整个页。
修复模块5150逐贴片地对下采样图像(通过总线5130提供)执行前景去除。该模块5150可以实现在运行布局分析软件5140的同一个嵌入式处理器上,或可替换地,它可以用硬件(HW)实现。模块5150然后在下采样的图像上以估计的背景颜色修复前景区域,以便产生背景图像。在总线5155上输出前景去除的背景图像,并且在总线5175上产生前景掩蔽。The
输出产生模块5160从在总线5175上产生的前景图像和在总线5155上产生的背景图像创建经布局分析的文档,诸如PDF文件。模块5160可以用运行在同一个嵌入式处理器上的软件实现。
在第二实施例中,模块5115和5125工作在实时的一页接一页的扫描数据上,并且产生页N上的紧凑的连通组分和下采样图像。模块5140、5150和5160使用由模块5125在页N-1上产生的数据顺序地工作。因此,系统可以实时地送出来自高速扫描仪的实时数据的经布局分析的文档。In a second embodiment,
可以用针对第一实施例中相应步骤所描述的方式实现模块5125、5140、5150和5160。
5.1颜色分段模块5.1 Color segmentation module
图52详细地示出模块5125。颜色分段为CC模块5125的组件包括去半色调模块5220,其按贴片顺序取得像素流,并且去除由于扫描使用打印的半色调系统(例如喷墨打印机)打印的材料引起的任何赝象。去半色调模块5220是前面描述的软件实施例的硬件实施例。该模块内部可以使用静态RAM和流水线处理以达到所需速度。Figure 52 shows
从去半色调模块5220输出的像素被传递到颜色转换模块5230,其将像素从输入颜色空间(通常为RGB)转换为YCbCr亮度/色度空间。该模块5230执行根据下面的公式对每个像素执行必要的乘法和加法:The pixels output from the
以按比例缩放的定点算术执行这些算术运算,以便减小复杂性并且增加模块5125的速度。将来自颜色转换模块5230的输出传递到两个模块,即下扫描模块5240和连通组分分析模块5260。下扫描模块5240执行一组4个(在2×2的方块内)或16个(在4×4的方块内)颜色的简单平均,以便形成每个输出像素。然后由硬件JPEG压缩器5250压缩从下扫描模块5240输出的像素。These arithmetic operations are performed in scaled fixed point arithmetic in order to reduce complexity and increase the speed of the
6计算机实现6 computer implementation
可以使用一个或多个通用计算机系统、打印设备和其它适合的计算设备实施根据本发明的实施例的方法。参考图1-52中的一个或多个描述的处理可以被实现为软件,诸如在计算机系统内执行的或被嵌入打印设备的应用程序。软件可以包括一个或多个计算机程序,包括应用程序、操作系统、过程、规则、数据结构和数据。指令可以被构成为一个或多个代码模块,每一个用于执行一个或多个特定任务。该软件可以存储在计算机可读介质(包括例如下面描述的一个或多个存储设备)内。计算机系统从计算机可读介质装入该软件,并且然后执行该软件。Methods according to embodiments of the invention may be implemented using one or more general purpose computer systems, printing devices, and other suitable computing devices. The processes described with reference to one or more of Figures 1-52 may be implemented as software, such as an application program executed within a computer system or embedded in a printing device. Software may include one or more computer programs, including applications, operating systems, procedures, rules, data structures, and data. Instructions may be structured as one or more code modules, each for performing one or more specific tasks. The software may be stored on a computer readable medium (including, for example, one or more storage devices described below). The computer system loads the software from the computer readable medium and then executes the software.
图7示出了可以实施本发明的实施例的计算机系统700的例子。其上记录有这种软件的计算机可读介质是一种计算机程序产品。该计算机程序产品在计算机系统内的使用可以实现用于实现一个或多个上述方法的有利的装置。FIG. 7 shows an example of a
在图7中,计算机系统700耦合到网络。操作者可以使用键盘730和/或指点设备诸如鼠标732(或例如触摸垫)给计算机750提供输入。计算机系统700可以具有任意数目的输出设备,包括线打印机、激光打印机、绘图机和连接到计算机的其它再现设备。计算机系统700可以通过通信接口764使用适合的通信通道740诸如调制解调器通信路径、路由器等连接到一个或多个其它计算机。计算机网络720可以包括例如局域网(LAN)、广域网(WAN)、内联网和/或互联网。计算机750可以包括处理单元766(例如,一个或多个中央处理单元)、存储器770(其可以包括随机访问存储器(RAM)、只读存储器(ROM)或两者的组合)、输入/输出(IO)接口772、图形界面760和一个或多个存储设备762。存储设备762可以包括下面的一个或多个:软盘、硬盘驱动器、磁光盘驱动器、CD-ROM、DVD、数据卡或存储器棒、闪速RAM设备、磁带或本领域的技术人员公知的其它若干非易失性存储设备。虽然在图7中存储设备被示出为直接连接到总线,可以通过任意适合的接口连接这种存储设备,诸如并行口、串行口、USB接口、火线(Firewire)接口、无线接口、PCMCIA槽等。出于本说明的目的,存储单元可以包括存储器770和存储设备762中的一个或多个(如图7中围绕着这些元件的虚线框指示的)。In FIG. 7, a
计算机750的每个组件典型地通过在图7中概括地示出的一个或多个总线780连接到一个或多个其它设备,这些总线又包括数据、地址和控制总线。虽然在图7中示出了单个总线780,本领域的技术人员应当理解,计算机、打印设备或其它电子计算设备可以有若干总线,包括处理器总线、存储器总线、图形卡总线和外围总线中的一个或多个。可以采用适合的桥来接口这些总线间的通信。虽然描述了使用CPU的系统,本领域的技术人员应当理解,可以使用能够处理数据并且执行操作的其它处理单元而不脱离本发明的范围和精神。Each component of
仅出于说明的目的提供计算机系统700,并且可以采用其它配置而不脱离本发明的范围和精神。可以实施这些实施例的计算机包括IBM PC/AT或兼容机、膝上/笔记本计算机、PC的Macintosh(TM)族之一、Sun Sparcstation(TM)、PDA、工作站等。上面仅是可以实施本发明的实施例的设备类型的例子。典型地,下面描述的这些实施例的处理被作为软件或记录在作为计算机可读介质的硬盘驱动器上的程序驻留,并且使用处理器读取和控制。可以使用半导体存储器实现程序、中间数据和从网络取得的任何数据的中间存储。
在某些情况下,可以提供编码在CD ROM或软盘上的程序,或可替换地,可以通过,例如,连接到计算机的调制解调器设备从网络读取该程序。另外,可以从其它计算机可读介质(包括磁带、ROM或集成电路、磁光盘、计算机和另一个设备间的无线或红外线传输通道、计算机可读卡(诸如PCMCIA卡)、以及互联网和内联网(包括电子邮件传输和记录在网站上的信息等))将软件装入计算机系统。前面仅是有关的计算机可读介质的例子。可以使用其它的计算机可读介质而不脱离本发明的范围和精神。In some cases, the program may be provided encoded on a CD ROM or floppy disk, or alternatively, the program may be read from a network by, for example, a modem device connected to the computer. Additionally, data may be read from other computer-readable media, including magnetic tape, ROM or integrated circuits, magneto-optical disks, wireless or infrared transmission channels between a computer and another device, computer-readable cards such as PCMCIA cards, and the Internet and intranets ( Including e-mail transmissions and information recorded on the website etc.)) The software is loaded into the computer system. The foregoing are merely examples of related computer-readable media. Other computer readable media may be used without departing from the scope and spirit of the invention.
7工业实用性7 Industrial Applicability
本发明的实施例适用于计算机和数据处理行业。前面仅描述了根据本发明的实施例用于处理和压缩数字图像的少量方法、装置和计算机程序产品,并且可以对其作出修改和/或改变而不脱离本发明的范围和精神,这些实施例是说明性的而不是限制性的。Embodiments of the invention are applicable to the computer and data processing industries. The foregoing has only described a small number of methods, apparatuses and computer program products for processing and compressing digital images according to embodiments of the present invention, and modifications and/or changes may be made thereto without departing from the scope and spirit of the present invention, these embodiments are illustrative rather than restrictive.
Claims (44)
Applications Claiming Priority (3)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| AU2004242419A AU2004242419A1 (en) | 2004-12-21 | 2004-12-21 | Analysing digital image of a document page |
| AU2004242419 | 2004-12-21 | ||
| AU2004242421 | 2004-12-21 |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| CN101091186A true CN101091186A (en) | 2007-12-19 |
| CN100543766C CN100543766C (en) | 2009-09-23 |
Family
ID=36660018
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CNB200580043979XA Expired - Fee Related CN100543766C (en) | 2004-12-21 | 2005-12-20 | Image segmentation methods, compact representation production method, image analysis method and device |
Country Status (2)
| Country | Link |
|---|---|
| CN (1) | CN100543766C (en) |
| AU (1) | AU2004242419A1 (en) |
Cited By (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN103841295A (en) * | 2012-11-27 | 2014-06-04 | 京瓷办公信息系统株式会社 | Image processing apparatus |
Families Citing this family (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20180123612A1 (en) * | 2015-04-08 | 2018-05-03 | Siemens Aktiengesellschaft | Data reduction method and apparatus |
Family Cites Families (6)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| EP0811946A3 (en) * | 1994-04-15 | 1998-01-14 | Canon Kabushiki Kaisha | Image pre-processor for character recognition system |
| JPH1051642A (en) * | 1996-07-31 | 1998-02-20 | Fuji Xerox Co Ltd | Image processor |
| JP4170441B2 (en) * | 1997-11-28 | 2008-10-22 | 富士通株式会社 | Document image inclination detection apparatus and storage medium for document image inclination detection program |
| US7164797B2 (en) * | 2002-04-25 | 2007-01-16 | Microsoft Corporation | Clustering |
| EP1388815A3 (en) * | 2002-04-25 | 2005-11-16 | Microsoft Corporation | Segmented layered image system |
| US7110596B2 (en) * | 2002-04-25 | 2006-09-19 | Microsoft Corporation | System and method facilitating document image compression utilizing a mask |
-
2004
- 2004-12-21 AU AU2004242419A patent/AU2004242419A1/en not_active Abandoned
-
2005
- 2005-12-20 CN CNB200580043979XA patent/CN100543766C/en not_active Expired - Fee Related
Cited By (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN103841295A (en) * | 2012-11-27 | 2014-06-04 | 京瓷办公信息系统株式会社 | Image processing apparatus |
| CN103841295B (en) * | 2012-11-27 | 2016-08-24 | 京瓷办公信息系统株式会社 | Image processing apparatus |
Also Published As
| Publication number | Publication date |
|---|---|
| AU2004242419A1 (en) | 2006-07-06 |
| CN100543766C (en) | 2009-09-23 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| JP5008572B2 (en) | Image processing method, image processing apparatus, and computer-readable medium | |
| Akçay et al. | Automatic detection of geospatial objects using multiple hierarchical segmentations | |
| US8155437B2 (en) | Perceptually lossless color compression | |
| US9384409B1 (en) | Word segmentation for document image using recursive segmentation | |
| JP3353968B2 (en) | Image processing device | |
| US7596265B2 (en) | Segmenting pixels in an image based on orientation-dependent adaptive thresholds | |
| Chen et al. | Decompose algorithm for thresholding degraded historical document images | |
| US5835638A (en) | Method and apparatus for comparing symbols extracted from binary images of text using topology preserved dilated representations of the symbols | |
| CN110263610A (en) | A kind of degeneration file and picture binary coding method and system based on deep learning | |
| CN118229588B (en) | Warehouse-in and warehouse-out information acquisition method and system | |
| JP2000132690A (en) | Image processing method and image processor using image division by making token | |
| Shafait et al. | Pixel-accurate representation and evaluation of page segmentation in document images | |
| CA2267828A1 (en) | Multiple size reductions for image segmentation | |
| Haneda et al. | Text segmentation for MRC document compression | |
| Senturk et al. | Seam carving based image retargeting: A survey | |
| Agam et al. | Degraded document image enhancement | |
| CN101091186A (en) | Segment and generate compact representations for digital images | |
| Fateh et al. | Color Reduction in Hand-drawn Persian Carpet Cartoons before Discretization using image segmentation and finding edgy regions | |
| Bal et al. | Interactive degraded document enhancement and ground truth generation | |
| CN109800758A (en) | A kind of natural scene character detecting method of maximum region detection | |
| CN114549649A (en) | Feature matching-based rapid identification method for scanned map point symbols | |
| Konya et al. | Adaptive methods for robust document image understanding | |
| AU2004242421A1 (en) | Segmenting digital image and producing compact representation | |
| AU2005211628A1 (en) | Processing digital image | |
| Mignotte | Nonparametric multiscale energy-based mode and its application in some imagery problems |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| C06 | Publication | ||
| PB01 | Publication | ||
| C10 | Entry into substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| C14 | Grant of patent or utility model | ||
| GR01 | Patent grant | ||
| CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20090923 Termination date: 20191220 |
|
| CF01 | Termination of patent right due to non-payment of annual fee |