JP2000207495A

JP2000207495A - Character recognizing device, character learning device and computer readable recording medium

Info

Publication number: JP2000207495A
Application number: JP11007621A
Authority: JP
Inventors: Takafumi Koshinaka; 孝文越仲
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 1999-01-14
Filing date: 1999-01-14
Publication date: 2000-07-28
Anticipated expiration: 2019-01-14
Also published as: JP3180792B2

Abstract

PROBLEM TO BE SOLVED: To realize the reading of a strong character string due to characters contact and continuous writing or the like. SOLUTION: A character segmenting and characteristic extracting means 12 detects segmentation position candidates from a character string image. A character string reading means 13 extracts a character pattern candidate from the character string image based on the segmentation position candidates and verifies the validity of all kinds of thinkable read results by using a character appearance probability calculating means 14. The means 14 receives the character pattern candidate, a character code, a character state and the character code and character state of a character pattern candidate located just before the character pattern candidate from the means 13, evaluates the validity of shape connection with the preceding character pattern candidate by using a character state transition probability, also evaluates the validity that the character pattern candidate belongs to a certain category by using a character template and calculates maximum likelihood (point) that the character pattern candidate belongs to a certain state and a certain character category. The means 13 searches for and outputs segmentation recognized results of the character string in which the recognition points of the entire character string become maximum.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、紙などに書かれた
文字を光学センサで取り込んで読み取る光学的文字認識
の分野に関し、特に単語や文のように複数の文字が並ん
だ文字列を認識する文字認識技術に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to the field of optical character recognition in which characters written on paper or the like are read by an optical sensor and read, and in particular, recognizes a character string in which a plurality of characters are arranged like words or sentences. Character recognition technology.

【０００２】[0002]

【従来の技術】従来この種の文字認識では、文字列内の
文字の境界を同定する文字切り出しと、切り出されたそ
れぞれの文字を読み取る文字認識とを組み合わせること
によって文字列を読み取っていた。2. Description of the Related Art Conventionally, in this type of character recognition, a character string is read by combining character extraction for identifying a boundary between characters in the character string and character recognition for reading each of the extracted characters.

【０００３】従来技術の一例が文献「1994年、スー・リ
ャン他、セグメンテーション・オブ・タッチング・キャ
ラクターズ・イン・プリンテッド・ドキュメント・レコ
グニション、パターン・レコグニション、第27巻、第６
号、825〜840 頁(Su Liang etal.， Segmentation of T
ouching Characters in Printed Document Recognitio
n， Pattern Recognition， Vol.27， No.6，pp.825-84
0，1994) 」に記載されている。この文献に記載されて
いる方式では、投影ヒストグラムの形状およびそれより
派生する情報を利用して文字境界の候補を抽出し、任意
の２つの文字境界で挟まれる文字列の一部を文字パタン
候補としてすべて抽出する (文字切り出し) 。次に、す
べての文字パタン候補に対して文字認識を行って、それ
ぞれに認識結果とそのもっともらしさ（得点) を計算す
る。最後に、文字列として連結した際に得点が最大とな
るように、文字パタン候補を選び、同時に正しいと思わ
れる文字列の切り出し位置を決定する。An example of the prior art is described in the document "Shu Liang et al., 1994, Segmentation of Touching Characters in Printed Document Recognition, Pattern Recognition, Vol. 27, Vol.
825-840 (Su Liang et al., Segmentation of T
ouching Characters in Printed Document Recognitio
n, Pattern Recognition, Vol.27, No.6, pp.825-84
0, 1994) ". In the method described in this document, a character boundary candidate is extracted using the shape of a projection histogram and information derived from the shape, and a part of a character string sandwiched between any two character boundaries is extracted as a character pattern candidate. Extract all (character cutout). Next, character recognition is performed on all the character pattern candidates, and the recognition result and its likelihood (score) are calculated for each. Finally, character pattern candidates are selected so that the score is maximized when the character strings are concatenated, and at the same time, a cutout position of a character string considered to be correct is determined.

【０００４】この他にもいくつかの方式が従来から考え
られているが、多くは文字切り出しに用いる情報が異な
るのみであるもの、あるいは文字切り出しをせずに網羅
的に文字列中のあらゆる部分で文字認識を行って最適な
切り出し位置を決定するもの、または文字認識で文字画
像から抽出する特徴量や文字を識別する方法が異なるの
みであるものがほとんどであった。上述した例は印刷文
字を認識対象としているが、手書き文字を対象とする方
式でも同様で、多くの場合、文字切り出しと文字認識は
別個のモジュールとして構成され、それらを組み合わせ
て文字列の読み取りを行うという手順が採用されてき
た。[0004] Several other methods have been conventionally considered, but in many cases, only information used for character extraction is different, or all parts in a character string are exhaustively extracted without character extraction. In most cases, character recognition is performed to determine an optimal cut-out position, or character recognition uses only a different feature amount to be extracted from a character image or a method of identifying a character. Although the example described above is for recognition of printed characters, the same applies to a method for handwritten characters.In many cases, character extraction and character recognition are configured as separate modules, and a combination of them is used to read a character string. The procedure of doing has been adopted.

【０００５】[0005]

【発明が解決しようとする課題】従来の技術では、文字
切り出しと文字認識の処理が別個に用いられるため、ひ
とたび１つの文字と思われる部分画像（文字パタン候
補) が切り出されると、その文字は前後の文字の存在と
関わりなく、それぞれ独立に認識処理にかけられる。In the prior art, since character extraction and character recognition processing are used separately, once a partial image (character pattern candidate) considered to be one character is extracted, the character is replaced with a character. Regardless of the presence of the characters before and after, the recognition process is performed independently of each other.

【０００６】図８は従来の技術の一例の機能的な構成を
示すブロック図である。この従来例は、入力された文字
列画像を記憶する画像記憶手段41と、画像記憶手段41よ
り受け取った文字列画像から隣接文字間の境界の候補を
切り出し位置候補として検出し、また文字列画像を識別
に有用なより少数の量（特徴) に変換する特徴抽出を行
う文字切り出し・特徴抽出手段42と、いくつかの切り出
し位置候補を選んで文字列画像を分割した際の個々の文
字パタン候補について文字認識を行って文字列全体とし
ての認識結果および認識結果の確からしさを表す認識得
点を計算し、認識得点が最大となる切り出しおよび認識
結果を文字列の読み取り結果として出力する文字列読み
取り手段43と、文字列読み取り手段43の要求に応じて、
与えられた文字パタン候補が、与えられた文字カテゴリ
（文字コード）のもとで出現する確率を計算する文字出
現確率計算手段44と、文字出現確率計算手段44が文字出
現確率を計算する際、与えられた文字パタン候補が与え
られた文字カテゴリにどれくらい近いかを計算するため
に必要な数値（文字テンプレート) を格納しておく文字
テンプレート格納手段45とを有する。また、文字読み取
り手段43が文字出現確率計算手段44に認識対象である文
字パタン候補と文字コードを渡す際のインタフェースと
して、文字パタン記憶手段46および文字コード記憶手段
47を備える。FIG. 8 is a block diagram showing a functional configuration of an example of the prior art. In this conventional example, an image storage unit 41 that stores an input character string image, a candidate for a boundary between adjacent characters is detected from the character string image received from the image storage unit 41 as a cutout position candidate, and a character string image is detected. Character extracting / characteristic extracting means 42 for extracting a character into a smaller amount (feature) useful for identification, and individual character pattern candidates when a character string image is divided by selecting some of the extracting position candidates A character string reading means for performing character recognition on the character string, calculating a recognition result as a whole of the character string and a recognition score representing the certainty of the recognition result, and outputting the cutout and the recognition result with the maximum recognition score as a character string reading result 43, according to the request of the character string reading means 43,
When a given character pattern candidate calculates the probability of occurrence under a given character category (character code), the character appearance probability calculation means 44 and the character appearance probability calculation means 44 calculate the character appearance probability. And character template storage means 45 for storing a numerical value (character template) necessary for calculating how close a given character pattern candidate is to a given character category. Further, the character pattern storage means 46 and the character code storage means serve as interfaces when the character reading means 43 passes a character pattern candidate and a character code to be recognized to the character appearance probability calculation means 44.
47 is provided.

【０００７】文字列読み取り手段43はいくつかの切り出
し位置で切った文字列の部分画像を文字パタン候補と
し、あらゆる文字カテゴリを想定した場合のそれら文字
パタン候補と文字テンプレートとの近さを文字出現確率
計算手段44により求め、個々の文字パタン候補と文字テ
ンプレートの近さが文字列全体でもっとも高くなるよう
に、文字の切り出し位置および文字コード列を決定す
る。なお、この従来の技術では、文字テンプレートは、
文字学習手段49により、学習文字データ格納手段48に格
納された個別文字データを用いて学習される。The character string reading means 43 uses the partial images of the character string cut at several cutout positions as character pattern candidates, and character appearance is determined by assuming the closeness between the character pattern candidates and the character template when all character categories are assumed. Determined by the probability calculation means 44, the character cutout position and the character code string are determined so that the proximity of each character pattern candidate and the character template is the highest in the entire character string. In this conventional technique, the character template is
The character learning means 49 learns using the individual character data stored in the learning character data storage means 48.

【０００８】しかし、手書き文字列、特に筆記体英字列
のように続け書きで書かれる文字列の場合、個々の文字
は前後の文字とのつながりによって形状を変える。した
がって、従来の技術のように前後の文字の形状を無視し
て一定の認識処理を施す場合、続け書きによって引き起
こされる変形に対応できず、誤認識を生ずることがしば
しばである。[0008] However, in the case of a handwritten character string, particularly a character string written in a continuous manner such as a cursive English character string, the shape of each character changes according to the connection with the preceding and following characters. Therefore, when a certain recognition process is performed ignoring the shapes of the characters before and after as in the related art, it is not possible to cope with the deformation caused by continuous writing, and erroneous recognition often occurs.

【０００９】例えば、筆記体で続けて書かれた文字列の
場合、“ａ”という文字を書き終わった時点でのペンの
位置は下部にあるが、“ｏ”を書き終わった時点でのペ
ンの位置は上部である。したがって、同じ文字であって
も、“ａ”の次に書かれるか“ｏ”の次に書かれるかに
よって文字の形状が変化する（図２）。これは個別文字
では起こり得ない、文字列特有の変形である。このよう
な変形は、個別文字を認識することを目的として構築さ
れる従来の文字認識処理では対応不可能であり、しばし
ば誤認識の原因となる。For example, in the case of a character string written continuously in cursive, the position of the pen at the time of finishing writing the character "a" is at the bottom, but the position of the pen at the time of finishing writing "o" is Is in the upper part. Therefore, even if the same character is written, the shape of the character changes depending on whether it is written after "a" or "o" (FIG. 2). This is a character string-specific transformation that cannot occur with individual characters. Such deformation cannot be handled by the conventional character recognition processing constructed for the purpose of recognizing individual characters, and often causes erroneous recognition.

【００１０】従来技術には続け書きの認識以外にも問題
点が存在する。アラビア数字では、“１”と“７”や
“４”と“９”のように、同一の筆者が書いた文字は互
いに識別できるが、異なる筆者同士を比べると識別不能
となる文字が存在する。例えば図３において、左上の
“17”と右上の“17”は異なる筆者によって書かれた文
字列の例である。同一筆者では“１”と“７”の識別は
容易であるが、矩形で囲って示した２つの文字、すなわ
ち第１の筆者の“７”と第２の筆者の“１”は形状的に
類似しており、識別するのが困難となりやすい。同様
に、図３の左下の“49”と右下の“49”も、同一筆者の
“４”と“９”は容易に識別できるが、第１の筆者の
“４”と第２の筆者の“９”は、それだけを見て比べる
と、識別が困難となる。この場合も、近隣の文字の形状
を見ずにある文字だけを単独で認識すると誤認識を生ず
るという問題が存在することがわかる。The prior art has other problems besides recognition of continuous writing. In Arabic numerals, characters such as “1” and “7” and “4” and “9” can be distinguished from each other by characters written by the same writer, but there are characters that cannot be distinguished when compared between different writers. . For example, in FIG. 3, “17” in the upper left and “17” in the upper right are examples of character strings written by different writers. Although it is easy for the same writer to distinguish between “1” and “7”, two characters enclosed in a rectangle, that is, “1” of the first writer and “1” of the second writer, are different in shape. They are similar and tend to be difficult to identify. Similarly, “49” at the lower left and “49” at the lower right in FIG. 3 can be easily identified as “4” and “9” of the same writer, but “4” of the first writer and “2” of the second writer. It is difficult to identify “9” by comparing it alone. Also in this case, it can be seen that there is a problem that erroneous recognition occurs when only a certain character is recognized alone without looking at the shape of a nearby character.

【００１１】上述したような、隣接する文字に依存して
文字が変形を受ける問題に対して、解決の試みが従来全
くみられなかったわけではない。ただし、認識精度や処
理速度等に問題を抱えており、実用に耐え得る方式とは
なり得ていない。The above-described problem that characters are deformed depending on adjacent characters is not necessarily completely solved. However, there are problems in recognition accuracy, processing speed, and the like, and the method cannot be practically endurable.

【００１２】例えば、隣接する２文字を１つのパタンと
考えて、字種数の２乗個のテンプレートを学習して辞書
を構築し、認識対象の文字列を２文字単位で認識する方
法が考えられる。しかし、２文字の並びはパタンとして
の変形のバリエーションが極めて多様となり、膨大な量
の学習データが必要となる。しかしながら、テンプレト
は字種数の２乗個必要となるため、学習データ不足の問
題が深刻となる。また、仮に大量の学習データを得られ
たとしても、１文字のパタンに比べて極めて多様な変形
をする２文字分のパタンを、１文字の字種の二乗個のク
ラスから選ぶという識別は本質的に困難さが増している
ので、認識精度の劣化は避けられない。よって、２文字
を１パタンとして字種の２乗個のテンプレートを準備す
る方法は実用に適さない。For example, a method is considered in which adjacent two characters are considered as one pattern, a template is constructed by learning templates of the number of character types squared, and a character string to be recognized is recognized in units of two characters. Can be However, the arrangement of the two characters has a very wide variety of variations as patterns, and an enormous amount of learning data is required. However, since the number of templates required is the square of the number of character types, the problem of insufficient learning data becomes serious. Also, even if a large amount of learning data is obtained, the distinction of selecting a pattern of two characters that deforms in a much more diverse manner than a pattern of one character from the square class of one character type is essentially essential. The difficulty in recognition is unavoidable because of the increasing difficulty. Therefore, a method of preparing two square templates of the character type using two characters as one pattern is not practical.

【００１３】従来技術にはまた、文字列を文字のような
下位の要素に分解することをせずに単語単位で認識する
ことによって文字間の依存性を吸収しようとする方式も
ある。しかしながらこのような方式も、単語パタンの変
形バリエーションが膨大であるため、上記の隣接２文字
パタンを用いる方式と同様あるいはそれ以上の問題を抱
える。さらに、文字に比べて大規模の画像を扱うため、
処理効率も悪い。In the prior art, there is also a method in which the dependence between characters is absorbed by recognizing a character string in units of words without decomposing the character string into lower-order elements such as characters. However, such a method also has a problem similar to or higher than the above-described method using the adjacent two-character pattern because the variation of the word pattern is enormous. In addition, to handle images that are larger than text,
Processing efficiency is also poor.

【００１４】上記以外にも、隣接する文字同士の関係を
考慮した従来技術があるが、字種の並びを考慮するに過
ぎず、パタンの変形までは考慮していない。その一例が
文献「1989年、クンドゥ他、レコグニション・オブ・ハ
ンドリトゥン・ワード：ファースト・アンド・セカンド
・オーダー・ヒドゥン・マルコフ・モデル・ベースト・
アプローチ、パターン・レコグニション、第27巻、第３
号、283〜297ページ(Recognition of handwritten wor
d; first and second order hidden Markov modelbased
approach, Kundu et al, Pattern Recognition, Vol.2
2, No．3, pp．283−297, 1989) 」に記載されている。
この文献に記載されている方式では、国語辞典や新聞雑
誌の文章から、あらゆる字種の２文字組の隣接する頻度
の統計を抽出しておき、その結果を文字列認識に利用す
る。すなわち、隣接する頻度の高い２文字組は文字認識
で現われやすいように文字認識の出力結果を調整する。
このように字種（文字コード) の２文字組の頻度情報を
利用する方式はバイグラム(bigram)と呼ばれ、これを利
用した技術は、この他にも多く報告されている。ただ、
はじめにも述べたように、bigramは文字コードの隣接関
係を考慮するに過ぎず、文字パタンの変形を考慮して文
字の隣接関係を考慮する技術とは別物である。In addition to the above, there is a prior art in which the relationship between adjacent characters is considered, but only the arrangement of character types is considered, and pattern deformation is not considered. An example is the document "Kundu et al., Recognition of Handwritten Words: 1989: First and Second Order Hidden Markov Model Based."
Approach, Pattern Recognition, Volume 27, Volume 3
Issue, pages 283-297 (Recognition of handwritten wor
d; first and second order hidden Markov modelbased
approach, Kundu et al, Pattern Recognition, Vol. 2
2, No. 3, pp. 283-297, 1989) ".
In the method described in this document, statistics of the frequency of adjacent two-character groups of all character types are extracted from sentences in a Japanese language dictionary or newspaper magazine, and the result is used for character string recognition. That is, the output result of character recognition is adjusted so that a two-character group having a high frequency of adjacentness easily appears in character recognition.
Such a method of using the frequency information of two character sets of character types (character codes) is called a bigram, and many other techniques using this are reported. However,
As described earlier, bigram only considers the adjacent relationship between character codes, and is different from the technology that considers the adjacent relationship between characters in consideration of deformation of a character pattern.

【００１５】このように、文字コードレベルでの隣接関
係を利用して文字列を認識する技術は広く使われている
が、文字コードレベルとは別に文字パタンレベルでの隣
接関係を利用する技術は、文字パタンという次元数の大
きい情報を扱うことの困難さから、上述のように実用に
耐え得る技術は確立されていない。As described above, the technique of recognizing a character string using the adjacency at the character code level is widely used. However, the technique of using the adjacency at the character pattern level separately from the character code level is known. Due to the difficulty in handling information having a large number of dimensions, such as character patterns, no technology that can be put to practical use has been established as described above.

【００１６】そこで本発明の目的は、隣接する文字間の
依存性に起因して生ずる文字の変形の影響を受けにく
い、すなわち文字の接触や続け書き、および筆者ごとの
字形の個人差に対して頑健な、かつ実用的な処理速度で
動作可能な文字認識装置を提供することである。Therefore, an object of the present invention is to reduce the influence of character deformation caused by the dependence between adjacent characters, that is, to prevent contact between characters, continuous writing, and individual differences in the character shape of each writer. An object of the present invention is to provide a character recognition device that is robust and can operate at a practical processing speed.

【００１７】[0017]

【課題を解決するための手段】上述した目的を達成する
ために、本発明による文字認識装置は、文字カテゴリご
とに、文字パタンの変形のタイプを代表するいくつかの
離散的な文字状態を想定し、それらの文字状態を接続し
た状態ネットワークを考える。１つの文字の直後にもう
１つの文字が続くことはネットワーク上での１回の状態
遷移に対応する。状態遷移によって新しい文字状態に移
るたびに、文字パタンが１つ発生するとし、１つの文字
列は文字数と同数の文字状態を経由することによって観
測されると考える。１つの文字状態から他の文字状態へ
の遷移は、ある文字状態からある文字状態へ遷移する確
率として規定されている。また各文字状態には、文字カ
テゴリごとに、変形した文字パタンを代表する代表パタ
ン（文字テンプレート) が関連づけられており、文字状
態からの文字パタンの発生は代表パタンを元にした確率
密度関数によって規定されている。これら文字状態の遷
移と代表パタンにより、入力文字列画像から抽出される
あらゆる文字パタン候補およびそれらの隣接関係の尤も
らしさを考慮しながら、文字列全体の認識結果を算出す
る。具体的には、本発明の文字認識装置は、入力された
文字列画像を格納する画像記憶手段と、前記画像記憶手
段より受け取った文字列画像より隣接文字間の境界の候
補を切り出し位置候補として検出し、また文字列画像を
識別に有用なより少数の量（特徴) に変換する特徴抽出
を行う文字切り出し・特徴抽出手段と、いくつかの切り
出し位置候補を選んで文字列画像を分割した際の個々の
文字パタン候補について文字認識を行って、文字列全体
として最適な切り出しおよび最適な文字コード列を文字
列の読み取り結果として出力する文字列読み取り手段
と、前記文字列読み取り手段から文字パタン候補、文字
コード、文字の変形のタイプを表すインデクスである文
字状態、与えられた文字パタン候補の直前に位置する文
字パタン候補の文字コード、文字状態を受け取り、与え
られた文字パタン候補が与えられた文字コードおよび文
字状態のもとで出現する確率を計算する文字出現確率計
算手段と、前記文字出現確率計算手段が文字出現確率を
計算する際、確率の文字状態に依存する部分を計算する
ために必要な数値（状態遷移確率) を格納しておく文字
状態遷移確率格納手段と、前記文字出現確率計算手段が
文字出現確率を計算する際、確率の文字パタンに依存す
る部分を計算するために必要な数値（文字テンプレー
ト) を格納しておく文字テンプレート格納手段とを有す
る。In order to achieve the above-mentioned object, a character recognition apparatus according to the present invention assumes several discrete character states representing character pattern deformation types for each character category. Then, consider a state network connecting those character states. One character immediately followed by another character corresponds to one state transition on the network. It is assumed that one character pattern is generated each time the state transitions to a new character state, and that one character string is observed by passing through the same number of character states as the number of characters. The transition from one character state to another character state is defined as the probability of transition from a certain character state to a certain character state. Each character state is associated with a representative pattern (character template) representing the deformed character pattern for each character category, and the occurrence of the character pattern from the character state is determined by a probability density function based on the representative pattern. Stipulated. Based on these character state transitions and representative patterns, the recognition result of the entire character string is calculated in consideration of the likelihood of all character pattern candidates extracted from the input character string image and their adjacent relationships. Specifically, the character recognition device of the present invention includes an image storage unit that stores an input character string image, and a candidate for a boundary between adjacent characters from the character string image received from the image storage unit as a cutout position candidate. Character extraction / feature extraction means for detecting and converting a character string image into a smaller amount (feature) useful for identification, and when extracting a character string image by selecting some extraction position candidates Character string reading means for performing character recognition on individual character pattern candidates of the character string, and outputting an optimal cutout and an optimal character code string as a character string as a whole character string; and a character pattern candidate from the character string reading means. , Character code, character state which is an index indicating the type of character deformation, and character code of the character pattern candidate located immediately before the given character pattern candidate A character appearance probability calculating means for receiving a character state and calculating a probability that a given character pattern candidate appears under a given character code and character state, and the character appearance probability calculating means calculating a character appearance probability The character state transition probability storing means for storing a numerical value (state transition probability) necessary for calculating a part of the probability depending on the character state, and the character appearance probability calculating means calculating the character appearance probability. In this case, there is provided a character template storage means for storing a numerical value (character template) necessary for calculating a portion of the probability depending on the character pattern.

【００１８】また、本発明の文字学習装置では、上記ネ
ットワーク上での状態遷移を規定する確率（以下、文字
状態遷移確率とする）および文字パタンの出力を規定す
る確率密度関数を定めるパラメータ（以下、文字テンプ
レートとする) は、正解文字コード列を付与された文字
列画像から、学習によって自動的に獲得される。学習に
おいて用いられる文字列画像には、正解文字コード列さ
え付与されていれば、文字間の境界等の情報を与えなく
とも、学習の過程で個々の文字パタンが自動的に切り出
されて学習が進行する。具体的には、本発明の文字学習
装置は、最適な文字状態遷移確率の推定および最適な文
字テンプレートの推定を、与えられた文字列画像とその
正解文字コード列から推定する際に用いる文字列データ
を格納する学習文字列データ格納手段と、前記学習文字
列データ格納手段より受け取った文字列画像より隣接文
字間の境界の候補を切り出し位置候補として検出し、ま
た文字列画像を識別に有用なより少数の量（特徴) に変
換する特徴抽出を行う文字切り出し・特徴抽出手段と、
文字パタン候補、文字コード、文字の変形のタイプを表
すインデクスである文字状態、与えられた文字パタン候
補の直前に位置する文字パタン候補の文字コード、文字
状態を受け取り、与えられた文字パタン候補が与えられ
た文字コードおよび文字状態のもとで出現する確率を計
算する文字出現確率計算手段と、文字列画像に付与され
た正解文字コード列と前記文字出現確率計算手段を用い
て文字列画像中の文字の境界を推定する文字境界決定手
段と、前記文字境界決定手段が前記文字出現確率計算手
段に文字の出現確率の計算を要求する際に渡す文字パタ
ン候補を記憶する文字パタン記憶手段と、文字パタン候
補に対応する、正解文字コード列中の文字コードとその
直前の文字コードを記憶する２つの文字コード記憶手段
と、同じく前記文字パタン候補に対応する文字状態とそ
の直前の文字パタン候補に対応する文字状態を記憶する
２つの文字状態記憶手段と、前記文字出現確率計算手段
が文字出現確率を計算する際、確率の文字状態に依存す
る部分を計算するために必要な数値（状態遷移確率) を
格納しておく文字状態遷移確率格納手段と、前記文字出
現確率計算手段が文字出現確率を計算する際、確率の文
字パタンに依存する部分を計算するために必要な数値
（文字テンプレート）を格納しておく文字テンプレート
格納手段と、前記文字境界決定手段によって切り出され
た個々の文字パタンとその並び順を用いて、前記文字状
態遷移確率格納手段に格納されている文字状態遷移確率
および前記文字テンプレート格納手段に格納されている
文字テンプレートを更新する文字学習手段とを有する。Further, in the character learning device of the present invention, a parameter (hereinafter, referred to as a character state transition probability) defining a state transition on the network and a probability density function defining a character pattern output (hereinafter, referred to as a character pattern transition probability). , A character template) is automatically acquired by learning from a character string image to which a correct character code string is assigned. As long as even the correct character code string is assigned to the character string image used in learning, individual character patterns are automatically cut out during the learning process without providing information such as boundaries between characters, and learning is performed. proceed. Specifically, the character learning apparatus of the present invention provides a character string used when estimating an optimal character state transition probability and estimating an optimal character template from a given character string image and its correct character code string. A learning character string data storage unit for storing data; detecting a candidate for a boundary between adjacent characters as a cutout position candidate from a character string image received from the learning character string data storage unit; and useful for identifying a character string image. A character segmentation / feature extraction unit for performing feature extraction for conversion into a smaller amount (feature);
The character pattern candidate, the character code, the character state which is an index indicating the type of character deformation, the character code of the character pattern candidate located immediately before the given character pattern candidate, and the character state are received, and the given character pattern candidate is received. A character appearance probability calculating means for calculating a probability of occurrence under a given character code and character state, and a correct character code string given to the character string image and the character appearance probability calculating means using the character appearance probability calculating means. Character boundary determining means for estimating a character boundary, and a character pattern storage means for storing a character pattern candidate to be passed when the character boundary determining means requests the character appearance probability calculating means to calculate a character appearance probability, Two character code storage means for storing a character code in the correct character code string and a character code immediately before the character code corresponding to the character pattern candidate; Two character state storage means for storing a character state corresponding to the pattern candidate and a character state corresponding to the character pattern candidate immediately before the character state candidate; and A character state transition probability storage means for storing a numerical value (state transition probability) necessary for calculating a dependent part, and a character appearance probability calculation means for calculating a character appearance probability, which depends on a character pattern of the probability. The character state transition is performed by using a character template storage unit for storing a numerical value (character template) necessary for calculating a portion to be calculated, and individual character patterns cut out by the character boundary determining unit and their arrangement order. A character learning method for updating a character state transition probability stored in the probability storage means and a character template stored in the character template storage means. And a step.

【００１９】[0019]

【発明の実施の形態】次に、本発明の第一の実施の形態
について図面を参照して詳細に説明する。Next, a first embodiment of the present invention will be described in detail with reference to the drawings.

【００２０】図１は、本発明の一実施例を示すブロック
図である。この実施例は、入力された文字列画像を光学
センサで取り込んで格納する画像記憶手段11と、画像記
憶手段11より受け取った文字列画像から隣接文字間の境
界の候補を切り出し位置候補として検出し、また文字列
画像を識別に有用なより少数の量（特徴) に変換する特
徴抽出を行う文字切り出し・特徴抽出手段12と、いくつ
かの切り出し位置候補を選んで文字列画像を分割した際
に得られる文字パタン候補について文字認識を行って文
字列全体としての認識得点を計算し、最も高い認識得点
が得られる切り出しおよび文字コード列を文字列の読み
取り結果として出力する文字列読み取り手段13と、文字
列読み取り手段13の要求に応じて、文字列読取り手段13
より文字パタン候補とそれに対応する文字コード（主文
字コード）、文字状態（主文字状態）、文字パタン候補
の直前に位置するもう一つの文字パタン候補に対応する
文字コード（副文字コード）および文字状態（副文字状
態）を受け取り、文字パタン候補が出現する確率を計算
する文字出現確率計算手段14と、文字出現確率計算手段
14が文字出現確率を計算する際、文字間の接続すなわち
文字状態の遷移に依存する部分を計算するために必要な
数値（文字状態遷移確率) を格納しておく文状態遷移確
率格納手段15と、文字出現確率計算手段14が文字出現確
率を計算する際、与えられた文字パタンそのものに依存
する部分を計算するために必要な数値（文字テンプレー
ト) を格納しておく文字テンプレート格納手段16と、文
字列読み取り手段13が文字出現確率計算手段14に文字の
出現確率の計算を要求する際に渡す文字パタン、文字コ
ードおよび文字状態を格納する文字パタン記憶手段30、
主文字コード記憶手段31、副文字コード記憶手段32、主
文字状態記憶手段33および副文字状態記憶手段34とを有
する。FIG. 1 is a block diagram showing one embodiment of the present invention. In this embodiment, an image storage unit 11 that captures and stores an input character string image with an optical sensor, and detects a candidate for a boundary between adjacent characters from the character string image received from the image storage unit 11 as a cutout position candidate. A character extraction / characteristic extraction unit 12 for extracting a character string image into a smaller amount (feature) useful for identification; and a character extraction image when a character image is divided by selecting some extraction position candidates. A character string reading unit 13 that performs character recognition on the obtained character pattern candidate, calculates a recognition score as the entire character string, and outputs a cutout and a character code string that provides the highest recognition score as a result of reading the character string, In response to a request from the character string reading means 13, the character string reading means 13
More character pattern candidates and their corresponding character codes (main character codes), character states (main character states), character codes (sub character codes) and characters corresponding to another character pattern candidate located immediately before the character pattern candidates Character appearance probability calculating means 14 for receiving a state (sub character state) and calculating the probability of occurrence of a character pattern candidate;
When calculating the character appearance probability, the sentence state transition probability storage means 15 stores a numerical value (character state transition probability) required to calculate a connection between characters, that is, a part depending on the transition of the character state. When the character appearance probability calculation means 14 calculates the character appearance probability, a character template storage means 16 for storing a numerical value (character template) necessary for calculating a portion dependent on the given character pattern itself, A character pattern storage unit 30 that stores a character pattern, a character code, and a character state to be passed when the character string reading unit 13 requests the character appearance probability calculation unit 14 to calculate the character appearance probability;
It has a main character code storage unit 31, a sub character code storage unit 32, a main character state storage unit 33, and a sub character state storage unit.

【００２１】さらにこの実施例は、最適な文字状態遷移
確率および文字テンプレートを、与えられた文字列デー
タから推定する際に用いる学習用の文字列データを格納
する学習文字列データ格納手段21と、学習文字列データ
格納手段21より受け取った文字列画像より切り出し位置
候補を検出し、また文字列画像を識別に有用なより少数
の量（特徴) に変換する特徴抽出を行う文字切り出し・
特徴抽出手段22と、文字列画像に付与された正解文字コ
ード列と文字出現確率計算手段14を用いて、文字列画像
中の切り出し位置候補から切り出し位置を選択する文字
境界決定手段23と、文字境界決定手段23が文字出現確率
計算手段14に文字の出現確率の計算を要求する際に渡す
文字パタン候補、文字コードおよび文字状態を格納する
文字パタン記憶手段35、主文字コード記憶手段36、副文
字コード記憶手段37、主文字状態記憶手段38および副文
字状態記憶手段39と、文字境界決定手段23によって切り
出された個々の文字パタンとその並び順を用いて、文字
状態遷移確率格納手段15に格納されている文字状態遷移
確率および文字テンプレート格納手段16に格納されてい
る文字テンプレートを更新する文字学習手段24とを有す
る。Further, in this embodiment, learning character string data storage means 21 for storing character string data for learning used when estimating an optimum character state transition probability and a character template from given character string data, A character cutout / detection unit that detects a cutout position candidate from the character string image received from the learning character string data storage unit 21 and performs feature extraction for converting the character string image into a smaller number (feature) useful for identification.
Character extraction means 22, a character boundary determining means 23 for selecting a cutout position from cutout position candidates in a character string image using a correct character code string given to the character string image and a character appearance probability calculating means 14, A character pattern storage means 35 for storing a character pattern candidate, a character code and a character state to be passed when the boundary determination means 23 requests the character appearance probability calculation means 14 to calculate a character appearance probability, a main character code storage means 36, The character code storage means 37, the main character state storage means 38, the sub character state storage means 39, and the character state transition probability storage means 15 using the individual character patterns cut out by the character boundary determination means 23 and their arrangement order. It has a character learning means 24 for updating the stored character state transition probability and the character template stored in the character template storage means 16.

【００２２】各々の手段はそれぞれ計算機上に記憶され
たプログラムとして動作させることにより実現可能であ
る。Each means can be realized by operating as a program stored on a computer.

【００２３】画像記憶手段11、文字切り出し・特徴抽出
手段12、文字列読み取り手段13、文字出現確率計算手段
14、文字状態遷移確率格納手段15、文字テンプレート格
納手段16、各記憶手段30〜34で文字列認識装置１が構成
される（図１の点線枠) 。また、学習文字列データ格納
手段21、文字切り出し・特徴抽出手段22、文字境界決定
手段23、文字学習手段24、各記憶手段35〜39で、文字列
学習装置２が構成される（図１の破線枠) 。なお、文字
列認識装置１内の文字切り出し・特徴抽出手段12と文字
列学習装置２内の文字切り出し・特徴抽出手段22は同一
の機能を備える。また文字列認識装置１と文字列学習装
置２は通常は同時に使われないので、文字列認識装置１
と文字列学習装置２で１つの文字切り出し・特徴抽出手
段を共有するような構成でもよい。Image storage means 11, character extraction / characteristic extraction means 12, character string reading means 13, character appearance probability calculation means
14, the character state transition probability storage means 15, the character template storage means 16, and each of the storage means 30 to 34 constitute the character string recognition device 1 (dotted frame in FIG. 1). The character string learning device 2 is composed of the learning character string data storage means 21, character extraction / characteristic extraction means 22, character boundary determination means 23, character learning means 24, and storage means 35 to 39 (FIG. 1). (Dashed frame). The character cutout / feature extraction means 12 in the character string recognition device 1 and the character cutout / feature extraction means 22 in the character string learning device 2 have the same function. Since the character string recognition device 1 and the character string learning device 2 are not usually used at the same time, the character string recognition device 1 is not used.
The character string learning device 2 and the character string learning device 2 may share one character cutout / feature extraction unit.

【００２４】なお、文字列の認識や学習を行う場合に
は、入力画像に対して前処理を行うのが一般的であり、
前処理としては、多値画像をより扱いやすい２値画像に
変換する２値化処理、文字の大きさやストロークの間
隔、傾き等を整形する正規化処理、画像中の細かな汚れ
やかすれを除くノイズ除去処理などが考えられる。ここ
では図示していないが、これらの前処理を、必要に応じ
て文字切り出し・特徴抽出手段12や文字切り出し・特徴
抽出手段22、文字列読み取り手段13、文字境界決定手段
23、文字出現確率計算手段14の内部等に導入してよい。
また、これらの前処理と文字切り出し、特徴抽出は、前
後関係を問わずあらゆる順序で適用することができる。When character strings are recognized or learned, preprocessing is generally performed on an input image.
The pre-processing includes binarization processing for converting a multi-valued image into a more manageable binary image, normalization processing for shaping the character size, stroke interval, inclination, etc., and removing fine stains and blurs in the image. Noise removal processing and the like can be considered. Although not shown here, these pre-processing may be performed, if necessary, on a character cutout / feature extraction unit 12, a character cutout / feature extraction unit 22, a character string reading unit 13, a character boundary determination unit.
23. It may be introduced into the character appearance probability calculation means 14 or the like.
The pre-processing, character segmentation, and feature extraction can be applied in any order regardless of the context.

【００２５】以下、本実施の形態による本発明の動作に
ついて段階を追って説明する。まず、文字列認識装置１
の動作について、図５の流れ図を参照しながら説明す
る。The operation of the present invention according to this embodiment will be described step by step. First, the character string recognition device 1
Will be described with reference to the flowchart of FIG.

【００２６】読み取り対象の画像はスキャナ等によって
光学的に入力され、画像記憶手段11に格納され、さらに
文字切り出し・特徴抽出手段12へ送られる。図５の流れ
図の画像読み込み100 がこれに相当する。文字切り出し
・特徴抽出手段12は、文字列画像に２値化処理、正規化
処理等適当な前処理を施した上で、文字列画像からいく
つかの切り出し位置（文字境界) 候補（これは横書き文
字列を扱う場合はｘ軸上の、つまり水平方向の座標とし
て表される）を検出し、その座標および文字列画像また
は文字列画像を特徴抽出処理により変換した特徴パタン
を文字列読み取り手段13へ送る。図５の流れ図の文字切
り出し・特徴抽出101がこれに相当する。The image to be read is optically input by a scanner or the like, is stored in the image storage unit 11, and is sent to the character extraction / characteristic extraction unit 12. The image reading 100 in the flowchart of FIG. 5 corresponds to this. The character cutout / feature extraction means 12 performs appropriate preprocessing such as binarization processing and normalization processing on the character string image, and then performs some cutout position (character boundary) candidates (horizontal writing) from the character string image. When a character string is handled, it is detected on the x-axis, that is, expressed as horizontal coordinates), and the coordinates and a character string image or a feature pattern obtained by converting the character string image by the feature extraction processing are extracted by the character string reading unit 13. Send to The character cutout / feature extraction 101 in the flowchart of FIG. 5 corresponds to this.

【００２７】文字切り出し・特徴抽出101 において、切
り出し位置候補の検出には、例えば図形的な情報を利用
する。図形的な情報としては、例えば文字列の (文字列
が横書きならば縦方向の、縦書きならば横方向の) 投影
ヒストグラムを計算し、度数があらかじめ設定したしき
い値よりも低い位置を切り出し位置候補とすればよい。
別の図形的な情報を用いた切り出し手段として、文字列
の輪郭線を追跡してその凹凸を計測し、凹みがしきい値
よりも大きくなる位置を切り出し位置候補として記憶す
るという方法も考えられる。また、複数の図形的特徴を
併用して切り出し位置候補を求める方法も可能である。In the character extraction / feature extraction 101, for example, graphic information is used to detect an extraction position candidate. For graphical information, for example, calculate a projection histogram of a character string (vertical direction if the character string is horizontal, or horizontal direction if the character string is vertical), and cut out positions where the frequency is lower than a preset threshold What is necessary is just to be a position candidate.
As a clipping means using other graphical information, a method of tracing the contour of a character string and measuring its irregularity, and storing a position where the dent is larger than a threshold value as a clipping position candidate is also conceivable. . Further, a method of obtaining a cutout position candidate using a plurality of graphic features together is also possible.

【００２８】図形的情報を利用しないで切り出し位置候
補を作成することも可能である。図形的情報を利用しな
い場合は、文字列画像の開始位置の座標から終了位置の
座標までを等間隔に区切り、その区切り点をすべて切り
出し位置候補として記憶する。この場合はある程度多数
の（例えば想定される文字数の数倍程度の) 切り出し位
置候補を記憶する。It is also possible to create a cutout position candidate without using graphic information. When the graphical information is not used, the coordinates from the coordinates of the start position to the coordinates of the end position of the character string image are equally spaced, and all the separation points are stored as cutout position candidates. In this case, a large number of cutout position candidates (for example, about several times the number of expected characters) are stored.

【００２９】文字切り出し・特徴抽出101 における特徴
抽出処理には任意の方法が採用可能である。以下では、
特徴抽出の例として２値画像からの方向特徴の抽出処理
を挙げる。２値画像の各画素について、その画素を含む
水平方向(0°方向) の黒ランのラン長を画素値として
（ただし、注目している画素が白画素なら画素値は０と
する) 、多値画像を作る。これにより水平方向のストロ
ークのみを強調した方向画像ができる。鉛直方向(90°
方向)、斜め方向(45°、135°方向) についても同様
に、各方向を強調した方向画像が作れるので、計４枚の
画像ができる。図４は方向特徴の抽出例を示す図であ
り、“92383”と書かれた数字列画像から0°、45°、90
°、135°の４方向を強調した方向画像が抽出されてい
る。Any method can be used for the feature extraction processing in the character extraction / feature extraction 101. Below,
As an example of feature extraction, a process of extracting directional features from a binary image will be described. For each pixel of the binary image, the run length of the black run in the horizontal direction (0 ° direction) including the pixel is set as a pixel value (however, if the pixel of interest is a white pixel, the pixel value is set to 0). Create a value image. Thus, a directional image in which only horizontal strokes are emphasized is created. Vertical direction (90 °
Direction) and oblique directions (45 ° and 135 ° directions) in the same manner, a directional image in which each direction is emphasized can be created, so that a total of four images can be obtained. FIG. 4 is a diagram showing an example of the extraction of the directional feature, wherein 0 °, 45 °, 90 °
Direction images in which four directions of 135 ° and 135 ° are emphasized are extracted.

【００３０】この後、必要ならばさらにパタンの圧縮を
行う。すなわち、それぞれの方向画像で、鉛直方向を適
当な数（例えば４、５等)に、また水平方向を画素単位
で小領域に分割し、各小領域の画素値を領域全体の画素
値の平均値あるいは最大値に置き換える。これによって
鉛直方向の画像サイズは数画素に圧縮される。Thereafter, if necessary, the pattern is further compressed. That is, in each direction image, the vertical direction is divided into an appropriate number (for example, 4, 5 or the like) and the horizontal direction is divided into small regions in pixel units, and the pixel value of each small region is averaged over the pixel value of the entire region. Replace with the value or maximum value. As a result, the vertical image size is reduced to several pixels.

【００３１】文字列読み取り手段13は、入力画像にＴ個
の文字が含まれていると仮定して、切り出し位置候補か
ら(T-1) 個の切り出し位置の選び方および各々の選び方
でできるＴ個の文字パタン候補の属するカテゴリの可能
性を、すべての組合せについて調べ、最も認識得点（認
識結果の確からしさを示す尺度) の高い切り出し位置お
よび文字カテゴリ（文字コード) を選んだ場合の認識得
点を計算する。図５の流れ図の文字列出現確率計算102
がこれに相当する。ここで、Ｔの値としては、一般に
は、想定し得る幾つかの候補を考え、それぞれの文字数
について認識得点を計算して、一番得点の高いところを
選ぶようにする。但し、例えば海外の郵便で使われるpo
stal codeなどは、必ず５桁の数字と決まっているの
で、本発明をpostal codeを認識する装置など、入力画
像中に含まれる文字数が既知の或る値Ｄである装置に適
用する場合には、Ｔ＝Ｄに固定すれば良い。Assuming that the input image contains T characters, the character string reading means 13 selects (T-1) cutout positions from the cutout position candidates and selects T (T1) cutout positions by each method. For all combinations, check the possibility of the category to which the character pattern candidate belongs, and determine the cutout position with the highest recognition score (a measure of the certainty of the recognition result) and the recognition score when the character category (character code) is selected. calculate. Character string appearance probability calculation 102 in the flowchart of FIG.
Corresponds to this. Here, as the value of T, generally, several possible candidates are considered, a recognition score is calculated for each number of characters, and a portion having the highest score is selected. However, for example, po used in overseas mail
Since the stal code is always determined to be a five-digit number, when the present invention is applied to a device that recognizes a postal code, such as a device in which the number of characters included in an input image is a known certain value D, , T = D.

【００３２】さらに、文字列読み取り手段13は、認識得
点が最高となった場合に選ばれるはずの（T-1)個の切り
出し位置とＴ個の文字カテゴリを求め、これと文字列出
現確率計算102 で得られた認識得点を併せて読み取り結
果として出力する。図５の流れ図の正解文字コード計算
・正解切り出し位置計算103 がこれに相当する。Further, the character string reading means 13 obtains (T-1) cutout positions and T character categories which should be selected when the recognition score is the highest, and calculates the character string appearance probability. The recognition score obtained in step 102 is also output as a read result. The correct character code calculation / correct cutout position calculation 103 in the flowchart of FIG. 5 corresponds to this.

【００３３】この一連の動作について、以降でさらに詳
しく説明する。This series of operations will be described in more detail hereinafter.

【００３４】まず、文字列の認識得点の定義を示す。入
力画像の文字数をＴとして、切り出し位置候補の数を
（Ｓ−１）とする（Ｔ≦Ｓ）。入力画像またはそれを特
徴に変換した特徴パタンをすべての切り出し位置候補で
分割すると、Ｓ個の部分画像ができる。このＳ個の部分
画像を左から順にｘ₁ ，ｘ₂，…，ｘ_Sとする（１，
…，Ｓは部分画像のインデクス）。Ｓ個の部分画像ｘ
₁ ，ｘ₂，…，ｘ_Sの並びを任意にＴ個のグループに分
割した各々のグループを入力画像中に含まれるＴ個の文
字の各々に対応付けたとき、それぞれの文字の終端にく
る部分画像のインデクスをＳ₁，Ｓ_2,…，Ｓ_Tとする。
ただし、すべての部分画像が過不足なくいずれかの文字
に割り当てられると考えてＳ₁ ＜Ｓ₂ ＜…＜Ｓ_T ＝Ｓと
する。例えば、Ｔ＝５、Ｓ＝９とすると、ｘ₁ ，ｘ₂，ｘ_３，ｘ_４，ｘ_５，ｘ_６，ｘ_７，ｘ
_８，ｘ_９という９個の部分画像ができ、これを、グループ１ｘ₁ ，ｘ₂ グループ２ｘ_３グループ３ｘ_４，ｘ_５，ｘ_６グループ４ｘ_７，ｘ_８グループ５ｘ_９のようにＴ＝５個にグループ分けした場合、ｘ₁ ，ｘ₂
を連結したパタンが先頭の文字、ｘ_３が単独で２番目
の文字、ｘ_４，ｘ_５，ｘ_６を連結したパタンが３番
目の文字というように各部分画像が各文字に割り当てら
れる。そして、文字の終端にくる部分画像は、各グルー
プの右端に位置する部分画像（ｘ ₂，ｘ _３，ｘ_６，
ｘ_８，ｘ_９）となる。つまり、Ｓ₁＝２、Ｓ₂＝３、
Ｓ_３＝６、Ｓ_４＝８、Ｓ_５＝９である（実際の処理で
は、グループ分けは１種類だけしか考えないのではな
く、あらゆるグループ分けの組み合わせで認識得点を計
算してみて、一番認識得点の高いグループ分けを選び
出すことにより、正しいと思われる文字切り出しを求め
る）。また、第ｔ目の文字に対応する文字パタンＸ_t
は、いくつかの部分画像を連結したパタンとして、Ｘ_t
＝（ｘ_{s t - 1}＋１，…，ｘ_S _t）と表される（ただしＳ₀
＝０）。また、各文字の属するカテゴリをＷ₁，Ｗ
₂，…，Ｗ_T とする。このとき、認識得点Ａは〔数１〕
で定義される。First, the definition of a character string recognition score will be described. Entering
Let T be the number of characters in the force image, and
(S-1) (T ≦ S). Input image or special
The feature pattern converted to the symbol
When divided, S partial images are created. This S parts
X from the left₁ , X_Two, ..., x_S(1,
..., S is the index of the partial image). S partial images x
₁ , X_Two, ..., x_SAre arbitrarily divided into T groups.
T sentences included in the input image for each divided group
When matching each character,
The index of the partial image₁, S_2,…, S_TAnd
However, if all the partial images are
To be assigned to₁ <S_Two <… <S_T = S and
I do. For example, if T = 5 and S = 9, x₁ , X_Two, X₃ , X₄ , X₅, X₆ , X₇, X
₈ , X₉ 9 partial images are created, which are group 1 x₁ , X_Two Group 2 x₃ Group 3 x₄, X₅, X₆ Group 4 x₇, X₈ Group 5 x₉ If T = 5 groups as in₁ , X_Two
Is the first character, x₃Is the second by itself
Character, x₄ , X₅, X₆Is the third pattern
Each partial image is assigned to each character
It is. The partial image at the end of the character is
Partial image (x_Two, X ₃ , X₆ ,
x₈ , X₉). That is, S₁= 2, S_Two= 3,
S₃= 6, S₄= 8, S₅= 9 (in actual processing
I think only one kind of grouping
The recognition score for all combinations of groupings.
Calculate and select the group with the highest recognition score
To get the character cutout that seems correct
). Also, a character pattern X corresponding to the t-th character_t
Is X as a pattern connecting several partial images._t
= (X_st-1+1, ..., x_S _t) (Where S₀
= 0). The category to which each character belongs is W₁,W
_Two, ..., W_T And At this time, the recognition score A is [Equation 1].
Is defined by

【００３５】[0035]

【数１】 (Equation 1)

【００３６】ここに、ｚ₁，…，ｚ_T は、それぞれ１，
…，Ｔ番目の文字に対応する文字状態を表す。π_{i k}は
１文字目がカテゴリｋに属しているという条件のもとで
文字状態がｉにある確率、ａ_{i j k l}は２文字目以降の
ある文字が文字カテゴリｌ（エル）に属し、かつその直
前の文字が文字カテゴリｋに属し、かつ直前の文字の時
点で文字状態がｉにあったという条件のもとで、現時点
の文字状態がｊにある確率を意味する。また、
μ_{i k}、Σ_{i k}は、文字状態ｉ、文字カテゴリｋのとき
の文字パタンの発生を特徴づけるパラメータで、それぞ
れ平均と共分散である。f( X ｜μ,Σ )は平均μ、共分
散Σの正規分布（ガウス分布) を表す。Here, z ₁ ,..., Z _T are respectively 1,
.. Represents the character state corresponding to the T-th character. π _ik is the probability that the character state is i under the condition that the first character belongs to category k, and a _ijkl is the character after the second character belongs to character category l (el) and immediately before it Under the condition that the character belongs to the character category k and the character state was i at the time of the immediately preceding character, it means the probability that the current character state is in j. Also,
μ _ik and Σ _ik are parameters characterizing the occurrence of a character pattern when the character state is i and the character category is k, and are the average and the covariance, respectively. f (X | μ, Σ) represents a normal distribution (Gaussian distribution) of mean μ and covariance Σ.

【００３７】π_{i k}とａ_{i j k l}（ｉ，ｊ＝１，２，…，
Ｎ。ｋ，ｌ＝１，２，…，Ｃ。Ｎは文字状態の数、Ｃは
文字カテゴリの数) は、文字カテゴリｋ,ｌを条件とし
て文字状態の遷移を規定するパラメータで、文字パタン
間の隣接の尤もらしさを表すパラメータである。これら
を文字状態遷移確率と呼ぶ。これらは図１のブロック図
の文字状態遷移確率格納手段15に格納されている。一
方、μ_{i k}、Σ_{i k}は、状態ｉ、カテゴリｋを仮定したと
きに、与えられた文字パタンが現出する確率密度関数ｆ
を規定するので、一種のテンプレートと考えられる。こ
れらは図１のブロック図の文字テンプレート格納手段16
に格納されている。Π _ik and a _ijkl (i, j = 1, 2,...,
N. k, l = 1, 2,..., C. (N is the number of character states, C is the number of character categories) is a parameter that defines the transition of character states on the condition of character categories k and l, and is a parameter that represents the likelihood of adjacent character patterns. These are called character state transition probabilities. These are stored in the character state transition probability storage means 15 of the block diagram of FIG. On the other hand, μ _ik and Σ _ik are probability density functions f in which a given character pattern appears when state i and category k are assumed.
Therefore, it can be considered as a kind of template. These are the character template storage means 16 in the block diagram of FIG.
Is stored in

【００３８】文字状態遷移確率ａ_{i j k l}は、文字テン
プレート (μ_{i k}，Σ_{i k} ) と (μ_{j l}, Σ_{j l}) の間
の遷移を規定している。ａ_{i j k l}の値が大きいほど、
文字テンプレート(μ_{i k}，Σ_{i k} ) に代表される文字
パタン候補と (μ_{j l}，Σ_{j l} )に代表される文字パタ
ン候補が隣接することの妥当性は高いことを意味する。
これら文字状態遷移確率や文字テンプレートのパラメー
タの値は、後述するように学習によって自動的に獲得
される。The character state transition probability a _ijkl defines the transition between the character templates (μ _ik , Σ _ik ) and (μ _jl , Σ _jl ). The larger the value of a _ijkl ,
This means that it is highly valid that a character pattern candidate represented by the character template (μ _ik , Σ _ik ) and a character pattern candidate represented by (μ _jl , Σ _jl ) are adjacent to each other.
These character state transition probabilities and character template parameter values are automatically obtained by learning as described later.

【００３９】文字出現確率計算手段14は、文字列読み取
り手段13の要求に応じて、また文字状態遷移確率格納手
段15および文字テンプレート格納手段16が有するパラメ
ータを参照しながら、与えられた文字パタン候補が出現
する確率を計算する。つまり、文字列読み取り手段13が
文字パタン候補Ｘ_t ＝（ｘ_{s t - 1}＋１，…，ｘ_{S t}）お
よびその文字パタン候補が属すると仮定する文字カテゴ
リｌと文字状態ｊ、その文字の直前の文字Ｘ_{t - 1} が属
すると仮定する文字カテゴリｋと文字状態ｉをそれぞれ
文字パタン記憶手段30、主文字コード記憶手段31、副文
字コード記憶手段32、主文字状態記憶手段33、副文字状
態記憶手段34に格納すると、文字出現確率計算14はそれ
らを読み出して、その文字パタン候補が出現する確率ａ
_{i j k l} f( X _t｜μ_{j l} ，Σ_{j l} )を返す。ただし、与
えられた文字パタン候補が文字列の先頭の文字である場
合には、π _{j l} f ( X _t ｜μ_{j l} ，Σ_{j l} ) を返す。The character appearance probability calculating means 14 gives a given character pattern candidate in response to a request from the character string reading means 13 and referring to the parameters of the character state transition probability storing means 15 and the character template storing means 16. Calculate the probability that appears. In other words, the character string reading means 13 character pattern candidate _{_{X t = (x st - 1}} + 1, ..., x S t) and assuming the character category l and character state j that character pattern candidate belongs, of the character immediately preceding the The character category k and the character state i to which the character Xt- ₁ belongs are stored in the character pattern storage unit 30, the main character code storage unit 31, the sub character code storage unit 32, the main character state storage unit 33, and the sub character state storage, respectively. When stored in the means 34, the character appearance probability calculator 14 reads them out and calculates the probability a that the character pattern candidate appears.
_{_{_{ijkl f (X t | μ jl}}} , Σ jl) returns a. However, if the given character pattern candidate is the first character of the character string, π _jl f (X _t | μ _jl , Σ _jl ) is returned.

【００４０】このように、文字出現確率計算手段14は、
文字状態を用いて文字パタン候補の出現確率を計算する
際に、各文字状態が、対応する文字パタン候補の変形の
傾向に応じてマルコフ確率過程に従って遷移し、この状
態間の遷移確率の大小で隣接する文字間の接続の妥当性
を評価することによって文字出現確率の値を加減する。
また、与えられた文字パタン候補と辞書パタンとの距離
を計算する際、文字パタン候補に対応する前記文字状態
に応じて文字テンプレートを選択する。As described above, the character appearance probability calculating means 14
When calculating the appearance probabilities of the character pattern candidates using the character states, each character state transits according to the Markov stochastic process according to the tendency of the deformation of the corresponding character pattern candidate. The value of the character appearance probability is adjusted by evaluating the validity of the connection between adjacent characters.
When calculating the distance between a given character pattern candidate and a dictionary pattern, a character template is selected according to the character state corresponding to the character pattern candidate.

【００４１】文字列読み取り手段13は、上記ｌ、ｋとし
て考えられるすべての文字カテゴリを１つずつ順番に、
また上記ｊ、ｉとして１〜Ｎまでのすべての文字状態を
１つずつ順番に代入して、その度に文字出現確率計算手
段14で文字パタンの出現する確率を計算する。こうし
て、文字列読み取り手段13は、考えられるすべての文字
カテゴリ、文字状態を網羅的に調べて、一番認識得点の
高い文字カテゴリの組合せを見つける。The character string reading means 13 sequentially reads all the character categories considered as l and k one by one,
In addition, all the character states from 1 to N are sequentially substituted one by one as j and i, and the appearance probability of the character pattern is calculated by the character appearance probability calculation means 14 each time. In this way, the character string reading means 13 comprehensively examines all possible character categories and character states to find a combination of character categories with the highest recognition score.

【００４２】なお、文字出現確率計算手段14がパタンＸ
_t ＝（ｘ_{s t - 1}＋１，…，ｘ_{S t}）の出現確率を計算す
る際、文字パタン候補に対して簡単な特徴変換を施す。
すなわち、文字切り出し・特徴抽出手段12では鉛直方向
の画素数を圧縮したが、それと同様に水平方向の画素数
を数画素に圧縮する。例えば文字切り出し・特徴抽出手
段12で鉛直方向を４画素に圧縮していたとすると、ここ
で水平方向も４画素に圧縮する。これにより４×４＝16
画素が残るので、４つの各方向画像ごとにこれらの画素
値を並べて合計64次元のベクトルを作る。これをもとに
文字パタン候補の出現確率を計算する。It should be noted that the character appearance probability calculating means 14 uses the pattern X
_{_{t = (x st - 1 +}} 1, ..., x S t) when calculating the probability of performing simple feature transform for the character pattern candidates.
That is, although the number of pixels in the vertical direction is compressed by the character extraction / feature extraction means 12, the number of pixels in the horizontal direction is compressed to several pixels in the same manner. For example, if the vertical direction is compressed to four pixels by the character extracting / characteristic extracting means 12, the horizontal direction is also compressed to four pixels. This gives 4 × 4 = 16
Since pixels remain, these pixel values are arranged for each of the four directional images to create a 64-dimensional vector in total. Based on this, the appearance probability of the character pattern candidate is calculated.

【００４３】図５の文字列出現確率計算102 における
〔数１〕の認識得点の最大値を求める処理と、図５の正
解文字コード計算・正解切り出し位置計算103 における
（Ｔ−１）個の切り出し位置とＴ個のカテゴリの組合せ
を求める処理では、処理時間を短縮するため、漸化式を
用いて効率的に計算する。その計算手順について説明す
る。Processing for obtaining the maximum value of the recognition score of [Equation 1] in the character string appearance probability calculation 102 in FIG. 5 and (T-1) extraction in the correct character code calculation / correct extraction position calculation 103 in FIG. In the process of obtaining the combination of the position and the T categories, the calculation is efficiently performed using a recurrence formula in order to reduce the processing time. The calculation procedure will be described.

【００４４】今、部分画像ｘ₁，…，ｘ_S のうち１番目
からｓ番目までを使って１文字目からｔ文字目までを認
識させたとして、ｔ番目の文字がカテゴリｗに属し、か
つそのときの状態がｚにあるという条件の下での認識得
点の最大値をＡ _t （ｓ，ｚ，ｗ）とおく（小文字ｓと
大文字Ｓの違いに注意)。つまり、Ａ _t （ｓ，ｚ，
ｗ）は、ｔ番目の文字の属する文字カテゴリがｗで、ｔ
番目の文字に対応する文字状態がｚにあり、かつ、ｔ番
目の文字とｔ＋１番目の文字との境界位置ｓ（＝ｔ文字
目の終端に位置する部分画像のインデクス）であると仮
定したときの、１文字目からｔ文字目までの認識得点の
最大値である。このとき、〔数２〕のようなｔに関する
漸化式が成り立つ。Now, assuming that the first to s-th characters of the partial images x ₁ ,..., X _S are used to recognize the first to t-th characters, the t-th character belongs to the category w, and placing the maximum value of the state at that time is the recognition score under the condition that in the _{z a t (s, z,} w) and (note the difference between the lower case s and uppercase S). In other words, A _t (s, z,
w) indicates that the character category to which the t-th character belongs is w and t
Assuming that the character state corresponding to the th character is at z and is the boundary position s (= index of the partial image located at the end of the t th character) between the t th character and the t + 1 th character Is the maximum value of the recognition score from the first character to the t-th character. At this time, a recurrence formula for t such as [Equation 2] holds.

【数２】 (Equation 2)

【００４５】この漸化式に従って、文字列読み取り手段
13は、ｔを順次増加させながらｓ，ｚ，ｗに関する計算
を進めることによって、最終的な認識得点ＡはＡ＝ｍａ
ｘ_w Σ_z Ａ_T（Ｓ，ｚ，ｗ）と求めることができる(こ
こでの Σ_zはｚに関する総和を意味する)。このとき文
字列読み取り手段13は、着目する文字の属する文字カテ
ゴリ（文字コード) 、対応する文字状態および次の文字
との境界位置を記憶しながら漸化式に従って認識得点を
計算する。ｔを１つ増やして次の段階の認識得点Ａ
_T＋１（ｓ，ｚ，ｗ）を計算する際に、すべてのｓ，
ｚ，ｗの値にわたって、Ａ_t（ｓ，ｚ，ｗ）の値が必要
なので、それら（すべてのｓ，ｚ，ｗに関するＡ
_t（ｓ，ｚ，ｗ）の値）を記憶しておく必要があるから
である。According to the recurrence formula, the character string reading means
13 is that the final recognition score A is A = ma by performing calculations on s, z, w while sequentially increasing t.
x _w Σ _z A _T (S, z, w) (where Σ _z means the summation with respect to z). At this time, the character string reading means 13 calculates the recognition score according to the recurrence formula while storing the character category (character code) to which the target character belongs, the corresponding character state, and the boundary position with the next character. Increment t by one and the next stage recognition score A
When calculating _{T + 1} (s, z, w), all s,
z, through the values _{w, A t (s, z} , w) the value of the required, they (all s, z, w for A
_This is because it is necessary to store _t (the value of (s, z, w)).

【００４６】また、各文字の終端位置および属するカテ
ゴリは、〔数３〕に示す漸化式でｔを順次減じてゆくこ
とによって求めることができる。The end position of each character and the category to which it belongs can be obtained by sequentially reducing t by the recurrence formula shown in [Equation 3].

【数３】 (Equation 3)

【００４７】ただしここに、ａｒｇｍａｘは、最大値を
求める計算をして最大値が得られたときの引数値を返す
関数である。Here, argmax is a function that calculates the maximum value and returns the argument value when the maximum value is obtained.

【００４８】次に、図１の文字列学習装置２の動作につ
いて、図６の流れ図を参照しながら説明する。文字列学
習装置２は、学習用の文字列データ、すなわち文字列画
像とそれらに付与された正解文字コード列を用いて、文
字状態遷移確率格納手段15および文字テンプレート格納
手段16に格納されているパラメータを最適化する。学習
には複数の文字列データを用いる。これらの画像はスキ
ャナ等の適当な手段によって学習文字列データ格納手段
21に格納される。学習文字列データ格納手段21には、画
像の他に、画像に書かれた文字列の正解のデータも（キ
ーボード入力等の適当な手段によって) 格納される。図
６の画像読み込み200 がこれに相当する。Next, the operation of the character string learning device 2 of FIG. 1 will be described with reference to the flowchart of FIG. The character string learning device 2 stores character state transition probability storage means 15 and character template storage means 16 using character string data for learning, that is, character string images and correct character code strings assigned to them. Optimize parameters. A plurality of character string data are used for learning. These images are stored in the learning character string data storage means by an appropriate means such as a scanner.
Stored in 21. In the learning character string data storage means 21, in addition to the image, correct data of a character string written in the image is stored (by an appropriate means such as a keyboard input). The image reading 200 in FIG. 6 corresponds to this.

【００４９】学習文字列データ格納手段21に格納された
文字列データの個数をＫとする。文字切り出し・特徴抽
出手段22では、これらＫ個の文字列画像の各々につい
て、２値化処理、正規化処理等適当な前処理を施した上
で、文字列画像からいくつかの切り出し位置候補を検出
し、その座標および文字列画像または文字列画像を特徴
抽出処理により変換した特徴パタンを文字境界決定手段
23へ送る。前処理、切り出し位置検出、特徴抽出の詳細
は、文字列認識装置１の文字切り出し・特徴抽出手段12
と同一である。図６の文字切り出し・特徴抽出201 がこ
れに相当する。It is assumed that the number of character string data stored in the learning character string data storage means 21 is K. The character cutout / feature extraction means 22 performs appropriate preprocessing such as binarization processing and normalization processing on each of these K character string images, and then extracts some cutout position candidates from the character string images. Character boundary determining means for detecting and converting the coordinates and the character string image or the character string image by the characteristic extraction processing;
Send to 23. The details of the preprocessing, the extraction position detection, and the feature extraction are described in the character extraction and feature extraction unit 12 of the character string recognition device 1.
Is the same as The character extraction / feature extraction 201 in FIG. 6 corresponds to this.

【００５０】文字境界決定手段23は、文字切り出し・特
徴抽出手段22より、Ｋ個の画像から得られた切り出し位
置候補、特徴パタンおよび正解文字コード列の組を受け
取る。ｋ番目の文字列画像あるいは文字列画像を変換し
た特徴パタンをＸ^{( k )} 、これに対応する正解文字コー
ド列をＷ ₁ ^{( k )} ，…，Ｗ _T ^{( k )} とする。ここでは
正解の文字数Ｔは文字列画像ごとに一定とするが、各々
異なっていても構わない。次に、文字境界決定手段23
は、ｋ個のデータの各々について、正解文字コードを既
知として、切り出し位置候補から（Ｔ−１）個の切り出
し位置の選び方すべての組合せを調べ、最も認識得点の
高い切り出し位置を選んだ場合の得点を計算する。計算
手順は、正解文字コードが固定である以外は、前述の文
字列読取り手段13と同様である。つまり、文字列読み取
り手段13が文字パタン候補Ｘ_t ^{( k )}およびその文字パタ
ン候補が属すべきカテゴリm = Ｗ_t ^{( k )}と文字状態ｊ、
その文字パタン候補の直前に位置する文字パタン候補が
属すべきカテゴリ l = Ｗ _{t -} ₁ ^{( k )}と文字状態ｉを
それぞれ文字パタン記憶手段35、主文字コード記憶手段
36、副文字コード記憶手段37、主文字状態記憶手段38、
副文字状態記憶手段39に格納すると、文字出現確率計算
手段14はそれらを読み出して、その文字パタン候補が出
現する確率ａ_{i j l m} f ( X _t ^{( k )} ｜μ_{j m},Σ_{j m} )
を返す。図６の流れ図の文字列出現確率計算202 がこれ
に相当する。The character boundary determining means 23 receives, from the character extracting / feature extracting means 22, a set of candidate cutout positions, characteristic patterns, and correct character code strings obtained from the K images. k-th character string image or the feature pattern obtained by converting the character string image X ^(k), correct character code string of W ₁ ^(k) corresponding thereto, ..., and W _T ^(k). Here, the correct number of characters T is constant for each character string image, but may be different. Next, the character boundary determining means 23
Is the case where the correct character code is known for each of k data, and all combinations of (T-1) cutout positions are selected from the cutout position candidates, and the cutout position with the highest recognition score is selected. Calculate the score. The calculation procedure is the same as that of the above-described character string reading means 13 except that the correct character code is fixed. In other words, the character string reading means 13 character pattern candidate X _t ^(k) and category m = W _t ^(k) and the character state j to the character pattern candidate belongs,
The category l = W _t _-1 ^(k) to which the character pattern candidate located immediately before the character pattern candidate belongs and the character state i are respectively stored in the character pattern storage means 35 and the main character code storage means.
36, sub character code storage means 37, main character state storage means 38,
When stored in the sub-character state storage means 39, the character appearance probability calculation means 14 reads them out, and the probability a _ijlm f (X _t ^(k) | μ _jm , Σ _jm ) of the appearance of the character pattern candidate is ^obtained
return it. The character string appearance probability calculation 202 in the flowchart of FIG. 6 corresponds to this.

【００５１】この場合の文字列の認識得点は前述の〔数
１〕と同様の表式で定義されるが、正解文字コードが既
知であるので、これに関する最大化ｍａｘ_{W 1} ^{( k )}，
…，_{W T} ^{( k )}は不要となる。認識得点の計算において
は、前述の〔数２〕と同様、漸化式による効率的な計算
が適用可能であるが、正解文字コードが既知であること
から、文字コードに関する最大化計算が不要となり、よ
り簡略化された計算で認識得点が求められる。In this case, the recognition score for the character string is
1), but the correct character code is already
Because we know, maximization max for this_{W 1} ^(k),
…,_WT ^(k)Becomes unnecessary. In calculating the recognition score
Is an efficient calculation by recurrence formula, as in [Equation 2] above.
Is applicable, but the correct character code is known
This eliminates the need for character code maximization calculations,
Recognition scores are obtained by simplified calculations.

【００５２】さらに文字境界決定手段23は、認識得点が
最高となった場合に選ばれる（Ｔ−１）個の切り出し位
置を求める。この手順は、正解文字コードが固定である
点を除いて、前述の〔数３〕でＳ₁，…，Ｓ_{T - 1} を
求めたのと同様である。図６の流れ図の正解切り出し位
置計算203 がこれに相当する。Further, the character boundary determining means 23 obtains (T-1) cutout positions selected when the recognition score becomes the highest. This procedure is the same as that for S ₁ ,..., S _T _{-1 in} [Equation 3], except that the correct character code is fixed. The calculation 203 of the correct cutout position in the flowchart of FIG. 6 corresponds to this.

【００５３】ここまでの手続きによって、文字列データ
から個別の文字パタンまたはそれを変換した特徴パタン
Ｘ_t ^{( k )} および対応する正解文字カテゴリｗ
_t ^{( k )}（ｔ＝１，…，Ｔ。ｋ＝１，…，Ｋ）が得られる
ので、これらを文字学習手段24に送る。図６の流れ図の
文字データ生成204 がこれに相当する。According to the procedure up to this point, an individual character pattern from the character string data or a characteristic pattern _Xt ^{(k) obtained} by converting the character pattern data and the corresponding correct character category w
_{Since t} ^(k) (t = 1,..., T. ^k = 1,..., K) is obtained, these are sent to the character learning means 24. The character data generation 204 in the flowchart of FIG. 6 corresponds to this.

【００５４】文字学習手段24は、文字境界決定手段23よ
り個別文字の特徴パタンＸ_t ^{( k )}および対応する正解
文字コードｗ_t ^{( k )}（ｔ＝１，…，Ｔ。ｋ＝１，…，
Ｋ）を受け取り、文字状態遷移確率格納手段15および文
字テンプレート格納手段16に格納されたパラメータを最
適化する。すなわち、各文字列画像に関する認識得点の
総和または総積が最大となるようにパラメータを更新す
る。図６の流れ図のパラメータ更新205 がこれに相当す
る。以下でパラメータ更新の計算の詳細について示す。The character learning means 24 receives the characteristic pattern X _t ^{(k) of the} individual character and the corresponding correct character code w _t ^(k) (t = 1,..., T. k = 1,. ,
K), and optimizes the parameters stored in the character state transition probability storage means 15 and the character template storage means 16. That is, the parameter is updated so that the total sum or total product of the recognition scores for each character string image is maximized. The parameter update 205 in the flowchart of FIG. 6 corresponds to this. The details of the parameter update calculation will be described below.

【００５５】まず、次の〔数４〕の漸化式に従って、α
_t ^{( k )}( i ) およびβ_t ^{( k )}( j )（ｉ，ｊ＝１，…，
Ｎ。ｋ＝１，…，Ｋ。ｔ＝１，…，Ｔ）を計算し記憶す
る。α _t ^{( k )} ( i ) は、ｔ文字目に対応する文字状態
がｉであるという条件の下での文字パタン候補Ｘ₁ ^{( k )}
，…，Ｘ_t ^{( k )} の認識得点、β_t ^{( k )}( j ) は、ｔ
文字目に対応する文字状態がｊであるという条件の下で
の文字パタン候補Ｘ_{t +} ₁ ^{( k )}，…，Ｘ_T ^{( k )} の認
識得点を表す。First, according to the following recurrence formula of [Equation 4], α
_t ^(k)(i) and β_t ^(k)(j) (i, j = 1, ...,
N. k = 1, ..., K. t = 1,..., T)
You. α _t ^(k) (i) is the character state corresponding to the t-th character
Character pattern candidate X under the condition that is i₁ ^(k)
, ..., X_t ^(k) Recognition score of β_t ^(k)(j) is t
Under the condition that the character state corresponding to the character number is j
Character pattern candidate X_{t +} ₁ ^(k), ..., X_T ^(k) Recognition of
Indicates intellectual score.

【数４】 (Equation 4)

【００５６】上記α_t ^{( k )} ( i ) 、β_t ^{( k )} ( j ) と
現状のパラメータを用いて、次の〔数５〕のようにパラ
メータを更新する。Using the above α _t ^(k) (i) and β _t ^(k) (j) and the current parameters, the parameters are updated as in the following [Equation 5].

【数式５】 (Equation 5)

【００５７】ただし、〔数５〕の中のＰ（Ｘ ₁
^{( k )} ，…，Ｘ _T ^{( k )} ｜ｗ ₁ ^{( k )} ，…，ｗ _t
^{( k )} ) は、α _T ^{( k )}( i ) のｉに関する総和Σ
_i=1 ^N α_T ^{( k )} (i ) として計算可能である(ここでの
Σ_i=1 ^Nはi=1,2,…,Nに関する総和を意味する) 。また
δ_{i j}はクロネッカーのデルタを意味する（ｉ＝ｊなら
１、そうでなければ０）。〔数５〕のパラメータ更新手
順に従えば、ｋ個の文字列画像データの認識得点の総積
は単調に増大する。更新手続を何回かくり返せば、認識
得点の総積の増加傾向が或る程度小さくなって殆ど増加
しなくなる。これをもって、収束と判定する（図６の処
理２０６）。この時点で、所望のパラメータの値が得ら
れる。However, P (X _{1 in} Expression 5)
^{_{(K), ..., X T}} (k) | w 1 (k), ..., w t
^(k) ) is the sum 総 of α _T ^(k) (i) with respect to i.
_It can be calculated as _{i = 1} ^N α _T ^(k) (i) (where Σ _{i = 1} ^N means the sum of i = 1, 2,..., N). Δ _ij means Kronecker's delta (1 if i = j, 0 otherwise). According to the parameter updating procedure of [Equation 5], the total product of the recognition scores of the k character string image data monotonically increases. If the renewal procedure is repeated several times, the increasing tendency of the total volume of the recognition scores is reduced to some extent and hardly increases. With this, it is determined that convergence has occurred (process 206 in FIG. 6). At this point, the desired parameter values are obtained.

【００５８】上記のパラメータ更新手続きでは、現状で
得られているパラメータの値を用いてよりよいパラメー
タ値を得るという方法を採っているので、最初にパラメ
ータの初期値を設定する必要がある。そこでパラメータ
の初期値設定の手順について以下に説明する。In the above-described parameter updating procedure, a method of obtaining a better parameter value by using the currently obtained parameter value is employed. Therefore, it is necessary to first set an initial parameter value. Therefore, a procedure for setting initial values of parameters will be described below.

【００５９】はじめに、個別に切り出されて文字カテゴ
リごとに分類された少量の文字画像データを準備する。
必要ならば、文字画像は前述の文字列認識装置１、文字
列学習装置２と同様に前処理を施し、特徴パタンに変換
する。次に、各カテゴリごとに、ｋ−ｍｅａｎｓ等のク
ラスタリングアルゴリズムを用いて、カテゴリ内のデー
タを所望の状態数Ｎと同数のクラスタに分類し、各クラ
スタの中心（平均) と分散を求める。そして、第ｉ番目
のカテゴリの第ｋ番目のクラスタの中心をμ_{i k}に代入
する。またΣ_{i k}の対角成分にクラスタの分散を代入
し、非対角成分には０を代入する。この後、最尤推定法
を用いて混合正規分布の推定を行い、μ_{i k}およびΣ_{i k}
をより精度よく推定する。なお、ｋ−ｍｅａｎｓアル
ゴリズムおよび最尤推定法に基づく混合正規分布の推定
の手順は、例えば文献「1990年、ファン他、ヒドゥン・
マルコフ・モデルズ・フォー・スピーチ・レコグニショ
ン、エジンバラ・ユニバーシティ・プレス(Huang et a
l.， Hidden Markov Models forSpeech Recognition，
Edinburgh University Press， 1990) 」等、多くの文
献に記載されている公知の技術である。First, a small amount of character image data that is individually cut out and classified according to character categories is prepared.
If necessary, the character image is subjected to preprocessing in the same manner as the character string recognition device 1 and the character string learning device 2 described above, and is converted into a characteristic pattern. Next, for each category, the data in the category is classified into the same number of clusters as the desired number N of states using a clustering algorithm such as k-means, and the center (average) and variance of each cluster are obtained. Then, the center of the k-th cluster of the i-th category is substituted for μ _ik . Also, the variance of the cluster is substituted for the diagonal component of Σ _ik , and 0 is substituted for the non-diagonal component. Thereafter, the mixture normal distribution is estimated using the maximum likelihood estimation method, and μ _ik and Σ _ik
Is more accurately estimated. The procedure of estimating the mixture normal distribution based on the k-means algorithm and the maximum likelihood estimation method is described in, for example, the document "1990, Fan et al., Hidden.
Markov Models for Speech Recognition, Edinburgh University Press (Huang et a
l., Hidden Markov Models for Speech Recognition,
Edinburgh University Press, 1990) ”and other well-known techniques.

【００６０】π_{i k}とａ_{i j k l}は確率の値なので、何ら
かの総和が１になるように適当に設定する。π_{i k}はπ
_１ｋ＋ π_２ｋ＋…＋π_Ｎｋ＝１でなくてはならず、ａ
_{i j k l}はａ_{i １ k l}＋ａ_{i ２ k l}＋…＋ａ_{i Ｎ k l}＝
１でなくてはならないので、π_i _kとａ_{i j k l}には例え
ば１／Ｎを代入すれば良い。Since π _ik and a _ijkl are probability values, they are appropriately set so that some sum becomes 1. π _ik is π
_1k + _π2k +... + _ΠNk = 1, and a
_ijkl is a _{i 1 kl} + a _{i 2 kl} +... + a _{i N kl} =
Since it must be 1, for example, 1 / N may be substituted for π _i _k and a _ijkl .

【００６１】次に、本発明の第二の実施の形態について
図面を参照して説明する。Next, a second embodiment of the present invention will be described with reference to the drawings.

【００６２】図７を参照すると、本発明の第二の実施の
形態は、データ処理装置18と、文字認識プログラムを記
録した記録媒体17と、図１と同様な画像記憶手段11、文
字状態遷移確率格納手段15、文字テンプレート格納手段
16とを備える。この記録媒体17はＣＤ−ＲＯＭ、磁気デ
ィスク、半導体メモリその他の記録媒体であってよく、
ネットワークを介して流通する場合も含む。データ処理
装置18はＣＰＵおよびメモリを含む。Referring to FIG. 7, according to a second embodiment of the present invention, a data processing device 18, a recording medium 17 on which a character recognition program is recorded, an image storage means 11 similar to FIG. Probability storage means 15, character template storage means
16 is provided. This recording medium 17 may be a CD-ROM, a magnetic disk, a semiconductor memory, or another recording medium,
This includes the case of distribution via a network. The data processing device 18 includes a CPU and a memory.

【００６３】文字認識プログラムは記録媒体17からデー
タ処理装置18に読み込まれ、データ処理装置18の動作を
制御することにより、データ処理装置18上に、図１に示
した文字切り出し・特徴抽出手段12、文字列読み取り手
段13、文字出現確率計算手段14、各記憶手段30〜34を実
現する。データ処理装置18は文字認識プログラムの制御
により、文字切り出し・特徴抽出手段を用いて画像記憶
手段11に入力された文字列画像からいくつかの切り出し
位置候補を検出し、画像に前処理を施し、特徴を抽出す
る。次に、文字列読み取り手段、文字出現確率計算手
段、各記憶手段を用いて、それら複数の切り出し位置候
補よりいくつかの文字パタン候補を生成し、それら文字
パタン候補のそれぞれについて、文字状態遷移確率格納
手段15および文字テンプレート格納手段16にそれぞれ格
納された文字状態遷移確率および文字テンプレートを用
いて認識処理を行い、文字列全体として最大の認識得点
が得られるような読み取り結果を求めて出力する。即
ち、本実施の形態の実施例では、データ処理装置18が文
字認識プログラムの制御により、第一の実施の形態にお
ける文字切り出し・特徴抽出手段12、文字列読み取り手
段13、文字出現確率計算手段14、文字パタン記憶手段3
0、主文字コード記憶手段31、副文字コード記憶手段3
2、主文字状態記憶手段33および副文字状態記憶手段34
による処理と同一の処理を実行して文字列の読み取り結
果を出力する。The character recognition program is read from the recording medium 17 into the data processing device 18, and by controlling the operation of the data processing device 18, the character recognition / character extraction means 12 shown in FIG. , A character string reading unit 13, a character appearance probability calculating unit 14, and each of the storage units 30 to 34. Under the control of the character recognition program, the data processing device 18 detects some cutout position candidates from the character string image input to the image storage means 11 using the character cutout / feature extraction means, performs preprocessing on the image, Extract features. Next, several character pattern candidates are generated from the plurality of cutout position candidates using the character string reading means, the character appearance probability calculating means, and the respective storage means, and the character state transition probability is calculated for each of the character pattern candidates. Recognition processing is performed using the character state transition probabilities and the character templates stored in the storage unit 15 and the character template storage unit 16, respectively, and a read result is obtained and obtained which can obtain the maximum recognition score for the entire character string. That is, in the embodiment of the present embodiment, the data processing device 18 controls the character recognition program to control the character cutout / characteristic extracting unit 12, the character string reading unit 13, and the character appearance probability calculating unit 14 in the first embodiment. , Character pattern storage means 3
0, main character code storage means 31, sub character code storage means 3
2.Main character state storage means 33 and sub character state storage means 34
And outputs the result of reading the character string.

【００６４】[0064]

【発明の効果】以上説明したように、本発明によれば、
文字列の読み取りにおいて、文字カテゴリごとに文字の
変形のいくつかのタイプを代表する複数の文字状態を準
備し、それらを接続したネットワークを考え、隣接する
２つの文字パタンのそれぞれの形状を代表する文字状態
および文字カテゴリを鑑みてその接続の妥当性を評価す
ることにより、直前に書かれた文字からの続け書きや接
触、筆者の変化によって文字形状が変形することを考慮
に入れて、高精度に文字を認識することが可能となり、
結果として正確な文字列の読み取りが実現される。ま
た、この状態ネットワークは、文字パタンのテンプレー
トとともに、切り出し位置が未知の学習文字列データを
用いて最適化できるため、学習用に個別文字データを準
備する等の作業が不要となり、高精度の読み取り処理系
をより少ない労力で容易に構築することが可能となる。
また、文字パタンの間の依存性を文字状態という離散的
な記号の依存性に置き換えて処理するため、処理量も低
く抑えられる。As described above, according to the present invention,
In reading a character string, a plurality of character states representing several types of character deformation are prepared for each character category, a network connecting them is considered, and the shape of each of two adjacent character patterns is represented. By evaluating the validity of the connection in consideration of the character state and character category, high precision is taken into account that the character shape is deformed due to continuous writing from the character written immediately before, contact, and changes in the writer Can recognize characters,
As a result, accurate character string reading is realized. In addition, since this state network can be optimized using learning character string data whose cutout position is unknown together with a character pattern template, there is no need to prepare individual character data for learning, etc. It is possible to easily construct a processing system with less labor.
Further, since the dependency between the character patterns is replaced with the dependency of a discrete symbol called a character state for processing, the processing amount can be reduced.

[Brief description of the drawings]

【図１】本発明の一実施例の機能的な構成を表したブロ
ック図である。FIG. 1 is a block diagram showing a functional configuration of an embodiment of the present invention.

【図２】文字の誤り方がその字種に依存する例を示す文
字画像の図である。FIG. 2 is a diagram of a character image showing an example in which a character error depends on the character type.

【図３】文字の誤り方が筆者に依存する例を示す文字画
像の図である。FIG. 3 is a diagram of a character image showing an example in which the manner of error of a character depends on the writer.

【図４】入力文字列画像を特徴パタンに変換した結果の
一例を示す図である。FIG. 4 is a diagram illustrating an example of a result obtained by converting an input character string image into a feature pattern.

【図５】本発明の一実施例の処理の流れを表した流れ図
である。FIG. 5 is a flowchart showing a flow of processing according to an embodiment of the present invention.

【図６】本発明の一実施例の処理の流れを表した流れ図
である。FIG. 6 is a flowchart showing a flow of processing according to an embodiment of the present invention.

【図７】本発明の一実施例の機能的な構成を表したブロ
ック図である。FIG. 7 is a block diagram showing a functional configuration of one embodiment of the present invention.

【図８】従来技術の一実施例の機能的な構成を表したブ
ロック図である。FIG. 8 is a block diagram showing a functional configuration of an embodiment of the related art.

[Explanation of symbols]

１文字列認識装置２文字列学習装置 11 画像記憶手段 12 文字切り出し・特徴抽出手段 13 文字列読み取り手段 14 文字出現確率計算手段 15 文字状態遷移確率格納手段 16 文字テンプレート格納手段 17 記録媒体 18 データ処理装置 21 学習文字列データ格納手段 22 文字切り出し・特徴抽出手段 23 文字境界決定手段 24 文字学習手段 30 文字パタン記憶手段 31 主文字コード記憶手段 32 副文字コード記憶手段 33 主文字状態記憶手段 34 副文字状態記憶手段 35 文字パタン記憶手段 36 主文字コード記憶手段 37 副文字コード記憶手段 38 主文字状態記憶手段 39 副文字状態記憶手段 41 画像記憶手段 42 文字切り出し・特徴抽出手段 43 文字列読み取り手段 44 文字出現確率計算手段 45 文字テンプレート格納手段 46 文字パタン記憶手段 47 文字コード記憶手段 48 学習文字データ格納手段 49 文字学習手段 100 画像読み込み 101 文字切り出し・特徴抽出 102 文字列出現確率計算 103 正解文字コード計算・正解切り出し位置計算 200 画像読み込み 201 文字切り出し・特徴抽出 202 文字列出現確率計算 203 正解切り出し位置計算 204 文字データ生成 205 パラメータ更新 206 収束判定 DESCRIPTION OF SYMBOLS 1 Character string recognition device 2 Character string learning device 11 Image storage means 12 Character extraction / feature extraction means 13 Character string reading means 14 Character appearance probability calculation means 15 Character state transition probability storage means 16 Character template storage means 17 Recording medium 18 Data processing Apparatus 21 Learning character string data storage means 22 Character extraction / feature extraction means 23 Character boundary determination means 24 Character learning means 30 Character pattern storage means 31 Main character code storage means 32 Sub character code storage means 33 Main character state storage means 34 Sub characters Status storage means 35 Character pattern storage means 36 Main character code storage means 37 Sub character code storage means 38 Main character state storage means 39 Sub character state storage means 41 Image storage means 42 Character cutout / feature extraction means 43 Character string reading means 44 Characters Appearance probability calculation means 45 Character template storage means 46 Character pattern storage means 47 Character code storage means 48 Learning Character data storage means 49 Character learning means 100 Image reading 101 Character extraction / feature extraction 102 Character string appearance probability calculation 103 Correct character code calculation / correct extraction position calculation 200 Image reading 201 Character extraction / feature extraction 202 Character string appearance probability calculation 203 Correct answer Cutout position calculation 204 Character data generation 205 Parameter update 206 Convergence judgment

Claims

[Claims]

An image storage means for storing an input character string image, a candidate for a boundary between adjacent characters is detected as a cutout position candidate from the character string image received from the image storage means, and a character string image is detected. Character extraction / feature extraction means for extracting features that are converted to a smaller amount (feature) useful for identification, and character extraction for individual character pattern candidates when a character string image is divided by selecting some extraction position candidates Character string reading means for performing recognition and outputting an optimum cutout and an optimum character code string as a whole character string as a result of reading the character string; and a character pattern candidate, a character code, and a character deformation from the character string reading means. A character state that is an index indicating the type, a character code of a character pattern candidate located immediately before a given character pattern candidate,
Character appearance probability calculating means for receiving a character state and calculating a probability that a given character pattern candidate appears under a given character code and character state; and the character appearance probability calculating means calculating a character appearance probability The character state transition probability storage means for storing a numerical value (state transition probability) required to calculate a part of the probability that depends on the character state, and the character appearance probability calculation means for calculating the character appearance probability ,
A character recognition device comprising: character template storage means for storing a numerical value (character template) necessary for calculating a portion of a probability depending on a character pattern.

2. A character pattern storage means for storing a character pattern which is passed when the character string reading means requests the character appearance probability calculation means to calculate a character appearance probability. Two character code storage means for storing a character code corresponding to the character pattern candidate and a character code corresponding to a character pattern candidate positioned immediately before the character pattern candidate to be passed to the occurrence probability calculation means; and 2. The apparatus according to claim 1, further comprising: two character state storage means for storing a character state corresponding to the character pattern candidate and a character state corresponding to a character pattern candidate located immediately before the character state, which are passed to the appearance probability calculation means. Character recognition device according to the description.

3. When the character appearance probability calculation means calculates the appearance probability of a character pattern candidate using the character state,
Each character state transitions according to the Markov stochastic process according to the tendency of deformation of the corresponding character pattern candidate, and the character appearance probability is evaluated by evaluating the validity of the connection between adjacent characters depending on the magnitude of the transition probability between these states. 3. The character recognition device according to claim 1, wherein the value of the character is adjusted.

4. The method according to claim 1, wherein when calculating the distance between the given character pattern candidate and the dictionary pattern, the character appearance probability calculation means selects a character template according to the character state corresponding to the character pattern candidate. The character recognition device according to claim 3, wherein

5. The character template storage means according to claim 1, wherein said character appearance probability calculation means includes a plurality of character templates for use in accordance with the character state corresponding to the immediately preceding character pattern candidate in the form of a normal distribution in the character template storage means. The character recognition device according to claim 4, wherein

6. When the character string reading means calculates a recognition score which is a measure of the likelihood of the recognition result of the character string,
The second character, starting from the score attributed to the first character, 3
The score is sequentially added to the second character, and the recognition score is calculated according to the recurrence formula while storing the character category (character code) to which the character of interest belongs, the corresponding character state, and the boundary position with the next character. The character recognition device according to claim 1 or 2, wherein:

7. A learning character string data storing character string data used for estimating an optimum character state transition probability and an optimum character template from a given character string image and its correct character code string. Storage means for detecting a candidate for a boundary between adjacent characters as a cutout position candidate from the character string image received from the learning character string data storage means, and reducing the character string image to a smaller amount (feature) useful for identification. Character extraction / feature extraction means for extracting the features to be converted, character pattern candidates, character codes, character states which are indexes indicating the types of character transformation, and characters of character pattern candidates located immediately before a given character pattern candidate A character output that receives a code and character state and calculates the probability that the given character pattern candidate appears under the given character code and character state A probability calculating means, a character boundary determining means for estimating a boundary of a character in the character string image using the correct character code string given to the character string image and the character appearance probability calculating means, and the character boundary determining means A character pattern storage unit that stores a character pattern candidate to be passed when requesting the character appearance probability calculation unit to calculate a character appearance probability, and a character code in the correct character code string corresponding to the character pattern candidate and a character immediately before the character code. Two character code storage means for storing codes, two character state storage means for storing a character state corresponding to the character pattern candidate and a character state corresponding to the character pattern candidate immediately before the character pattern candidate, and the character appearance probability calculation When the means calculates the character appearance probability, the character state transition probability case stores the numerical value (state transition probability) necessary to calculate the part of the probability that depends on the character state. When the character appearance probability calculation means calculates the character appearance probability,
Character template storage means for storing a numerical value (character template) necessary for calculating a part of the probability depending on the character pattern, and individual character patterns cut out by the character boundary determining means and their arrangement order are used. A character learning unit for updating the character state transition probability stored in the character state transition probability storage unit and the character template stored in the character template storage unit.

8. A character pattern storage unit for storing a character pattern candidate to be passed when the character boundary determination unit requests the character appearance probability calculation unit to calculate a character appearance probability, and a correct answer corresponding to the character pattern candidate. Two character code storage means for storing a character code in a character code string and a character code immediately before the character code, and a character state corresponding to the character pattern candidate and a character state corresponding to the character pattern candidate immediately before the character code candidate. The character learning device according to claim 7, comprising: two character state storage units.

9. The character learning unit according to claim 7, wherein the character learning means simultaneously optimizes the character state transition probability and the character template so that the total sum or the total product of the recognition scores of the character string data is maximized. The described character learning device.

10. The character learning means uses character string data whose cutout position is unknown, and uses an individual character image automatically cut out by a character boundary determining means so as to maximize the recognition score of the character string. 9. The character learning device according to claim 7, wherein the character state transition probability and the character template are simultaneously optimized.

11. A recording medium storing a character recognition program operating on a computer, comprising the steps of: inputting and storing a character string image in the computer; Detecting a cutout position candidate and performing feature extraction for converting the character string image into a smaller amount (feature) useful for identification; generating a plurality of character pattern candidates based on the cutout position candidate; Using the character pattern candidate, the character code and character state for the character pattern candidate, and the character code and state for another character pattern candidate located immediately before the character pattern candidate, transition of the state corresponding to two adjacent characters Probability that each character pattern candidate appears while evaluating the validity of the connection between two adjacent character patterns based on the probability Calculating, and outputting to search the character segmentation and character code string as obtained the highest score in the entire string,
A computer-readable recording medium recording a program to be executed.