JPH0496882A

JPH0496882A - Full size/half size discriminating method

Info

Publication number: JPH0496882A
Application number: JP2214717A
Authority: JP
Inventors: Takakuni Minewaki; 隆邦嶺脇
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 1990-08-14
Filing date: 1990-08-14
Publication date: 1992-03-30

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。(57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】〔産業上の利用分野］本発明は、日本語文章を対象とする文字認識装置におい
て、日本語文章中の全角文字と半角文字を判別する方法
に関する。DETAILED DESCRIPTION OF THE INVENTION [Field of Industrial Application] The present invention relates to a method for discriminating between full-width characters and half-width characters in a Japanese text in a character recognition device for Japanese texts.

〔従来の技術］近年、データベースへの文書データ入力手段などとして
文字認識装置が用いられるようになっている。このよう
な用途においては、原稿の忠実な認識が必要であって、
単に文字コードとして正確に認識するだけでは十分では
ない場合が多い。[Prior Art] In recent years, character recognition devices have come to be used as means for inputting document data into a database. In such applications, faithful recognition of the manuscript is required.
In many cases, simply recognizing a character code accurately is not sufficient.

例えば日本語文章の場合、全角の文字に混じって半角の
文字が用いられることが少なくないが、半角文字は文字
コードとして正しく認識すると同時に、そのサイズつま
り半角文字であることをも認識する必要がある。For example, in the case of Japanese texts, half-width characters are often used mixed in with full-width characters, but while recognizing half-width characters correctly as character codes, it is also necessary to recognize their size, that is, that they are half-width characters. be.

従来、このような文字の全角、半角の別を考慮した文字
認識装置として、文字行の垂直射影より検出した文字幅
及び文字量空白幅に着目して全角、半角を判別し、全角
文字については漢字辞書及び非漢字辞書の両方を用いて
文字認識を行い、半角文字については非漢字辞書のみを
用いて文字認識を行うようにした文字認識装置が知られ
ている（特開昭６３−８３８８７号、特開昭６３−８３
８８８号）。Conventionally, character recognition devices that consider full-width and half-width characters distinguish between full-width and half-width characters by focusing on the character width and space width detected from the vertical projection of character lines. There is a known character recognition device that recognizes characters using both a kanji dictionary and a non-kanji dictionary, and uses only the non-kanji dictionary for half-width characters (Japanese Patent Laid-Open No. 83887/1987). , Japanese Patent Publication No. 63-83
No. 888).

［発明が解決しようとする課題１文字幅と文字量空白幅は、文字によって、また文字の並
び方によって違いがある。例えば■“とＩＩ　ＭＩ＋で
は文字幅が大きく相違し、また’ＴＩＴ　”と続く場合
、”　Ｍ　Ｉ　ＮＩ　”と続く場合、“’　Ｎｉ　ｋｌ
Ｍ”と続く場合とでは、文字量空白幅が相違する。[Problem to be solved by the invention 1 Character width and character amount Blank width differs depending on the character and how the characters are arranged. For example, there is a big difference in character width between ■" and II MI+, and when it is followed by 'TIT', when it is followed by 'M I NI', it is 'Ni kl
The amount of characters and the blank width are different in the case where it continues with "M".

したがって、上記従来技術のように文字幅や文字量空白
幅を一定の文字幅閾値と比較することによって全角、半
角を判別する方法は、文字によって、あるいは文字並び
によっては、判定精度か悪く、また左右分離文字（“〕
］ピや′い“なと）の誤判定も起こりやすい。Therefore, the method of determining full-width and half-width characters by comparing the character width and character amount/blank width with a fixed character width threshold as in the prior art described above may have poor determination accuracy or Left and right separator characters (“〕
] It is also easy to misjudge the situation.

さらに上記従来技術では、左右分離の漢字を半角文字と
判定した場合、その文字、４２に非漢字辞書のみを用い
る結果、致命的な（回復できない）認識エラーとなって
しまう。Further, in the above-mentioned conventional technology, when a left-right separated Kanji character is determined to be a half-width character, only a non-Kanji dictionary is used for that character 42, resulting in a fatal (unrecoverable) recognition error.

本発明の目的は、日本語文章を対象とする文字Ｈ，２装
置において、上に述べたような全角／半角判定の精度の
問題と、判定誤りによる回復不能な認識エラーの問題を
解決できる全角／半角判定方法を提供することにある。The purpose of the present invention is to solve the problem of accuracy of full-width/half-width judgment as described above and the problem of irrecoverable recognition errors due to judgment errors in a character H, 2 device that targets Japanese sentences. /An object of the present invention is to provide a half-width determination method.

［課題を解決するための手段］本発明によれば、日本語文章を対象とする文字認識装置
において、入力画像より切出された文字画像に対し全角
、半角の区別をせずに文字認識を行い、認識結果として
得られた特定文字種の文字（例えば英数字文字）につい
て、その文字画像の切出しの際に得られた文字サイズに
関する値と、予め特定文字種の文字別に用意された全角
／半角判別用値とを用いて全角、半角の別を判定する。[Means for Solving the Problems] According to the present invention, in a character recognition device for Japanese sentences, character recognition is performed on character images extracted from an input image without distinguishing between full-width and half-width characters. For the characters of a specific character type (for example, alphanumeric characters) obtained as a recognition result, the value related to the character size obtained when cutting out the character image and the full-width/half-width discrimination prepared in advance for each character of the specific character type. Use the value to determine whether it is full-width or half-width.

全角／半角判別用値は、例えば全角文字標準幅に対する
文字幅の比の閾値あるいは文字高さと文字幅の比の閾値
であり、特定文字種の文字に対する全角、半角の別の判
定は、全角文字標準幅に対する文字幅の比、あるいはそ
の文字高さは文字幅の比と、該当の半角／全角判別用値
との比較によって行う。The full-width/half-width discrimination value is, for example, a threshold value for the ratio of the character width to the standard full-width character width or a threshold value for the ratio of the character height to the character width. The ratio of character width to width or the character height is determined by comparing the character width ratio with the corresponding half-width/full-width discrimination value.

また本発明によれば、上記の文字単位の全角／半角判定
処理の後（こ、認識結果として得られた特定文字種の文
字列の中の半角と判定された文字数と全角と判定された
文字数の割合に基づき全角、半角の別を最終判定し、そ
の結Ｖ！：（こ従って文字列中の全文字を全角または半
角に統一する。Further, according to the present invention, after the above-mentioned full-width/half-width determination process for each character (this is the number of characters determined to be half-width characters and the number of characters determined to be full-width characters in a character string of a specific character type obtained as a recognition result), A final decision is made as to whether it is full-width or half-width based on the ratio, and the result is V!: (Thus, all characters in the character string are unified to full-width or half-width.

［作　用］一般に日本語文章中に出現する半角文字は文字種が限ら
れ、殆との場合、漢字やひながなは全角で、半角文字は
英数字（英字、数字）に限られる。[Function] In general, the types of half-width characters that appear in Japanese texts are limited, and in most cases, kanji and hinagana are full-width characters, and half-width characters are limited to alphanumeric characters (letters and numbers).

本発明は、全角、半角を区別せずに文字ｌト識を行い、
その結果が英数字のような半角の可能性のある文字種の
文字であるとき（二■って全角／半角判定の対象とする
ため、分離漢字などを半角文字と判定する間違いを避け
ることができる。また、上記従来技術のような認識前に
全角／半角判定を行う方法と違い、全角文字を半角文字
と誤判定することにより認識漏れ（致命的計、識エラー
）も回避できる。The present invention performs character recognition without distinguishing between full-width and half-width characters,
When the result is a character type that may be half-width, such as an alphanumeric character (2■ is subject to full-width/half-width determination, it is possible to avoid the mistake of determining separated kanji, etc., as half-width characters. Furthermore, unlike the method of performing full-width/half-width determination before recognition as in the prior art described above, recognition failure (fatal chance, recognition error) can be avoided by erroneously determining a full-width character as a half-width character.

また、英数字などの半角文字として用いられる可能性の
ある文字種の文字数は限られているため、文字別に、全
角であるか半角であるかの判別のための値、例えば文字
幅の全角文字標準幅に対する比の閾値や、縦横比（文字
高さと文字幅の比）の閾値を予め用意することは比較的
容易であり、その設定も文字別であるから細かく行うこ
とができる。また、このような比は、前後の文字の影響
を直接受けるものではない。In addition, because the number of characters that can be used as half-width characters such as alphanumeric characters is limited, we also provide values for determining whether each character is full-width or half-width, such as the standard width of the full-width character. It is relatively easy to prepare in advance a threshold value for the ratio to the width and a threshold value for the aspect ratio (ratio of character height to character width), and since the settings are made for each character, they can be set in detail. Further, such a ratio is not directly influenced by the preceding and following characters.

したがって、そのような文字別の判別用閾値などを用い
る本発明の方法によれば、日本語文章中に出現する英数
字などの半角文字を精度よく判別することが可能であり
、また文字間空白幅を用いるような方法と違い文字並び
の影響による判定間違いを回避できる。Therefore, according to the method of the present invention that uses such character-specific discrimination thresholds, it is possible to accurately discriminate half-width characters such as alphanumeric characters that appear in Japanese sentences, and it is also possible to accurately discriminate half-width characters such as alphanumeric characters that appear in Japanese sentences. Unlike methods that use width, it is possible to avoid judgment errors due to the influence of character arrangement.

また、英数字のＩ＋　１．　１１１＋−のような幅の狭
い文字は、全角のときの文字幅と半角のときの文字幅の
差が小さいため、文字単位の全角／半角判定処理では判
定間違いが起きることがある。Also, alphanumeric characters I+ 1. For narrow characters such as 111+-, the difference between the full-width character width and the half-width character width is small, so a determination error may occur in the full-width/half-width determination process for each character.

本発明によれば、文字単位の全角／半角判定処理の後に
、特定文字種の文字列中の半角と判定された文字数と全
角と判定された文字数の割合に基づき、文字列について
の全角／半角判定を行い、その結果によって文字列の全
文字の全角または半角に統一する。一般に日本語文章に
おいて、一つの英数字文字列（単語）の中に全角と半角
が混在することは希であるので、そのような文字列単位
の全角／半角判定処理により、　　ｉ　Ｉ＋や′１“の
ような幅の狭い文字の判定間違いを修正することができ
る。According to the present invention, after full-width/half-width determination processing for each character, full-width/half-width determination is made for a character string based on the ratio of the number of characters determined to be half-width to the number of characters determined to be full-width in a character string of a specific character type. and, depending on the result, standardize all characters in the string to full-width or half-width characters. In general, in Japanese sentences, it is rare for a single alphanumeric character string (word) to contain both full-width and half-width characters, so by performing full-width/half-width determination processing for each character string, characters such as i I+ or '1 It is possible to correct misjudgment of narrow characters such as “.

［実施例］第１図は本発明の一実施例に係る文字認識装置のブロッ
ク図である。[Embodiment] FIG. 1 is a block diagram of a character recognition device according to an embodiment of the present invention.

この文字認識装置において、画像入力部】Ｏはスキャナ
ーなどにより原稿の画像を読取り、その２値画像データ
を入力し、画像メモリ】１に格納する。行・文字切出し
部１２は、画像メモリ１１内の入力画像に対し文字行の
切出し文字画像の切出しを行い、切出した文字画像デー
タを文字画像メモリ１３へ格納し、また文字切出し位置
、文字幅、文字高さ、行ごとの全角文字標準値なとの切
出し情報を切出し情報メモリ１４に格納する。In this character recognition device, an image input unit ]O reads an image of a document using a scanner or the like, inputs the binary image data, and stores it in an image memory ]1. The line/character cutting unit 12 cuts out a character image of a character line from the input image in the image memory 11, stores the cut out character image data in the character image memory 13, and also stores the character cutting position, character width, Cutting information such as character height and standard value for full-width characters for each line is stored in the cutting information memory 14.

文字認識部１５は、文字画像メモリ１３より文字画像の
データを読込み、正規化処理を行ってから特徴量を抽出
し、抽出特徴量と文字辞書メモリ１６より読出した辞書
の特徴量とを比較し、特徴量の距離が小さい認識結果候
補をＮ位まで求め、距離の小さい順にソー［・シて認識
結果メモリ１７へ格納する。なお、この段階では文字が
半角であるか全角であるかを区別せず、文字認識には漢
字辞書及び非漢字辞書の両方を用い、認識結果として全
角文字コードを出力する。The character recognition unit 15 reads the character image data from the character image memory 13, performs normalization processing, extracts the feature amount, and compares the extracted feature amount with the dictionary feature amount read from the character dictionary memory 16. , the recognition result candidates with the shortest distances between the feature amounts are obtained, and are stored in the recognition result memory 17 in order of decreasing distance. Note that at this stage, it does not distinguish whether the characters are half-width or full-width, and both a kanji dictionary and a non-kanji dictionary are used for character recognition, and a full-width character code is output as a recognition result.

全角／半角判定部１８は、認識結果メモリ１７に得られ
た認識結果の第１候補と全角／半角判別テーブルメモリ
１９の内容を参照し、特定文字種（ここでは英数字とす
る）の第１候補文字に関してのみ、全角／半角判定を行
い、半角文字と判定された候補の文字コードを半角文字
コードに書換える。The full-width/half-width determination unit 18 refers to the first candidate of the recognition result obtained in the recognition result memory 17 and the contents of the full-width/half-width discrimination table memory 19, and determines the first candidate of a specific character type (here, alphanumeric characters). Full-width/half-width determination is performed only for characters, and character codes of candidates determined to be half-width characters are rewritten to half-width character codes.

全角／半角判別テーブルメモリ１９には、特定文字種で
ある英数字について文字別の全角／半角判別用値を登録
した全角／半角判別テーブルが格納されている。本実施
例においては、全角／！４″、角判別用値とした全角文
字標準幅に対する文字幅の比の閾値が用いられる。The full-width/half-width discrimination table memory 19 stores a full-width/half-width discrimination table in which full-width/half-width discrimination values for each character are registered for alphanumeric characters that are specific character types. In this example, full-width /! 4'', a threshold value of the ratio of the character width to the standard width of full-width characters, which is used as the value for corner discrimination.

結果出力部２０は、認識結果メモリ２１の全角／半角判
定処理後の認識結果データをデイスプレィやプリンタな
との出力機器へ出力する。The result output unit 20 outputs the recognition result data after the full-width/half-width determination process in the recognition result memory 21 to an output device such as a display or a printer.

第２図に、この文字認識装置の全体処理の流れを示す。FIG. 2 shows the overall processing flow of this character recognition device.

■は画像入力部１０による画像入力の処理であり、■は
行・文字切出し部１２による切出し処理、■は文字肥識
部１５による全角／半角を区別しない文字認識処理であ
る。■と■が全角／半角判定部１８による処理であり、
■が文字単位の判定処理、■が文字列量位の判定処理で
ある。(2) is image input processing by the image input section 10, (2) is cutting processing by the line/character cutting section 12, and (2) is character recognition processing by the character recognition section 15 without distinguishing between full-width and half-width characters. ■ and ■ are processes performed by the full-width/half-width determination unit 18,
■ is a character-by-character determination process, and ■ is a character string amount determination process.

■は結果出力部２０による認識結果の出力処理である。(2) is the output processing of the recognition result by the result output unit 20.

　文字単位の全角／半角判定処理■の内容は第３図に示
す通りである。また文字列単位の全角／側角判定処理■
の処理内容は第４図に示す通りである。The contents of the character-by-character full-width/half-width determination process (2) are as shown in FIG. Also, full-width/side-width determination processing for each character string■
The processing contents are as shown in FIG.

次に、横書きの文字列“新型　Ｒｊｆａｘ　　発売″（
漢字は全角、英字は半角で印刷されているものとする）
を例にして、全角／半角判定処理について詳細に説明す
る。ただし、説明を簡単にするため、この文字列が存在
する行に他の文字がないものとする。Next, the horizontal text string “New Rjfax released” (
(Kanji characters are printed in full-width characters and alphabetic characters are printed in half-width characters.)
The full-width/half-width determination process will be explained in detail using as an example. However, to simplify the explanation, it is assumed that there are no other characters in the line where this string exists.

上記文字列の行切出し１文字切出しが行われた結果、切
出し情報メモリ１４に第５図に示す内容の切出し情報が
得られたとする。Assume that as a result of line cutting and single character cutting of the character string, cutting information having the contents shown in FIG. 5 is obtained in the cutting information memory 14.

ここで、文字幅は文字外接矩形の横方向のサイズ、文字
高さは文字外接矩形の縦方向のサイズである。また、全
角文字標準幅は、対象となる行に対して一つ与えられる
パラメータで、行内文字の文字幅の最大値あるいは文字
高さの最大値、行の高さ等をもとにして算出される。Here, the character width is the horizontal size of the character circumscribing rectangle, and the character height is the vertical size of the character circumscribing rectangle. In addition, the full-width character standard width is a parameter given to the target line, and is calculated based on the maximum character width or height of characters within the line, the height of the line, etc. Ru.

上記文字列の各文字に対する文字認識の結果、第６図に
示すような第１候補が得られて認識結果メモリ１９に格
納されたとする。なお、文字認識では全角、半角を区別
しないため、本実施例では前述のように候補の文字コー
トはすべて全角文字コードとなっている。Assume that as a result of character recognition for each character in the character string, a first candidate as shown in FIG. 6 is obtained and stored in the recognition result memory 19. Note that character recognition does not distinguish between full-width and half-width characters, so in this embodiment, all candidate character codes are full-width character codes, as described above.

全角／半角判別テーブルメモリ１９は第７図に示す内容
であるとして、全角／半角判定部】８の文字単位判定処
理■（第２図）について第３図に沿い説明する。Assuming that the full-width/half-width discrimination table memory 19 has the contents shown in FIG. 7, the character unit determination process (2) of [Full-width/half-width determination section]8 (FIG. 2) will be explained with reference to FIG.

まず認識結果メモリ１９内の認識結果第１候補を参照し
、英数字（英大文字、英小文字、数字）を探す（ステッ
プ１００，１０１，１０２）。First, the first recognition result candidate in the recognition result memory 19 is referred to to search for alphanumeric characters (uppercase letters, lowercase letters, and numbers) (steps 100, 101, and 102).

英数字を見つけると、その文字の文字幅と全角標準幅を
切出し情報メモリ１４より読込み、文字幅の全角文字標
準幅に対する比Ａを計算する（ステップ１０３）。次に
、対象文字の全角文字コードにより全角／半角判別テー
ブルメモリ１９を検索し、その文字に対する全角／半角
判別用閾値Ｂを読込む。When an alphanumeric character is found, the character width and full-width standard width of that character are read from the cutout information memory 14, and the ratio A of the character width to the full-width character standard width is calculated (step 103). Next, the full-width/half-width discrimination table memory 19 is searched using the full-width character code of the target character, and the full-width/half-width discrimination threshold B for that character is read.

そしてＡとＢを比較し、Ａ＜Ｂであれば対象文字を半角
と判定し、Ａｇ３であれば対象文字を全角と判定する（
ステップ１０５）。すなわち、半角文字は全角文字に比
べて横方向に漬れた形をしているので、半角文字の比Ａ
の値は全角文字の比Ａの値に比へて小さいので、この違
いを閾値Ｂによって判定しているわけである。そして、
閾値Ｂとして適切な値は文字によって当然に違いがある
ので、閾値Ｂを文字別に用意している。換言すれば、文
字別にすることによって、閾値Ｂを最適値に細かく設定
可能となる。Then, A and B are compared, and if A<B, the target character is determined to be half-width, and if Ag3, the target character is determined to be full-width (
Step 105). In other words, half-width characters have a horizontally curved shape compared to full-width characters, so the ratio of half-width characters is A.
Since the value of ratio A is smaller than the value of ratio A for full-width characters, this difference is determined using threshold value B. and,
Since the appropriate value for threshold B naturally differs depending on the character, threshold B is prepared for each character. In other words, by character-specific, it becomes possible to finely set the threshold value B to the optimum value.

半角と判定した場合、対象文字の第１候補の全角文字コ
ードを半角文字コードに書き換える（ステップ１ｏ６）
。If it is determined to be half-width, rewrite the full-width character code of the first candidate of the target character to a half-width character code (step 1o6)
.

同様の処理をステップ１．０７で最終文字と判定される
まで繰返す。Similar processing is repeated until the final character is determined in step 1.07.

上記文字列の場合、３番目の文字”　Ｒ”はＡ−０，５
０，８＝０．７６であるから半角と判定される。４番目
から７番目までの英数字もいずれもＡ〈Ｂであるので（
第５図、第７図参照）、すべて半角と判定される。In the case of the above string, the third character "R" is A-0,5
Since 0,8=0.76, it is determined to be half-width. The alphanumeric characters from the 4th to the 7th are all A<B, so (
(see FIGS. 5 and 7), all are determined to be half-width.

このような文字単位の全角／半角判定処理によって、殆
どの半角文字に対する第１候補は半角文字コードに修正
されるので、この処理後の認識結果を最終結果として出
力することも可能である。Through such full-width/half-width determination processing for each character, the first candidates for most half-width characters are corrected to half-width character codes, so it is also possible to output the recognition result after this processing as the final result.

すなわち、文字列単位の全角／半角判定処理σ（第２図
）を省くことも可能である。That is, it is also possible to omit the full-width/half-width determination process σ (FIG. 2) for each character string.

しかし、本実施例においては、より確実な全角／半角判
定を達成するため、さらに文字列単位の全角／半角判定
処理を行う。この処理内容について第４図に沿って説明
する。However, in this embodiment, in order to achieve more reliable full-width/half-width determination, full-width/half-width determination processing is further performed for each character string. The contents of this process will be explained with reference to FIG.

初期設定（ステップ２０１，２０２）の後、認識結果メ
モリ１７内の対象行の第１候補を先頭文字より順に調べ
、英数字文字列を探し、見つかった英数字文字列中の半
角文字と全角文字の個数をカウントする（ステップ２０
３〜２１０）。英数フラグは、英数字文字列の始まりと
終りを検出するためのフラグである。英数字文字列が最
終文字まで続いている場合はステップ２０９からステッ
プ２１３へ進み、その英数字文字列中の全角文字数と半
角文字数を比較し、全角文字数く半角文字数てあれば、
その英数字文字列の全文字の第１候補の文字コードを半
角文字コードに書き換える（ステップ２１４〕。すなわ
ち、本実施例では英数文字列中の全角文字と半角文字の
多数決によって、英数字文字列全体の全角、半角の最終
判定を行う。After the initial settings (steps 201 and 202), the first candidate for the target line in the recognition result memory 17 is examined in order from the first character, the alphanumeric string is searched, and the half-width and full-width characters in the found alphanumeric string are searched. (Step 20)
3-210). The alphanumeric flag is a flag for detecting the beginning and end of an alphanumeric character string. If the alphanumeric character string continues to the last character, proceed from step 209 to step 213, compare the number of full-width characters and the number of half-width characters in the alphanumeric character string, and if the number of full-width characters is equal to the number of half-width characters,
The character codes of the first candidates for all characters in the alphanumeric character string are rewritten to half-width character codes (step 214).In other words, in this embodiment, alphanumeric characters are Performs final determination of full-width or half-width for the entire column.

また、英数字文字列の最終文字に続いて英数字以外の文
字が存在する場合、ステップ２１１，２１２を経てステ
ップ２１３に進み同様の処理を行う。Further, if a character other than alphanumeric characters exists following the last character of the alphanumeric character string, the process proceeds to step 213 via steps 211 and 212, and similar processing is performed.

そして、ステップ２１５で最終文字と判定されると、処
理を終了する。Then, if it is determined in step 215 that the character is the last character, the process ends.

文字単位の全角／半角判定処理の結果、例えば上記文字
列の’Ｒｊｆａｘ’”の中の１”だけ全角と誤判定され
たとする。　　１″や′１″のような幅の狭い文字は、
全角と半角で文字幅の差が小さいため、文字単位の判定
処理では誤果定が起きやすい。As a result of the full-width/half-width determination process for each character, it is assumed that, for example, only 1'' in the character string 'Rjfax' is incorrectly determined to be full-width. Narrow characters such as 1" and '1" are
Because the difference in character width between full-width and half-width characters is small, incorrect determinations are likely to occur when character-by-character determination processing is performed.

しかし、全角文字数はｌ、半角文字数は４であるから、
文字列単位の判定処理で、この″】″は半角に修正され
る。However, since the number of full-width characters is l and the number of half-width characters is 4,
In the determination process for each character string, this "]" is corrected to half-width characters.

一般に英数字の一つの文字列（単語）の中では、全角文
字と半角文字が混在することは希であるので、このよう
な文字列単位の判定処理を行えば、はとんど殆どの場合
に正解を得られる。In general, it is rare for full-width and half-width characters to coexist in a single string of alphanumeric characters (word), so if you perform this kind of judgment processing on a string-by-string basis, in most cases You can get the correct answer.

本発明の他の実施例によれば、文字高さは文字幅の比（
縦横比）の全角、半角の判定に用いられる。この縦横比
は、全角文字に比べ半角文字のほうが大きくなるので、
この違いを判定するための縦横比閾値が文字別に用意さ
れて全角／半角判別テーブルメモリ１９に格納される。According to another embodiment of the invention, the character height is a ratio of the character width (
Used to determine full-width and half-width (aspect ratio). This aspect ratio is larger for half-width characters than for full-width characters, so
Aspect ratio thresholds for determining this difference are prepared for each character and stored in the full-width/half-width discrimination table memory 19.

文字単位の全角／半角判定処理では、切出し情報として
得られた文字高さは文字幅より縦横比が計算され、その
値と全角／半角判別テーブルメモリ１９から読出された
該当閾値との比較により、全角、半角の別が判定がなさ
れる。In the full-width/half-width determination process for each character, the aspect ratio of the character height obtained as cutting information is calculated from the character width, and by comparing that value with the corresponding threshold read from the full-width/half-width discrimination table memory 19, Full-width and half-width characters are determined.

［発明の効果］以上説明した如く、請求項（１）あるいはその従属請求
項（３）または（４）記載の発明によれば、日本語文章
中に出現する英数字などの半角文字を高精度に判別可能
であり、分離漢字などの誤判定、全角文字を半角文字と
誤判定することによる致命的認識エラー、文字並びの影
響による判定間違いを回避でき、さらに請求項（２）あ
るいはその従属請求項（３）または（４）記載の発明に
よれば、英数字の″　ビ′のような幅が狭い文字の全角
、半角の判定間違いを修正することができる。[Effect of the invention] As explained above, according to the invention described in claim (1) or its dependent claims (3) or (4), half-width characters such as alphanumeric characters appearing in Japanese text can be translated with high precision. It is possible to avoid misjudgment of separated kanji, fatal recognition errors caused by misjudging full-width characters as half-width characters, and misjudgment due to the influence of character arrangement. According to the invention described in item (3) or (4), it is possible to correct the error in determining full-width or half-width characters of narrow characters such as alphanumeric characters "bi'".

[Brief explanation of drawings]

第１図は本発明の一実施例に係る文字認識装置のブロッ
ク図、第２図は処理全体のフローチャート、第３図は文
字単位の全角／半角判定処理のフローチャート、第４図
は文字列単位の全角／半角判定処理のフローチャート、
第５図は切出し情報メモリの内容の説明図、第６図は認
識結果メモリの内容の説明図、第７図は全角／半角判別
テーブルメモリの内容の説明図である。１　画像入力部、　　１１・・画像メモリ、１２・・行
・文字切出し部、　　１３・−・文字画像メモリ、　　
１４・・切出し情報メモリ、　　１５・・文字肥識部、
　　１６・文字辞書メモリ、１７・・認識結果メモリ、
　　１８・全角／半角判定部、　　１９・・全角／半角
判別テーブルメモリ、２０　結果出力部。第３図Figure 1 is a block diagram of a character recognition device according to an embodiment of the present invention, Figure 2 is a flowchart of the entire process, Figure 3 is a flowchart of full-width/half-width determination processing for each character, and Figure 4 is for each character string. Flowchart of full-width/half-width determination processing,
FIG. 5 is an explanatory diagram of the contents of the cutout information memory, FIG. 6 is an explanatory diagram of the contents of the recognition result memory, and FIG. 7 is an explanatory diagram of the contents of the full-width/half-width discrimination table memory. 1 Image input section, 11... Image memory, 12... Line/character cutting section, 13... Character image memory,
14... Cutting information memory, 15... Character knowledge department,
16. Character dictionary memory, 17. Recognition result memory,
18.Full-width/half-width determination unit, 19.Full-width/half-width discrimination table memory, 20. Result output unit. Figure 3

Claims

[Claims]

(1) In a character recognition device for Japanese sentences,
Character recognition is performed on the character image cut out from the input image without distinguishing between full-width and half-width characters, and for the characters of a specific character type obtained as a recognition result, the characters obtained when the character image is cut out. A full-width/half-width determination method characterized by determining whether the character is full-width or half-width using a value related to size and a value for determining full-width/half-width prepared in advance for each character of a specific character type.

(2) In a character recognition device that targets Japanese texts, character recognition is performed on character images cut out from input images without distinguishing between full-width and half-width characters, and the specific character type obtained as a recognition result is After determining whether a character is full-width or half-width using the value related to the character size obtained when cutting out the character image and the full-width/half-width discrimination value prepared in advance for each character of a specific character type, recognition is performed. Based on the ratio of the number of characters determined to be full-width and the number of characters determined to be half-width in the resulting character string of the specific character type, final determination is made as to whether it is full-width or half-width, and all characters in the string are determined according to the result. A method for determining full-width or full-width angles, which is characterized by unifying them into full-width or half-width characters.

(3) The value for determining full-width/half-width is the threshold of the ratio of the character width to the standard width of full-width characters, and the other determination of full-width or half-width for a specific character type is based on the ratio of the character width to the standard width of full-width characters, and The full-width/half-width determination method according to claim 1 or 2, characterized in that the determination is performed by comparing with a full-width/half-width determination value.

(4) The value for full-width/half-width discrimination is the threshold value of character height and character width, and another determination of full-width or half-width for a specific character type is based on the ratio of the character height and width, and the corresponding full-width/half-width discrimination. Claims characterized in that the determination is made by comparison with the practical value (
The full-width/half-width determination method described in 1) or (2).