JP2004334461A

JP2004334461A - Character recognition device and character recognition program

Info

Publication number: JP2004334461A
Application number: JP2003128637A
Authority: JP
Inventors: Hiroyasu Miyahara; 景泰宮原; Yasuhiro Okada; 康裕岡田
Original assignee: Mitsubishi Electric Corp
Current assignee: Mitsubishi Electric Corp
Priority date: 2003-05-07
Filing date: 2003-05-07
Publication date: 2004-11-25
Anticipated expiration: 2023-05-07
Also published as: JP4244692B2

Abstract

<P>PROBLEM TO BE SOLVED: To provide a technique that recognizes characters in an input image displaying characters differing in size. <P>SOLUTION: An input image is divided by an area section part 1 into subareas of a first division mode and divided also by an area section part 2 into subareas of a second division mode differing in size from the subareas of the first division mode. A projection section extraction part 3 next extracts black sections from both subareas. A character string area extraction part 4 then merges the black sections to form character string areas suitable to the respective division modes, a character pickup part 5 executes picking in character units, and a character recognition part 6 executes character recognition. <P>COPYRIGHT: (C)2005,JPO&NCIPI

Description

【０００１】
【発明の属する技術分野】
この発明は、画像中の文字を認識する文字認識装置及び文字認識プログラムに係るものであり、特に画像中に異なる大きさの文字からなる文字列が存在する場合に、この画像から文字列領域を効率的に抽出する技術に関する。
【０００２】
【従来の技術】
従来の画像中の文字を認識する文字認識装置は、文字パターンの大きさを推測して文字列の存在する領域（文字列領域）を切り出して、この文字列領域内に存在する画素パターンと文字パターンとを照合するものであった。このような文字認識装置では、切り出す文字列領域の大きさの基礎となる文字パターンの大きさの推測方法が重要となる。
【０００３】
このような文字パターンの大きさを推測する方法としては、画像中の特定位置にある部分領域において、文字を構成する画素の分布状況を取得し、この分布状況から文字サイズを推測する技術がある（例えば特許文献１）。
【０００４】
【特許文献１】
特開昭６３−２９２３８１「文字行検出装置」（第１図、第３頁−第５頁）
【０００５】
【発明が解決しようとする課題】
上記のとおり、従来の文字認識装置は入力画像の一部の領域から文字サイズと行間隔を推定している。したがって、このような領域から、基準となる文字の情報を得ても、この文字とは異なる大きさの文字が別の領域に存在している場合には、正しく認識することができないという課題があった。
【０００６】
この発明は、このような課題を解決するためになされたものであり、大きさの異なる複数の文字列が存在する場合であっても、適切に文字列を検出し、認識を行う文字認識装置を提供することを目的とする。
【０００７】
【課題を解決するための手段】
この発明に係る文字認識装置は、入力画像を所定の大きさの文字に適合する第１分割形態の領域に分割するとともに、前記大きさとは異なる大きさの文字に適合する第２分割形態の領域に分割し、さらに前記第１分割形態の各領域と前記第２分割形態の各領域から黒画素数が所定数以上存在する領域を黒区画として抽出する黒区画抽出手段と、
前記黒区画を併合してそれらの分割形態に適合する文字列領域を形成するとともに、前記文字列領域から文字領域を切り出す文字領域抽出手段と、
前記文字領域の文字パターンを認識する文字認識手段と、を備えたものである。
【０００８】
【発明の実施の形態】
以下、この発明による実施の形態について説明する。
実施の形態１．
図１はこの発明の実施の形態１による文字認識装置の構成を示したブロック図である。図において、領域区画部１と領域区画部２はそれぞれ、図示せぬカメラなどの画像入力手段によって撮像された入力画像を複数の領域に分割するものである。領域区画部１が分割する領域の大きさと領域区画部２が分割する領域の大きさは異なっている。投影区画抽出部３は、入力画像中の領域の画素の値に基づいて、黒画素の存在する領域（黒区画）を抽出するようになっている。文字列領域抽出部４は、投影区画抽出部３によって抽出された黒区画を併合して、文字列が存在する可能性のある領域である文字列領域候補を形成し、さらに文字列領域候補の大きさから文字列領域か否かを決定する部位である。文字切り出し部５は、文字列領域抽出部４によって形成された文字列領域から各文字の領域を切り出すようになっている。そして、文字認識部６は、文字切り出し部５によって切り出された各文字の領域を文字認識する部位である。ここで、領域区画部１と領域区画部２、投影区画抽出部３は黒区画抽出手段を構成するものであり、文字列領域抽出部４と文字切り出し部５は文字領域抽出手段を構成するものである。また文字認識部６は、文字認識手段に対応する。
【０００９】
次にこの文字認識装置の動作を説明する。図２はこの文字認識装置の動作を示すフローチャートである。本処理の前提として、図３に示すような入力画像７が撮像されているものとする。入力画像７は、白と黒の画素からなる２値画像であって、図に示すように、入力画像７には文字列ではない模様８と文字列９〜文字列１１からなる３個の横書きの文字列が存在している。
【００１０】
まず、領域区画部１と領域区画部２は入力画像７全体を部分領域に分割する（ステップＳ１）。領域区画部１は、入力画像７を互いに等しい面積を有する部分領域に分割する。入力画像７を領域区画部１によって分割した状態を、第１の分割形態と呼ぶこととする。また領域区画部２は、同じ入力画像７を第１の分割形態による部分領域とは異なる面積を有する部分領域であって、互いに等しい面積を有する部分領域に分割する。領域区画部２によるこのような分割状態を、第２の分割形態と呼ぶこととする。
【００１１】
図４は領域区画部１による部分領域設定（第１の分割形態）の例を示す図であって、図５は領域区画部２による部分領域設定（第２の分割形態）の例を示す図である。第２の分割形態に比べて、第１の分割形態では幅狭の部分領域に分割されている。
【００１２】
一般に、画像中の文字を認識するためには、画像を複数の部分領域に分割して、各部分領域毎に画素の分布を求めることが基本となる。精度のよい文字認識を行うには、この部分領域を適切に設定することが要求される。ところで、多くの場合画像中には、文字列以外の物体やその影が画像中に撮像されたり、文字列が回転する（画像の水平あるいは垂直座標軸に対して文字列が斜めに撮像される）ことによって、ノイズ（文字を構成しない画素）が混在する。そこでなるべく大きな部分領域を設定すれば、このようなノイズによる画素分布の影響を相対的に小さくすることができる。しかし画像中に小さな文字が存在する場合には、この小さな文字までもノイズとして排除されてしまうおそれがある。
【００１３】
そこで、実施の形態１による文字認識装置では、小さな文字を認識することを目的とする領域分割も行うこととした。小さな文字用の領域分割によって、大きな文字用の領域分割ではノイズとして排除されてしまうような画素の分布に対しても文字認識が可能となるからである。
【００１４】
図４に示した第１の分割形態は、比較的小さな文字を認識することを目的とする部分領域に分割した状態を指している。また図５に示した第２の分割形態は、第１の分割形態に対応する文字よりも大きな文字を認識することを目的とする部分領域に分割した状態を指している。
【００１５】
さらに画像を分割する方向については、横書き文字列を認識する場合には、縦長の短冊状に入力画像を分割する方がよい。部分領域の幅単位でノイズを棄却するためである。一方、縦書き文字列を認識するには、横長の短冊状に入力画像を分割すればよいし、横書きか縦書きか想定できない場合には正方形に近い部分領域に分割する。図４及び図５は縦長の短冊状の部分領域に入力画像を分割したものである。
【００１６】
また、第１及び第２の分割形態の部分領域は、説明を簡単にするために、想定される入力画像中の文字の大きさに基づいて定められるものとする。文字の大きさと大きさが極端に異なる部分領域に分割すると、正しくノイズの棄却が行えなかったり、文字の一部が欠けた状態で検出してしまったりするためである。この例とは異なり、入力画像中の文字の大きさを予測できない場合には、何段階かの文字の大きさに対応した分割形態とその分割形態に対応した領域区画部を準備しておけばよい。したがって当然に、３以上の分割形態に分割するようにしてもよい。
【００１７】
次に、投影区画抽出部３は、第１の分割形態と第２の分割形態の双方に対して、入力画像を分割した方向の画素の列ごとに、投影をとって投影値を算出する（ステップＳ２）。投影値とは、ある領域の一定方向（水平方向、又は垂直方向）の画素の列について、その列上の画素値の総和をいう。この例では、入力画像を水平方向（部分領域が縦長の短冊状をなすように）分割したので、各部分領域を水平方向の画素の列ごとに、画素値の総和を算出する。
【００１８】
続いて、ステップＳ２で算出された投影値に基づいて、黒区画を抽出する（ステップＳ３）。具体的には、各投影値と所定の閾値とを比較し、所定の閾値以上となる場合には１、所定の閾値未満となる場合には０に２値化する。ついで２値化された投影値として１が連続する領域を黒区画、０が連続する領域を白区画とする。その結果として、第１の分割形態から抽出された黒区画の例が図６である。幅狭な第１の分割形態の黒区画１４では、大きな文字列９の領域は分断されている。また第２の分割形態から抽出された黒区画の例が図７である。幅広な第２の分割形態の黒区画１５では、小さくかつ近接位置にある文字列１０と１１の領域が一つの黒区画になっている。また、図３の入力画像７の模様８の領域については、第１の分割形態の黒区画１４では分かれているが、第２の分割形態の黒区画１５では全体が一つの区画となり、その大きさ（高さ）は文字列９や１０とほぼ同じになっている。
【００１９】
次に文字列領域抽出部４は、投影区画抽出部３によって抽出された第１の分割形態と第２の分割形態の各黒区画から、文字列領域を構成する黒区画を抽出する（ステップＳ４）。すなわち、次のような処理を行う。まず第１の分割形態について、図示せぬ記憶装置に記憶されている文字の大きさ（第１の文字の大きさと呼ぶ）を取得する。ここでは、第１の文字の大きさとして、文字の標準高さを取得する。次に第１の分割形態の各黒区画の高さと第１の文字の大きさとして取得した文字の標準高さとを比較する。ここでは例えば許容最小倍率は９０％、許容最大倍率は１１０％を許容範囲として設定しておき、黒区画の高さが文字の標準高さの９０％以上でかつ１１０％以内の値となる場合に、その黒区画を文字列領域を構成する黒区画として採用する。また第２の分割形態についても同様に第２の文字の大きさを取得して比較する。第２の分割形態は第１の分割形態よりも大きいので、第２の文字の大きさも第１の文字の大きさよりも大きく設定される。
【００２０】
この結果、図６及び図７の左下に存在した模様８に対応する黒区画については、選択されない。その理由は、第１の分割形態において、これらの黒区画は第１の文字の大きさの許容範囲を超えて小さいものであり、さらに、第２の分割形態において、これらの黒区画は第１の文字の大きさに近い高さを有しているが、第２の文字の大きさの許容範囲を超えていることを理由とする。このようにして、異なる大きさの文字が混在する入力画像であっても、文字を構成しない画素を原因とするノイズを除去し、誤検出を防止する。
【００２１】
なお、上記の例では横書き文字列を検出するために、縦方向に分割した領域に存在する黒区画の高さと文字の標準高さとを比較した。これに対して縦書き文字列を認識する場合には横方向に領域分割するが、この場合には各領域に存在する黒区画の幅と文字の標準幅とを比較すればよい。縦書き文字列と横書き文字列が混在した入力画像を文字認識の対象とするために、正方形状に部分領域に分割した場合には、高さと幅の双方を比較すればよい。
【００２２】
その結果、文字列領域抽出部４は、当該許容範囲に入っている黒区画を文字列領域候補とする（ステップＳ５）。その後、文字列領域抽出部４は、文字列領域候補を併合して文字列領域を形成する（ステップＳ６）。すなわち、隣接する部分領域に存在する文字列領域候補であって、相互の垂直座標の差が所定の閾値以下の文字列領域候補を一つの文字列領域とする。一方、隣接する部分領域に上端・下端が近接した文字列領域候補が存在しない場合、この文字列領域候補は文字列領域として形成されない。図１０は、第１の分割形態の黒区画から形成された文字列領域の例であり、図３の文字列１０〜１１に対応した文字列領域２１〜２２が形成されている。また図１１は、第２の分割形態の黒区画から形成された文字列領域の例であり、図３の文字列９に対応した文字列領域２４が得られている。
【００２３】
文字切り出し部５は、文字列領域抽出手段５の抽出した文字列領域それぞれに対して、文字切り出し対象領域を定め、従来と同様の手順で文字切り出しを行う（ステップＳ７）。文字切り出し対象領域は、当該領域からはみ出る文字パターンが発生しないよう、例えば文字列領域の上下左右を所定値だけ広げた範囲とする。図１１の文字列領域２４に対して設定した例が図１２の文字切り出し対象領域２５である。その後、文字認識部６が、従来と同様の手順で文字認識を行う（ステップＳ８）。
【００２４】
なお、画像分布の状態から、大きな文字を処理対象とする第２の分割形態の文字列領域と、小さな文字を処理対象とする第１の分割形態の文字列領域が重なることも考えられる。例えば漢字「知」は偏「矢」と旁「口」から構成されているが、偏と旁それぞれのみで単独の漢字と扱うことも可能である。このような場合に第１の分割形態による処理結果からは「矢」と「口」が検出され、第２の分割形態による処理結果からは「知」が検出されることになるので、両者の処理結果は矛盾することになる。
【００２５】
そこで、このような場合には、大きな文字を処理対象とする第２の分割形態の算出結果を優先することとする。これによって、複数の分割形態による処理結果は統合される。なお、このような統合処理は文字切り出し部５あるいは文字認識部６のいずれかで行うようにする。
【００２６】
以上から明らかなように、実施の形態１の文字認識装置によれば、部分領域の大きさと対応付けて抽出すべき文字列の大きさを定め、この大きさと抽出した黒区画の大きさとを比較して文字列領域を抽出するようにし、さらに異なる大きさの部分領域に分割して、それぞれの大きさの部分領域ごとにこのような処理を行うこととしたので、異なる大きさの文字を含む入力画像に対しても、誤認識を防止して適切に文字認識を行うことができる。
【００２７】
さらに、入力画面全体を部分領域に分割したので、文字列の表示位置やその大きさによらず、文字列を適切に検出して認識することができる。
【００２８】
なお、以上の処理では、各分割形態の黒区画の大きさと文字の標準大きさとを比較し、適合する黒区画のみを選択した後に、選択された黒区画から文字列領域を形成することとした。しかしこの方法以外にも、まず隣接する黒区画を併合して文字列領域候補を形成した後に、この文字列領域候補が文字列領域であるかどうかを調べる方法も考えられる。この場合には、上述の説明のように文字列が横書きの場合には文字の標準高さを基準として文字列領域候補を選択する方法（文字列が縦書きの場合には文字の幅、縦書きと横書きが混在するには双方）の他に、次のような文字列領域候補選択方法を採ってもよい。
【００２９】
すなわち、文字列領域候補の幅（文字列が横書きの場合）をこの分割形態に対応する文字の標準幅で除算し、この除算結果が整数値（離散値）に近い値になる場合に、この文字列領域候補を文字列領域であると判定するというものである。文字列が縦書きの場合には、文字の標準高さで除算するようにする。また混在する場合には、いずれか文字の標準高さか標準幅のいずれか一方を選択して除算する。さらに標準幅と標準高さとを乗算して得た標準面積を基準としてもよい。
【００３０】
また、本実施の形態では黒い文字を検出・認識するため、入力画像の投影値から黒区画を求めたが、最初に入力画像を白黒反転させることで、白い文字の検出・認識も可能である。
【００３１】
また、実施の形態１による文字認識装置が果たす文字認識機能をコンピュータに実行させるコンピュータプログラムとして実現することも当然に可能である。この場合には、領域区画部１、領域区画部２、投影区画抽出部３、文字列領域抽出部４、文字切り出し部５、文字認識部６のそれぞれの部位の機能に相当する機能を実行するコンピュータプログラムを順次実行するコンピュータプログラムとすればよい。
【００３２】
実施の形態２．
なお、実施の形態１では、入力画像全体を複数通りの分割形態によって部分領域に分割した。これに対して、図２のフローチャートのステップＳ１において、一つの入力画像を複数の分割形態を組み合わせて分割するようにしてもよい。
【００３３】
例えば、図１３に示すように入力画像を撮像するカメラ２６ａが支柱２６ｂの上端に設置されており、自動車などのナンバープレート２７や２８上に印字されたナンバーを読みとる場合、撮像された入力画像中の文字列は図１４のようになる。図１４において、入力画像２９の上部にはナンバープレート２８上の文字列３０が相対的に小さく表示されている。また入力画像２９の下部にはナンバープレート２７上の文字列３１が相対的に大きく表示されている。このように、ナンバープレート２７と２８上の文字の大きさはもともとほぼ同じ大きさであるが、カメラ２６ａから遠い位置にあるナンバープレート２７上の文字列３０は入力画像２９の上部に小さく、かつ、カメラ２６ａから近い位置にあるナンバープレート２８上の文字列３１は入力画像２９の下部に大きく表示されることになる。
【００３４】
このような場合に、例えば入力画像２９の上半分を第１の分割形態によって領域分割し、下半分を第２の分割形態によって領域分割するようにすれば、実施の形態１と同様に文字列を構成する文字の大きさに適した文字列領域の分割が行える。
【００３５】
以上から明らかなように、実施の形態２による文字認識装置によれば、カメラの撮像位置と文字列が表示されている物体の位置との関係から、入力画像中の文字列の大きさが予め予測できる場合に、入力画像の分割形態を最適に組み合わせて検出・認識することができる。
【００３６】
また実施の形態１のように入力画像全体を単一の分割形態による部分領域に分割する処理を複数回行う方法に比べて、同一の入力画像を複数の分割形態を組み合わせて部分的に分割することによって、大量の画素を処理する手間が省けるので、性能も向上し、さらに計算機資源の節約を図ることも可能となる。
【００３７】
なお、カメラの位置と文字列を表示する物体の位置関係に応じて、入力画像の分割の仕方を変更してもよいことはいうまでもない。例えば、左側に設置されたカメラから右方向に設置されているナンバープレートを撮像するような場合、部分領域の大きな分割形態の対象範囲を入力画像の左側に、部分領域の小さな分割形態の対象範囲を入力画像の右側に設定すればよい。
【００３８】
実施の形態３．
次に、この発明の実施の形態３による文字認識装置について説明する。実施の形態１による文字認識装置は入力画像を白と黒の画素からなる２値画像としたが、実施の形態３による文字認識装置は、多値画像あるいは多階調画像を入力画像とする点で異なる。
【００３９】
図１５は、実施の形態３による文字認識装置の構成を示すブロック図である。図において微分画像抽出部１０１は、多階層画像から微分画像を作成する部位である。その他、実施の形態１による文字認識装置と同一の符号を付した構成要素については、実施の形態１と同様であるので説明を省略する。
【００４０】
次に実施の形態３による文字認識装置の動作について説明する。この文字認識装置の処理を示すフローチャートは実施の形態１と同じく図２を用いる。まず実施の形態１と同じように図示せぬ画像入力手段によって、多階層画像が撮像されて取り込まれる。例えば、この画像は１画素８ビットの濃淡画像であるものとする。図１６はこのような入力画像の例を示す図であって、入力画像３６の中には、白色の文字列３７と黒色の文字列３８が混在しており、さらに文字列３８の両脇には柱３９と柱４０が表示されている。
【００４１】
まず、実施の形態１と同様にステップＳ１において、領域区画部１および２が第１の分割形態及び第２の分割形態による領域に分割する（ステップＳ１）。続いて、入力画像３６の投影値を算出する（ステップＳ２）。実施の形態１による文字認識装置とは異なり、この文字認識装置の入力画像は多階層画像である。そこで、このステップにおいては、まず微分画像抽出部１０１が微分画像を作成し、次にこの微分画像を２値化して投影値を算出する。画像の微分は、例えば総研出版発行「コンピュータ画像処理入門」ｐｐ．１１９〜１２２に記載の各種方法が使用できるが、ここでは、その中のＳｏｂｅｌオペレータによる方法を用いることとする。また、微分値の２値化には、例えば固定の閾値を適用する方法を用いることができる。その結果、図１７に示すように表示されている物体と文字の輪郭部分だけが残った画像が得られる。この微分２値画像から投影値を算出することで、黒文字だけでなく白色の文字列からも黒区画が抽出される。
【００４２】
なお図１７では、図を見やすくするために領域区画部１および２によって分割された分割の境界線を割愛している。またステップＳ１とステップＳ２の処理順序を逆にして、先に微分２値画像を求めてから領域分割を行うようにしてもよい。
【００４３】
次に実施の形態１と同様の手順で、ステップＳ２で算出された投影値に基づいて、第１の分割形態と第２の分割形態のそれぞれについて黒区画を抽出する（ステップＳ３）。微分２値画像の場合、濃度の変化の少ない領域は黒画素の分布が小さいので、黒区画は濃度の変化の大きい文字や物体の輪郭部分に多く検出される。図１８は、第１の分割形態から抽出された黒区画の例を示す図であり、幅狭な第１の分割形態の黒区画４４では、大きな文字列３７の領域は分断されている。また図１９は、第１の分割形態から抽出された黒区画の例を示す図であり、幅広な第２の分割形態の黒区画４５では、文字列３８の領域が隣接した柱３９と柱４０の影響で極度に大きな黒区画となっている。
【００４４】
次に、抽出された黒区画から文字列領域を構成しうる文字列領域候補を選択し（ステップＳ４とステップＳ５）、次に文字列領域候補を併合して文字列領域を形成する（ステップＳ６）。これらの処理は実施の形態１と同様であるので、説明を省略する。
【００４５】
続いて文字列切り出し部５は、実施の形態１と同様に文字パターンの切り出しを行う（ステップＳ７）。但し、実施の形態３における入力画像は多階層画像なので、最初に各文字列領域に含まれる文字が黒文字か白文字かを判定する。そのために、文字列領域の位置を基準に判定対象領域を設定し、入力画像における判定対象領域を２値化して文字列方向への投影を行い、この投影値に基づいて判定を行う。
【００４６】
なお「文字列方向」という語は、文字列を構成する文字の並んでいる方向（縦または垂直・横または水平など）を意味する語であるものとし、「文字列方向への投影を行う」とは、例えば、横書き文字列であれば、水平方向の各画素列について投影データを算出するものである。したがって、例えば画素数が２０（垂直）×１２８（水平）の文字列領域に横書き文字が表示されている場合、同一の垂直座標を有する１２８個の画素からなる水平方向の画素列が２０個存在することになる。このような場合、文字列方向への投影を行う、とは、２０個の水平方向画素列のそれぞれについて投影データを算出することを意味する。
【００４７】
また、判定領域とは文字列領域を含む領域であって、例えば、文字の端が確実に判定領域内に含まれるように、文字列領域を文字列方向と垂直な方向に所定量広げた範囲の領域である。図２０は、このような判定領域を概念的に示すための図であって、白文字列３７についての判定領域４８と黒文字列３８についての判定領域４９を示している。
【００４８】
以下に、判定領域４８を例にとって、この領域に表示されている文字が黒文字であるか白文字であるかを判定する処理（黒文字・白文字判定）について説明する。図２１は判定領域４８について算出した投影値の分布を示すものである。図において、投影値分布５０はこの領域全体の水平方向の投影値を示している。５１は文字列領域３７の中央位置であって、５２と５３はそれぞれ予め設定された投影値の下限値と上限値である。この下限値５２・上限値５３の値は、例えば、判定対象領域の文字列方向の長さに所定の比率を掛けた値とする。
【００４９】
この場合において、まず文字列領域の中央位置５１の投影値から開始して、次に上方向（縦書き文字列の場合は左方向）、および下方向（縦書き文字列の場合は右方向）に一画素分ずつ順次投影値を取得していき、そして各投影値が下限値を下回らないか、さらに上限値を上回らないかを調べる。この結果、初めて下限値を下回るか上限値を上回る画素の位置を文字列の端とみなす。さらに下限値を先に下回った場合には、この文字列を黒色の文字列とみなし、上限値を先に上回った場合には、この文字列を白色の文字列とみなす。
【００５０】
図２１の例でいえば、上下どちらの方向についても下限値５２を下回る前に上限値５３を上回ることになるので、この文字列が白色の文字列であると判断される。一方、図２２に示した例では、投影値５４を文字列領域の中央から参照して行くと、上限値５６を上回る前に下限値５５を下回り、黒色の文字列と判定される。
【００５１】
その後、文字切り出し部５は、実施の形態１と同様の手順で文字切り出し対象領域を定めた後、当該領域の入力画像を２値化して文字切り出し用の２値画像を作成する。さらに黒文字・白文字判定の結果、判定結果が白文字であれば、当該２値画像を白黒反転させた画像を文字切り出しに用いる。以後の処理については、実施の形態１と同様であるので説明を省略する。
【００５２】
以上から明らかなように、実施の形態３の文字認識装置によれば、多階層画像に対しても微分２値画像化したのちに、黒区画を抽出して文字列の大きさと比較し、文字列領域を選択することとしたので、白色・黒色の文字列が混在した画像からでも、処理量を大きく増やすことなく、個々の文字を正しく抽出して認識できる。
【００５３】
なお上述の説明では、領域分割を行った後に、それぞれの分割形態ごとに微分２値化を行うこととしたが、微分２値化は黒区画を抽出するステップＳ３以前に行っておけばよく、例えば入力画像を微分２値化し、その微分２値化後の画像に対して領域分割を行うようにしてもよい。
【００５４】
実施の形態４．
次に実施の形態４による文字認識装置について説明する。実施の形態４の文字認識装置は、第２の分割形態の部分領域を形成する方法に特徴を有するものである。また、実施の形態３の文字認識装置と比して、多階調画像を取扱う方法が異なり、さらに入力画像中には回転を生じた文字列を含むものとする。
【００５５】
実施の形態４による文字認識装置の構成を示すブロック図として図１５を用いる。但し実施の形態４の文字認識装置では、領域区画部２及び投影区画抽出部３、文字列領域抽出部４が実施の形態３と異なっている。領域区画部２は領域区画部１が分割した第１の分割形態による部分領域を併合することによって第２の分割形態による部分領域を形成するようになっている。投影区画抽出部３は、第１の分割形態の部分領域から抽出した黒区画を併合して第２の分割形態の部分領域の黒区画を形成するようになっている。文字列領域抽出部４は、文字列の回転によって生じた黒区画間のずれの影響を排除して文字列領域を形成するようになっている。他の構成要素については、実施の形態３と同様であるので説明を省略する。
【００５６】
図２３は実施の形態４の文字認識装置が文字認識を行う入力画像の例である。図の入力画像５８において、５９は黒地に白色で表示された文字列である。また６２は文字又は文字列でない楕円状の図形であり、さらに文字列６０および６１は回転が生じている文字列である。
【００５７】
次に実施の形態４による文字認識装置の動作を説明する。実施の形態４による文字認識装置における処理は実施の形態１乃至３と同様にフローチャート図２によって示される。まず領域区画部１は、入力画像５８を第１の分割形態による部分領域に分割した後、領域区画部２は、これらの部分領域に基づいて第２の分割形態による部分領域を形成する（ステップＳ１）。すなわち、最初に領域区画部１は、入力画像５８を小さい部分領域に分割する。領域区画部１が分割した部分領域は第１の分割形態としてメモリに記憶させておく。次に領域区画部２は、この小さな部分領域のうち、隣接する２個の部分領域同士を併合して大きな部分領域を形成する。実施の形態１及び３では、領域区画部１と２は独立して入力画像を部分領域に分割したが、実施の形態４では第１の分割形態を利用して第２の分割形態による部分領域を形成する点で異なるものである。
【００５８】
なお、この説明では簡単のために、第１の分割形態による部分領域のうち、隣接する部分領域を２個ずつ併合して、第２の分割形態による部分領域を形成することとするが、第２の分割形態による部分領域を形成する方法はこの限りではない。例えば隣接する部分領域を３個ずつ併合する方法を採用してもよいし、また隣接する３個の部分領域を併合した後に、２等分するような方法で部分領域を形成してもよい。
【００５９】
次に第１の分割形態による部分領域と第２の分割形態による部分領域から投影値を算出して、黒区画を抽出する（ステップＳ２及びステップＳ３）。実施の形態４における入力画像は多階調画像であるので、実施の形態３と同様に微分２値化を行ってから黒区画を抽出する。ただし実施の形態４は、次のような点で実施の形態３とは異なる。すなわち、第１の分割形態による部分領域に対して微分２値化を行い、さらに黒区画の抽出を行った後に、この黒区画を併合して第２の分割形態による黒区画を形成する点である。
【００６０】
具体的には、次のような処理を行う。まず第１の分割形態に対して実施の形態３と同様に微分２値画像や投影値の算出、黒区画の抽出を行う。図２４は、ここで得られた微分２値画像の例である。但し部分領域間の境界線の表示を省略している。さらに図２５は、第１の分割形態に基づいて得られた黒区画の例である。
【００６１】
さらに、すでに第２の分割形態の部分領域を形成するために併合された第１の分割形態の部分領域間で、黒区画が隣接している場合に、これらの黒区画の併合処理を行う。この併合処理は例えば次のいずれかの方法によって行われる。
【００６２】
（１）隣接する第１の分割形態による黒区画を囲む最小の矩形を算出し、この矩形によって囲まれた黒区画の面積の和とこの矩形の面積との比をとって、この比が所定値以上となる場合に、この最小矩形全体を第２の分割形態による黒区画とする。
（２）隣接する第１の分割形態による黒区画の境界線の長さが所定値以上である場合に、これらの黒区画を囲む最小の矩形全体を第２の分割形態による黒区画とする。
【００６３】
図２６は、このような黒区画の併合処理を示す説明図である。図は、第１の分割形態による黒区画１１１と１１２が第２の分割形態による黒区画１１３に併合される様子を示すものである。また第１の分割形態による黒区画１１６は同じ部分領域に属する黒区画１１４と１１５の双方に隣接している。このような場合には、黒区画１１４、１１５、１１６のすべてを囲む最小の矩形が一つの黒区画１１７となる。
【００６４】
一方、第１の分割形態による黒区画１１８と１１９も隣接しているが、このような場合には、上記（１）と（２）のいずれの方法によっても第２の分割形態による黒区画には形成されない。このように、文字列の回転を吸収するために、黒区画のずれを許容しようとすると文字列の回転によって生じた黒区画のずれではない黒区画のずれまで含んでしまうことがある。しかし、上記（１）と（２）の基準に基づいて隣接する黒区画を併合するようにすれば、そのようなケースを排除することが可能となる。
【００６５】
このように第１の分割形態による黒区画を併合して第２の分割形態の黒区画を形成することによって、第２の分割形態に対して微分２値画像や投影値の算出、黒区画の抽出を行う処理を行わなくて済むので、処理を高速に行うことができるようになる。
【００６６】
次に文字列領域抽出部４は、黒区画を併合して文字列領域を形成する（ステップＳ４）。実施の形態４では、文字の標準大きさによって黒区画を選択する代わりに、各黒区画の領域における入力画像の画素濃度に基づいて黒区画を選択することとする。例えば、第１の分割形態の黒区画については黒文字・白文字双方を許容し、第２の分割形態の部分領域に対しては白文字のみを許容するようにする。
【００６７】
黒区画を選択する処理は次のように行う。すなわち、まず黒区画の領域における入力画像の最大画素値と最小画素値との平均値を２値化閾値として算出する。次に、２値化閾値より値の小さい画素の数と２値化閾値以上の値を持つ画素の数とを比較し、前者が大きければ（２値化閾値より黒い画素が多ければ）黒文字と判定し、逆に後者が大きければ白文字と判定する。黒文字・白文字の判定結果が、その黒区画の分割形態で定められた文字の色（黒色・白色）に一致すれば、この黒区画は選択される。一致しない場合は、その黒区画は棄却される。文字列領域の形成（ステップＳ５）は、選択された黒区画だけを併合することによって行われる。
【００６８】
一般的な文字列の画像領域では、文字の画素よりも背景の画素の方が多いため、上記の方法によれば、２値化閾値を適正に設定することで黒文字・白文字が判定できる。実施の形態３で説明した方法では文字列方向の投影を用いるため、文字列の回転角度が非常に大きいと正しく判定できない場合もあるが、この方法では、濃淡分布を使用しているため、回転角度に制約を受けずに判定できる。
【００６９】
ステップＳ５以降の処理については実施の形態３と同様であるので、説明を省略する。
【００７０】
以上から明らかなように、実施の形態４の文字認識装置によれば、第１の分割形態に基づいて第２の分割形態を求めるので、演算量を大幅に削減できる。また上記（１）と（２）の基準により黒区画を併合するので、文字列の回転に強い文字認識が可能となる。
【００７１】
【発明の効果】
この発明による文字認識装置は、入力画像を第１の分割形態の領域に分割するとともに、第１の分割形態の領域とは異なる大きさを有する第２の分割形態の領域にも分割し、さらに双方の分割形態の領域から抽出された黒区画から、それぞれの分割形態に適合する文字列領域を形成するようにしたので、大きさの異なる複数の文字列が存在する場合であっても、適切に文字列を検出し、認識を行うことができるという極めて優れた効果を有するものである。
【図面の簡単な説明】
【図１】この発明の実施の形態１の文字認識装置の構成を示すブロック図である。
【図２】この発明の実施の形態１の文字認識装置のフローチャートである。
【図３】この発明の実施の形態１の入力画像の例を示す図である。
【図４】この発明の実施の形態１の第１の分割形態を示す図である。
【図５】この発明の実施の形態１の第２の分割形態を示す図である。
【図６】この発明の実施の形態１の第１の分割形態から抽出された黒区画の例を示す図である。
【図７】この発明の実施の形態１の第２の分割形態から抽出された黒区画の例を示す図である。
【図８】この発明の実施の形態１の第１の分割形態から抽出された文字列領域を構成する黒区画の例を示す図である。
【図９】この発明の実施の形態１の第２の分割形態から抽出された文字列領域を構成する黒区画の例を示す図である。
【図１０】この発明の実施の形態１の第１の分割形態から抽出された文字列領域候補の例を示す図である。
【図１１】この発明の実施の形態１の第２の分割形態から抽出された文字列領域候補の例を示す図である。
【図１２】この発明の実施の形態１の文字列切り出し領域の例を示す図である。
【図１３】この発明の実施の形態２の入力画像を撮像するカメラとナンバープレートの位置関係を示す説明図である。
【図１４】この発明の実施の形態２の入力画像の例を示す図である。
【図１５】この発明の実施の形態３の文字認識装置の構成を示すブロック図である。
【図１６】この発明の実施の形態３の入力画像の例を示す図である。
【図１７】この発明の実施の形態３の微分２値画像の例を示す図である。
【図１８】この発明の実施の形態３の第１の分割形態から抽出された黒区画の例を示す図である。
【図１９】この発明の実施の形態３の第２の分割形態から抽出された黒区画の例を示す図である。
【図２０】この発明の実施の形態３の判定領域の説明図である。
【図２１】この発明の実施の形態３の白文字についての判定領域の投影値の分布を示す図である。
【図２２】この発明の実施の形態３の黒文字についての判定領域の投影値の分布を示す図である。
【図２３】この発明の実施の形態４の入力画像の例を示す図である。
【図２４】この発明の実施の形態４の微分２値画像の例を示す図である。
【図２５】この発明の実施の形態４の第１の分割形態から抽出された黒区画の例を示す図である。
【図２６】この発明の実施の形態４の黒区画の併合処理を示す説明図である。
【符号の説明】
１、２領域区画部
３投影区画抽出部
４文字列領域抽出部
５文字切り出し部
６文字認識部
１０１微分画像抽出部[0001]
TECHNICAL FIELD OF THE INVENTION
The present invention relates to a character recognizing device and a character recognizing program for recognizing characters in an image. In particular, when a character string composed of characters of different sizes exists in an image, a character string area is formed from the image. It relates to a technology for efficient extraction.
[0002]
[Prior art]
A conventional character recognition device that recognizes characters in an image estimates a character pattern size, cuts out an area (character string area) in which a character string exists, and extracts a pixel pattern and a character existing in the character string area. It was to match the pattern. In such a character recognition device, a method of estimating the size of a character pattern serving as a basis for the size of a character string region to be cut out is important.
[0003]
As a method of estimating the size of such a character pattern, there is a technique of acquiring a distribution state of pixels constituting a character in a partial region at a specific position in an image and estimating a character size from the distribution state. (For example, Patent Document 1).
[0004]
[Patent Document 1]
JP-A-63-292381, "Character line detection device" (FIG. 1, page 3-5)
[0005]
[Problems to be solved by the invention]
As described above, the conventional character recognition device estimates the character size and line spacing from a partial area of the input image. Therefore, even if information of a reference character is obtained from such an area, it cannot be correctly recognized if a character having a size different from that of the character exists in another area. there were.
[0006]
The present invention has been made to solve such a problem, and a character recognition device that appropriately detects and recognizes a character string even when a plurality of character strings having different sizes exist. The purpose is to provide.
[0007]
[Means for Solving the Problems]
A character recognition device according to the present invention divides an input image into regions in a first division mode suitable for characters of a predetermined size and a region in a second division mode suitable for characters having a size different from the size. Black section extracting means for further dividing, as a black section, a region having a predetermined number of black pixels or more from each region of the first division mode and each region of the second division mode,
A character area extracting unit that merges the black sections to form a character string area that conforms to those division forms, and cuts out a character area from the character string area.
And character recognition means for recognizing a character pattern in the character area.
[0008]
BEST MODE FOR CARRYING OUT THE INVENTION
Hereinafter, embodiments according to the present invention will be described.
Embodiment 1 FIG.
FIG. 1 is a block diagram showing a configuration of a character recognition device according to Embodiment 1 of the present invention. In the figure, a region partitioning unit 1 and a region partitioning unit 2 divide an input image captured by an image input unit such as a camera (not shown) into a plurality of regions. The size of the area divided by the area dividing section 1 is different from the size of the area divided by the area dividing section 2. The projection section extraction unit 3 is configured to extract an area (black section) where a black pixel exists, based on the pixel value of the area in the input image. The character string area extraction unit 4 combines the black sections extracted by the projection section extraction unit 3 to form a character string area candidate that is an area where a character string may exist, and furthermore, the character string area candidate This is a part for determining whether or not the area is a character string area based on the size. The character cutout unit 5 cuts out each character region from the character string region formed by the character string region extraction unit 4. The character recognizing unit 6 is a part for recognizing a region of each character cut out by the character cutting unit 5. Here, the area section 1 and the area section 2 and the projection section extracting section 3 constitute black section extracting means, and the character string area extracting section 4 and the character extracting section 5 constitute character area extracting means. It is. Further, the character recognition unit 6 corresponds to a character recognition unit.
[0009]
Next, the operation of the character recognition device will be described. FIG. 2 is a flowchart showing the operation of the character recognition device. As a premise of this processing, it is assumed that an input image 7 as shown in FIG. 3 has been captured. The input image 7 is a binary image composed of white and black pixels. As shown in the figure, the input image 7 has a pattern 8 which is not a character string and three horizontal writings composed of character strings 9 to 11. String exists.
[0010]
First, the region partitioning unit 1 and the region partitioning unit 2 divide the entire input image 7 into partial regions (step S1). The region partitioning unit 1 divides the input image 7 into partial regions having equal areas. A state in which the input image 7 is divided by the region partitioning unit 1 is referred to as a first division mode. Further, the area partitioning section 2 divides the same input image 7 into partial areas having different areas from the partial areas according to the first division mode and having equal areas to each other. Such a division state by the area partitioning unit 2 is referred to as a second division mode.
[0011]
FIG. 4 is a diagram showing an example of a partial area setting (first division mode) by the area division unit 1, and FIG. 5 is a diagram showing an example of a partial area setting (second division mode) by the area division unit 2. It is. Compared to the second division mode, the first division mode is divided into narrower partial regions.
[0012]
Generally, in order to recognize a character in an image, it is basically necessary to divide an image into a plurality of partial areas and obtain a distribution of pixels for each partial area. In order to perform accurate character recognition, it is necessary to appropriately set the partial area. By the way, in many cases, an object other than a character string and its shadow are captured in the image, or the character string rotates (the character string is captured diagonally with respect to the horizontal or vertical coordinate axes of the image). As a result, noise (pixels that do not constitute a character) is mixed. Therefore, by setting a large partial area as much as possible, the influence of the pixel distribution due to such noise can be relatively reduced. However, when small characters are present in the image, even such small characters may be eliminated as noise.
[0013]
Therefore, the character recognition device according to the first embodiment also performs area division for the purpose of recognizing small characters. This is because the area division for small characters enables character recognition even for a pixel distribution that is excluded as noise in the area division for large characters.
[0014]
The first division mode shown in FIG. 4 indicates a state where a relatively small character is divided into partial regions for the purpose of recognition. In addition, the second division mode shown in FIG. 5 indicates a state where the image is divided into partial areas for the purpose of recognizing a character larger than the character corresponding to the first division mode.
[0015]
Regarding the direction in which the image is divided, when recognizing a horizontally written character string, it is better to divide the input image into vertically long strips. This is because noise is rejected in units of the width of the partial area. On the other hand, in order to recognize a vertically written character string, the input image may be divided into horizontally long strips. If it is not possible to determine whether the image is horizontally or vertically written, the input image is divided into a partial area close to a square. FIGS. 4 and 5 show the input image divided into vertically long strip-shaped partial areas.
[0016]
In addition, the partial areas of the first and second divided forms are determined based on the assumed size of characters in the input image for simplicity of description. This is because, if the image is divided into partial areas having extremely different character sizes and sizes, noise cannot be rejected correctly or a part of the character is detected as being missing. Unlike this example, if the size of the character in the input image cannot be predicted, it is necessary to prepare a division form corresponding to the character size in several stages and an area partition corresponding to the division form. Good. Therefore, naturally, the image may be divided into three or more division forms.
[0017]
Next, the projection section extraction unit 3 calculates a projection value by projecting, for both the first division mode and the second division mode, for each column of pixels in a direction in which the input image is divided ( Step S2). The projection value refers to a sum of pixel values on a row of pixels in a certain direction (horizontal direction or vertical direction) in a certain area. In this example, since the input image is divided in the horizontal direction (so that the partial area forms a vertically elongated strip), the sum of the pixel values of each partial area is calculated for each column of pixels in the horizontal direction.
[0018]
Subsequently, a black section is extracted based on the projection value calculated in step S2 (step S3). Specifically, each projection value is compared with a predetermined threshold value, and is binarized to 1 when the projection value is equal to or more than the predetermined threshold value and to 0 when the projection value is less than the predetermined threshold value. Next, as a binarized projection value, an area where 1s are continuous is a black section, and an area where 0s are continuous is a white section. As a result, FIG. 6 shows an example of a black section extracted from the first division mode. In the narrow black section 14 of the first division mode, the area of the large character string 9 is divided. FIG. 7 shows an example of a black section extracted from the second division mode. In the wide black division 15 of the second division mode, the regions of the character strings 10 and 11 which are small and close to each other are one black division. In addition, the area of the pattern 8 of the input image 7 in FIG. 3 is divided in the black section 14 of the first division mode, but is entirely one section in the black section 15 of the second division mode. The height (height) is almost the same as the character strings 9 and 10.
[0019]
Next, the character string area extracting unit 4 extracts a black section constituting the character string area from each of the black sections of the first division mode and the second division mode extracted by the projection section extraction unit 3 (step S4). ). That is, the following processing is performed. First, for the first division mode, the size of a character (referred to as the first character size) stored in a storage device (not shown) is obtained. Here, the standard height of the character is acquired as the size of the first character. Next, the height of each black section in the first division mode is compared with the standard height of the character acquired as the size of the first character. In this case, for example, the allowable minimum magnification is set to 90% and the allowable maximum magnification is set to 110% as an allowable range, and the height of the black section is 90% or more of the standard height of the character and within 110%. Then, the black section is adopted as a black section constituting the character string area. In the second division mode, the size of the second character is similarly obtained and compared. Since the second division mode is larger than the first division mode, the size of the second character is also set to be larger than the size of the first character.
[0020]
As a result, the black section corresponding to the pattern 8 existing at the lower left of FIGS. 6 and 7 is not selected. The reason is that in the first division mode, these black sections are smaller than the allowable range of the size of the first character, and in the second division mode, these black sections are the first section. , But is outside the allowable range for the size of the second character. In this way, even in the case of an input image in which characters of different sizes are mixed, noise caused by pixels that do not constitute characters is removed, and erroneous detection is prevented.
[0021]
In the above example, in order to detect a horizontally written character string, the height of the black section existing in the vertically divided area was compared with the standard height of the character. On the other hand, when recognizing a vertically written character string, the area is divided in the horizontal direction. In this case, the width of the black section existing in each area may be compared with the standard width of the character. When an input image in which a vertical writing character string and a horizontal writing character string are mixed is divided into partial areas in a square shape in order to be subjected to character recognition, both the height and the width may be compared.
[0022]
As a result, the character string area extraction unit 4 sets black sections falling within the permissible range as character string area candidates (step S5). After that, the character string area extracting unit 4 combines the character string area candidates to form a character string area (step S6). That is, character string region candidates existing in adjacent partial regions and having a difference in vertical coordinate between each other less than or equal to a predetermined threshold are regarded as one character string region. On the other hand, when there is no character string area candidate whose upper end and lower end are close to an adjacent partial area, the character string area candidate is not formed as a character string area. FIG. 10 is an example of a character string area formed from black sections in the first division mode. Character string areas 21 to 22 corresponding to the character strings 10 to 11 in FIG. 3 are formed. FIG. 11 is an example of a character string area formed from black sections in the second division mode, and a character string area 24 corresponding to the character string 9 in FIG. 3 is obtained.
[0023]
The character cutout unit 5 determines a character cutout target area for each of the character string areas extracted by the character string area extraction unit 5, and performs character cutout in the same procedure as in the related art (step S7). The character cutout target area is, for example, a range in which the upper, lower, left, and right sides of the character string area are extended by a predetermined value so that a character pattern protruding from the area does not occur. An example set for the character string area 24 in FIG. 11 is the character extraction target area 25 in FIG. Thereafter, the character recognition unit 6 performs character recognition in the same procedure as in the related art (step S8).
[0024]
In addition, from the state of the image distribution, it is conceivable that the character string region of the second division mode for processing large characters and the character string region of the first division mode for processing small characters overlap. For example, the kanji character "chi" is composed of a partial "arrow" and a side "mouth", but it is also possible to treat only the partial and the side as a single kanji. In such a case, “arrow” and “mouth” are detected from the processing result of the first division mode, and “knowledge” is detected from the processing result of the second division mode. The processing results will be inconsistent.
[0025]
Therefore, in such a case, the calculation result of the second division mode for processing a large character is given priority. As a result, the processing results of the plurality of division modes are integrated. Note that such integration processing is performed by either the character cutout unit 5 or the character recognition unit 6.
[0026]
As is clear from the above, according to the character recognition device of the first embodiment, the size of the character string to be extracted is determined in association with the size of the partial area, and this size is compared with the size of the extracted black section. Then, the character string area is extracted and further divided into partial areas of different sizes, and such processing is performed for each partial area of each size, so that characters of different sizes are included. Even for an input image, erroneous recognition can be prevented and proper character recognition can be performed.
[0027]
Further, since the entire input screen is divided into partial areas, the character string can be appropriately detected and recognized regardless of the display position and size of the character string.
[0028]
In the above processing, the size of the black section of each divided form is compared with the standard size of the character, and only the suitable black section is selected, and then the character string area is formed from the selected black section. . However, besides this method, a method is also conceivable in which first, adjacent black sections are merged to form a character string area candidate, and then whether or not the character string area candidate is a character string area. In this case, as described above, when the character string is written horizontally, a method of selecting a character string area candidate based on the standard height of the character (when the character string is written vertically, the width of the character, In addition to the two methods for mixing writing and horizontal writing, the following character string region candidate selection method may be adopted.
[0029]
That is, when the width of the character string area candidate (when the character string is written horizontally) is divided by the standard width of the character corresponding to this division mode, and the result of the division becomes a value close to an integer value (discrete value), The character string area candidate is determined to be a character string area. If the string is vertical, divide by the standard height of the character. If the characters are mixed, one of the standard height and the standard width of one of the characters is selected and divided. Further, a standard area obtained by multiplying the standard width by the standard height may be used as a reference.
[0030]
In addition, in the present embodiment, in order to detect and recognize black characters, a black section is obtained from the projection value of the input image. However, white characters can be detected and recognized by first inverting the black and white of the input image. .
[0031]
Further, it is naturally possible to realize as a computer program that causes a computer to execute the character recognition function performed by the character recognition device according to the first embodiment. In this case, the functions corresponding to the functions of the respective sections of the area section 1, the area section 2, the projection section extracting section 3, the character string area extracting section 4, the character extracting section 5, and the character recognizing section 6 are executed. What is necessary is just to make it a computer program which executes a computer program sequentially.
[0032]
Embodiment 2 FIG.
In the first embodiment, the entire input image is divided into partial regions by a plurality of division modes. On the other hand, in step S1 of the flowchart in FIG. 2, one input image may be divided by combining a plurality of division forms.
[0033]
For example, as shown in FIG. 13, a camera 26a that captures an input image is installed at the upper end of a column 26b, and when reading a number printed on a license plate 27 or 28 of an automobile or the like, when a camera reads a number printed on a license plate 27 or 28, Is as shown in FIG. In FIG. 14, a character string 30 on the license plate 28 is displayed relatively small above the input image 29. Further, a character string 31 on the license plate 27 is displayed relatively large below the input image 29. As described above, the size of the characters on the license plates 27 and 28 is originally substantially the same size, but the character string 30 on the license plate 27 at a position far from the camera 26a is small above the input image 29, and The character string 31 on the license plate 28 at a position close to the camera 26a is displayed large below the input image 29.
[0034]
In such a case, if, for example, the upper half of the input image 29 is divided into regions by the first division form and the lower half is divided into regions by the second division form, the character string is similar to the first embodiment. Can be divided into character string regions suitable for the size of the characters constituting.
[0035]
As is clear from the above, according to the character recognition device according to the second embodiment, the size of the character string in the input image is determined in advance from the relationship between the imaging position of the camera and the position of the object on which the character string is displayed. When prediction is possible, detection and recognition can be performed by optimally combining the division forms of the input image.
[0036]
Also, the same input image is partially divided by combining a plurality of division modes, as compared with the method of performing the process of dividing the entire input image into partial regions in a single division mode multiple times as in the first embodiment. As a result, it is not necessary to process a large number of pixels, so that the performance is improved and the computer resources can be saved.
[0037]
Needless to say, the method of dividing the input image may be changed according to the positional relationship between the camera position and the object displaying the character string. For example, when capturing a license plate installed in the right direction from a camera installed on the left side, a target area of a large divided form of the partial area is set on the left side of the input image, May be set to the right of the input image.
[0038]
Embodiment 3 FIG.
Next, a character recognition device according to a third embodiment of the present invention will be described. Although the character recognition device according to the first embodiment uses an input image as a binary image composed of white and black pixels, the character recognition device according to the third embodiment uses a multivalued image or a multi-tone image as an input image. Different.
[0039]
FIG. 15 is a block diagram illustrating a configuration of a character recognition device according to the third embodiment. In the figure, a differential image extraction unit 101 is a part that creates a differential image from a multi-layer image. The other components that are denoted by the same reference numerals as those of the character recognition device according to the first embodiment are the same as those in the first embodiment, and thus description thereof is omitted.
[0040]
Next, the operation of the character recognition device according to the third embodiment will be described. FIG. 2 is used for the flowchart showing the processing of this character recognition device as in the first embodiment. First, as in the first embodiment, a multi-level image is captured and captured by an image input unit (not shown). For example, it is assumed that this image is a grayscale image of 8 bits per pixel. FIG. 16 is a diagram showing an example of such an input image. In the input image 36, a white character string 37 and a black character string 38 are mixed. Indicates a column 39 and a column 40.
[0041]
First, as in the first embodiment, in step S1, the area partitioning sections 1 and 2 are divided into areas according to the first division mode and the second division mode (step S1). Subsequently, a projection value of the input image 36 is calculated (step S2). Unlike the character recognition device according to the first embodiment, the input image of the character recognition device is a multi-layer image. Therefore, in this step, first, the differential image extracting unit 101 creates a differential image, and then binarizes the differential image to calculate a projection value. Differentiation of an image is described in, for example, “Introduction to Computer Image Processing” published by Soken Publishing, pp. Although various methods described in 119 to 122 can be used, here, the method by the Sobel operator is used. For the binarization of the differential value, for example, a method of applying a fixed threshold can be used. As a result, as shown in FIG. 17, an image is obtained in which only the outline of the displayed object and character remains. By calculating the projection value from the differential binary image, a black section is extracted not only from a black character but also from a white character string.
[0042]
Note that, in FIG. 17, the boundaries of the divisions divided by the region divisions 1 and 2 are omitted for easy viewing. Alternatively, the processing order of step S1 and step S2 may be reversed, and the area division may be performed after obtaining the differential binary image first.
[0043]
Next, in the same procedure as in the first embodiment, a black section is extracted for each of the first and second division modes based on the projection values calculated in step S2 (step S3). In the case of a differential binary image, since the distribution of black pixels is small in an area with a small change in density, many black sections are detected in a contour portion of a character or an object with a large change in density. FIG. 18 is a diagram illustrating an example of a black section extracted from the first division mode. In the black section 44 of the narrow first division mode, the area of the large character string 37 is divided. FIG. 19 is a diagram showing an example of a black section extracted from the first division mode. In the wide black section 45 of the second division mode, the area of the character string 38 is adjacent to the adjacent columns 39 and 40. Is extremely large black section.
[0044]
Next, character string area candidates that can constitute a character string area are selected from the extracted black sections (steps S4 and S5), and then the character string area candidates are merged to form a character string area (step S6). ). These processes are the same as those in the first embodiment, and a description thereof will not be repeated.
[0045]
Subsequently, the character string cutout unit 5 cuts out a character pattern as in the first embodiment (step S7). However, since the input image in the third embodiment is a multi-layer image, it is first determined whether the characters included in each character string area are black characters or white characters. For this purpose, a determination target region is set based on the position of the character string region, the determination target region in the input image is binarized and projected in the character string direction, and a determination is made based on the projection value.
[0046]
Note that the term “character string direction” means a direction in which characters constituting a character string are arranged (vertical or vertical / horizontal or horizontal), and “projects in the character string direction”. For example, in the case of a horizontally written character string, projection data is calculated for each pixel row in the horizontal direction. Therefore, for example, when a horizontally written character is displayed in a character string area having 20 (vertical) × 128 (horizontal) pixels, there are 20 horizontal pixel rows of 128 pixels having the same vertical coordinates. Will do. In such a case, performing projection in the character string direction means calculating projection data for each of the 20 horizontal pixel rows.
[0047]
The determination area is an area including a character string area. For example, a range in which the character string area is extended by a predetermined amount in a direction perpendicular to the character string direction so that the end of the character is included in the determination area. Area. FIG. 20 is a diagram conceptually showing such a determination area, and shows a determination area 48 for the white character string 37 and a determination area 49 for the black character string 38.
[0048]
The process of determining whether a character displayed in this region is a black character or a white character (black character / white character determination) will be described below taking the determination region 48 as an example. FIG. 21 shows the distribution of the projection values calculated for the determination area 48. In the figure, a projection value distribution 50 indicates the projection values in the horizontal direction of the entire area. 51 is the center position of the character string area 37, and 52 and 53 are the lower and upper limits of the preset projection value, respectively. The lower limit value 52 and the upper limit value 53 are, for example, values obtained by multiplying the length of the determination target area in the character string direction by a predetermined ratio.
[0049]
In this case, starting from the projection value at the center position 51 of the character string area, then upward (to the left in the case of a vertically written character string) and downward (to the right in the case of a vertically written character string) Then, the projection values are sequentially acquired one pixel at a time, and it is checked whether each projection value does not fall below the lower limit value or further exceeds the upper limit value. As a result, for the first time, the position of a pixel below the lower limit or above the upper limit is regarded as the end of the character string. When the value falls below the lower limit first, this character string is regarded as a black character string, and when the value exceeds the upper limit first, this character string is regarded as a white character string.
[0050]
In the example of FIG. 21, the character string exceeds the upper limit value 53 before falling below the lower limit value 52 in both the up and down directions, so that this character string is determined to be a white character string. On the other hand, in the example shown in FIG. 22, when the projection value 54 is referred to from the center of the character string area, it falls below the lower limit 55 before exceeding the upper limit 56, and is determined to be a black character string.
[0051]
After that, the character cutout unit 5 determines a character cutout target area in the same procedure as in the first embodiment, and then binarizes the input image of the area to create a binary image for character cutout. Further, as a result of the black character / white character determination, if the determination result is a white character, an image obtained by inverting the binary image in black and white is used for character extraction. Subsequent processing is the same as in the first embodiment, and a description thereof will not be repeated.
[0052]
As is apparent from the above description, according to the character recognition device of the third embodiment, a multi-layer image is converted into a differential binary image, and then a black section is extracted and compared with the size of the character string. Since the row region is selected, individual characters can be correctly extracted and recognized even from an image in which white and black character strings are mixed without greatly increasing the processing amount.
[0053]
In the above description, after performing the area division, the differential binarization is performed for each of the division forms. However, the differential binarization may be performed before step S3 for extracting the black section. For example, the input image may be binarized differently, and region division may be performed on the image after the binarization.
[0054]
Embodiment 4 FIG.
Next, a character recognition device according to a fourth embodiment will be described. The character recognition device according to the fourth embodiment is characterized by a method of forming a partial area in the second division mode. Also, the method of handling multi-tone images is different from that of the character recognition device according to the third embodiment, and the input image contains a rotated character string.
[0055]
FIG. 15 is used as a block diagram showing the configuration of the character recognition device according to the fourth embodiment. However, in the character recognition device according to the fourth embodiment, the area section 2, the projection section extraction section 3, and the character string area extraction section 4 are different from the third embodiment. The area dividing section 2 is configured to form a partial area according to the second division mode by merging the partial areas according to the first division mode divided by the area dividing section 1. The projection section extraction unit 3 combines the black sections extracted from the partial areas of the first division mode to form black sections of the partial areas of the second division mode. The character string area extraction unit 4 forms a character string area by eliminating the influence of the shift between black sections caused by the rotation of the character string. The other components are the same as those in the third embodiment, and a description thereof will not be repeated.
[0056]
FIG. 23 is an example of an input image on which the character recognition device of the fourth embodiment performs character recognition. In the input image 58 shown in the figure, a character string 59 is displayed in white on a black background. Reference numeral 62 denotes a character or an elliptical figure which is not a character string, and character strings 60 and 61 are character strings in which rotation occurs.
[0057]
Next, the operation of the character recognition device according to the fourth embodiment will be described. The processing in the character recognition device according to the fourth embodiment is shown by the flowchart in FIG. 2 as in the first to third embodiments. First, the area division unit 1 divides the input image 58 into partial areas according to the first division mode, and then the area division unit 2 forms a partial area according to the second division mode based on these partial areas (step S1). S1). That is, first, the area partitioning unit 1 divides the input image 58 into small partial areas. The partial area divided by the area dividing unit 1 is stored in a memory as a first division mode. Next, the area partitioning unit 2 forms a large partial area by merging two adjacent partial areas among the small partial areas. In the first and third embodiments, the area partitioning sections 1 and 2 independently divide the input image into partial areas, but in the fourth embodiment, the first division mode is used to divide the input image into the partial areas. Is different in that
[0058]
In this description, for the sake of simplicity, of the partial regions according to the first division mode, adjacent partial regions are merged two by two to form a partial region according to the second division mode. The method of forming the partial area by the two division forms is not limited to this. For example, a method of merging three adjacent partial regions may be adopted, or a method of merging three adjacent partial regions and then forming a partial region by a method of dividing into two equal parts.
[0059]
Next, a projection value is calculated from the partial area according to the first division mode and the partial area according to the second division mode, and a black section is extracted (steps S2 and S3). Since the input image in the fourth embodiment is a multi-tone image, the differential binarization is performed as in the third embodiment, and then the black section is extracted. However, the fourth embodiment differs from the third embodiment in the following points. That is, after performing differential binarization on the partial area according to the first division mode and further extracting a black section, the black sections are merged to form a black section according to the second division mode. is there.
[0060]
Specifically, the following processing is performed. First, as in the third embodiment, calculation of a differential binary image and a projection value and extraction of a black section are performed for the first division mode. FIG. 24 is an example of the differential binary image obtained here. However, the display of the boundary line between the partial areas is omitted. FIG. 25 is an example of a black section obtained based on the first division mode.
[0061]
Further, if black sections are adjacent to each other in the first divided form that has been merged to form the partial areas in the second divided form, the black sections are merged. This merging process is performed, for example, by any of the following methods.
[0062]
(1) Calculate the minimum rectangle surrounding the black section in the adjacent first division mode, take the ratio of the sum of the areas of the black sections surrounded by this rectangle to the area of this rectangle, and set this ratio to a predetermined value. If the value is equal to or greater than the value, the entirety of the minimum rectangle is set as a black section in the second division mode.
(2) When the length of the boundary line between adjacent black sections in the first division mode is equal to or greater than a predetermined value, the entirety of the smallest rectangle surrounding these black sections is defined as the black section in the second division mode.
[0063]
FIG. 26 is an explanatory diagram showing such a black section merging process. The figure shows how the black sections 111 and 112 according to the first division mode are merged into the black section 113 according to the second division mode. Further, the black section 116 according to the first division mode is adjacent to both the black sections 114 and 115 belonging to the same partial area. In such a case, the smallest rectangle surrounding all of the black sections 114, 115, and 116 becomes one black section 117.
[0064]
On the other hand, the black sections 118 and 119 according to the first division mode are also adjacent to each other. In such a case, the black sections according to the second division mode are formed by any of the methods (1) and (2). Is not formed. As described above, in order to absorb the rotation of the character string, if the deviation of the black section is allowed, the deviation of the black section other than the deviation of the black section caused by the rotation of the character string may be included. However, if adjacent black sections are merged based on the criteria of (1) and (2), such a case can be eliminated.
[0065]
Thus, by combining the black sections of the first division mode to form the black sections of the second division mode, the differential binary image and the projection value are calculated with respect to the second division mode, and the black section is calculated. Since it is not necessary to perform the process of performing the extraction, the process can be performed at high speed.
[0066]
Next, the character string area extracting unit 4 forms a character string area by merging the black sections (step S4). In the fourth embodiment, instead of selecting a black section based on the standard size of a character, a black section is selected based on the pixel density of an input image in each black section area. For example, both black characters and white characters are allowed for the black section in the first division mode, and only white characters are allowed for the partial area in the second division mode.
[0067]
The process of selecting a black section is performed as follows. That is, first, the average value of the maximum pixel value and the minimum pixel value of the input image in the black section area is calculated as the binarization threshold. Next, the number of pixels having a value smaller than the binarization threshold value is compared with the number of pixels having a value equal to or greater than the binarization threshold value. If the latter is large, it is determined to be a white character. If the determination result of the black character / white character matches the character color (black / white) defined in the division mode of the black partition, the black partition is selected. If they do not match, the black section is rejected. The formation of the character string area (step S5) is performed by merging only the selected black sections.
[0068]
In a general character string image area, since there are more background pixels than character pixels, according to the above-described method, black and white characters can be determined by appropriately setting the binarization threshold. In the method described in the third embodiment, since the projection in the character string direction is used, it may not be possible to correctly determine that the rotation angle of the character string is very large. However, in this method, since the grayscale distribution is used, the rotation is not performed. It can be determined without any restrictions on the angle.
[0069]
The processing after step S5 is the same as in the third embodiment, and a description thereof will not be repeated.
[0070]
As is clear from the above, according to the character recognition device of the fourth embodiment, since the second division mode is obtained based on the first division mode, the amount of calculation can be significantly reduced. In addition, since the black sections are merged based on the above criteria (1) and (2), character recognition that is strong against rotation of a character string can be performed.
[0071]
【The invention's effect】
A character recognition device according to the present invention divides an input image into regions in a first divisional form, and also divides the input image into regions in a second divisional form having a size different from the region in the first divisional form. Since a character string area suitable for each of the divided forms is formed from the black sections extracted from the areas of both divided forms, even if a plurality of character strings of different sizes exist, the This has an extremely excellent effect that a character string can be detected and recognized.
[Brief description of the drawings]
FIG. 1 is a block diagram showing a configuration of a character recognition device according to Embodiment 1 of the present invention.
FIG. 2 is a flowchart of the character recognition device according to the first embodiment of the present invention.
FIG. 3 is a diagram illustrating an example of an input image according to the first embodiment of the present invention;
FIG. 4 is a diagram showing a first division mode according to the first embodiment of the present invention.
FIG. 5 is a diagram showing a second division mode according to the first embodiment of the present invention.
FIG. 6 is a diagram showing an example of a black section extracted from the first division mode according to the first embodiment of the present invention.
FIG. 7 is a diagram illustrating an example of a black section extracted from the second division mode according to the first embodiment of the present invention.
FIG. 8 is a diagram showing an example of a black section constituting a character string area extracted from the first division mode according to the first embodiment of the present invention.
FIG. 9 is a diagram showing an example of a black section constituting a character string area extracted from the second division mode according to the first embodiment of the present invention.
FIG. 10 is a diagram showing an example of a character string region candidate extracted from the first division mode according to the first embodiment of the present invention.
FIG. 11 is a diagram showing an example of a character string region candidate extracted from the second division mode according to the first embodiment of the present invention.
FIG. 12 is a diagram showing an example of a character string cutout area according to the first embodiment of the present invention.
FIG. 13 is an explanatory diagram illustrating a positional relationship between a camera that captures an input image and a license plate according to the second embodiment of the present invention.
FIG. 14 is a diagram illustrating an example of an input image according to the second embodiment of the present invention;
FIG. 15 is a block diagram showing a configuration of a character recognition device according to Embodiment 3 of the present invention.
FIG. 16 is a diagram showing an example of an input image according to the third embodiment of the present invention.
FIG. 17 is a diagram showing an example of a differential binary image according to the third embodiment of the present invention.
FIG. 18 is a diagram illustrating an example of a black section extracted from the first division mode according to the third embodiment of the present invention.
FIG. 19 is a diagram showing an example of a black section extracted from the second division mode according to the third embodiment of the present invention.
FIG. 20 is an explanatory diagram of a determination area according to the third embodiment of the present invention.
FIG. 21 is a diagram showing a distribution of projection values of a determination area for white characters according to Embodiment 3 of the present invention.
FIG. 22 is a diagram showing a distribution of projection values of a determination area for black characters according to Embodiment 3 of the present invention.
FIG. 23 is a diagram illustrating an example of an input image according to Embodiment 4 of the present invention;
FIG. 24 is a diagram showing an example of a differential binary image according to the fourth embodiment of the present invention.
FIG. 25 is a diagram illustrating an example of a black section extracted from the first division mode according to the fourth embodiment of the present invention.
FIG. 26 is an explanatory diagram showing a black partition merging process according to the fourth embodiment of the present invention;
[Explanation of symbols]
1, 2 area division
3 Projection section extraction unit
4 Character string area extraction unit
5 Character cutout
6 Character recognition unit
101 Differential image extraction unit

Claims

The input image is divided into regions in a first division mode suitable for characters of a predetermined size, and further divided into regions in a second division mode suitable for characters having a size different from the size. Black section extraction means for extracting, as a black section, a region having a predetermined number of black pixels or more from each region of the form and each region of the second division form;
A character area extracting unit that merges the black sections to form a character string area that conforms to those division forms, and cuts out a character area from the character string area.
A character recognition unit that recognizes a character pattern in the character area.

The black section extracting means may divide a predetermined portion of the input image into regions according to the first division mode, and divide a portion of the input image excluding the predetermined portion into regions according to the second division mode. The character recognition device according to claim 1, wherein

2. The character recognition device according to claim 1, wherein the black section extraction unit merges a plurality of adjacent areas of the first division mode to form an area of the second division mode. 3.

The black section extraction unit extracts the black section from the area of the first division mode, and then merges the plurality of areas of the first division form and the black section adjacent to each other to form the second division mode. 2. The character recognition device according to claim 1, wherein each of the regions and a black section thereof are formed.

2. The black section extracting unit according to claim 1, wherein, instead of the input image, an image obtained by differentiating and binarizing the input image is divided into an area of a first division mode and an area of a second division mode. The character recognition device according to any one of claims 1 to 4.

The black section extracting means may differentiate the area of the first division mode and the area of the second division mode into differential binarization, and extract the black section from the differential binarized area. Item 5. The character recognition device according to any one of Items 1 to 4.

When the size of the black section is within a predetermined range as the size of a character corresponding to the division mode of the black section, the character area extracting unit merges the black sections to form the character string area. The character recognition device according to claim 1, wherein the character recognition device is formed.

The character area extracting means forms the character string area by merging the black sections when a length of a boundary line between the plurality of adjacent black sections is equal to or greater than a predetermined value. Item 7. The character recognition device according to any one of Items 1 to 6.

The character area extracting means merges the black sections and forms the character string area when a ratio of a total area of the plurality of black sections adjacent to each other and an area of a rectangle surrounding the black section is a predetermined value or more. The character recognition device according to any one of claims 1 to 6, wherein

2. The character area extracting unit according to claim 1, wherein the character string area candidate is selected as the character string area when a character string area candidate formed by merging the black sections satisfies a predetermined condition. 7. The character recognition device according to any one of claims 6 to 6.

The character area extracting unit may be configured such that the size of the character string area candidate formed by merging the black sections is close to a value obtained by multiplying the size of the character corresponding to the division mode of the black section by a natural number. The character recognition device according to claim 10, wherein the character string area candidate is selected as the character string area.

The character area extracting unit may be configured to, when a pixel density of a character string area candidate formed by merging the black sections is substantially the same as a pixel density corresponding to the division mode of the black section, determine the character string area candidate. The character recognition apparatus according to claim 10, wherein the character string area is selected.

When the pixel density of the character string area is equal to or less than a predetermined value, the character area extracting means inverts the pixels of the character string area to generate an inverted character string area, and extracts the character area from the inverted character string area. The character recognition device according to claim 1, wherein the character is cut out.

The character area extracting means calculates projection data in the character string direction for the binary image of the character string area, and calculates the projection data for each pixel string in the character string direction from the center position of the character string area. The data is sequentially acquired toward the end of the character string area and compared with a predetermined upper limit value and a lower limit value. When the projection data becomes equal to or more than a predetermined upper limit value before becoming equal to or less than a predetermined lower limit value, the inversion is performed. 13. The character recognition device according to claim 12, wherein a character string area is generated.

The character area extracting means calculates an average value of the maximum density and the minimum density of the pixels in the character string area, and calculates the number of pixels having a density lower than the average value and the pixels having a density equal to or higher than the average value in the character string area. 13. The character recognition device according to claim 12, wherein whether or not to generate the inverted character string area is determined based on a magnitude relationship with a number.

The input image is divided into regions in a first division mode suitable for characters of a predetermined size, and further divided into regions in a second division mode suitable for characters having a size different from the size. A black section extraction procedure for extracting, as a black section, an area having a predetermined number of black pixels or more from each area of the form and each area of the second division form;
A character area extraction procedure for merging the black sections to form a character string area conforming to the division mode, and for extracting a character area from the character string area,
And a character recognition procedure for recognizing a character pattern in the character area.