JP3755674B2

JP3755674B2 - Image processing apparatus and method

Info

Publication number: JP3755674B2
Application number: JP25038495A
Authority: JP
Inventors: 清信小島
Original assignee: Sony Corp
Current assignee: Sony Corp
Priority date: 1995-09-28
Filing date: 1995-09-28
Publication date: 2006-03-15
Anticipated expiration: 2015-09-28
Also published as: JPH0991462A

Description

【０００１】
【発明の属する技術分野】
本発明は、画像処理装置および方法に関し、特にビットマップデータで表される文字を認識し、テキストデータに変換する場合に用いて好適な画像処理装置および方法に関する。
【０００２】
【従来の技術】
図１６は、従来のディスプレイにおける表示例を示している。この例においては、ディスプレイ１にウインドウ２−１が設けられ、そこにイメージデータ（ビットマップデータ）で表される横書きの文字が表示されている。この文字は、例えば紙などにプリンタにより印刷されたものを、図示せぬＯＣＲ（ＯｐｔｉｃａｌＣｈａｒａｃｔｅｒＲｅａｄｅｒ）装置などで読み取り、これを表示したものである。従って、そこに表示されている文字を他の文字に変更するなどの編集を行うことができない。
【０００３】
このような編集を行うことができるようにするには、イメージデータを文字認識し、キャラクタデータに変換する必要がある。
【０００４】
従来、このような変換処理を行うのに、次のようにしていた。すなわち、最初に、変換すべき範囲の左上の点Ｐ₁₁と右下の点Ｐ₁₂をマウスで指定する。このとき、この点Ｐ₁₁とＰ₁₂で指定される矩形領域の範囲のイメージデータが、変換すべき範囲とされる。
【０００５】
このような指定を行うと、次に、ウインドウ２−３に示されているように、このイメージデータを識別すべき方向（文字が連続する方向）が表示される。この表示例では、ウインドウ２−３に、「縦向き」と「横向き」の文字が表示され、使用者は、文字認識を行うべき方向（文字が連続する方向）を、このウインドウ２−３のいずれかの領域をカーソルで指定するなどして選択する。この例の場合、「いろはにほへと…」と文字が横書きされているため、「横向き」の領域を指定する。この指定を行うと、指定された領域内のイメージデータが文字認識される。
【０００６】
次に、使用者は、指定した範囲のイメージデータを認識した結果得られた文字を表示する位置を、ウインドウ２−２上の点Ｐ₁₃として指定する。この指定を行うと、認識された結果得られたテキストデータに対応する文字が、ウインドウ２−２の点Ｐ₁₃を左上の点とする領域に表示される。
【０００７】
以上の例は、文字が横書きされていた場合の例であるが、文字が縦書きされている（縦方向に連続している）場合も、同様の処理が行われる。すなわち、図１７に示すように、ウインドウ２−１に縦書きにイメージデータによる文字が表示されているとき、図１６における場合と同様に、左上の点Ｐ₁₁と右下の点Ｐ₁₂を指定することで、イメージデータを文字認識する領域を指定する。
【０００８】
上述した場合と同様に、このような指定を行うと、次にウインドウ２−３に文字の連続する方向が表示されるので、この方向を指定する。いまの場合、文字は縦書きされているため、「縦向き」が選択される。
【０００９】
そして、さらに、ウインドウ２−２上の点Ｐ₁₃をコピー先の点として指定すると、そこにイメージデータを文字認識した結果得られたキャラクタデータに対応する文字が表示される。
【００１０】
このように、ウインドウ２−２に表示された文字は、キャラクタデータに対応するものであるため、所定の文字を他の文字に変更したりする編集処理が可能となる。
【００１１】
【発明が解決しようとする課題】
しかしながら、従来の装置においては、上記したように、文字が連続する方向（文字認識する方向）を表示して、表示した方向の中から所定の方向を選択するようにしているため、文字認識処理を実行させるのに必要な操作の回数が多く、操作性が悪い課題があった。
【００１２】
本発明はこのような状況に鑑みてなされたものであり、操作回数を減らし、操作性を向上させるようにしたものである。
【００１３】
【課題を解決するための手段】
請求項１に記載の画像処理装置は、ドラッグの開始点である第１の点と、ドラッグの終了点である第２の点とを頂点とする矩形の範囲であって、第１の点および第２の点を結ぶ線が対角線をなす矩形の範囲を指定する指定手段と、第１の点の座標と第２の点の座標の関係を判定する判定手段と、判定手段の判定結果に対応して、矩形の範囲の画像から文字を認識する処理を行うとともに、文字の連続する方向を決定する処理手段とを備えることを特徴とする。
【００１４】
請求項３に記載の画像処理方法は、ドラッグの開始点である第１の点と、ドラッグの終了点である第２の点とを頂点とする矩形の範囲であって、第１の点および第２の点を結ぶ線が対角線をなす矩形の範囲を指定し、第１の点の座標と第２の点の座標の関係を判定し、判定結果に対応して、矩形の範囲の画像から文字を認識する処理を行うとともに、文字の連続する方向を決定することを特徴とする。
【００１５】
請求項１に記載の画像処理装置においては、指定手段が、ドラッグの開始点である第１の点と、ドラッグの終了点である第２の点とを頂点とする矩形の範囲であって、第１の点および第２の点を結ぶ線が対角線をなす矩形の範囲を指定し、判定手段が、第１の点の座標と第２の点の座標の関係を判定し、処理手段が、判定手段の判定結果に対応して、矩形の範囲の画像から文字を認識する処理を行うとともに、文字の連続する方向を決定する。
【００１６】
請求項３に記載の画像処理方法においては、ドラッグの開始点である第１の点と、ドラッグの終了点である第２の点とを頂点とする矩形の範囲であって、第１の点および第２の点を結ぶ線が対角線をなす矩形の範囲を指定し、第１の点の座標と第２の点の座標の関係を判定し、判定結果に対応して、矩形の範囲の画像から文字を認識する処理を行うとともに、文字の連続する方向を決定する。
【００１７】
【発明の実施の形態】
図１は、本発明の画像処理装置が接続されるネットワークの構成例を表している。同図に示すように、コンピュータのための国際的なネットワークとしてのインターネット（サービスマーク）には多くのサーバとプロバイダが接続されており、サーバはユーザに各種の情報、サービスを提供し、プロバイダは、ユーザをインターネットにアクセスさせるサービスを提供する。
【００１８】
図２は、本発明の画像処理装置の構成例を示すブロック図である。この実施例においては、ネットワークインタフェース（Ｉ／Ｆ）２３が、インターネット、その他のネットワークから供給されるデータを受信し、文書データ格納部１８に供給し、記憶させるようになされている。この文書データ格納部１８は、ハードディスク、光ディスク、光磁気ディスクなどの他、固体メモリなどにより構成することができる。また、文書データ格納部１８に格納されるデータ構造は、イメージデータ、ＭＭＲ（ｍｏｄｉｆｉｅｄｍｏｄｉｆｉｅｄＲＥＡ）やＭＨ（ｍｏｄｉｆｉｅｄＨｕｆｆｍａｎ）などにより圧縮されたイメージデータ、テキストデータ、ＤＴＰなどで用いられるＰｏｓｔｓｃｒｉｐｔなどのページ記述言語などとすることができる。
【００１９】
イメージ展開処理部１９は、ＣＰＵ１１からの指令に対応して、文書データ格納部１８に記憶されているデータを、データ構造に対応してビットマップなどのイメージデータに展開し、メインメモリ１２に出力するようになされている。データ構造が、例えばファクシミリなどで用いられているＭＭＲやＭＨなどにより圧縮されているイメージデータである場合においては、イメージ展開処理部１９は伸長処理を行う。また、Ｐｏｓｔｓｃｒｉｐｔなどのページ記述言語であれば、フォントを展開しページ割り付けを行うラスタイメージ展開処理を行う。
【００２０】
メインメモリ１２に記憶されたデータは、イメージデータ転送部２０またはイメージデータ圧縮転送部２１を介して、表示バッファ１３に供給され、記憶されるようになされている。基本的には、イメージデータ転送部２０は、メインメモリ１２に記憶されたデータをそのまま表示バッファ１３に転送し、イメージデータ圧縮転送部２１は、メインメモリ１２に記憶されている画像を圧縮して、表示バッファ１３に供給し、記憶させる。
【００２１】
イメージデータ圧縮転送部２１は、数行おきにデータを間引きながら転送する処理や、行間で論理ＯＲなどの演算をしながら行数を減らす処理によって圧縮処理を行う。あるいはまた、イメージデータのドットの数を計数し、その数に対応して、圧縮処理を行うようにする。
【００２２】
また、イメージデータ圧縮転送部２１とイメージデータ転送部２０は、メインメモリ１２から読み出したデータを表示バッファ１３に転送するとき、２値のイメージデータを多値化することにより、比較的解像度の低いディスプレイにおいても、細かい文字をつぶさないで、表示できるようにしている。ただし、多値化解像度変換処理には時間がかかるため、例えば特開平４−３３７８００号公報に開示されているように、先に粗い画像をまず表示し、そのデータを多値化されたデータに、後で順次置き換えて行くようにする。これにより、反応の速さときれいな表示の要求を両方満足することができる。
【００２３】
また、領域コピー処理部２２は、表示バッファ１３に記憶されている画像データの一部を、表示バッファ１３の他の領域にコピー（移動）する処理を実行する。
【００２４】
ビデオ信号発生部１４は、表示バッファ１３に記憶されている画像データを読み出し、ビデオ信号に変換し、ディスプレイ１５に出力し、表示させるようになされている。
【００２５】
ＯＣＲ（ＯｐｔｉｃａｌＣｈａｒａｃｔｅｒＲｅｃｏｇｎｉｔｉｏｎ）エンジン２４は、ＣＰＵ１１の制御の下、イメージデータ（ビットマップデータ）の文字を認識し、ＪＩＳコードなどのテキストデータに変換する処理を実行する。
【００２６】
キーボード１７は、少なくともカーソルキー１７Ａを有し、ＣＰＵ１１に対して各種の指令を入力するとき、使用者により操作されるようになされている。また、マウスなどのポインティングデバイス１６は、ディスプレイ１５に表示されたカーソルを用いて所定の位置を指定するような場合に、使用者によって操作される。
【００２７】
次に、図２の実施例の動作について説明する。キーボード１７を操作し、ＣＰＵ１１に、例えばインターネットに対するアクセスの開始を指令すると、ＣＰＵ１１はディスプレイ１５に、例えば図３に示すようなメニュー画面を表示させる。このメニュー画面には、インターネットに接続されている各種のサーバにアクセスするためのアイコンが表示されている。
【００２８】
使用者が、例えば「Ｆａｘｉｎ」のアイコン３１をカーソルで指定、選択すると、ＣＰＵ１１は、ネットワークインタフェース２３を制御し、インターネットに接続されている、そのアイコンに対応するサーバにアクセスさせる。このサーバは、新聞、雑誌などの切抜きをＯＣＲ（ｏｐｔｉｃａｌｃｈａｒａｃｔｅｒｒｅａｄｅｒ）により読み取り、イメージデータ（ビットマップデータ）として記憶しており、そのデータを提供するサービス（Ｆａｘｉｎサービス）を行っている。
【００２９】
ネットワークインタフェース２３は、インターネットを介してアクセスしたそのサーバから供給されたデータを文書データ格納部１８に供給し、記憶させる。また、このデータの一部は、そのままイメージ展開処理部１９に供給され、伸長処理などが施され、ビットマップデータに変換され、メインメモリ１２に供給され、記憶される。
【００３０】
メインメモリ１２に記憶されたデータは、イメージデータ転送部２０を介して表示バッファ１３に供給され、そこに書き込まれる。表示バッファ１３に書き込まれたデータは、ビデオ信号発生部１４に供給されビデオ信号に変換され、ディスプレイ１５に供給され、表示される。このようにして、ディスプレイ１５に、例えばアクセスしたサーバの図４に示すようなホームページが最初に表示される。
【００３１】
そして、使用者は、このホームページを見ながらポインティングデバイス１６やキーボード１７を操作して、例えば使用者が新聞の切抜きのファイル８１−２の選択を指令すると、そのファイルのデータがまだ文書データ格納部１８に格納されていないとき、ＣＰＵ１１は、ネットワークインタフェース２３を介して、サーバにデータの転送を要求する。サーバがこの要求に対応してデータを転送すると、このデータは、ネットワークインタフェース２３を介して文書データ格納部１８に供給され、記憶される。
【００３２】
次に、ＣＰＵ１１は、文書データ格納部１８に記憶されたファイルのデータ（文書データ）を読み出させ、イメージ展開処理部１９によりビットマップデータに変換させた後、メインメモリ１２に供給させ、記憶させる。そして、このデータが、イメージデータ転送部２０またはイメージデータ圧縮転送部２１を介して表示バッファ１３に供給され、記憶される。表示バッファ１３に書き込まれた１枚（１ページ）の画像データは、ビデオ信号発生部１４に供給され、ビデオ信号に変換され、ディスプレイ１５に出力され、表示される。
【００３３】
次に、１枚の画像を表示する原理について、図５を参照して説明する。今、ディスプレイ１５にウインドウ４１が表示されており、このウインドウ４１に文書データ格納部１８より読み出された１枚（１ページ）のＡ４の大きさの新聞記事の切り抜きの画像を表示させるものとする。メインメモリ１２に記憶された１枚の画像のイメージデータ４２が、図５に示すように、幅Ｗと高さＨを有するものとする。
【００３４】
これに対して、ウインドウ４１は、その幅がｗ、高さがｈであり、イメージデータ４２の幅Ｗと高さＨが、ウインドウ４１の幅ｗと高さｈより大きいものとする。この場合、イメージデータ４２をウインドウ４１に、その全部をそのまま表示することはできない。そこで、この実施例においては、例えばイメージデータ４２の幅Ｗを、ウインドウ４１の幅ｗに合わせる（調整する）処理が行われる。すなわち、イメージデータ４２は、その幅が全体的に、ｗ／Ｗの圧縮率で圧縮される。
【００３５】
さらにまた、このようにして、幅方向に全体的にｗ／Ｗに圧縮されたイメージデータ５２が、次のようにして高さ方向に圧縮される。すなわち、ウインドウ４１の高さｈは、イメージデータ５２の高さＨより小さいため、ウインドウ４１の高さｈの、例えば７０％の高さａ₂の領域Ａ₂と、その上部の高さａ₁の領域Ａ₁、およびその下部の高さａ₃の領域Ａ₃とに、ウインドウ４１が区分される。この区分に対応して、イメージデータ５０にも、高さｒ₂（＝ａ₂）の領域Ｒ₂と、その上部の高さｒ₁の領域Ｒ₁、およびその下部の高さｒ₃の領域Ｒ₃とに区分される。
【００３６】
そして、イメージデータ５２の領域Ｒ₂のデータは、ウインドウ４１の領域Ａ₂に、そのまま（圧縮せずに）転送、表示される。これに対して、領域Ｒ₁のデータは、領域Ａ₁に、縦方向に圧縮されて転送、表示され、また領域Ｒ₃のデータは、領域Ａ₃に、縦方向に圧縮されて転送、表示される。領域Ａ₂の高さａ₂は、ウインドウ４１の高さｈの７０％の値とされ、イメージデータ５２の領域Ｒ₂の高さｒ₂は、ａ₂と同一の値とされているので、領域Ａ₂は、文字が正しい比率（縦方向と横方向の比率）で表示される標準部とされるのに対して、領域Ａ₁とＡ₃は、文字が縦方向に圧縮されて表示される圧縮部とされる。
【００３７】
標準部の領域Ａ₂の位置は、カーソルで移動させることができるようになされている。図６と図７は、この関係を表している。すなわち、図６に示すように、表示バッファ１３（従ってウインドウ４１）のカーソル６１の位置を中心として、上方向にＫまでの範囲と、下方向にＫまでの範囲が、標準部の領域Ａ₂とされ、その上部と下部の領域がＡ₁またはＡ₃とされる。従って、例えば、図６に示す状態から、カーソル６１を下方に移動させると、図７に示すように、標準部の領域Ａ₂は、図６における位置より下方に移動する。その結果、領域Ａ₁の範囲は、図７における場合の方が図６における場合より拡大し、また、領域Ａ₃の範囲は、図６における場合より図７における場合の方が狭くなる。
【００３８】
図８は、以上のような原理に従って、ファイル８１−２を指定して、所定のページをウインドウ４１に表示した例を表している。この表示例においては、新聞の切り抜きをＯＣＲで読み取り、イメージデータとして取り込んだ画像が、その中央部では、縦方向と横方向の比が同一とされる標準部として表示され、その上下の所定の領域が、縦方向に圧縮した圧縮部として表示されている。
【００３９】
また、図９は、標準部をウインドウ４１の上端まで移動させた状態を表している。従って、この表示例においては、圧縮部は、ウインドウの下部にのみ表示されている。図８と図９のウインドウ４１の右下には、ファイルを選択するための各種のコントロールボタン（アイコン）９１が表示されている。
【００４０】
図１０は、このコントロールボタン９１の内容を理解するために、右端のコントロールボタン（ヘルプボタン）を選択した場合に、ＣＰＵ１１がディスプレイ１５に表示させるヘルプ画面の表示例を示している。この表示例を参照して、各コントロールボタンについて、以下に説明する。
【００４１】
同図に示すように、この表示例においては、コントロールボタン９１の解説、マウスのボタンの解説、およびコピー方向の解説が表示されている。
【００４２】
コントロールボタン９１のうち、左端のコントロールボタン９１−１は、例えばこのヘルプ画面から戻るとき操作される。その右隣のコントロールボタン９１−２は、ウインドウ４１に表示されている画像を印刷するとき操作する。さらに、その右隣のコントロールボタン９１−３と９１−４は、ウインドウ４１に表示されている画像を反時計方向または時計方向にそれぞれ回転表示させる場合に操作される。
【００４３】
さらに、その右隣のコントロールボタン９１−５は、ウインドウ４１に表示されているファイルを前のファイルに変更するとき操作され、コントロールボタン９１−６は、いまウインドウ４１に表示されているファイルのページを１ページだけ前のページに戻すとき操作される。
【００４４】
同様に、その右隣のコントロールボタン９１−７と９１−８は、ウインドウ４１に表示されているファイルのページを次のページにするとき、または、ウインドウ４１に表示されているファイルを次のファイルに変更するとき操作される。
【００４５】
従って、例えば、ウインドウ４１に図４におけるファイル８１−１が表示されている状態において、コントロールボタン９１−８が操作されると、ウインドウ４１には、次のファイル８１−２が表示され、ファイル８１−２が表示されている状態において、コントロールボタン９１−５が操作されると、前のファイル８１−１が表示される。また、例えばファイル８１−２の所定のページが表示されている状態において、コントロールボタン９１−７を操作すると、ファイル８１−２のその次のページが表示され、コントロールボタン９１−６が操作されると、ファイル８１−２のその前のページが表示される。
【００４６】
さらに、右端のコントロールボタン９１−９は、図１０に示されているようなヘルプ画面を表示させるとき操作する。
【００４７】
なお、これらのコントロールボタン９１−１乃至９１−９と同様の機能が、キーボード１７のアルファベットキーｑ，ｗ，ｉ，ｒ，Ｐ，ｐ，ｎ，Ｎ，ｈに、それぞれ割り当てられてる。また、コントロールボタン９１−６と９１−７の機能は、カーソルキー１７Ａのうち、左方向のカーソルキーと右方向のカーソルキーにも割り当てられている。さらに、コントロールボタン９１−９の機能は、キーボード１７のヘルプキーにも割り当てられている。
【００４８】
マウスのボタンの解説の欄には、マウスの操作方法が解説されている。すなわち、この実施例においては、ポインティングデバイス１６を構成するマウス１００は、図１１に示すように、ボタン１０１乃至ボタン１０３の３つのボタンを有している。このうち、右端のボタン１０３は、ズームモードのボタンとされ、このボタン１０３を操作すると、所定の範囲の文字が、図１２に示すように、拡大されて表示される。このとき、拡大領域の背景には、拡大領域を表示しない場合における状態の文字が薄く表示される。そして、このボタン１０３とその左隣（中央）のボタン１０２を同時に操作すると、拡大領域における拡大率が大きくなり（大きな文字が表示され）、ボタン１０３と最も左側のボタン１０１とを同時に操作すると、拡大領域における拡大率が小さくなる（小さい文字が表示される）ようになされている。
【００４９】
すなわち、ボタン１０３と１０２によりズームイン動作が行われ、ボタン１０３とボタン１０１によりズームアウト動作が行われるようになされている。
【００５０】
さらにまた、マウス１００のボタン１０１を操作すると、ＯＣＲで読み取ったイメージデータを、テキストデータに変換して所定の位置にコピーすることができるようになされている。そして、その場合におけるコピーの方向の説明がその左側に表示されている。
【００５１】
すなわち、この実施例においては、縦書きの文字をコピーするとき、右端の点からドラッグして、左端の点を指定するようにし、横書きの文字をコピーするとき、左上の点からドラッグして、右下の点を指定するようになされている。すなわち、ドラッグする方向を変えるだけで、文字の連続する方向を実質的に指定することができるようになされている。
【００５２】
次に、図１３のフローチャートを参照して、この実施例におけるＯＣＲ（ＯｐｔｉｃａｌＣｈａｒａｃｔｅｒＲｅｃｏｇｎｉｔｉｏｎ）機能の詳細について説明する。
【００５３】
いま、例えば図１４に示すように、ウインドウ４１−１に、イメージデータで表された文字が表示されているものとする。この文字は、例えば紙などに印刷した文字をＯＣＲで読み取って、イメージデータに変換したものであり、上述したように、具体的には新聞記事の切抜きのイメージデータなどである。この表示例においては、「いろはにほへと…けふこえて」の横書きの文字が表示されている。このような状態において、ウインドウ４１−１に表示されているイメージデータの文字をテキストデータの文字に変換してウインドウ４１−２にコピーするものとする。
【００５４】
最初に、ステップＳ１において、使用者は、変換する範囲の左上の点Ｐ₁を指定する。この指定は、マウス１００を操作して、カーソルを点Ｐ₁の位置に移動させ、その位置でボタン１０１をクリックすることで行われる。このとき、ＣＰＵ１１は、指定された点Ｐ₁の座標を（ｘ₁，ｙ₁）として記憶する。なお、この実施例においても、ディスプレイ１５（またはウインドウ４１）上で、原点は左上の点とされ、右方向にｘ座標が、下方向にｙ座標が取られている。
【００５５】
次に、使用者は、マウス１００のボタン１０１を、点Ｐ₁で押圧したままドラッグし、変換する範囲の右下の点Ｐ₂の位置まで移動させ、その位置でドラッグを解除する。
【００５６】
ＣＰＵ１１は、ステップＳ２で、ドラッグが終了するまで待機し、ドラッグの終了がマウス１００（ポインティングデバイス１６）から入力されたとき、ステップＳ３において、その点Ｐ₂の座標を（ｘ₂，ｙ₂）として記憶する。
【００５７】
次に、ステップＳ４において、点Ｐ₂のｙ座標ｙ₂と点Ｐ₁のｙ座標ｙ₁の大きさが判定される。図１４に示す状態においては、文字が横方向に連続している。この場合、文字は、左から右方向に連続し、その行の右端に達したとき、下の行に移動し、再び左端から右端に向かって文字が連続するように文字が記載される。すなわち、文字は左上から右下方向に連続する。このように、文字が横書きされている場合、使用者は、変換する範囲を指定するとき、最初に左上の点Ｐ₁を指定し、次に右下の点Ｐ₂を指定する。その結果、ｙ₂はｙ₁より大きくなっている。
【００５８】
そこで次に、ステップＳ５に進み、点Ｐ₁の座標ｘ₁と点Ｐ₂の座標ｘ₂の大きさが比較される。文字が横書きされている場合、点Ｐ₁は点Ｐ₂より左側に位置しているため、ｘ₁はｘ₂より小さくなっている。そこで、この場合、ステップＳ８に進み、ＣＰＵ１１は、ＯＣＲエンジン２４を制御し、点Ｐ₁と点Ｐ₂で指定される矩形領域の内部のイメージデータを、横書き文字として認識し、ＪＩＳコードなどのテキストデータに変換する処理を実行させる。いまの場合、「りぬるをわたれそつねむういのお」のイメージデータの文字が文字認識されることになる。
【００５９】
次に、使用者は、認識した結果得られたテキストデータに対応する文字をコピーする領域の左上の点Ｐ₃をマウス１００のボタン１０１を操作することで指定する。図１４の実施例においては、ウインドウ４１−２の座標ｘ₃，ｙ₃の点Ｐ₃が、コピー領域の左上の点として指定されている。
【００６０】
ステップＳ１０においては、このコピー先の指定が行われるまで待機する。そして、点Ｐ₃が指定されたとき、ＣＰＵ１１は、ステップＳ１１に進み、ステップＳ８でＯＣＲエンジン２４により認識されたテキストデータに対応する文字を、点Ｐ₃で指定される領域に表示させる。このウインドウ４１−２の点Ｐ₃で規定される範囲に表示される文字は、テキストデータに対応する文字であるため、使用者が、任意にこれを変更したり、消去するなどの編集操作が可能である。
【００６１】
一方、図１５に示すように、ウインドウ４１−１に縦書きの文字が表示されているとき、文字は、右端の行の上から下に連続し、その行の下端に達すると、左隣の行に移り、その行の最上端から下に向かって連続する。すなわち、文字は、右上から左下方向に連続することになる。
【００６２】
このように、文字が縦書きで表示されている場合、使用者は、変換する範囲を指定するとき、その右上の点と左下の点を指定する。すなわち、点Ｐ₁が右上の点となり、点Ｐ₂が左下の点となる。このため、ｘ₁はｘ₂より大きくなる。従って、ステップＳ５において、ＮＯの判定が行われ、ｘ₁がｘ₂より大きいか否かを判定するステップＳ６において、ＹＥＳの判定が行われる。そこで、ステップＳ９に進み、ＣＰＵ１１はＯＣＲエンジン２４に、点Ｐ₁と点Ｐ₂で規定される矩形の範囲のイメージデータを、縦書きの文字として認識させる。
【００６３】
縦書きの文字をコピーする場合、使用者は、コピー先の点として、コピー領域の右上の点を、点Ｐ₃として指定する。ステップＳ１０において、この点Ｐ₃の入力が検知されたと判定されたとき、ステップＳ１１に進み、ステップＳ９で認識したテキストデータに対応する文字が、点Ｐ₃で規定されるコピー領域に表示される。図１５の実施例の場合、「りぬるをわたれそつねむういのお」の文字が縦書きで表示される。
【００６４】
以上の実施例においては、平仮名を認識処理するようにしたが、これに限らず、漢字、アルファベット文字などを認識させるようにすることも可能である。
【００６５】
このように、範囲を指定する方法を、文字が横書きされている場合と縦書きされている場合とで異ならせることで、文字の連続する方向を新たに指定する必要がなくなるため、操作性が改善される。また、左上から右下に向けて連続する横書きの文字の領域は、左上の点と右下の点とにより指定させ、右上から左下に向けて文字が連続する縦書きの文字の領域は、右上の点と左下の点とにより指定させるようにしたので、文字の連続する方向と指定する点の方向とが対応しており、極めて自然な操作で範囲を指定することが可能となる。
【００６６】
以上の実施例においては、文字認識する場合における処理を、範囲を指定する方法により変更するようにしたが、文字認識以外の処理を実行する場合においても、本発明は適用することが可能である。
【００６７】
【発明の効果】
以上の如く、請求項１に記載の画像処理装置および請求項３に記載の画像処理方法によれば、範囲を指定する第１の点と第２の点の座標の関係に対応して、その範囲の画像を処理するようにしたので、操作性が改善される。
【図面の簡単な説明】
【図１】本発明の画像処理装置が接続されるネットワークを説明する図である。
【図２】本発明の画像処理装置の構成例を示すブロック図である。
【図３】図２のディスプレイ１５におけるメニューの表示例を示す図である
【図４】図２のディスプレイ１５におけるホームページの表示例を示す図である
【図５】図２の実施例におけるウインドウ内に１枚の画像を表示する原理を説明する図である。
【図６】カーソルと表示内容の関係を説明する図である。
【図７】カーソルと表示内容の関係を説明する図である。
【図８】ウインドウ内における表示例を示す図である。
【図９】ウインドウ内における他の表示例を示す図である。
【図１０】ヘルプ画面の表示例を示す図である。
【図１１】マウスの構成を示す図である。
【図１２】ウインドウ内における拡大表示の例を示す図である。
【図１３】図２の実施例におけるＯＣＲ機能の処理を説明するフローチャートである。
【図１４】図１３のステップＳ８における表示例を示す図である。
【図１５】図１３のステップＳ９における表示例を示す図である。
【図１６】従来のＯＣＲ機能を説明する図である。
【図１７】従来のＯＣＲ機能を説明する他の図である。
【符号の説明】
１１ＣＰＵ
１２メインメモリ
１３表示バッファ
１４ビデオ信号発生部
１５ディスプレイ
１６ポインティングデバイス
１７キーボード
１７Ａカーソルキー
１８文書データ格納部
１９イメージ展開処理部
２０イメージデータ転送部
２１イメージデータ圧縮転送部
２３ネットワークインタフェース
２４ＯＣＲエンジン[0001]
BACKGROUND OF THE INVENTION
The present invention relates to an image processing apparatus and method, and more particularly, to an image processing apparatus and method suitable for use when recognizing characters represented by bitmap data and converting them into text data.
[0002]
[Prior art]
FIG. 16 shows a display example on a conventional display. In this example, a window 2-1 is provided on the display 1, and horizontally written characters represented by image data (bitmap data) are displayed there. This character is, for example, a character printed on a paper or the like by a printer (not shown) OCR (Optical Character Reader) device or the like and displayed. Therefore, editing such as changing the character displayed there to another character cannot be performed.
[0003]
In order to be able to perform such editing, it is necessary to recognize image data and convert it to character data.
[0004]
Conventionally, such conversion processing has been performed as follows. That is, first, the upper left point P of the range to be converted₁₁And lower right point P₁₂Is specified with the mouse. At this time, this point P₁₁And P₁₂The image data in the range of the rectangular area specified by is set as the range to be converted.
[0005]
When such designation is made, next, as shown in the window 2-3, the direction (direction in which the characters are continuous) for identifying the image data is displayed. In this display example, the characters “vertical” and “horizontal” are displayed in the window 2-3, and the user selects the direction in which the characters should be recognized (the direction in which the characters are continuous) in the window 2-3. Select one of the areas by specifying it with the cursor. In the case of this example, since the characters “Iroha niho hito…” are written horizontally, the “landscape” area is designated. When this designation is made, the image data in the designated area is recognized as characters.
[0006]
Next, the user sets the position for displaying the character obtained as a result of recognizing the image data in the specified range, to the point P on the window 2-2.₁₃Specify as. When this designation is made, the character corresponding to the text data obtained as a result of recognition is displayed as a point P in the window 2-2.₁₃Is displayed in the area with the upper left point.
[0007]
The above example is an example when the characters are written horizontally, but the same processing is performed when the characters are written vertically (continuous in the vertical direction). That is, as shown in FIG. 17, when characters by image data are displayed vertically in the window 2-1, the upper left point P is displayed as in FIG.₁₁And lower right point P₁₂By designating, an area for character recognition of image data is designated.
[0008]
As in the case described above, if such designation is performed, the direction in which the characters continue is displayed next in the window 2-3, so this direction is designated. In this case, since the characters are written vertically, “vertical” is selected.
[0009]
Further, the point P on the window 2-2₁₃Is designated as a copy destination point, a character corresponding to the character data obtained as a result of character recognition of the image data is displayed there.
[0010]
Thus, since the character displayed on the window 2-2 corresponds to the character data, an editing process for changing a predetermined character to another character can be performed.
[0011]
[Problems to be solved by the invention]
However, in the conventional apparatus, as described above, the direction in which characters continue (character recognition direction) is displayed, and a predetermined direction is selected from the displayed directions. The number of operations required to execute the operation is large, and there is a problem that operability is poor.
[0012]
The present invention has been made in view of such circumstances, and is intended to reduce the number of operations and improve operability.
[0013]
[Means for Solving the Problems]
  The image processing apparatus according to claim 1,A rectangular range having the first point as the drag start point and the second point as the drag end point as vertices, and the line connecting the first point and the second point forms a diagonal line Rectangle rangeCorresponding to the determination result of the determination means, the determination means for determining the relationship between the coordinates of the first point and the coordinates of the second point,Performs processing to recognize characters from the image in the rectangular area and determines the direction in which the characters continueAnd a processing means.
[0014]
  Claim 3The image processing method described inA rectangular range having the first point as the drag start point and the second point as the drag end point as vertices, and the line connecting the first point and the second point forms a diagonal line Rectangle range, Determine the relationship between the coordinates of the first point and the coordinates of the second point, corresponding to the determination result,Performs processing to recognize characters from the image in the rectangular area and determines the direction in which the characters continueIt is characterized by that.
[0015]
  In the image processing apparatus according to claim 1, the specifying unit includes:A rectangular range having the first point as the drag start point and the second point as the drag end point as vertices, and the line connecting the first point and the second point forms a diagonal line Rectangle rangeAnd the determining means determines the relationship between the coordinates of the first point and the coordinates of the second point, and the processing means corresponds to the determination result of the determining means,A process for recognizing characters from an image in a rectangular range is performed, and the direction in which the characters continue is determined.
[0016]
  Claim 3In the image processing method described inA rectangular range having the first point as the drag start point and the second point as the drag end point as vertices, and the line connecting the first point and the second point forms a diagonal line Rectangle range, Determine the relationship between the coordinates of the first point and the coordinates of the second point, corresponding to the determination result,A process for recognizing characters from an image in a rectangular range is performed, and the direction in which the characters continue is determined.
[0017]
DETAILED DESCRIPTION OF THE INVENTION
FIG. 1 shows a configuration example of a network to which the image processing apparatus of the present invention is connected. As shown in the figure, many servers and providers are connected to the Internet (service mark) as an international network for computers. Servers provide various information and services to users. Provide services that allow users to access the Internet.
[0018]
FIG. 2 is a block diagram illustrating a configuration example of the image processing apparatus of the present invention. In this embodiment, a network interface (I / F) 23 receives data supplied from the Internet and other networks, supplies it to the document data storage unit 18 and stores it. The document data storage unit 18 can be composed of a solid-state memory or the like in addition to a hard disk, an optical disk, a magneto-optical disk, or the like. The data structure stored in the document data storage unit 18 is a page such as image data, image data compressed by MMR (modified modified REA) or MH (modified Huffman), text data, Postscript used in DTP, and the like. It can be a description language.
[0019]
The image expansion processing unit 19 expands the data stored in the document data storage unit 18 into image data such as a bitmap corresponding to the data structure in response to a command from the CPU 11 and outputs the image data to the main memory 12. It is made to do. For example, when the data structure is image data compressed by MMR or MH used in a facsimile or the like, the image expansion processing unit 19 performs expansion processing. Also, in the case of a page description language such as Postscript, raster image expansion processing for expanding fonts and allocating pages is performed.
[0020]
Data stored in the main memory 12 is supplied to and stored in the display buffer 13 via the image data transfer unit 20 or the image data compression / transfer unit 21. Basically, the image data transfer unit 20 transfers the data stored in the main memory 12 to the display buffer 13 as it is, and the image data compression transfer unit 21 compresses the image stored in the main memory 12. And supplied to the display buffer 13 for storage.
[0021]
The image data compression / transfer unit 21 performs the compression process by a process of transferring data while thinning out every several lines, or a process of reducing the number of lines while performing an operation such as logical OR between the lines. Alternatively, the number of dots in the image data is counted, and the compression process is performed in accordance with the number.
[0022]
The image data compression transfer unit 21 and the image data transfer unit 20 have a relatively low resolution by converting the binary image data into multiple values when transferring the data read from the main memory 12 to the display buffer 13. Even on the display, it is possible to display without crushing fine characters. However, since the multi-value resolution conversion process takes time, for example, as disclosed in JP-A-4-337800, a coarse image is first displayed, and the data is converted into multi-value data. I will replace them later. Thereby, both the speed of reaction and the request | requirement of a beautiful display can be satisfied.
[0023]
The area copy processing unit 22 executes a process of copying (moving) a part of the image data stored in the display buffer 13 to another area of the display buffer 13.
[0024]
The video signal generator 14 reads the image data stored in the display buffer 13, converts it into a video signal, outputs it to the display 15, and displays it.
[0025]
An OCR (Optical Character Recognition) engine 24 executes processing for recognizing characters of image data (bitmap data) and converting them into text data such as JIS code under the control of the CPU 11.
[0026]
The keyboard 17 has at least a cursor key 17A, and is operated by a user when inputting various commands to the CPU 11. The pointing device 16 such as a mouse is operated by the user when a predetermined position is designated using a cursor displayed on the display 15.
[0027]
Next, the operation of the embodiment of FIG. 2 will be described. When the keyboard 17 is operated and the CPU 11 is instructed to start access to the Internet, for example, the CPU 11 causes the display 15 to display a menu screen as shown in FIG. On this menu screen, icons for accessing various servers connected to the Internet are displayed.
[0028]
When the user designates and selects, for example, the “Fax in” icon 31 with the cursor, the CPU 11 controls the network interface 23 to access the server corresponding to the icon connected to the Internet. This server reads cutouts such as newspapers and magazines by OCR (Optical Character Reader), stores them as image data (bitmap data), and provides a service (Fax in service) for providing the data.
[0029]
The network interface 23 supplies the data supplied from the server accessed via the Internet to the document data storage unit 18 for storage. A part of this data is supplied as it is to the image development processing unit 19, subjected to decompression processing and the like, converted into bitmap data, supplied to the main memory 12 and stored therein.
[0030]
The data stored in the main memory 12 is supplied to the display buffer 13 via the image data transfer unit 20 and written therein. The data written in the display buffer 13 is supplied to the video signal generator 14 and converted into a video signal, which is supplied to the display 15 and displayed. In this way, the home page as shown in FIG. 4 of the accessed server is first displayed on the display 15, for example.
[0031]
Then, when the user operates the pointing device 16 or the keyboard 17 while viewing this homepage, for example, when the user instructs the selection of the file 81-2 of the newspaper clipping, the data of the file is still stored in the document data storage unit. When the data is not stored in the CPU 18, the CPU 11 requests the server to transfer data via the network interface 23. When the server transfers data in response to this request, the data is supplied to the document data storage unit 18 via the network interface 23 and stored.
[0032]
Next, the CPU 11 reads the file data (document data) stored in the document data storage unit 18, converts the data into bitmap data by the image development processing unit 19, supplies the data to the main memory 12, and stores the data. Let Then, this data is supplied to the display buffer 13 via the image data transfer unit 20 or the image data compression transfer unit 21 and stored therein. One piece (one page) of image data written in the display buffer 13 is supplied to the video signal generator 14, converted into a video signal, and output to the display 15 for display.
[0033]
Next, the principle of displaying one image will be described with reference to FIG. Now, a window 41 is displayed on the display 15, and a clipped image of one (1 page) A4 size newspaper article read from the document data storage unit 18 is displayed on the window 41. To do. Assume that the image data 42 of one image stored in the main memory 12 has a width W and a height H as shown in FIG.
[0034]
On the other hand, the window 41 has a width w and a height h, and the width W and the height H of the image data 42 are larger than the width w and the height h of the window 41. In this case, the entire image data 42 cannot be displayed on the window 41 as it is. Therefore, in this embodiment, for example, processing for adjusting (adjusting) the width W of the image data 42 to the width w of the window 41 is performed. That is, the width of the image data 42 is compressed at a compression rate of w / W as a whole.
[0035]
Furthermore, the image data 52 compressed in the width direction as a whole to w / W in this way is compressed in the height direction as follows. That is, since the height h of the window 41 is smaller than the height H of the image data 52, the height a is, for example, 70% of the height h of the window 41.₂Region A₂And the height a at the top₁Region A₁, And the lower height a_ThreeRegion A_ThreeThen, the window 41 is divided. Corresponding to this division, the image data 50 also has a height r.₂(= A₂) Region R₂And the height r at the top₁Region R₁, And its lower height r_ThreeRegion R_ThreeIt is divided into and.
[0036]
The region R of the image data 52₂Is stored in the area A of the window 41.₂Then, it is transferred and displayed as it is (without compression). In contrast, region R₁The data of area A₁Are transferred and displayed after being compressed in the vertical direction._ThreeThe data of area A_ThreeThen, it is compressed and transferred and displayed in the vertical direction. Region A₂Height a₂Is a value of 70% of the height h of the window 41, and the region R of the image data 52 is₂Height of r₂Is a₂Is the same value as the region A,₂Is a standard part in which characters are displayed at a correct ratio (ratio between vertical and horizontal directions), whereas area A₁And A_ThreeIs a compression unit in which characters are compressed and displayed in the vertical direction.
[0037]
Standard area A₂The position of can be moved with a cursor. 6 and 7 illustrate this relationship. That is, as shown in FIG. 6, the range up to K and the range up to K around the position of the cursor 61 of the display buffer 13 (and hence the window 41) are the standard area A.₂The upper and lower areas are A₁Or A_ThreeIt is said. Therefore, for example, when the cursor 61 is moved downward from the state shown in FIG. 6, as shown in FIG.₂Moves downward from the position in FIG. As a result, region A₁7 is larger in the case of FIG. 7 than in the case of FIG._ThreeThis range is narrower in the case of FIG. 7 than in the case of FIG.
[0038]
FIG. 8 shows an example in which a file 81-2 is designated and a predetermined page is displayed on the window 41 in accordance with the principle as described above. In this display example, a newspaper cut-out is read by OCR, and an image captured as image data is displayed as a standard part having the same ratio in the vertical and horizontal directions at the center, and a predetermined part above and below the standard part. The area is displayed as a compressed portion compressed in the vertical direction.
[0039]
FIG. 9 shows a state where the standard part is moved to the upper end of the window 41. Therefore, in this display example, the compression unit is displayed only at the bottom of the window. Various control buttons (icons) 91 for selecting a file are displayed at the lower right of the window 41 in FIGS. 8 and 9.
[0040]
FIG. 10 shows a display example of a help screen that the CPU 11 displays on the display 15 when the rightmost control button (help button) is selected to understand the contents of the control button 91. Each control button will be described below with reference to this display example.
[0041]
As shown in the figure, in this display example, explanation of the control button 91, explanation of the mouse button, and explanation of the copy direction are displayed.
[0042]
Of the control buttons 91, the leftmost control button 91-1 is operated, for example, when returning from the help screen. The right control button 91-2 is operated when printing the image displayed in the window 41. Further, the control buttons 91-3 and 91-4 on the right are operated when the image displayed on the window 41 is rotated and displayed in the counterclockwise direction or the clockwise direction, respectively.
[0043]
Further, the control button 91-5 on the right side is operated when changing the file displayed in the window 41 to the previous file, and the control button 91-6 is the page of the file currently displayed in the window 41. This is operated when returning to the previous page by one page.
[0044]
Similarly, the control buttons 91-7 and 91-8 on the right side thereof are used to change the page of the file displayed in the window 41 to the next page, or to change the file displayed in the window 41 to the next file. It is operated when changing to.
[0045]
Therefore, for example, when the control button 91-8 is operated in a state where the file 81-1 in FIG. 4 is displayed in the window 41, the next file 81-2 is displayed in the window 41, and the file 81 -2 is displayed, if the control button 91-5 is operated, the previous file 81-1 is displayed. For example, when the control button 91-7 is operated in a state where a predetermined page of the file 81-2 is displayed, the next page of the file 81-2 is displayed and the control button 91-6 is operated. Then, the previous page of the file 81-2 is displayed.
[0046]
Further, the rightmost control button 91-9 is operated to display a help screen as shown in FIG.
[0047]
The same functions as those of the control buttons 91-1 to 91-9 are assigned to the alphabet keys q, w, i, r, P, p, n, N, and h of the keyboard 17, respectively. The functions of the control buttons 91-6 and 91-7 are also assigned to the left cursor key and the right cursor key among the cursor keys 17A. Further, the function of the control button 91-9 is also assigned to the help key of the keyboard 17.
[0048]
In the comment field of the mouse button, the operation method of the mouse is explained. That is, in this embodiment, the mouse 100 constituting the pointing device 16 has three buttons 101 to 103 as shown in FIG. Among these buttons, the rightmost button 103 is a zoom mode button, and when this button 103 is operated, characters in a predetermined range are enlarged and displayed as shown in FIG. At this time, characters in a state where the enlarged region is not displayed are displayed lightly on the background of the enlarged region. When the button 103 and the button 102 adjacent to the left (center) are simultaneously operated, the enlargement ratio in the enlargement area is increased (a large character is displayed). When the button 103 and the leftmost button 101 are simultaneously operated, The enlargement ratio in the enlargement area is reduced (small characters are displayed).
[0049]
That is, the buttons 103 and 102 perform a zoom-in operation, and the buttons 103 and 101 perform a zoom-out operation.
[0050]
Furthermore, when the button 101 of the mouse 100 is operated, the image data read by the OCR can be converted into text data and copied to a predetermined position. An explanation of the copy direction in that case is displayed on the left side.
[0051]
That is, in this embodiment, when copying vertically written characters, drag from the right end point to specify the left end point, and when copying horizontally written characters, drag from the upper left point, The lower right point is specified. That is, it is possible to substantially specify the continuous direction of characters simply by changing the dragging direction.
[0052]
Next, the details of the OCR (Optical Character Recognition) function in this embodiment will be described with reference to the flowchart of FIG.
[0053]
For example, as shown in FIG. 14, it is assumed that characters represented by image data are displayed in the window 41-1. This character is, for example, a character printed on paper or the like read by OCR and converted into image data. Specifically, as described above, it is image data of a cut-out newspaper article. In this display example, the horizontal characters “Iroha Nihonoe… Kefukoe” are displayed. In such a state, it is assumed that the characters of the image data displayed in the window 41-1 are converted into the characters of text data and copied to the window 41-2.
[0054]
First, in step S1, the user sets the upper left point P of the range to be converted.₁Is specified. For this designation, the mouse 100 is operated to move the cursor to the point P.₁This is done by moving to the position and clicking the button 101 at that position. At this time, the CPU 11 determines that the designated point P₁The coordinates of (x₁, Y₁). Also in this embodiment, on the display 15 (or window 41), the origin is the upper left point, and the x coordinate is taken in the right direction and the y coordinate is taken in the lower direction.
[0055]
Next, the user moves the button 101 of the mouse 100 to a point P.₁Drag while pressing with the point P on the lower right of the range to be converted₂Move to the position of, and release the drag at that position.
[0056]
The CPU 11 waits until the drag is completed in step S2, and when the end of the drag is input from the mouse 100 (pointing device 16), the point P is determined in step S3.₂The coordinates of (x₂, Y₂).
[0057]
Next, in step S4, the point P₂Y coordinate y₂And point P₁Y coordinate y₁Is determined. In the state shown in FIG. 14, characters are continuous in the horizontal direction. In this case, the characters are written from the left to the right, and when the character reaches the right end of the line, the character moves to the lower line and is again written so that the characters continue from the left end to the right end. That is, the characters continue from the upper left to the lower right. Thus, when the character is written horizontally, the user first specifies the upper left point P when specifying the range to be converted.₁And then the lower right point P₂Is specified. As a result, y₂Is y₁It is getting bigger.
[0058]
Then, next, the process proceeds to step S5, where₁Coordinate x₁And point P₂Coordinate x₂Are compared in size. If the character is written horizontally, the point P₁Is point P₂Because it is located on the left side, x₁Is x₂It is getting smaller. Therefore, in this case, the process proceeds to step S8, in which the CPU 11 controls the OCR engine 24, and the point P₁And point P₂The image data inside the rectangular area specified by is recognized as horizontally written characters, and a process of converting into text data such as JIS code is executed. In this case, the character of the image data “Rinuru wa wa me tsurumouino” is recognized.
[0059]
Next, the user points P at the upper left of the area to copy the character corresponding to the text data obtained as a result of recognition._ThreeIs designated by operating the button 101 of the mouse 100. In the embodiment of FIG. 14, the coordinates x of the window 41-2._Three, Y_ThreePoint P_ThreeIs designated as the upper left point of the copy area.
[0060]
In step S10, the process waits until the copy destination is designated. And point P_ThreeIs designated, the CPU 11 proceeds to step S11, and the character corresponding to the text data recognized by the OCR engine 24 in step S8 is changed to the point P._ThreeDisplay in the area specified by. Point P of this window 41-2_ThreeSince the characters displayed in the range defined by are characters corresponding to the text data, the user can arbitrarily edit or delete these characters.
[0061]
On the other hand, as shown in FIG. 15, when vertically written characters are displayed in the window 41-1, the characters continue from the top to the bottom of the rightmost line, and when the bottom end of the line is reached, Move to a line and continue down from the top edge of the line. That is, the characters are continuous from the upper right to the lower left.
[0062]
Thus, when the character is displayed in vertical writing, the user specifies the upper right point and the lower left point when specifying the range to be converted. That is, the point P₁Becomes the upper right point, and point P₂Is the lower left point. For this reason, x₁Is x₂Become bigger. Accordingly, in step S5, a NO determination is made and x₁Is x₂In step S <b> 6 for determining whether or not it is larger, a determination of YES is made. Therefore, the process proceeds to step S9, and the CPU 11 sends the point P to the OCR engine 24.₁And point P₂The image data in the rectangular range specified by is recognized as vertically written characters.
[0063]
When copying vertically written characters, the user selects a point P at the upper right of the copy area as a point of the copy destination._ThreeSpecify as. In step S10, this point P_ThreeWhen it is determined that the input is detected, the process proceeds to step S11, and the character corresponding to the text data recognized in step S9 is the point P._ThreeIs displayed in the copy area defined by. In the case of the example of FIG. 15, the characters “Rinu wa wa suru mu ei no umino” are displayed in vertical writing.
[0064]
In the above embodiment, hiragana is recognized. However, the present invention is not limited to this, and it is also possible to recognize kanji and alphabet characters.
[0065]
In this way, by changing the method of specifying the range between when the character is written horizontally and when it is written vertically, there is no need to newly specify the direction in which the characters are continuous. Improved. In addition, the horizontal writing area that continues from the upper left to the lower right is specified by the upper left point and the lower right point, and the vertical writing area that continues from the upper right to the lower left is the upper right area. Since the point is designated by the lower left point and the lower left point, the direction in which the characters continue and the direction of the designated point correspond to each other, and the range can be designated by a very natural operation.
[0066]
In the above embodiment, the process in the case of character recognition is changed by the method of designating a range. However, the present invention can be applied to the case of executing a process other than character recognition. .
[0067]
【The invention's effect】
  As described above, the image processing apparatus according to claim 1 andClaim 3According to the image processing method described in the above, since the image in the range is processed in accordance with the relationship between the coordinates of the first point and the second point specifying the range, the operability is improved. .
[Brief description of the drawings]
FIG. 1 is a diagram illustrating a network to which an image processing apparatus of the present invention is connected.
FIG. 2 is a block diagram illustrating a configuration example of an image processing apparatus according to the present invention.
FIG. 3 is a diagram showing a display example of a menu on the display 15 of FIG.
4 is a diagram showing a display example of a home page on the display 15 of FIG.
5 is a diagram for explaining the principle of displaying one image in a window in the embodiment of FIG. 2; FIG.
FIG. 6 is a diagram illustrating a relationship between a cursor and display content.
FIG. 7 is a diagram illustrating a relationship between a cursor and display contents.
FIG. 8 is a diagram showing a display example in a window.
FIG. 9 is a diagram showing another display example in the window.
FIG. 10 is a diagram showing a display example of a help screen.
FIG. 11 is a diagram showing a structure of a mouse.
FIG. 12 is a diagram illustrating an example of enlarged display in a window.
FIG. 13 is a flowchart for explaining processing of the OCR function in the embodiment of FIG. 2;
14 is a diagram showing a display example in step S8 of FIG.
15 is a diagram showing a display example in step S9 of FIG.
FIG. 16 is a diagram illustrating a conventional OCR function.
FIG. 17 is another diagram illustrating a conventional OCR function.
[Explanation of symbols]
11 CPU
12 Main memory
13 Display buffer
14 Video signal generator
15 display
16 pointing device
17 Keyboard
17A Cursor key
18 Document data storage
19 Image processing unit
20 Image data transfer unit
21 Image data compression and transfer unit
23 Network interface
24 OCR engine

Claims

A rectangular range having a first point as a drag start point and a second point as a drag end point as vertices, and a line connecting the first point and the second point is a diagonal line A specification means for specifying a rectangular range forming
Determination means for determining a relationship between the coordinates of the first point and the coordinates of the second point;
An image processing apparatus comprising: a processing unit that performs a process of recognizing a character from the image in the rectangular range in accordance with a determination result of the determination unit and determines a direction in which the characters continue .

2. The image processing according to claim 1, wherein the processing unit further converts the image in the rectangular range into text data based on a character recognition result and a determined continuous direction of the characters. apparatus.

A rectangular range having a first point as a drag start point and a second point as a drag end point as vertices, and a line connecting the first point and the second point is a diagonal line Specify the range of the rectangle that forms
Determining the relationship between the coordinates of the first point and the coordinates of the second point;
In accordance with the determination result, a process for recognizing characters from the image in the rectangular range and determining a direction in which the characters continue is determined .