JPH07306854A

JPH07306854A - Method and device for compressing document data

Info

Publication number: JPH07306854A
Application number: JP7050851A
Authority: JP
Inventors: Tetsuya Shibata; 哲也柴田
Original assignee: Mita Industrial Co Ltd
Current assignee: Kyocera Mita Industrial Co Ltd
Priority date: 1994-03-14
Filing date: 1995-03-10
Publication date: 1995-11-21

Abstract

PURPOSE:To effectively utilize a memory by decreasing the amount of document data which are previously generated as a document to be displayed. CONSTITUTION:A control part 1 calculates words included in a document inputted from an input device 2 and the frequencies of appearance of the words. Codes different from codes assigned to character data are assigned to the respective words, and the 1st total number W1 of words (= the number Nw of words Xfrequency Ni of appearance) and the 2nd total number W2 of words (= the number Nc of words of codes X frequency Ni of appearance + the number Nw of words) are calculated, word by word. Then parts constituting words whose 2nd total number W2 of words are less than the 1st total number W1 of words in the character data constituting the input document are replaced with the codes assigned to the respective words concerned to compress the document data. The words which become small in the total number of words in the whole document are replaced with the codes indicating the words concerned to decrease the amount of the document data.

Description

【発明の詳細な説明】Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、例えば液晶ディスプレ
イ等の表示装置に表示させるため予め作成されたコメン
ト、メッセージ等の文書データを圧縮してメモリに記憶
する文書データの圧縮方法及びその装置に関するもので
ある。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a document data compression method and apparatus for compressing document data such as comments and messages prepared in advance for display on a display device such as a liquid crystal display and storing it in a memory. It is a thing.

【０００２】[0002]

【従来の技術】従来、例えばファクシミリ、コピー機な
どの事務機や各種産業機械等においては、各種操作方法
やコメント等の文書データを予め作成しておき、機械の
動作状態やオペレータによるキー操作に応じて、適切な
文書データを選択してディスプレイ上に表示する表示技
術が知られている。2. Description of the Related Art Conventionally, in office machines such as facsimiles and copiers, various industrial machines, and the like, document data such as various operating methods and comments have been created in advance so that they can be used for operating states of machines and key operations by operators. Accordingly, a display technique is known in which appropriate document data is selected and displayed on a display.

【０００３】かかる表示技術においては、上記ファクシ
ミリ、コピー機等の事務機器の製造時に、予め作成され
た表示用の文書がコード化された文字、数字又はその他
の記号により文字単位でデータに変換され、この文書デ
ータがコードデータに変換するためのコード表と共に事
務機器本体に内蔵される内部メモリに記憶されている。In such a display technique, a display document prepared in advance is converted into data in units of characters by coded characters, numbers or other symbols at the time of manufacturing office equipment such as the above-mentioned facsimile and copying machine. The document data is stored in the internal memory built in the office equipment body together with the code table for converting the document data.

【０００４】そして、文書データを表示するときは、表
示すべき文書データを内部メモリから読み出し、上記コ
ード表を用いて各文字をコードデータ（７ビット又は８
ビットのデータ）に変換した後、このコードデータに基
づいて液晶ディスプレイ等の表示装置を駆動する手法が
採用されている。When displaying the document data, the document data to be displayed is read out from the internal memory, and each character is converted into code data (7 bits or 8 bits) using the above code table.
After conversion into bit data), a method of driving a display device such as a liquid crystal display based on the code data is adopted.

【０００５】図４は、従来の内部メモリに記憶された文
書データの一例を示す図である。同図において、「表示
文書」は、液晶ディスプレイに表示される文書で、文書
例として「ＬＩＮＥＴＹＰＥ」、「ＳＥＴＬＩＮＥ
ＴＹＰＥ」を掲載している。また、「文書データ」
は、内部メモリに記憶されている表示文書のデータ構成
である。文書データは、例えばＪＩＳコード表の各コー
ドデータに割り当てられた文字、数字又はその他の記号
（以下、文字データという）等の配列で構成されてい
る。各文字データは、「，」で区切られ、単語間はブラ
ンクデータ（図中、’□’で表す）により、また、文書
間は文書の終了を示す終了データ（図中、’０’で示
す）によりそれぞれ区切られている。FIG. 4 is a diagram showing an example of document data stored in a conventional internal memory. In the figure, “display document” is a document displayed on the liquid crystal display, and examples of the document include “LINE TYPE” and “SET LINE”.
"TYPE" is posted. Also, "Document data"
Is the data structure of the display document stored in the internal memory. The document data is composed of an array of characters, numbers or other symbols (hereinafter referred to as character data) assigned to each code data in the JIS code table, for example. Each character data is separated by ",", blank data (represented by "□" in the figure) between words, and end data indicating the end of document (represented by "0" in the figure) between documents. ) Are separated from each other.

【０００６】上記表示文書の表示制御においては、表示
すべき文書データが内部メモリから読み出され、例えば
ＪＩＳコード表により各文字データがコードデータに変
換された後、このコードデータに基づいて液晶ディスプ
レイを駆動することにより文書の表示が行われる。例え
ば「ＬＩＮＥＴＹＰＥ」を表示させる場合、内部メモ
リから’Ｌ’，’Ｉ’，…’Ｅ’の文字データが順次、
読み出され、各文字データはそれぞれコードデータ（’
Ｌ’＝「００１１１１００」，’Ｉ’＝「１００１１１
００」等）に変換される。そして、このコードデータに
基づいて液晶ディスプレイに設けられた表示用セグメン
トの駆動を制御することにより「ＬＩＮＥＴＹＰＥ」
の文書が表示される。In the display control of the above-mentioned display document, the document data to be displayed is read out from the internal memory, each character data is converted into code data by, for example, the JIS code table, and then the liquid crystal display is based on this code data. The document is displayed by driving. For example, when displaying "LINE TYPE", character data of "L", "I", ...
Each character data that is read out is code data ('
L '=' 00111100 ',' I '=' 100111
00 "). Then, by controlling the drive of the display segment provided in the liquid crystal display based on the code data, "LINE TYPE"
Is displayed.

【０００７】[0007]

【発明が解決しようとする課題】従来は、表示文書を構
成する文字毎に文字データに変換して当該表示文書の文
書データを作成し、この文書データを内部メモリに記憶
するようにしているので、文書データのデータ量が大き
く、内部メモリの記憶領域に占める文書データの領域が
大きくなり、内部メモリの有効利用が困難となってい
る。特に、表示文書に文字数の多い同一単語が何回も使
用されていると、単語数の少ない表示文書であっても表
示文書全体に含まれる文字数は多くなるから、文書デー
タのデータ量は大きくなる。Conventionally, each character forming a display document is converted into character data to create document data of the display document, and this document data is stored in an internal memory. Since the amount of document data is large and the area of the document data in the storage area of the internal memory is large, it is difficult to effectively use the internal memory. In particular, when the same word having a large number of characters is used many times in the display document, the number of characters included in the entire display document is large even for a display document having a small number of words, and thus the amount of document data is large. .

【０００８】また、複数の外国に輸出される事務機器に
おいては、輸出国毎に当該国の言語による表示文書とコ
ードデータ変換用の専用のコード表とを設けることの煩
わしさを回避するため、表示文書に使用できる文字デー
タ及びこの文字データをコードデータに変換するための
コード表の共通化を図ることが多いが、この場合、通
常、１０数ヵ国語の辞書を必要とし、このため上記コー
ド表のデータ量が膨大になって内部メモリの有効利用を
妨げることとなる。Further, in office equipment exported to a plurality of foreign countries, in order to avoid the trouble of providing a display document in the language of the country and a dedicated code table for converting code data for each exporting country, In many cases, the character data that can be used for display documents and the code table for converting this character data into code data are commonly used. In this case, however, a dictionary of 10 languages is usually required. The amount of data in the table becomes huge, which hinders effective use of the internal memory.

【０００９】更に、表示文書を文字単位でコードデータ
に変換する従来の表示方式では、コード表の検索（すな
わち、各文字データからコードデータへの変換）に時間
を要し、液晶ディスプレイ等への表示レスポンスが悪く
なるという問題もある。Further, in the conventional display system for converting a display document into code data on a character-by-character basis, it takes time to search a code table (that is, to convert each character data into code data), and to display it on a liquid crystal display or the like. There is also the problem of poor display response.

【００１０】本発明の目的は、上記課題に鑑みてなされ
たものであり、文書データのデータ量を可及的に少なく
してメモリの効率化を可能にするとともに、表示レスポ
ンスを向上させることのできる文書データの圧縮方法及
びその装置を提供することを目的とする。An object of the present invention has been made in view of the above problems, and it is possible to reduce the amount of document data as much as possible to improve the efficiency of the memory and to improve the display response. An object of the present invention is to provide a method of compressing document data and a device thereof.

【００１１】[0011]

【課題を解決するための手段】請求項１記載の本発明
は、所定のコード表で定義された文字データにより予め
作成された複数の文書データを、当該文書データに含ま
れる一部の単語を単語単位でコード化することにより圧
縮する文書データの圧縮方法であって、全文書データに
含まれる単語から単語単位でコード化すべき単語を抽出
する単語抽出工程と、抽出した各単語に、上記コード表
のコードであって上記文字データに割り当てられたコー
ドと異なるコードを割り振るコード割振工程と、コード
化された単語と当該単語に割り振られたコードとの対応
関係を示す辞書を作成する辞書作成工程と、文書データ
を構成する単語のうち、上記単語抽出工程で抽出した単
語を当該単語に割り振られたコードに置換して各文書デ
ータをコード混じりの文書データに圧縮する文書データ
圧縮工程とからなるものである。According to a first aspect of the present invention, a plurality of document data created in advance by character data defined by a predetermined code table is used, and a part of words included in the document data is converted. A method for compressing document data that is compressed by encoding in word units, comprising a word extraction step of extracting words to be encoded in word units from words included in all document data, and the above-mentioned code for each extracted word. A code allocating step of allocating a code different from the code assigned to the above character data in the table, and a dictionary creating step of creating a dictionary showing the correspondence between the coded word and the code assigned to the word Among the words that compose the document data, replace the word extracted in the word extraction step with the code assigned to the word, and mix each document data with the code. It is made of the document data compression step of compressing the document data.

【００１２】また、請求項２記載の発明は、上記文書デ
ータの圧縮方法において、文書データ圧縮工程に代え
て、各文書データ毎にコード化される単語数を算出する
単語数算出工程と、各文書データ毎に、当該文書データ
を構成する単語のうち、上記単語抽出工程で抽出した単
語を当該単語に割り振られたコードに置換するととも
に、各文書データの先頭にコード化された単語数のデー
タを付加して圧縮した文書データを作成する文書データ
作成工程とを備えたものである。According to a second aspect of the present invention, in the method of compressing document data described above, instead of the document data compression step, a word number calculation step of calculating the number of words coded for each document data, For each document data, of the words that compose the document data, replace the word extracted in the word extraction step with the code assigned to the word, and code the number of words at the beginning of each document data. And a document data creating step for creating compressed document data.

【００１３】また、請求項３記載の発明は、上記文書デ
ータの圧縮方法において、上記単語抽出工程は、全文書
データに含まれる文字列の異なる単語を抽出する第１の
単語抽出工程と、抽出した各単語について、全文書デー
タ中の発現数をカウントする単語発現数カウント工程
と、抽出した各単語について、文字単位でコード化した
場合に当該単語に要する第１の総ワード数を当該単語の
発現数に基づいて演算する第１の総ワード数演算工程
と、抽出した各単語について、単語単位でコード化した
場合に当該単語に要する第２の総ワード数を当該単語の
発現数に基づいて演算する第２の総ワード数演算工程
と、第１の総ワード数と第２の総ワード数とを比較し、
第２の総ワード数が第１の総ワード数より小さい単語を
コード化すべき単語として抽出する第２の単語抽出工程
とからなるものである。According to a third aspect of the present invention, in the method of compressing document data, the word extracting step includes a first word extracting step of extracting words having different character strings included in all document data, and an extracting step. For each word, the word expression number counting step of counting the expression number in all document data, and for each extracted word, the first total word number required for the word when encoded in character units A first total word number calculation step of calculating based on the number of occurrences, and for each extracted word, a second total number of words required for the word when encoded in word units, based on the number of occurrences of the word Comparing the second total word number calculation step for calculation with the first total word number and the second total word number,
A second word extracting step of extracting a word having a second total number of words smaller than the first total number of words as a word to be encoded.

【００１４】更に、請求項４記載の発明は、上記文書デ
ータの圧縮方法において、上記第１の総ワード数演算工
程は、抽出された各単語について、当該単語のワード数
に発現数を乗じて第１の総ワード数を演算するものであ
り、上記第２の総ワード数演算工程は、抽出された各単
語について、当該単語の発現数と割り振られるコードの
ワード数とを乗じ、この乗算結果に当該単語のワード数
を加算して第２の総ワード数を演算するものである。Further, in the invention of claim 4, in the method of compressing document data, the first total word number calculating step multiplies the word number of each extracted word by the expression number. The first total word number is calculated, and in the second total word number calculation step, for each extracted word, the expression number of the word is multiplied by the number of words of the allocated code, and the multiplication result is obtained. To calculate the second total number of words.

【００１５】また、請求項５記載の発明は、所定のコー
ド表で定義された文字データにより予め作成された複数
の文書からなる文書データを、当該文書データに含まれ
る一部の単語を単語単位でコード化することにより圧縮
して記憶手段に記憶する文書データの圧縮装置であっ
て、全文書データに含まれる単語から単語単位でコード
化すべき単語を抽出する単語抽出手段と、抽出した各単
語に、上記コード表のコードであって上記文字データに
割り当てられたコードと異なるコードを割り振るコード
割振手段と、コード化された単語と当該単語に割り振ら
れたコードとの対応関係を示す辞書を作成する辞書作成
手段と、文書データを構成する単語のうち、上記単語抽
出工程で抽出した単語を当該単語に割り振られたコード
に置換して各文書データをコード混じりの文書データに
圧縮する文書データ圧縮手段と、圧縮された文書データ
及び作成された辞書を記憶手段に書き込むデータ書込手
段とを備えたものである。Further, according to the invention of claim 5, the document data composed of a plurality of documents created in advance by the character data defined by a predetermined code table is used, and a part of the words included in the document data is used as a word unit. A device for compressing document data that is compressed by being coded by and stored in a storage unit, and is a word extraction unit that extracts words to be encoded in word units from words included in all document data, and each extracted word. In addition, a code allocation means for allocating a code different from the code assigned to the character data in the above code table and a dictionary showing the correspondence between the coded word and the code assigned to the word are created. The dictionary creating means and the words constituting the document data are extracted by replacing the words extracted in the word extracting step with codes assigned to the words. And document data compressing means for compressing the data in the document data code mingled, in which a data writing means for writing the compressed document data and created dictionary storage means.

【００１６】また、請求項６記載の発明は、上記文書デ
ータ圧縮装置において、文書データ圧縮手段に代えて、
各文書データ毎にコード化される単語数を算出する単語
数算出手段と、各文書データ毎に、当該文書データを構
成する単語のうち、上記単語抽出手段で抽出した単語を
当該単語に割り振られたコードに置換するとともに、各
文書データの先頭にコード化された単語数のデータを付
加して圧縮した文書データを作成する文書データ作成手
段とを備えたものである。According to a sixth aspect of the present invention, in the document data compression device, the document data compression means is replaced by
A word number calculation means for calculating the number of words coded for each document data, and for each document data, the words extracted by the word extraction means among the words constituting the document data are assigned to the word. And a document data creating means for creating compressed document data by adding data of the coded word number to the beginning of each document data.

【００１７】また、請求項７記載の発明は、上記文書デ
ータの圧縮装置において、上記単語抽出手段は、文書デ
ータに含まれる文字列の異なる単語を抽出する第１の単
語抽出手段と、抽出した各単語について、文書データ中
の発現数をカウントする単語発現数カウント手段と、抽
出した各単語について、文字単位でコード化した場合に
当該単語に要する第１の総ワード数を当該単語の発現数
に基づいて演算する第１の総ワード数演算手段と、抽出
した各単語について、単語単位でコード化した場合に当
該単語に要する第２の総ワード数を当該単語の発現数に
基づいて演算する第２の総ワード数演算手段と、第１の
総ワード数と第２の総ワード数とを比較し、第２の総ワ
ード数が第１の総ワード数より小さい単語をコード化す
べき単語として抽出する第２の単語抽出手段とからなる
ものである。According to a seventh aspect of the present invention, in the document data compression apparatus, the word extracting means includes first word extracting means for extracting words having different character strings contained in the document data. For each word, word expression number counting means for counting the number of occurrences in the document data, and for each extracted word, the first total number of words required for the word when encoded in character units is the expression number of the word. And a first total word number calculating means for calculating each word, and for each extracted word, a second total number of words required for the word when coded in word units is calculated based on the number of occurrences of the word. The second total word number calculating means is compared with the first total word number and the second total word number, and a word having a second total word number smaller than the first total word number is determined as a word to be coded. Extraction To is made of a second word extraction means.

【００１８】更に、請求項８記載の発明は、上記文書デ
ータの圧縮装置において、上記第１の総ワード数演算手
段は、抽出された各単語について、当該単語のワード数
に発現数を乗じて第１の総ワード数を演算するものであ
り、上記第２の総ワード数演算手段は、抽出された各単
語について、当該単語の発現数と割り振られるコードの
ワード数とを乗じ、この乗算結果に当該単語のワード数
を加算して第２の総ワード数を演算するものである。Further, in the document data compressing apparatus according to the present invention, the first total word number calculating means multiplies the number of words of each extracted word by the expression number. A second total word number calculating means calculates a first total word number, and for each extracted word, the number of occurrences of the word is multiplied by the number of allocated code words, and the multiplication result is obtained. To calculate the second total number of words.

【００１９】[0019]

【作用】請求項１，５記載の発明によれば、文書データ
に含まれる単語から単語単位でコード化すべき単語が抽
出され、抽出した各単語に、上記コード表のコードであ
って上記文字データに割り当てられたコードと異なるコ
ードが割り当てられる。続いて、コードがされた単語と
当該単語に割り当てられたコードとの対応関係を示す辞
書が作成されるとともに、文書データを構成する単語の
うち、コード化すべき単語として抽出した単語を当該単
語に割り振られたコードに置換して文書データがコード
混じりの文書データに圧縮される。そして、作成された
文書データ及び辞書は記憶手段に書き込まれる。According to the invention described in claims 1 and 5, words to be coded in word units are extracted from the words included in the document data, and each extracted word is a code of the code table and the character data. A code different from the code assigned to is assigned. Next, a dictionary showing the correspondence between the coded words and the codes assigned to the words is created, and the words extracted as the words to be coded among the words forming the document data are set to the words. The document data is replaced with the assigned code, and the document data is compressed into the document data containing the code. Then, the created document data and dictionary are written in the storage means.

【００２０】請求項２，６記載の発明によれば、各文書
データ毎にコード化される単語数が算出され、各文書デ
ータ毎に、当該文書データを構成する単語のうち、コー
ド化すべき単語として抽出した単語を当該単語に割り振
られたコードに置換するとともに、各文書データの先頭
にコード化された単語数のデータを付加して圧縮された
文書データが作成される。According to the second and sixth aspects of the present invention, the number of words coded for each document data is calculated, and the word to be coded among the words constituting the document data is calculated for each document data. The word data extracted as is replaced with the code assigned to the word, and the data of the number of coded words is added to the head of each document data to create compressed document data.

【００２１】請求項３，７記載の発明によれば、文書デ
ータに含まれる文字列の異なる単語が抽出されるととも
に、抽出した各単語について、文書データ中の発現数
（Ｎｉ）がカウントされる。そして、抽出した各単語に
ついて、文字単位でコード化した場合に当該単語に要す
る第１の総ワード数（Ｗ１）が、カウントされた当該単
語に発現数（Ｎｉ）に基づいて演算されるとともに、単
語単位でコード化した場合に当該単語に要する第２の総
ワード数（Ｗ２）が、カウントされた当該単語の発現数
（Ｎｉ）に基づいて演算され、更に第１の総ワード数
（Ｗ１）と第２の総ワード数（Ｗ２）とを比較し、第２
の総ワード数（Ｗ２）が第１の総ワード数（Ｗ１）より
小さい単語がコード化すべき単語として抽出される。According to the third and seventh aspects of the invention, words having different character strings contained in the document data are extracted, and the number of occurrences (Ni) in the document data is counted for each extracted word. . Then, for each extracted word, the first total word number (W1) required for the word when encoded in character units is calculated based on the number of occurrences (Ni) of the counted word, and The second total number of words (W2) required for the word when coded in word units is calculated based on the counted number of occurrences (Ni) of the word, and further the first total number of words (W1). And the second total number of words (W2) are compared,
A word having a total number of words (W2) smaller than the first total number of words (W1) is extracted as a word to be encoded.

【００２２】請求項４，８記載の発明によれば、抽出さ
れた各単語について、当該単語とのワード数（Ｎｗ）に
発現数（Ｎｉ）を乗じて第１の総ワード数（Ｗ１＝Ｎｗ
×Ｎｉ）が演算され、当該単語の発現数（Ｎｉ）と割り
振られるコードのワード数（Ｎｃ）とを乗じ、この乗算
結果（Ｎｃ×Ｎｉ）に当該単語のワード数（Ｎｗ）を加
算して第２の総ワード数（Ｗ２＝Ｎｃ×Ｎｉ＋Ｎｗ）が
演算される。According to the fourth and eighth aspects of the present invention, for each extracted word, the first total word number (W1 = Nw) is obtained by multiplying the word number (Nw) with the word by the expression number (Ni).
XNi) is calculated, the number of occurrences (Ni) of the word is multiplied by the number of words (Nc) of the code to be assigned, and the multiplication result (Nc × Ni) is added with the number of words (Nw) of the word. The second total number of words (W2 = Nc × Ni + Nw) is calculated.

【００２３】[0023]

【実施例】図１は、本発明に係る文書データの圧縮方法
が適用される文書データ圧縮装置のブロック構成図であ
る。ＬＣＤ（Liquid Cyristal Display）等の表示装置
を備えたファクシミリや複写機等の機器においては、通
常、コメントやメッセージ等の予め作成された所定の文
書が上記機器の動作状態やオペレータのキー操作に応じ
て上記表示装置に表示されるようになっている。そし
て、かかる文書の表示は、複数の上記文書を表すデータ
（以下、文書データ）が上記機器に内蔵されたＲＯＭ
（Read Only Memory）等のメモリに予め書き込まれてお
り、機器の動作状態やオペレータのキー操作に応じて上
記メモリから所定の文書データを読み出し、この文書デ
ータに基づいて上記ＬＣＤを駆動することにより行われ
るようになっている。1 is a block diagram of a document data compression apparatus to which a method for compressing document data according to the present invention is applied. In a device such as a facsimile or a copying machine equipped with a display device such as an LCD (Liquid Cyristal Display), a predetermined document such as a comment or a message is usually prepared according to the operation state of the device or an operator's key operation. Are displayed on the display device. The display of such a document is performed by a ROM in which data representing the plurality of documents (hereinafter, document data) is built in the device.
It is written in advance in a memory such as (Read Only Memory), and reads out predetermined document data from the memory according to the operation state of the device or the key operation of the operator, and drives the LCD based on the document data. It is supposed to be done.

【００２４】図１に示す文書データ圧縮装置は、オペレ
ータにより作成された文書データを圧縮し、上記機器に
内蔵されるメモリへ書き込むものである。The document data compression apparatus shown in FIG. 1 compresses the document data created by the operator and writes it in the memory built in the device.

【００２５】文書データ圧縮装置は、圧縮装置１、入力
装置２、表示装置３、データ書込装置４により構成され
ている。なお、メモリ５は、文書データ圧縮装置で作成
された文書データ（圧縮データ）が記憶されるＲＯＭ等
からなるメモリで、ファクシミリや複写機等の機器に内
蔵されるものである。なお、メモリ５は上記文書データ
及び辞書の他、上記機器の動作制御に必要な各種データ
や処理プログラムも記憶されるものである。The document data compression device comprises a compression device 1, an input device 2, a display device 3, and a data writing device 4. The memory 5 is a memory including a ROM or the like in which document data (compressed data) created by the document data compression device is stored, and is built in a device such as a facsimile or a copying machine. In addition to the document data and the dictionary, the memory 5 also stores various data and processing programs necessary for controlling the operation of the device.

【００２６】入力装置２は、オペレータが上記文書を入
力するための操作部材で、文字、数字及び各種記号等を
入力するためのキーやテンキー、ファンクションキーを
備えている。文書は、入力装置２により文字単位で入力
される。入力装置２の各キーに割り当てられた文字、数
字、記号及びファンクションは、ＪＩＳコード表、ＡＳ
ＣＩＩコード表等の所定のコード表によりコードデータ
との対応関係が定義されており、キー操作により入力さ
れた各文字は、上記コード表に基づいて所定のコードデ
ータに変換されて圧縮装置１に入力される。The input device 2 is an operating member for an operator to input the above-mentioned document, and is provided with keys, ten keys, and function keys for inputting characters, numbers and various symbols. The document is input by the input device 2 character by character. The characters, numbers, symbols and functions assigned to the respective keys of the input device 2 are defined by JIS code table, AS
Correspondence with the code data is defined by a predetermined code table such as a CII code table, and each character input by key operation is converted into predetermined code data based on the above code table, and the compressed data is stored in the compression device 1. Is entered.

【００２７】表示装置３は、ＣＲＴ（Cathode Ray Tub
e）、ＬＣＤ等からなり、後述する文書データの作成及
び圧縮処理を行うために必要な表示を行うものである。
例えば入力装置２から入力された文書は表示装置３に表
示され、この表示により操作者は入力文字の確認、訂
正、削除等を行うことができる。The display device 3 is a CRT (Cathode Ray Tub).
e), which is composed of an LCD or the like, and provides a display necessary for creating and compressing document data described later.
For example, the document input from the input device 2 is displayed on the display device 3, and the display allows the operator to confirm, correct, delete, etc. the input character.

【００２８】圧縮装置１は、入力装置２から入力された
文書データを圧縮し、メモリ５に書き込むための文書デ
ータを作成する装置である。圧縮装置１は、データ入力
部１１、文書メモリ１２、単語発現数カウンタ１３（第
１の単語抽出手段、単語発現数カウント手段）、コード
割振部１４（コード割振手段）、総ワード数演算部１５
（第１及び第２の総ワード数演算手段）、圧縮単語抽出
部１６（単語数算出手段、第２の単語抽出手段）、文書
データ圧縮部１７（圧縮した文書データの作成手段、辞
書作成手段）及び制御部１８を備えている。The compression device 1 is a device that compresses the document data input from the input device 2 and creates the document data to be written in the memory 5. The compression device 1 includes a data input unit 11, a document memory 12, a word expression number counter 13 (first word extraction unit, word expression number counting unit), a code allocation unit 14 (code allocation unit), and a total word number calculation unit 15.
(First and second total word number calculation means), compressed word extraction unit 16 (word number calculation unit, second word extraction unit), document data compression unit 17 (compressed document data creation unit, dictionary creation unit) ) And a control unit 18.

【００２９】なお、上記単語発現数カウンタ１３、総ワ
ード数演算部１５及び圧縮単語抽出部１６は、全文書デ
ータに含まれる単語から単語単位でコード化すべき単語
を抽出する単語抽出手段を構成している。The word expression number counter 13, the total word number calculation unit 15, and the compressed word extraction unit 16 constitute word extraction means for extracting words to be coded in word units from the words included in all document data. ing.

【００３０】データ入力部１１は、上記入力装置２から
入力される文書データを圧縮装置１内に取り込むための
インターフェース部である。入力された文書データは、
表示装置３に出力されて該表示装置３に入力文書が表示
されるとともに、文書メモリ１２に一旦、保存される。The data input unit 11 is an interface unit for taking in the document data input from the input device 2 into the compression device 1. The input document data is
The input document is output to the display device 3, the input document is displayed on the display device 3, and is temporarily stored in the document memory 12.

【００３１】文書メモリ１２は、後述するデータ圧縮処
理のために入力装置２から入力された文書データを保存
するものである。単語発現数カウンタ１３は、入力され
た文書中に含まれる独立した単語（文字列の異なる単
語）を抽出し、各単語の文書全体における発現数Ｎｉを
カウントするものである。The document memory 12 stores the document data input from the input device 2 for the data compression processing described later. The word expression number counter 13 extracts independent words (words having different character strings) contained in the input document, and counts the expression number Ni of each word in the entire document.

【００３２】コード割振部１４は、抽出された単語に、
上記コード表の文字等に割り当てられていない領域のコ
ードを割り振るものである。The code allocator 14 adds to the extracted words
The code of the area that is not assigned to the characters in the above code table is assigned.

【００３３】例えば文字等が表１に示すＪＩＳコード表
により定義されているとすると、当該ＪＩＳコード表の
文字等に割り当てられていない領域のコード、例えば
「０（行）×００（列）」〜「１５（行）〜００
（列）」，「０（行）×０１（列）」〜「１５（行）〜
００（列）」の機能キャラクタに割り当てられたコード
や「０（行）×０８（列）」〜「１５（行）〜０８
（列）」，「０（行）×０９（列）」〜「１５（行）〜
０９（列）」等の未定義のコードが抽出された単語に割
り振られる。For example, if the characters and the like are defined by the JIS code table shown in Table 1, the code of the area that is not assigned to the characters and the like in the JIS code table, for example, "0 (row) x 00 (column)" ~ "15 (line) ~ 00
(Column) "," 0 (row) x 01 (column) "-" 15 (row)-"
The code assigned to the functional character "00 (column)" or "0 (row) x 08 (column)" to "15 (row) to 08
(Column) "," 0 (row) x 09 (column) "~" 15 (row) ~ "
An undefined code such as "09 (column)" is assigned to the extracted word.

【００３４】[0034]

【表１】 [Table 1]

【００３５】総ワード数演算部１５は、抽出した各単語
について、当該単語を構成するワード数（バイト数）Ｎ
ｗに当該単語の発現数Ｎｉを乗じて得られる第１の総ワ
ード数Ｗ１（＝Ｎｉ×Ｎｗ）と、割り振られたコードの
ワード数（バイト数）Ｎｃに当該単語の発現数Ｎｉを乗
じ、この乗算結果（Ｎｃ×Ｎｉ）に当該単語のワード数
Ｎｗを加算して得られる第２の総ワード数Ｗ２（＝Ｎｃ
×Ｎｉ＋Ｎｗ）とを演算するものである。For each word extracted, the total word number calculation unit 15 is the word number (byte number) N that constitutes the word.
The first total word number W1 (= Ni × Nw) obtained by multiplying w by the expression number Ni of the word, and the word number (byte number) Nc of the allocated code are multiplied by the expression number Ni of the word, A second total word number W2 (= Nc) obtained by adding the word number Nw of the word to the multiplication result (Nc × Ni)
XNi + Nw) is calculated.

【００３６】上記第１の総ワード数Ｗ１は、圧縮前の文
書データにおける当該単語に要するデータ量（バイト
数）で、文字単位でコード化した場合に当該単語に要す
る総ワード数である。また、上記第２の総ワード数Ｗ２
は、圧縮後の文書データ（当該単語を割り振られたコー
ドに置換して文書データを圧縮した場合の文書データ）
における当該単語に要するデータ量（バイト数）で、単
語単位でコード化した場合に当該単語に要する総ワード
数である。The first total number of words W1 is the amount of data (the number of bytes) required for the word in the uncompressed document data, and is the total number of words required for the word when coded in character units. Also, the second total number of words W2
Is the compressed document data (the document data when the document data is compressed by replacing the word with the assigned code)
Is the total amount of data (bytes) required for the word, and is the total number of words required for the word when coded in word units.

【００３７】圧縮単語抽出部１６は、抽出された単語の
うち、コード化すべき単語を抽出するものである。すな
わち、本発明に係る文書データの圧縮方法は、文書デー
タに含まれる単語のうち、一部単語を単語単位でコード
に置換することにより全文書データのデータ量を圧縮す
るもので、圧縮単語抽出部１６は、コード化すべき単
語、すなわち、単語単位でコードに置換すべき単語を抽
出するものである。The compressed word extraction unit 16 extracts a word to be coded from the extracted words. That is, the document data compression method according to the present invention compresses the data amount of all document data by replacing a part of words included in the document data with a code on a word-by-word basis. The unit 16 extracts a word to be coded, that is, a word to be replaced with a code on a word-by-word basis.

【００３８】圧縮単語抽出部１６は、第２の総ワード数
Ｗ２が第１の総ワード数Ｗ１より小さい単語を圧縮すべ
き単語として抽出する。これは、第２の総ワード数Ｗ２
が第１の総ワード数Ｗ１より小さい単語は、当該単語を
割り振られたコードに置換することにより文書データに
おける当該単語のデータ量が低減するものだからであ
る。The compressed word extraction unit 16 extracts a word whose second total word number W2 is smaller than the first total word number W1 as a word to be compressed. This is the second total word count W2
This is because the data amount of the word in the document data is reduced by replacing the word with the assigned code when the word is smaller than the first total word number W1.

【００３９】また、圧縮単語抽出部１６は、各文書デー
タ毎に、コード化される単語数を演算する。この単語数
のデータは、各文書データの圧縮処理において、文書デ
ータの先頭に付加される。Further, the compressed word extraction unit 16 calculates the number of encoded words for each document data. The data of the number of words is added to the head of the document data in the compression process of each document data.

【００４０】文書データ圧縮部１７は、文書データの圧
縮を行うとともに、圧縮用に抽出された単語と割り振ら
れたコードとの対応関係を示す辞書を作成するものであ
る。文書データ圧縮部１７は、各文書データ毎に、当該
文書データを構成する単語のうち、圧縮単語抽出部１６
で抽出された単語を当該単語に割り振られたコードに置
換するとともに、先頭に上記単語数のデータを付加して
文書データの圧縮を行う。The document data compression section 17 compresses the document data and creates a dictionary showing the correspondence between the words extracted for compression and the assigned codes. The document data compression unit 17 includes, for each document data, the compressed word extraction unit 16 among the words constituting the document data.
The word extracted in step (3) is replaced with the code assigned to the word, and the data of the number of words is added to the head to compress the document data.

【００４１】制御部１８は、後述する文書データの圧縮
処理を集中制御するものである。制御部１８は、上記デ
ータ入力部１１〜文書データ圧縮部１７の各部の動作を
制御して文書データの圧縮処理を行う。データ書込装置
４は、圧縮装置１で圧縮された文書データ及び辞書をメ
モリ５に書き込むものである。The control unit 18 centrally controls the compression process of document data described later. The control unit 18 controls the operation of each unit of the data input unit 11 to the document data compression unit 17 to perform the compression process of the document data. The data writing device 4 writes the document data and the dictionary compressed by the compression device 1 into the memory 5.

【００４２】次に、文書データ圧縮装置の文書データ圧
縮処理について、図２のフローチャートを用いて説明す
る。Next, the document data compression processing of the document data compression apparatus will be described with reference to the flowchart of FIG.

【００４３】なお、本実施例では、「ＬＩＮＥＴＹＰ
Ｅ」及び「ＳＥＴＬＩＮＥＴＹＰＥ」の２つの文書
データ例により具体的処理について説明する。また、文
書を構成する文字データはＪＩＳコード表により定義さ
れているものとする。In this embodiment, "LINE TYPE
Specific processing will be described with reference to two document data examples of "E" and "SET LINE TYPE". In addition, the character data forming the document is defined by the JIS code table.

【００４４】まず、入力装置２により「ＬＩＮＥＴＹ
ＰＥ」及び「ＳＥＴＬＩＮＥＴＹＰＥ」の文書が入
力されると、この文書が、図４に示す文書データの形式
で圧縮装置１内の文書メモリ１２に一旦、記憶される
（Ｓ１）。First, the input device 2 is used to display "LINE TY
When a document of "PE" and "SET LINE TYPE" is input, this document is temporarily stored in the document memory 12 in the compression device 1 in the document data format shown in FIG. 4 (S1).

【００４５】続いて、文書データの入力が完了すると、
該文書データに含まれる文字列の異なる単語「ＬＩＮ
Ｅ」、「ＳＥＴ」及び「ＴＹＰＥ」が抽出されるととも
に、カウンタ１２により各単語の抽出数Ｋ＝３がカウン
トされる（Ｓ２）。なお、各単語は、ブランクデータ’
□’若しくは終了データ’０’で挟まれているので、こ
れらを識別することにより検出される。また、検出され
た単語と既に抽出された単語とを比較して文字列の異な
る単語のみの抽出が行われる。Then, when the input of the document data is completed,
The word "LIN" having different character strings included in the document data
“E”, “SET”, and “TYPE” are extracted, and the number of extractions K = 3 of each word is counted by the counter 12 (S2). Note that each word is blank data '
Since it is sandwiched by □ 'or end data' 0 ', it is detected by distinguishing them. In addition, only the words having different character strings are extracted by comparing the detected words with the already extracted words.

【００４６】続いて、抽出された各単語に、当該単語を
構成する文字データに割り当てられたコードと異なるコ
ードが割り振られる（Ｓ３）。すなわち、ＪＩＳコード
表の文字データのコードとして利用されない領域のコー
ドが各単語に割り振られる。例えば「ＳＥＴ」、「ＬＩ
ＮＥ」、「ＴＹＰＥ」の各単語に、ＪＩＳコード表の機
能キャラクタに割り当てられた「０×００」、「０×０
１」、「０×０２」のコードがそれぞれ割り振られる。Then, a code different from the code assigned to the character data forming the word is assigned to each extracted word (S3). That is, the code of the area that is not used as the code of the character data of the JIS code table is assigned to each word. For example, "SET", "LI
"0x00" and "0x0" assigned to the function characters of the JIS code table are assigned to the words "NE" and "TYPE", respectively.
Codes of "1" and "0x02" are assigned respectively.

【００４７】続いて、抽出された各単語について、単語
発現数カウンタ１３により文書データ中における発現数
Ｎｉ（回）がカウントされる（Ｓ４）。上記例では、単
語「ＬＩＮＥ」、「ＳＥＴ」及び「ＴＹＰＥ」の発現数
Ｎｉは、それぞれ「２」、「１」、「２」である。Next, the number of occurrences Ni (times) in the document data is counted by the word occurrence number counter 13 for each extracted word (S4). In the above example, the expression numbers Ni of the words “LINE”, “SET” and “TYPE” are “2”, “1” and “2”, respectively.

【００４８】続いて、抽出された各単語について、第１
の総ワード数Ｗ１（＝Ｎｉ×Ｎｗ）（バイト）が演算さ
れる（Ｓ５）。上記例では、単語「ＬＩＮＥ」、「ＳＥ
Ｔ」及び「ＴＹＰＥ」の各ワード数Ｎｗ（バイト）はそ
れぞれ「４」、「３」、「４」であるから、第１の総ワ
ード数Ｗ１は、それぞれ「８（＝２×４）」、「３（＝
１×３）」、「８（＝２×４）」となる。Then, for each extracted word, the first
The total number of words W1 (= Ni × Nw) (bytes) is calculated (S5). In the above example, the words "LINE", "SE
Since the word numbers Nw (bytes) of “T” and “TYPE” are “4”, “3”, and “4”, respectively, the first total word number W1 is “8 (= 2 × 4)”. , "3 (=
1 × 3) ”and“ 8 (= 2 × 4) ”.

【００４９】続いて、抽出された各単語について、第２
総ワード数Ｗ２（＝Ｎｉ×Ｎｃ＋Ｎｗ）（バイト）が演
算される（Ｓ６）。上記例では、割り当てられたコード
「０×００」、「０×０１」及び「０×０２」の各ワー
ド数Ｎｃ（バイト）は、「１」であるから、単語「ＬＩ
ＮＥ」、「ＳＥＴ」及び「ＴＹＰＥ」の第２総ワード数
Ｗ２は、それぞれ「６（＝１×２＋４）」、「４（＝１
×１＋３）」、「６（＝１×２＋４」となる。Then, for each extracted word, the second
The total number of words W2 (= Ni × Nc + Nw) (bytes) is calculated (S6). In the above example, the number of words Nc (bytes) of the assigned codes “0x00”, “0x01”, and “0x02” is “1”, so the word “LI
The second total word numbers W2 of “NE”, “SET”, and “TYPE” are “6 (= 1 × 2 + 4)” and “4 (= 1
“× 1 + 3)” and “6 (= 1 × 2 + 4)”.

【００５０】続いて、第２総ワード数Ｗ２が第１の総ワ
ード数Ｗ１より小さい単語がコード化すべき単語として
抽出される（Ｓ７）。上記例では、第２総ワード数Ｗ２
が第１の総ワード数Ｗ１より小さい単語は「ＬＩＮ
Ｅ」、「ＴＹＰＥ」であるから、これらの単語が抽出さ
れる。Then, a word whose second total word number W2 is smaller than the first total word number W1 is extracted as a word to be coded (S7). In the above example, the second total word number W2
Is less than the first total word count W1
Since these are "E" and "TYPE", these words are extracted.

【００５１】また、各文書データ毎に、コード化すべき
単語数が算出される（Ｓ８）。上記例では、「ＬＩＮＥ
ＴＹＰＥ」及び「ＳＥＴＬＩＮＥＴＹＰＥ」の各
文書データに対して単語数「２」が算出される。The number of words to be coded is calculated for each document data (S8). In the above example, "LINE
The number of words “2” is calculated for each document data of “TYPE” and “SET LINE TYPE”.

【００５２】続いて、表２に示すように、コード化すべ
き単語とこの単語に割り振られたコードとの対応関係を
示す辞書が作成され（Ｓ９）、更に文書データの圧縮が
行われる（Ｓ１０）。Then, as shown in Table 2, a dictionary showing the correspondence between the word to be coded and the code assigned to this word is created (S9), and the document data is further compressed (S10). .

【００５３】[0053]

【表２】 [Table 2]

【００５４】図３は、圧縮された文書データの一例を示
す図である。同図において、各文書データの先頭のデー
タ「０×ＦＢ」は、当該文書データに対する単語数のデ
ータである。「０×ＦＢ」は、単語数「２」を示し、
「ＬＩＮＥＴＹＰＥ」及び「ＳＥＴＬＩＮＥＴＹ
ＰＥ」の各文書データを構成するする単語のうち、２つ
の単語（ＬＩＮＥ，ＴＹＰＥ）がコード（０×０１，０
×０２）に置換されていることを示している。FIG. 3 is a diagram showing an example of compressed document data. In the figure, the head data "0xFB" of each document data is the data of the number of words for the document data. "0xFB" indicates the number of words "2",
"LINE TYPE" and "SET LINE TY"
Of the words that compose each document data of "PE", two words (LINE, TYPE) are code (0x01, 0
X02).

【００５５】上記のように、各文書の先頭にコードに置
換されている単語数のデータを付加しているのは、文書
データを表示する際、コード化された全単語のコードデ
ータへの変換の完了確認を容易にし、各文書データのコ
ードデータへの変換処理の簡素化を図るためである。す
なわち、予めコードに置換されている単語の個数が分か
っていれば、各文書のコードデータへの変換処理におい
て、辞書により単語単位でコードデータに変換された単
語数が当該個数に達すれば、それ以後はＪＩＳコード表
により文字単位でコードデータに変換すればよく、辞書
とＪＩＳコード表とを使い分けてコードデータへの変換
をする必要がなくなる分、変換処理が簡単になる。そし
て、これにより表示レスポンスも向上させることができ
る。As described above, the data of the number of words replaced by the code is added to the beginning of each document, when the document data is displayed, all the coded words are converted into code data. For facilitating confirmation of completion and simplifying conversion processing of each document data into code data. In other words, if the number of words that have been replaced with codes is known in advance, if the number of words converted into code data by the dictionary by the dictionary reaches the number in the conversion process of each document into code data, After that, it is only necessary to convert the code data into character data using the JIS code table, and it is not necessary to use the dictionary and the JIS code table separately to convert the code data, which simplifies the conversion process. Then, this can also improve the display response.

【００５６】なお、単語「ＳＥＴ」は、第２の総ワード
数Ｗ２が第１の総ワード数Ｗ１より大きいので、コード
には置換されず、文字データで表されている。また、単
語をコードに置換した場合は、コードが１つの単語を示
しているから、コードの前後に単語を識別するためのブ
ランクデータ’□’は設けられていない。Since the second total word number W2 is larger than the first total word number W1, the word "SET" is represented by character data without being replaced by a code. When a word is replaced with a code, the code indicates one word, and therefore blank data “□” for identifying the word is not provided before and after the code.

【００５７】そして、文書データの圧縮処理が終了する
と、当該文書データと作成された辞書とがメモリ５に書
き込まれて（Ｓ１１）、文書データの圧縮処理は終了す
る。When the compression process of the document data is completed, the document data and the created dictionary are written in the memory 5 (S11), and the compression process of the document data is completed.

【００５８】なお、上記実施例では、コード化すべき単
語を抽出する前に全単語にコードを割り振るようにして
いたが、割り振られるコードのワード数が予め決まって
いる場合は、コード化すべき単語を抽出した後、当該単
語にコードを割り振るようにしてもよい。この場合は、
図２において、Ｓ３の処理をＳ７とＳ８間若しくはＳ８
とＳ９間に変更すればよい。In the above embodiment, the codes are assigned to all the words before the words to be coded are extracted. However, when the number of assigned words is predetermined, the words to be coded are selected. After extraction, a code may be assigned to the word. in this case,
In FIG. 2, the process of S3 is performed between S7 and S8 or S8.
Between S9 and S9.

【００５９】なお、ファクシミリ、複写機等の機器にお
いては、上記圧縮された文書データは、以下の手順でＬ
ＣＤ等の表示装置に表示される。In a device such as a facsimile or a copying machine, the compressed document data is L
It is displayed on a display device such as a CD.

【００６０】すなわち、例えば「ＳＥＴＬＩＮＥＴ
ＹＰＥ」の文書をＬＣＤに表示する場合、メモリ５から
「０×ＦＢ，’Ｓ’，’Ｅ’，’Ｔ’，０×０１，０×
０２，’０’」の文書データが順次、読み出され、
「Ｓ」、「Ｅ」及び「Ｔ」の各文字データは、ＪＩＳコ
ード表によりそれぞれコード「３×０５」、「５×０
４」、「４×０５」のコードデータに変換されてＬＣＤ
に出力される。また、コード「０×０１」及び「０×０
２」は、メモリ５内の辞書によりそれぞれ単語「ＬＩＮ
Ｅ」と「ＴＹＰＥ」を構成する文字データのコードデー
タ列（’Ｌ’，’Ｉ’，’Ｎ’，’Ｅ’、’Ｔ’，’
Ｙ’，’Ｐ’，’Ｅ’）に一括変換されてＬＣＤに出力
される。そして、上記コードデータに基づいてＬＣＤの
駆動を制御することにより上記「ＳＥＴＬＩＮＥＴ
ＹＰＥ」の文書が表示される。That is, for example, "SET LINE T
When displaying a document of “YPE” on the LCD, “0 × FB, 'S', 'E', 'T', 0 × 01, 0 × is read from the memory 5.
The document data of "02, '0'" are sequentially read,
Character data of "S", "E" and "T" are coded as "3x05" and "5x0" according to the JIS code table, respectively.
LCD converted to code data of 4 "and 4x05"
Is output to. Also, the codes "0x01" and "0x0"
2 ”is the word“ LIN ”according to the dictionary in the memory 5.
Code data strings ('L', 'I', 'N', 'E', 'T', 'of the character data forming "E" and "TYPE"
Y ',' P ',' E ') are collectively converted and output to the LCD. Then, by controlling the driving of the LCD based on the code data, the "SET LINE T
The document "YPE" is displayed.

【００６１】上記のように、文書データは、当該文書デ
ータに含まれる単語のうち、第２の総ワード数Ｗ２が第
１の総ワード数Ｗ１より小さくなる単語、すなわち、文
書データ全体において当該単語を文字データのみで扱っ
たときよりワード数が小さくなる単語のみを当該単語に
割り振られたコードに置換して圧縮しているので、文字
データのみで構成される従来の文書データに比してデー
タ量が低減される。これによりメモリ５における文書デ
ータの容量比率が低下し、メモリ５の有効利用が可能に
なる。As described above, the document data is a word in which the second total word number W2 is smaller than the first total word number W1 among the words included in the document data, that is, the word in the entire document data. Since only words that have a smaller number of words than when treated with only character data are replaced with the codes assigned to the words and compressed, the data is compared to conventional document data that consists of only character data. The quantity is reduced. As a result, the capacity ratio of the document data in the memory 5 decreases, and the memory 5 can be effectively used.

【００６２】上記例について具体的に説明すると、「Ｌ
ＩＮＥＴＹＰＥ」の文書を文字データのみで構成する
と、図４に示すように、ワード数が「１０」のデータと
なり、データ量は１０バイトとなるが、文字データとコ
ードとで構成すると、図３に示すように、ワード数が
「４」のデータとなり、データ量は４バイトに低減され
る。The above example will be described in detail.
If a document of "INE TYPE" is composed only of character data, as shown in FIG. 4, the number of words is "10", and the amount of data is 10 bytes. As shown in, the number of words becomes data of "4", and the data amount is reduced to 4 bytes.

【００６３】同様に、「ＳＥＴＬＩＮＥＴＹＰＥ」
の文書の場合は、文字データのみで構成すると、データ
量は１４バイトとなるが、文字データとコードとで構成
すると、データ量は７バイトに低減される。従って、文
書全体では、文書データを文字データとコードとで構成
すると、文字データのみで構成した場合に比してデータ
量が１３バイト低減できる。Similarly, "SET LINE TYPE"
In the case of the above document, the data amount is 14 bytes if it is composed of only the character data, but the data amount is reduced to 7 bytes if it is composed of the character data and the code. Therefore, in the entire document, when the document data is composed of the character data and the code, the data amount can be reduced by 13 bytes as compared with the case where it is composed of only the character data.

【００６４】なお、文書データを単語数データ、文字デ
ータ及びコードで構成した場合は、ＪＩＳコード表とは
別に単語とコードとの対応関係を示す辞書が必要にな
り、この分メモリ５の容量を消費することになるが、上
述した文書データの圧縮効果により辞書のデータ増加分
は吸収可能なので、メモリ５における文書データ及び辞
書の容量比率は、従来の文書データの容量比率より低減
させることができる。When the document data is composed of word number data, character data and codes, a dictionary showing the correspondence between words and codes is required in addition to the JIS code table, and the capacity of the memory 5 is increased accordingly. Although it will be consumed, since the dictionary data increase amount can be absorbed by the above-described document data compression effect, the capacity ratio of the document data and the dictionary in the memory 5 can be made smaller than the capacity ratio of the conventional document data. .

【００６５】また、文書データの表示処理においてもコ
ードに置換された単語は単語単位でコードデータに一括
変換されるので、文書データを文字単位でコードデータ
に変換する従来例に比して迅速処理が可能で、表示のレ
スポンス性能が向上する。Also, in the document data display process, the words replaced with the code are collectively converted into code data in word units, so that the document data is converted into code data in character units in a quicker process than in the conventional example. It is possible to improve the response performance of the display.

【００６６】なお、文書データ圧縮装置をファクシミ
リ、コピー機等の事務機器に内蔵し、事務機器本体に文
書データの圧縮機能を持たせるようにしてもよい。この
場合は、事務機器本体に設けられた操作パネルと表示装
置とがそれぞれ上記入力装置２と上記表示装置３とにな
る。また、事務機器本体は文書データ圧縮モードが設定
可能になされ、当該文書データ圧縮モードが設定される
と、事務機本体の制御部が図２に示すフローチャートを
実行して入力装置２から入力された文書データの圧縮処
理を行う。The document data compression device may be incorporated in an office machine such as a facsimile or a copy machine so that the office machine body has a document data compression function. In this case, the operation panel and the display device provided in the office equipment body serve as the input device 2 and the display device 3, respectively. Further, the office equipment main body is made to be able to set the document data compression mode, and when the document data compression mode is set, the control unit of the office equipment main body executes the flowchart shown in FIG. Performs document data compression processing.

【００６７】このように事務機器本体に文書データの圧
縮機能を持たせると、事務機器の製造時だけでなく製品
出荷後においてもユーザーの希望に応じて文書データの
変更、追加及び削除等を行うことができ、事務機器の利
便性が向上する。When the office device body is provided with the document data compression function in this way, the document data can be changed, added, and deleted according to the user's request not only at the time of manufacturing the office device but also after the product is shipped. Therefore, the convenience of office equipment is improved.

【００６８】[0068]

【発明の効果】以上説明したように、本発明によれば、
所定のコード表で定義された文書データを用いて予め作
成された複数の文書データを、全文書データに含まれる
一部の単語を上記コード表のコードであって上記文字デ
ータに割り当てられたコードと異なるコードで置換して
コード混じりの文書データに圧縮するにしたので、文書
データのデータ量を低減することができる。これにより
記憶手段における文書データの容量比率を低減させるこ
とができ、該記憶手段の利用効率が向上する。As described above, according to the present invention,
A plurality of document data created in advance using document data defined by a predetermined code table, some words included in all document data are codes in the code table, and codes assigned to the character data. Since it is replaced with a code different from the above and compressed into the document data with the mixed code, the data amount of the document data can be reduced. Thereby, the capacity ratio of the document data in the storage means can be reduced, and the utilization efficiency of the storage means is improved.

【００６９】また、各文書データを表示装置等の表示さ
せる場合、コード化された単語は辞書に基づき単語単位
でコードデータに一括変換されるので、文書データのコ
ードデータへの変換処理が迅速に行われ、文書データの
表示レスポンスが向上する。Further, when each document data is displayed on the display device or the like, the coded words are collectively converted into code data on a word-by-word basis based on the dictionary, so that the conversion process of document data to code data can be performed quickly. The display response of the document data is improved.

【００７０】また、圧縮された各文書データの先頭に、
当該データに含まれるコード化された単語数のデータを
付加したので、コード化された全単語のコードデータへ
の変換の完了が容易に確認でき、各文書データをコード
データに変換する際の変換処理が簡単になる。At the beginning of each compressed document data,
Since the data of the coded word number included in the data is added, it is possible to easily confirm the completion of the conversion of all the coded words to the code data, and the conversion when converting each document data to the code data. Processing is easy.

【００７１】また、文書データの含まれる単語のうち、
単語単位でコード化した場合に当該単語に要する第２の
総ワード数が文字単位でコード化した場合に当該単語に
要する第１の総ワード数より小さい単語を抽出し、当該
単語のみを割り振られたコードに置換して文書データを
圧縮するようにしたので、圧縮効率の高い文書データが
得られる。Of the words included in the document data,
If the second total number of words required for the word when encoded in word units is smaller than the first total number of words required for the word when encoded in character unit, only the relevant word is allocated. Since the document data is compressed by substituting the code, the document data with high compression efficiency can be obtained.

【００７２】また、抽出された各単語について、当該単
語のワード数に発現数を乗じて第１の総ワード数を演算
し、当該単語の発現数と割り振られるコードのワード数
とを乗じ、この乗算結果に当該単語のワード数を加算し
て第２の総ワード数を演算するようにしたので、第１の
ワード数と第２のワード数とを簡単に算出することがで
きる。For each extracted word, the first word count is calculated by multiplying the word count of the word by the expression count, multiplying the first word count by the expression count of the word, and Since the word number of the word is added to the multiplication result to calculate the second total word number, it is possible to easily calculate the first word number and the second word number.

[Brief description of drawings]

【図１】本発明に係る文書データ圧縮装置のブロック図
である。FIG. 1 is a block diagram of a document data compression apparatus according to the present invention.

【図２】本発明に係る文書データ圧縮装置の文書データ
圧縮処理を示すフローチャートである。FIG. 2 is a flowchart showing a document data compression process of the document data compression apparatus according to the present invention.

【図３】本発明に係る文書データ作成装置により圧縮さ
れた文書データの一例を示す図である。FIG. 3 is a diagram showing an example of document data compressed by a document data creation device according to the present invention.

【図４】従来の文書データの一例を示す図である。FIG. 4 is a diagram showing an example of conventional document data.

[Explanation of symbols]

１圧縮装置１１データ入力部１２文書メモリ１３単語発現数カウンタ１４コード割振部１５総ワード数演算部１６圧縮単語抽出部１７文書データ圧縮部１８制御部２入力装置３表示装置４データ書込装置５メモリ 1 Compressor 11 Data Input Unit 12 Document Memory 13 Word Expression Number Counter 14 Code Allocation Unit 15 Total Word Number Calculation Unit 16 Compressed Word Extraction Unit 17 Document Data Compression Unit 18 Control Unit 2 Input Device 3 Display Device 4 Data Writing Device 5 memory

Claims

[Claims]

1. A document data for compressing a plurality of document data created in advance by character data defined by a predetermined code table by encoding a part of words included in the document data in word units. A compression method, a word extraction step of extracting words to be coded in word units from words included in all document data, and each extracted word is assigned to the character data which is the code in the above code table. A code allocation step of allocating a code different from the code, and a dictionary creation step of creating a dictionary showing a correspondence relationship between the coded word and the code allocated to the word,
Of the words constituting the document data, the word extracted in the word extracting step is replaced with the code assigned to the word, and the document data compressing step of compressing each document data into the document data containing the code. Characteristic document data compression method.

2. The method of compressing document data according to claim 1, further comprising a word number calculating step of calculating the number of words coded for each document data, instead of the document data compressing step.
For each document data, among the words that compose the document data, replace the word extracted in the word extraction step with the code assigned to the word, and set the number of words coded at the beginning of each document data. A method for compressing document data, comprising: a document data creating step of creating compressed document data by adding data.

3. The method of compressing document data according to claim 1, wherein the word extracting step includes a first word extracting step of extracting words having different character strings included in all document data, and each extracted word. For a word, the word expression number counting step of counting the number of occurrences in all document data, and for each extracted word, the first total number of words required for the word when encoded in character units is the expression number of the word. And a second total word number calculation step for calculating each word extracted for each word, based on the number of occurrences of the word. The second total word number calculation step is compared with the first total word number and the second total word number, and a word whose second total word number is smaller than the first total word number is determined as a word to be coded. First to extract 2. A method for compressing document data, which comprises two word extraction steps.

4. The method for compressing document data according to claim 3, wherein in the first total word number calculating step, for each extracted word, the word number of the word is multiplied by the expression number to obtain a first total word number. The number of words is calculated, and in the second total word number calculation step, for each extracted word, the number of occurrences of the word is multiplied by the number of words of the allocated code, and the multiplication result of the word is calculated. Second by adding the number of words
A method for compressing document data, characterized in that the total number of words in is calculated.

5. Document data composed of a plurality of documents created in advance by character data defined by a predetermined code table is compressed by encoding some words included in the document data in word units. A device for compressing document data to be stored in a storage unit, a word extracting unit for extracting a word to be coded on a word-by-word basis from words included in all document data, and a code in the above code table for each extracted word. There is a code allocating means for allocating a code different from the code assigned to the character data, a dictionary creating means for creating a dictionary showing the correspondence between the coded word and the code assigned to the word, and the document data. Of the words that make up the document, replace the words extracted in the word extraction step with the codes assigned to the words, and replace each document data with the document data containing the codes. An apparatus for compressing document data, comprising: a document data compressing unit for compressing the compressed document data and a data writing unit for writing the compressed document data and the created dictionary in a storage unit.

6. The document data compression apparatus according to claim 5, wherein instead of the document data compression means, a word number calculation means for calculating the number of words coded for each document data, and for each document data, Of the words forming the document data, the words extracted by the word extracting means are replaced with the codes assigned to the words, and the data of the coded word number is added to the head of each document data and compressed. And a document data creating means for creating the document data.

7. The apparatus for compressing document data according to claim 5, wherein the word extracting means includes first word extracting means for extracting words having different character strings contained in the document data, and each extracted word. With respect to the word expression number counting means for counting the expression number in the document data, and for each extracted word, the first total number of words required for the word when encoded in character units is based on the expression number of the word. And a second total word number calculating means for calculating each word extracted, and a second total word number required for the extracted word when the word is encoded on a word-by-word basis based on the number of occurrences of the word. Of the total word number and the first total word number and the second total word number are compared, and a word having a second total word number smaller than the first total word number is extracted as a word to be coded. Second A document data compression apparatus comprising: a word extraction unit.

8. The document data compression apparatus according to claim 7, wherein the first total word number calculation means multiplies the word number of each extracted word by the expression number to obtain a first total word number. The second total word number calculating means calculates the number of words, and for each extracted word, the number of occurrences of the word is multiplied by the number of words of the assigned code, and the multiplication result of the word is multiplied. Second by adding the number of words
An apparatus for compressing document data, characterized in that it calculates the total number of words in.