JPH1152973A

JPH1152973A - Document reading method

Info

Publication number: JPH1152973A
Application number: JP9213566A
Authority: JP
Inventors: Tetsuya Sakayori; 哲也酒寄
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 1997-08-07
Filing date: 1997-08-07
Publication date: 1999-02-26

Abstract

(57)【要約】【課題】電子化文書を読み上げる際の文書の聴解性を
向上させるとともに、文書読み上げ位置へのランダムア
クセスを可能にする。【解決手段】電子化文書の表題，段落などの論理的特
徴及び／又は文字の大きなどの視覚的特徴が分かるよう
に、前記文書の特定の情報及び／又は論理的構造によっ
て前記文書の内容を階層化して内部データに変換し、そ
の内部データに基づき読み上げ処理する。また、前記文
書の内容を階層化した内部データを利用してユーザの指
示に応じて文書の読み上げブロックに自由にアクセスで
きる文書読み上げシステムを提供する。 (57) [Summary] [PROBLEMS] To improve the intelligibility of a document when reading an electronic document, and to enable random access to a document reading position. SOLUTION: The contents of the electronic document are hierarchized by specific information and / or logical structure so that logical characteristics such as titles and paragraphs of the electronic document and / or large visual characteristics of characters can be recognized. The data is converted into internal data, and a reading process is performed based on the internal data. Further, the present invention provides a document reading system that can freely access a reading block of a document in accordance with a user's instruction by using internal data obtained by layering the contents of the document.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、ＳＧＭＬ，ＨＴＭ
Ｌ，などの電子構造化文書を、音声合成技術等により音
声化するシステムに関し、例えば、電話でデータべース
にアクセスしたり、視覚障害者がＷＷＷ（World wide W
eb)にアクセスする際などに利用される。TECHNICAL FIELD The present invention relates to an SGML, HTM
For example, a system for converting an electronic structured document such as L. to a speech by a speech synthesis technique or the like, such as accessing a database by telephone, or having a visually impaired person
Used to access eb).

【０００２】[0002]

【従来の技術】従来、視覚表示を前提としたテキスト
（文書)の読み取り方式として次のようなものがある。（１）テキストを音声化する際に文字装飾を音で表現し
たり、文書中の読み上げ位置を音像定位で表現するもの
（従来技術１という，例えば、特開平８−２６３２６０
号公報「テキスト読み上げ方法」）。（２）読み上げ位置を戻す際に文節単位毎に逆方向に、
かつ、文節単位内では順方向に高速に読み上げ、利用者
はこれを聞いていて聞き返したい位置を探すもの（従来
技術２という，例えば、特開平６−３０８９９８号公報
「音声読み上げ装置」）。（３）階層構造に分類された文書情報を読み上げる際、
スイッチ押下時間や回数によって読み上げ位置を指定す
るもの（従来技術３という，例えば、特開平５−２８１
９８７，２８１９８８，２８１９９２号公報「可搬性文
書読み上げ装置」）。（４）ＨＴＭＬ文書を合成音声で読み上げるもの（従来
技術４という，ソフトウェア製品化されている。例え
ば、（株）リコー製「おしゃべりさーふあー」）。2. Description of the Related Art Conventionally, there are the following methods for reading a text (document) on the premise of visual display. (1) When characterizing text, a character decoration is expressed by a sound, and a reading position in a document is expressed by a sound image localization (referred to as prior art 1, for example, Japanese Patent Laid-Open No. 8-263260)
Publication No. “Text-to-speech method”). (2) When returning the reading position, in the opposite direction for each phrase unit,
In addition, in a phrase unit, a user reads aloud in the forward direction at a high speed, and a user listens to the phrase and searches for a position to be heard back (referred to as a prior art 2; for example, Japanese Patent Laid-Open Publication No. 6-308998, "Speech-to-speech device"). (3) When reading out the document information classified into the hierarchical structure,
A device in which a reading position is designated by a switch pressing time or the number of times (referred to as prior art 3, for example, Japanese Patent Laid-Open No. 5-281)
987, 281988, 281992, “Portable document reading device”). (4) A device which reads out an HTML document with synthesized speech (referred to as a prior art 4, which is commercialized as a software product, for example, "Talking Speech" manufactured by Ricoh Co., Ltd.).

【０００３】[0003]

【発明が解決しようとする課題】しかしなから、これら
従来の文書読み上げ技術は以下で述べるようにそれぞれ
聴解性及び読み上げ文書へのランダムアクセスの点で未
だ十分とはいえない。この点を図面を参考にして説明す
ると、例えば、図１に示すＨＴＭＬ文書は、通常ブラウ
ザと呼ばれるソフトによって図２に示すように視覚的に
表示される。図２から明らかなように、タグ情報は視覚
的書式情報に変換されて文書における階層構造を明らか
にしているからその視認性は向上している。ところがこ
れを前記従来技術４によって音声化すると、前記従来技
術４では図３に示すようにタグを無視して日本語部分を
読み上げてしまうので、ユーザにとってはだらだらと分
かり難いものになってしまう。これに対し、前記従来技
術１では音声化する際に文字修飾，文書中の読み上げ位
置を表現するものであるため、ある程度視覚的特徴を音
で表現できるが、文書の論理構造まで把握することは難
しく、全体像の把握も音像定位という曖昧な形にとどま
っている。However, as described below, these conventional document reading techniques are still insufficient in terms of intelligibility and random access to the read document. This point will be described with reference to the drawings. For example, the HTML document shown in FIG. 1 is visually displayed as shown in FIG. 2 by software generally called a browser. As is clear from FIG. 2, the visibility is improved because the tag information is converted into visual format information to clarify the hierarchical structure in the document. However, if this is converted to speech by the conventional technology 4, the Japanese language is read aloud ignoring the tag in the conventional technology 4 as shown in FIG. 3, so that it becomes difficult for the user to understand. On the other hand, in the above-mentioned prior art 1, since the character modification and the reading position in the document are expressed when converting to speech, visual characteristics can be expressed to some extent by sound, but it is not possible to grasp the logical structure of the document. It is difficult, and grasping the whole picture is in the vague form of sound image localization.

【０００４】また、文書読み上げ方式において、音声メ
ディアは一覧性に欠けるためランダムアクセスが難しい
という問題があり、従来技術３では階層的ファイル構造
を採用することでボタン操作によるランダムアクセスを
実現している。しかしながら、従来技術３では専用の構
造でデータを記述するため既存のテキストを対象とする
ことはできない。他方、従来技術２ではテキスト中を高
速に移動することによりアクセス速度を得ているが、文
書を聞き返す場合に前記文書の各文節の並びを逆の順番
で聞き取りながら、文書の読み上げ位置を判断すること
はユーザにとって負荷が軽いとは言い難い。[0004] In the document reading system, there is a problem that random access is difficult due to lack of listability of audio media. In the prior art 3, random access by button operation is realized by adopting a hierarchical file structure. . However, in the prior art 3, existing data cannot be targeted because data is described in a dedicated structure. On the other hand, in the prior art 2, the access speed is obtained by moving through the text at high speed. However, when the document is to be heard back, the reading position of the document is determined while listening to the arrangement of each clause of the document in reverse order. It is hard to say that the load is light for the user.

【０００５】したがって、請求項１の発明の課題は、主
に視覚用に作成された文書の階層構造をそのまま保存し
て音声化することにより、聴解性を向上させることであ
る。請求項２の発明の課題は、請求項１の発明の課題に
加え、既存の構造化された電子テキストにもそのまま適
用できるようにすることである。請求項３の発明の課題
は、請求項１の発明の課題に加え、構造化タグのない電
子テキストにも適用できることである。請求項４の発明
の課題は、請求項１の発明の聴解性を更に向上させるこ
とである。請求項５の発明の課題は、請求項１の発明の
課題に加え、聴いている箇所の全体の中での位置付けを
把握し易くすることである。It is therefore an object of the present invention to improve the intelligibility by preserving the hierarchical structure of a document created mainly for visual use and converting it into speech. A second object of the present invention is to make it applicable to existing structured electronic text as it is in addition to the first object of the present invention. A third object of the present invention is to be applicable to an electronic text without a structured tag in addition to the first object of the present invention. An object of the invention of claim 4 is to further improve the intelligibility of the invention of claim 1. An object of the invention of claim 5 is to make it easy to grasp the position of the listening part in the whole, in addition to the object of the invention of claim 1.

【０００６】請求項６の発明の課題は、請求項１の発明
の課題に加え、読み取り文書へのランダムアクセスを迅
速に行うようにすることである。A sixth object of the present invention, in addition to the first object of the present invention, is to quickly perform random access to a read document.

【０００７】[0007]

【課題を解決するための手段】本発明は、主に視覚提示
用に書かれた既存の構造化された電子テキスト（文書）
を、作者の意図した文章の構造情報を含めて音声化する
ことで、視覚表示に近い分かり易さ及びアクセスビリテ
ィの向上を目指すものである。SUMMARY OF THE INVENTION The present invention relates to an existing structured electronic text (document) written primarily for visual presentation.
Is converted into a speech including the structure information of the sentence intended by the author, thereby improving the intelligibility and accessibility close to a visual display.

【０００８】請求項１の発明は、電子化文書の文字情報
を音声合成技術によって音声化して出力する文書読み上
げ方式において、前記文書の書式情報及び／又は論理的
構造から当該文書の内容を階層的に捉え、これに従って
出力順序及び／又は音声属性などを変更して音声化する
文書読み上げ方式である。According to a first aspect of the present invention, there is provided a text-to-speech system in which character information of an electronic document is converted into speech by a speech synthesis technique and output, and the contents of the document are hierarchically determined from format information and / or a logical structure of the document. This is a text-to-speech system in which the output order and / or the audio attributes are changed according to the above and converted to speech.

【０００９】請求項２の発明は、請求項１の発明におい
て、前記書式情報をその種類毎に異なる強さを持つ文書
中の区切りとして扱い、これによって文書を階層的にブ
ロック化する文書読み上げ方式である。According to a second aspect of the present invention, in the first aspect of the present invention, the format information is treated as a delimiter in a document having a different strength for each type, and thereby the document is hierarchically blocked. It is.

【００１０】請求項３の発明は、請求項１の発明におい
て、文書中の第１段落及び段落中の第１文のような、前
記ブロックの最初の文章単位をそれ以降の文章単位の上
位の階層に位置付けることによって、前記文書の内容を
階層的にブロック化する文書読み上げ方式である。According to a third aspect of the present invention, in the first aspect of the present invention, the first sentence unit of the block, such as the first paragraph in the document and the first sentence in the paragraph, is placed higher than the subsequent sentence units. This is a document reading system in which the contents of the document are hierarchically blocked by positioning them in a hierarchy.

【００１１】請求項４の発明は、請求項１の発明におい
て、前記文書の内容を上位の階層から下位の階層へ順に
読み上げる文書読み上げ方式である。A fourth aspect of the present invention is the document reading system according to the first aspect of the present invention, in which the contents of the document are read in order from a higher hierarchy to a lower hierarchy.

【００１２】請求項５の発明は、請求項１の発明におい
て、タイトルと内容及び階層間の識別を音声属性，付加
音などを用いて行う文書読み上げ方式である。According to a fifth aspect of the present invention, in the first aspect of the present invention, there is provided a text-to-speech system in which a title, a content, and a hierarchy are identified by using audio attributes, additional sounds, and the like.

【００１３】請求項６の発明は、請求項１の発明におい
て、読みだし手段がユーザの要求に応じて前記文書の上
下階層間及び同一階層間で移動自在である文書読み上げ
方式である。According to a sixth aspect of the present invention, there is provided a document reading system according to the first aspect, wherein the reading means is movable between upper and lower hierarchies of the document and between the same hierarchies according to a user's request.

【００１４】[0014]

【発明の実施の形態】本発明の実施態様を、図１に示す
ＨＴＭＬ文書を例に取って説明する。図４は、このＨＴ
ＭＬ文書から文書の構造情報を抽出して、それを階層的
内部データに変換した場合の該内部データを示したもの
である。図５は、その変換の際の処理フローを示したも
のである。以下では、まず、ソーステキストから内部デ
ータへの変換処理について述べ、続いて内部データの読
み上げ処理について述べることとする。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS An embodiment of the present invention will be described using an HTML document shown in FIG. 1 as an example. FIG. 4 shows this HT
It shows the internal data when the document structure information is extracted from the ML document and converted into hierarchical internal data. FIG. 5 shows a processing flow at the time of the conversion. In the following, first, the conversion process from the source text to the internal data will be described, and then the reading process of the internal data will be described.

【００１５】ソーステキストの階層化は主にタグに基づ
いて行われる。このためにタグは例えば予め図６のよう
に、最上層の仮想的最上層タグから最下層の強制改行の
〈ＢＲ〉まで階層が順位付けられている。この順位はタ
グが表わす階層の高さであり、これは論理的意味と視覚
的特徴から文書の区切りとしての強さを判断して決めら
れる。論理的意味とは表題，段落など、主に論理タグが
表わす文章の意味的な構造である。視覚的特徴とは文字
の大きさや罫線など視覚的に感じる構造情報である。以
下のブロック化では、これらの特徴をタグの持つ区切り
の強さと考えて文書を階層的に分割する。なお、この実
施態様及び以下の説明では主に開始タグをブロック区切
りとして用い、終了タグはブロック区切りとしては使用
していないが、本発明はこれに限定されるものではな
く、当然開始タグと終了タグで囲まれる範囲をブロック
として扱うことも考えられる。The source text is hierarchized mainly based on tags. For this purpose, for example, as shown in FIG. 6, the layers of the tags are ranked in advance from the virtual uppermost layer tag of the uppermost layer to <BR> of the forced line feed at the lowermost layer. This rank is the height of the hierarchy represented by the tag, and is determined by judging the strength as a document break from logical meaning and visual characteristics. The logical meaning is a semantic structure of a sentence mainly represented by a logical tag, such as a title or a paragraph. Visual features are structural information that is visually perceived, such as the size of characters and ruled lines. In the following blocking, the document is divided hierarchically considering these characteristics as the strength of the delimitation of the tag. In this embodiment and the following description, the start tag is mainly used as a block delimiter, and the end tag is not used as a block delimiter. However, the present invention is not limited to this. It is conceivable to treat the range enclosed by tags as a block.

【００１６】以下では、図５を参考に、図１に示すソー
ステキストを図４に示す階層的データに変換する場合を
例にとり、図５のブロック分割処理を再帰的に行うこと
について説明する。まず、図１のソーステキスト全体を
処理対象テキストとして、仮想的な最上層タグを現階層
タグとするルートブロックのブロック化（Ｓ１０１〜Ｓ
１１１）を行う。タイトルと内容の抽出処理（Ｓ１０
２）は現階層タグの種類によって異なるが、仮想的最上
層タグについては行わない。次に子ブロックＣＨＩＬＤ
のブロック化の準備として、処理対象テキストに含まれ
る現階層よりも下層で最も近いタグを探す（Ｓ１０３〜
Ｓ１０５）。ここでは第１表題タグ〈Ｈ１〉が発見され
る。そこで、タグ〈Ｈ１〉を対象に子ブロックをブロッ
ク化する。すなわち図１のソーステキスト全体を処理対
象テキストとして、タグ〈Ｈ１〉を現階層タグとしてブ
ロック化処理の再帰呼び出しを行う（Ｓ１０１）。In the following, referring to FIG. 5, the case where the source text shown in FIG. 1 is converted into the hierarchical data shown in FIG. 4 will be described as an example to recursively perform the block division processing of FIG. First, the entire source text of FIG. 1 is set as a processing target text, and a root block having a virtual top layer tag as a current layer tag is divided into blocks (S101 to S101).
111). Title and content extraction processing (S10
2) differs depending on the type of the current hierarchical tag, but is not performed on the virtual uppermost tag. Next, child block CHILD
As a preparation for blocking the text, a tag closest to the current hierarchical level and lower than the current hierarchical level included in the text to be processed is searched (S103 to S103).
S105). Here, the first title tag <H1> is found. Therefore, the child block is blocked for the tag <H1>. That is, a recursive call of the blocking process is performed using the entire source text of FIG. 1 as a processing target text and a tag <H1> as a current hierarchical tag (S101).

【００１７】タグ〈Ｈ１〉についてはタグ〈Ｈ１〉とタ
グ〈／Ｈ１〉に挾まれる部分をタイトルとして、それに
続く第１段落を内容として抽出する（Ｓ１０２）。その
結果「音声WebブラウザTelMePage」がタイトルとして抽
出され、内容は抽出されない。次に子ブロックＣＨＩＬ
Ｄのタグを〈Ｈ１〉より下層の候補タグから探し、第２
表題タグ〈Ｈ２〉が２つ発見される（Ｓ１０３〜Ｓ１０
５）。そこでまず最初のタグ〈Ｈ２〉と次のタグ〈Ｈ
２〉の間の部分を処理対象テキストとして、タグ〈Ｈ
２〉を現階層タグとしてブロック化処理の再帰呼び出し
を行い（Ｓ１０１〜Ｓ１０７）、これを子ブロックとす
る。さらに次のタグ〈Ｈ２〉から最後までも同様に処理
する（Ｓ１０１〜Ｓ１０７）。As for the tag <H1>, a portion between the tag <H1> and the tag </ H1> is extracted as a title, and the subsequent first paragraph is extracted as the content (S102). As a result, "Speech Web browser TelMePage" is extracted as the title, and the content is not extracted. Next, child block CHIL
The tag of D is searched from the candidate tags lower than <H1>,
Two title tags <H2> are found (S103 to S10)
5). Therefore, the first tag <H2> and the next tag <H
2> as the text to be processed and the tag <H
2> is set as the current hierarchical tag, recursive call of the blocking process is performed (S101 to S107), and this is set as a child block. Further, the same processing is performed from the next tag <H2> to the end (S101 to S107).

【００１８】このようにして再帰的に階層ブロック化を
行う（Ｓ１０１〜Ｓ１１１）が、タイトルと内容を抽出
する部分以外はこの処理の繰り返しとなるので、タイト
ル・内容抽出処理のみ以下に説明する。タグ〈Ｈ２〉で
はタグ〈Ｈ１〉と同様にタイトル・内容抽出処理を行
い、それに続くタグ〈Ｐ〉はタイトル無しの子ブロック
とする。これは通常第１段落でそれ以降の概要などを述
べることが多いことによるものである。タグ〈ＵＬ〉は
ここでは無視し、タグ〈Ｌ１〉部分をタイトル無しのブ
ロックとみなして処理する。このようにソーステキスト
から内部データへの変換が行われる。In this way, hierarchical block formation is performed recursively (S101 to S111). This processing is repeated except for the part for extracting the title and the content. Therefore, only the title / content extraction processing will be described below. In the tag <H2>, title / content extraction processing is performed in the same manner as the tag <H1>, and the subsequent tag <P> is a child block without a title. This is due to the fact that the first paragraph often gives an outline after that in many cases. The tag <UL> is ignored here, and the tag <L1> portion is processed as a block without a title. Thus, the conversion from the source text to the internal data is performed.

【００１９】次に、変換された内部データの読み上げ処
理の一実施態様を図７を参考にして説明する。ユーザか
ら何ら操作のない場合は、タイトル，内容，子ブロック
のタイトルの順に読み上げ（Ｓ２０１〜Ｓ２１０）、そ
の子ブロックに移って同様に（タイトル，内容，子ブロ
ックのタイトルの順に）読み上げる（Ｓ２１２〜Ｓ２１
４）。この時タイトル，内容，子ブロックのタイトルそ
れぞれの前に異なる効果音を付加するか、及び／又は声
種を変えることにより識別を助けるようになっている。
これを繰り返し最下層ブロックまで読み上げ（Ｓ２０１
〜Ｓ２１５）、その後未読ブロックに戻って読み上げを
続ける（Ｓ２０２〜Ｓ２１５）。これによってこのテキ
ストが全体としては背景と特徴からなることが読み上げ
の冒頭で分かり聴解性が向上する。Next, an embodiment of a process for reading out the converted internal data will be described with reference to FIG. When there is no operation from the user, the reading is performed in the order of the title, the content, and the title of the child block (S201 to S210), and the process proceeds to the child block and is similarly read (in the order of the title, the content, and the title of the child block) (S212 to S21).
4). At this time, a different sound effect is added before each of the title, the content, and the title of the child block, and / or the voice type is changed to assist identification.
This is repeated to the lowermost block (S201).
After that, the process returns to the unread block and continues reading out (S202 to S215). As a result, at the beginning of the text-to-speech, the text is composed entirely of the background and the features, and the intelligibility is improved.

【００２０】さらに読み上げ途中でユーザからの割り込
みによるロケーション指定を受け付けることができ、こ
れによりランダムアクセスが可能となる。即ち、「もう
一度」を指示することで（Ｓ３０１）現在読み上げてい
るブロックの先頭へ返って聴き返すことができる（Ｓ２
０３〜Ｓ２１４）。また、例えばタグ〈Ｈ１〉ブロック
の子ブロックタイトル「特徴」を読み上げている時に、
「下層へ」を指示すると（Ｓ３０２）、読み上げロケー
ションを直接「特徴」の中身へ飛ばす（Ｓ２１２）こと
が出来る。タグ〈Ｈ２〉「特徴」の内容の読み上げ中に
「次へ」を指示すると（Ｓ３０３）、親ブロックの次の
子ブロック、すなわちタグ〈Ｈ２〉ブロック「背景」へ
飛ぶことが出来る。箇条書き部分の（Ｕ）ブロックにつ
いても同様に読み上げ中に次の項目に飛ぶことができ
る。「上層へ」を指示すると（Ｓ３０４）親ブロックの
先頭へ戻って読み上げることができる。Further, during the reading operation, a location designation by an interrupt from the user can be accepted, thereby enabling random access. In other words, by instructing "again" (S301), it is possible to return to the beginning of the block currently being read out and listen again (S2).
03 to S214). For example, when reading out the child block title “feature” of the tag <H1> block,
When "down" is instructed (S302), the reading location can be skipped directly to the contents of "feature" (S212). If "next" is instructed while reading out the contents of the tag <H2>"feature" (S303), it is possible to jump to the child block next to the parent block, that is, the tag <H2> block "background". Similarly, the (U) block in the itemized list can jump to the next item during reading. When "to the upper layer" is instructed (S304), it is possible to return to the head of the parent block and read out.

【００２１】以上の読み上げ動作は図７の流れ図で示さ
れる処理（Ｓ２０１〜２１５）を再帰的に用いて実現す
ることができる。図中、ｂｌｏｃｋＴＩＴＬＥはブロッ
クｂｌｏｃｋのタイトル、ｂｌｏｃｋＣＯＮＴＥＮＴは
ブロックｂｌｏｃｋの内容、ｂｌｏｃｋＮＣＨＩＬＤは
ブロックｂｌｏｃｋの子ブロックの数、ｂｌｏｃｋＣＨ
ＩＬＤ[ｉ]はブロックｂｌｏｃｋのｉ番目の子ブロック
をそれぞれ表している。なお、テキストの音声化処理に
ついては既存のテキスト音声合成技術が使えるのでここ
では説明を省略する。The above reading operation can be realized by using the processing (S201 to S215) shown in the flowchart of FIG. 7 recursively. In the figure, blockTITLE is the title of the block, blockCONTENT is the content of the block, blockNCHILD is the number of child blocks of the block, and blockCH is used.
ILD [i] represents the i-th child block of the block block. Note that the text-to-speech processing is not described here because existing text-to-speech synthesis technology can be used.

【００２２】[0022]

【発明の効果】請求項１に対応する効果：文書の情報を、その階層構造
を保存したまま話し言葉としての表現に写像して出力す
ることができるため、聴解性を向上することができる。請求項２に対応する効果：請求項１に対応する効果に加
えて、既存の構造化された電子テキストにもそのまま適
用することができる。請求項３に対応する効果：請求項１に対応する効果に加
えて、既存の構造化タグのない電子テキストにも適用す
ることができる。請求項４に対応する効果：請求項１に対応する効果に加
えて、下位概念を聴解するための前提知識である上位概
念を常に事前に取得することになるため全体的に聴解性
が向上する。請求項５に対応する効果：請求項１に対応する効果に加
えて、現在聞いている個所の文書全体の中での位置づけ
が把握しやすく、聴解性が向上する。請求項６に対応する効果：請求項１に対応する効果に加
えて、音声メディアの欠点であるランダムアクセスの難
しさを補い、ユーザ要求に対する反応速度を向上するこ
とができる。According to the first aspect of the present invention, since the information of a document can be mapped and output as a spoken expression while preserving the hierarchical structure, the intelligibility can be improved. Effect corresponding to claim 2: In addition to the effect corresponding to claim 1, the present invention can be applied to existing structured electronic text as it is. Effect corresponding to claim 3: In addition to the effect corresponding to claim 1, the present invention can be applied to an existing electronic text without a structured tag. Effect corresponding to claim 4: In addition to the effect corresponding to claim 1, the intelligibility is improved as a whole by always acquiring in advance the superordinate concept which is the prerequisite knowledge for listening to the subordinate concept. . Effect corresponding to claim 5: In addition to the effect corresponding to claim 1, the position of the place currently being listened to in the entire document is easily grasped, and the intelligibility is improved. Effect corresponding to claim 6: In addition to the effect corresponding to claim 1, it is possible to supplement the difficulty of random access, which is a drawback of audio media, and improve the response speed to user requests.

[Brief description of the drawings]

【図１】読み上げ文書の例を示す図である。FIG. 1 illustrates an example of a reading document.

【図２】前記文書のブラウザによる表示例を示す図で
ある。FIG. 2 is a diagram showing a display example of the document by a browser.

【図３】前記文書を従来の方式で音声化した場合の読
み上げ表示例を示す図である。FIG. 3 is a diagram showing an example of a read-aloud display when the document is converted into a voice by a conventional method.

【図４】前記文書の階層的内部データを示す図であ
る。FIG. 4 is a diagram showing hierarchical internal data of the document.

【図５】前記文書の構造を階層的内部データに変換す
る処理フローを示す図である。FIG. 5 is a diagram showing a processing flow for converting the structure of the document into hierarchical internal data.

【図６】前記文書に付与されるタブの一例を示す図で
ある。FIG. 6 is a diagram showing an example of a tab added to the document.

【図７】階層的内部データによる文書読み上げ処理フ
ローを示す図である。FIG. 7 is a diagram showing a document reading process flow based on hierarchical internal data.

Claims

[Claims]

1. A document reading system in which character information of an electronic document is converted into speech by a speech synthesis technique and output.
A text-to-speech system characterized by hierarchically grasping the contents of the document from the format information and / or logical structure of the document, and changing the order and / or the voice attribute of the output sentence according to the hierarchically-spoken text. .

2. The document reading system according to claim 1, wherein the format information is treated as a break in a document having a different strength for each type, and the document is hierarchically blocked. .

3. The contents of the document are hierarchically arranged by positioning the first sentence unit of the block, such as the first paragraph in the document and the first sentence in the paragraph, at a higher level than the subsequent sentence units. 2. The method according to claim 1, wherein the document is read out.

4. The document reading system according to claim 1, wherein the contents of the document are read out in order from a higher layer to a lower layer.

5. The document reading system according to claim 1, wherein discrimination between the title, the content, and the hierarchy is performed using a voice attribute, an additional sound, or the like.

6. The document reading system according to claim 1, wherein the reading means is movable between upper and lower layers of the document and between the same layers according to a user's request.