JPH11161298A

JPH11161298A - Speech synthesis method and apparatus

Info

Publication number: JPH11161298A
Application number: JP9328640A
Authority: JP
Inventors: Hideki Shiina; 秀樹椎名
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 1997-11-28
Filing date: 1997-11-28
Publication date: 1999-06-18

Abstract

(57)【要約】【課題】テキストに音量や発声速度等に関する情報を付
加することなく、しかも同一テキストであっても時と場
合に応じた音声合成が実現できるようにする。【解決手段】入力部１０１により入力されたテキストを
言語処理部１０３での言語処理により音韻系列と韻律情
報からなる記号列に変換する。合成パラメータ生成部１
０６では、この音韻系列と韻律情報から、音声合成部１
０７での音声合成に供される合成パラメータを生成す
る。その際、合成パラメータ生成部１０６は、音声合成
部１０７での入力テキストに対応する合成パラメータに
従う合成音の出力開始からの経過時間を発声時間計測タ
イマ１０４の計測結果により検出し、その経過時間に応
じて合成音の特性が可変されるように合成パラメータを
生成する。 (57) [Summary] [PROBLEMS] To provide speech synthesis according to time and case without adding information on volume, utterance speed, and the like to text. A text input by an input unit is converted into a symbol string including a phoneme sequence and prosodic information by language processing in a language processing unit. Synthesis parameter generation unit 1
In step 06, the speech synthesis unit 1 calculates
In step S07, a synthesis parameter used for speech synthesis is generated. At that time, the synthesis parameter generation unit 106 detects the elapsed time from the start of the output of the synthesized sound according to the synthesis parameter corresponding to the input text in the speech synthesis unit 107 based on the measurement result of the utterance time measurement timer 104, and determines the elapsed time. A synthesis parameter is generated such that the characteristics of the synthesized sound are changed in response.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、入力されたテキス
トデータ（以下、単にテキストと称する）を言語処理に
より音韻系列と韻律情報からなる記号列に変換し、その
記号列から合成音を生成する音声合成方法及び装置に係
り、特に時または場合に応じた特性の合成音を生成する
音声合成方法及び装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention converts input text data (hereinafter simply referred to as "text") into a symbol string composed of a phoneme sequence and prosodic information by linguistic processing, and generates a synthesized sound from the symbol string. The present invention relates to a speech synthesis method and apparatus, and more particularly, to a speech synthesis method and apparatus for generating a synthesized sound having characteristics according to time or case.

【０００２】[0002]

【従来の技術】漢字かな混じり文などのテキストから対
応する音声を合成する場合に、合成音を人間の自然な発
声に近づけるために、テキストに予め音量、発声速度、
効果音に関する情報を付加しておき、それらを利用し
て、音量、発声速度、効果音を制御して音声合成すると
いう、特開平６−３３７８７６号に記載されているよう
な手法（以下、第１の手法と称する）が提案されてい
る。2. Description of the Related Art When synthesizing a corresponding voice from a text such as a kanji-kana mixed sentence, in order to make the synthesized sound closer to a natural human utterance, a volume, a utterance speed,
A method as described in JP-A-6-337876 (hereinafter, referred to as the "Japanese Patent Application Laid-Open No. 6-337876") in which information about sound effects is added, and the sound is synthesized by controlling the volume, the utterance speed, and the sound effects using the information. 1 method) has been proposed.

【０００３】また、テキストに含まれる熟語や漢字の持
つ意味を利用して、発声速度、ピッチ、音質、音量を制
御して音声合成するという、特開平６−８３３８１号に
記載されているような手法（以下、第２の手法と称す
る）も提案されている。[0003] Japanese Patent Laid-Open No. 6-83381 discloses a technique of synthesizing speech by controlling the utterance speed, pitch, sound quality, and volume using the meanings of idioms and kanji contained in text. A technique (hereinafter, referred to as a second technique) has also been proposed.

【０００４】[0004]

【発明が解決しようとする課題】しかしながら、上記し
た第１の手法では、テキストに情報を付加しなければな
らず、煩雑であるという問題がある。また、上記した第
１及び第２の手法では、いずれも同一テキストに対し
て、常に同じ音量、発声速度などで音声合成されてしま
う。ところが、人間の発声を観察すると、同じ文章を話
す場合でも話す時の時間や状況に応じて、声の大きさや
発声速度等の特性を変えて発声している。However, the first method described above has a problem that information must be added to the text, which is complicated. Further, in the first and second techniques described above, the same text is always synthesized at the same volume and utterance speed for the same text. However, when observing the utterance of a human, even when speaking the same sentence, the utterance is changed while changing the characteristics such as the loudness of the voice and the utterance speed in accordance with the time and situation at the time of speaking.

【０００５】本発明は上記事情を考慮してなされたもの
でその目的は、テキストに音量や発声速度等に関する情
報を付加することなく、しかも同一テキストであっても
時または場合に応じた音声合成が実現できる音声合成方
法及び装置を提供することにある。SUMMARY OF THE INVENTION The present invention has been made in view of the above circumstances, and has as its object to add voice synthesis information to a text without adding information relating to volume, utterance speed, and the like. It is an object of the present invention to provide a speech synthesizing method and apparatus which can realize the above.

【０００６】[0006]

【課題を解決するための手段】本発明は、入力テキスト
（テキストデータ）に対する言語処理により変換生成さ
れる音韻系列と韻律情報からなる記号列から合成音を生
成して出力する際に、その入力テキストに対応する合成
音の発声開始からの経過時間を計測し、計測した経過時
間に応じて上記合成音の特性を変えるようにしたことを
特徴とする。ここで、合成音の特性を変えるパラメータ
（制御パラメータ）として、発声（発話）速度、ピッ
チ、音量、及び声質などのパラメータのうちの少なくと
も１つを適用するとよい。SUMMARY OF THE INVENTION According to the present invention, when a synthesized sound is generated and output from a symbol string comprising a phoneme sequence and prosody information converted and generated by linguistic processing of input text (text data), the input It is characterized in that the elapsed time from the start of the utterance of the synthesized sound corresponding to the text is measured, and the characteristics of the synthesized sound are changed according to the measured elapsed time. Here, at least one of parameters such as the utterance (utterance) speed, pitch, volume, and voice quality may be applied as a parameter (control parameter) for changing the characteristics of the synthesized sound.

【０００７】このような構成においては、発声開始から
の経過時間に応じて合成音の特性が自動的に可変される
ため、聞き手の注意を引くことができる。特に、可変の
対象となる合成音声の特性として発声速度を適用し、発
声開始からの経過時間に応じて、つまり聞き手が合成音
に対して慣れるに従って、合成音の発声速度を徐々に速
くするとよい。このようにした場合、合成音に不慣れな
状態では、ゆっくりとした口調での音声出力となるため
に、聞き手は合成音の内容を聴覚で正しく知覚でき、ま
た合成音に慣れた状態では早口の口調での音声出力とな
るために、いらいらしないで済む。なお、可変の対象と
なる合成音声の特性としては、発声速度以外に、ピッ
チ、音量、声質なども適用可能である。In such a configuration, the characteristics of the synthesized sound are automatically changed according to the elapsed time from the start of utterance, so that the listener's attention can be drawn. In particular, it is good to apply the utterance speed as a characteristic of the synthesized speech to be changed, and gradually increase the utterance speed of the synthesized sound according to the elapsed time from the start of the utterance, that is, as the listener gets used to the synthesized sound. . In this case, if the user is unfamiliar with the synthesized sound, the sound will be output in a slow tone, so that the listener can perceive the contents of the synthesized sound correctly by hearing. You don't need to be frustrated because the audio output is in tone. In addition, pitch, volume, voice quality, and the like can be applied to the characteristics of the synthesized speech to be changed, in addition to the utterance speed.

【０００８】本発明はまた、合成音の特性を自動的に変
えるためのトリガになる情報として、上記した発声開始
からの経過時間に代えて、時刻、日付、及び曜日を含む
各種計時情報の少なくとも１つを用いるようにしたこと
をも特徴とする。The present invention also provides at least one of various types of timekeeping information including time, date, and day of the week, instead of the above-mentioned elapsed time from the start of utterance, as information serving as a trigger for automatically changing the characteristics of a synthesized sound. It is also characterized in that one is used.

【０００９】このように、計時情報に応じて合成音の特
性が変えられるため、日時や聞き手にとっての個人的な
情報を表すことが可能となる。また、本発明は、合成音
出力の対象となった全てのテキスト中の各単語について
出現回数を保持・管理し、入力テキストをもとに合成音
を生成する場合には、当該テキスト中の各単語について
その出現回数を調べ、少なくとも、初めて出現した単語
と２回以上出現した単語とで、合成音の特性を変えるよ
うにしたことを特徴とする。As described above, since the characteristics of the synthesized sound can be changed according to the timekeeping information, it is possible to represent the date and time and personal information for the listener. Further, the present invention holds and manages the number of appearances for each word in all the texts that have been subjected to the output of the synthesized sound, and generates a synthesized sound based on the input text. The frequency of appearance of a word is examined, and the characteristic of the synthesized sound is changed at least between the word that appears for the first time and the word that appears twice or more.

【００１０】このように、初めて出現した単語と２回以
上出現した単語とで、合成音の特性を変えることで、初
めて出現した単語については聞き取りやすいような特性
で出力し、２回目以降に出現した単語については聞き手
のいらいらを招かないような特性で出力することが可能
となる。As described above, by changing the characteristics of a synthesized sound between a word that has appeared for the first time and a word that has appeared more than once, the word that has appeared for the first time is output with characteristics that are easy to hear, and the words that have appeared for the second and subsequent times are output. The word thus output can be output with characteristics that do not cause annoyance to the listener.

【００１１】また、本発明は、入力テキストに対応する
合成音の出力が終了する前に、次のテキストについての
合成音の出力要求があった場合、未出力の合成音に対応
するテキスト部分を簡略化し、その簡略化されたテキス
ト部分の合成音を元のテキスト部分に代えて生成して出
力するようにしたことを特徴とする。即ち本発明は、現
在処理中のテキストの音声出力終了前の次のテキスト入
力をトリガとして、未出力のテキスト部分の読み方を変
えて当該部分の音声出力を短時間で終了させるようにし
たことを特徴とする。Further, according to the present invention, if there is a request to output a synthesized voice for the next text before the output of the synthesized voice corresponding to the input text is completed, the text portion corresponding to the unoutput synthesized voice is deleted. It is characterized by being simplified, and generating and outputting the synthesized speech of the simplified text part instead of the original text part. That is, according to the present invention, the next text input before the end of the audio output of the text currently being processed is used as a trigger to change the way of reading the unoutputted text portion and terminate the audio output of the text portion in a short time. Features.

【００１２】このように、現在音声出力処理の対象とな
っている入力テキストに対応する合成音の出力が終了す
る前に、次のテキストの出力要求があった場合、未出力
の合成音に対応するテキスト部分が簡略化されて、その
簡略化されたテキスト部分（例えば、要約）の合成音が
出力されるため、未出力のテキスト部分の音声出力を短
時間で終了させて、次の要求内容の音声出力に直ちに移
行でき、時間的に同期を取ることが可能となる。なお、
未出力の合成音に対応するテキスト部分だけを簡略化す
る代わりに、未出力の合成音に対応するテキスト部分と
次のテキストの少なくとも先頭文とを結合して、その結
合した文章部分を簡略化するようにしても構わない。As described above, when the output of the next text is requested before the output of the synthesized speech corresponding to the input text currently being subjected to the speech output processing is completed, the output of the synthesized text corresponding to the unoutputted synthesized speech is handled. The text portion to be output is simplified, and a synthesized speech of the simplified text portion (for example, a summary) is output. , And can be synchronized in time. In addition,
Instead of simplifying only the text part corresponding to the unoutput synthesized voice, the text part corresponding to the unoutput synthesized voice is combined with at least the first sentence of the next text, and the combined text is simplified. You may do it.

【００１３】また、現在音声出力処理の対象となってい
る入力テキストに対応する合成音の出力が終了する前
に、次のテキストの出力要求があった場合、未出力の合
成音の発声速度を速めるように制御することによって
も、未出力のテキスト部分の音声出力を短時間で終了さ
せて、次の要求内容の音声出力に直ちに移行できるた
め、時間的に同期を取ることが可能となる。Further, if the output of the next text is requested before the output of the synthesized speech corresponding to the input text currently subjected to the speech output processing is completed, the utterance speed of the unoutput synthesized speech is reduced. By controlling to speed up, the voice output of the text portion that has not been output can be ended in a short time and the voice output of the next requested content can be immediately shifted, so that the time can be synchronized.

【００１４】また、現在音声出力処理の対象となってい
る入力テキストに対応する合成音の出力が終了する前
に、次のテキストの出力要求があった場合、未出力の合
成音を音量を徐々に小さくしながら出力し、所定レベル
の音量となった時点で、次のテキストについての合成音
の出力に切り換えるようにしても、未出力のテキスト部
分の音声出力を短時間で終了させて、次の要求内容の音
声出力に直ちに移行できるため、時間的に同期を取るこ
とが可能となる。Further, if the output of the next text is requested before the output of the synthesized speech corresponding to the input text which is the target of the current speech output processing is completed, the volume of the unoutputted synthesized speech is gradually reduced. Even if it is switched to the output of the synthesized sound for the next text when the volume reaches a predetermined level, the voice output of the unoutputted text portion is terminated in a short time, and Can be immediately shifted to the audio output of the requested content, so that time synchronization can be achieved.

【００１５】[0015]

【発明の実施の形態】以下、本発明の実施の形態につき
図面を参照して説明する。［第１の実施形態］図１は本発明の第１の実施形態に係
る音声合成装置の概略構成を示すブロック図である。Embodiments of the present invention will be described below with reference to the drawings. [First Embodiment] FIG. 1 is a block diagram showing a schematic configuration of a speech synthesizer according to a first embodiment of the present invention.

【００１６】図１において、入力部１０１は音声合成の
対象となるテキストを入力するものであり、例えばキー
ボード、ＯＣＲ（光学的文字読み取り装置）、フロッピ
ーディスク装置、磁気ディスク装置等の大容量記憶装
置、（通信媒体を介して送信されるテキストの入力を司
る）通信インタフェース等である。In FIG. 1, an input unit 101 is for inputting a text to be subjected to speech synthesis. For example, a large-capacity storage device such as a keyboard, an OCR (optical character reading device), a floppy disk device, a magnetic disk device, and the like. , A communication interface (which controls input of text transmitted via a communication medium).

【００１７】言語処理部１０３は入力部１０１から入力
された例えば漢字かな混じり文のテキストを対象とする
言語解析を、品詞情報などの文法情報の他、読みとアク
セント型等が登録された単語辞書１０２を用いて実行す
る。即ち言語処理部１０３は、入力されたテキストを単
語辞書１０２と照合することで、当該テキストに含まれ
る単語や句等についてのアクセント型、読み、品詞情報
を求め、その品詞情報に伴うアクセント型、境界を決定
することにより、漢字かな混じり文の読みの形式への変
換を行い、音韻系列とアクセント型などの韻律情報（か
らなる音声記号列）を生成する。この言語処理部１０３
により生成された音韻系列と韻律情報は合成パラメータ
生成部１０６に渡される。The linguistic processing unit 103 performs a linguistic analysis on the text of a sentence mixed with, for example, kanji and kana input from the input unit 101, and a word dictionary in which reading and accent types are registered in addition to grammatical information such as part of speech information. The processing is performed using 102. That is, the language processing unit 103 collates the input text with the word dictionary 102 to obtain accent type, reading, and part of speech information of words and phrases included in the text, and obtains an accent type associated with the part of speech information, By determining the boundary, the sentence is converted into a reading form of a sentence mixed with kanji and kana, and a prosody information (a phonetic symbol string composed of a phoneme series and an accent type) is generated. This language processing unit 103
The phonological sequence and prosodic information generated by the above are passed to the synthesis parameter generating unit 106.

【００１８】合成パラメータ生成部１０６は、言語処理
部１０３により生成された音韻系列を受け取ると、音声
素片ファイル１０５を参照する。この音声素片ファイル
１０５には、予め作成された多数の音声素片が格納され
ている。音声素片は、アナウンサ等が発声した音声を分
析して所定の音声の特徴パラメータを得た後、所定の合
成単位例えば日本語の音節単位で、日本語の音声に含ま
れる全ての音節を上記特徴パラメータから切り出すこと
により作成される。Upon receiving the phoneme sequence generated by the language processing unit 103, the synthesis parameter generation unit 106 refers to the speech unit file 105. The speech unit file 105 stores a large number of speech units created in advance. The speech unit analyzes a voice uttered by an announcer or the like and obtains a predetermined feature parameter of the voice, and then, in a predetermined synthesis unit, for example, a Japanese syllable unit, all syllables included in the Japanese voice are described above. It is created by cutting out from the feature parameters.

【００１９】この際、合成パラメータ生成部１０６は、
合成音声の特性を決定するための、例えば発声（発話）
速度、ピッチ、音量、声質などの各種パラメータを独立
にコントロールするが、これについては後述する。At this time, the synthesis parameter generation unit 106
For example, utterance (utterance) for determining the characteristics of the synthesized speech
Various parameters such as speed, pitch, volume, and voice quality are controlled independently, which will be described later.

【００２０】音声合成部１０７は、合成パラメータ生成
郡１０６によって生成された音韻パラメータ及び韻律パ
ラメータと、後述する音量パラメータから、音源の生成
とディジタルフィルタリング処理を行って（入力された
テキストに対する）合成音声を生成し、スピーカ１０８
に出力する。また音声合成部１０７は、入力されたテキ
ストに対応する合成音声の出力開始時（発声開始時）に
は、時間計測用のタイマ（以下、発声時間計測タイマと
称する）１０４を起動して、発声開始時からの経過時間
を計測させる。そして音声合成部１０７は、入力された
テキストに対応する合成音声の出力終了時には、タイマ
１０４による時間計測を停止させ、計測値をクリアさせ
る。The speech synthesis unit 107 generates a sound source and performs digital filtering processing on the phoneme parameter and the prosody parameter generated by the synthesis parameter generation unit 106 and a volume parameter described later (for the input text) to produce a synthesized speech. And the speaker 108
Output to When the output of the synthesized speech corresponding to the input text is started (at the start of utterance), the speech synthesis unit 107 activates a timer for measuring time (hereinafter, referred to as a utterance time measurement timer) 104 to produce the utterance. The elapsed time from the start is measured. Then, at the end of the output of the synthesized speech corresponding to the input text, the speech synthesis unit 107 stops the time measurement by the timer 104 and clears the measured value.

【００２１】さて、上記合成パラメータ生成部１０６
は、入力されたテキストに対応する音韻パラメータ及び
韻律パラメータを生成する際には、発声時間計測タイマ
１０４の計測値を参照し、当該計測値、即ち音声合成部
１０７による（入力テキストに対する）合成音声の出力
開始時（発声開始時）からの経過時間に応じて、上記発
声速度、ピッチ、声質などの各種パラメータを独立にコ
ントロールし、上記音韻パラメータ及び韻律パラメータ
に反映させる（ピッチは韻律パラメータに対しての
み）。また合成パラメータ生成部１０６は、上記経過時
間に応じた音量のパラメータを生成し、上記音韻パラメ
ータ及び韻律パラメータに対応付けて設定する。The synthesis parameter generation unit 106
Refers to the measurement value of the utterance time measurement timer 104 when generating the phoneme parameter and the prosody parameter corresponding to the input text, and refers to the measurement value, that is, the synthesized speech (for the input text) by the speech synthesis unit 107. In accordance with the elapsed time from the start of the output (at the start of utterance), various parameters such as the utterance speed, pitch and voice quality are independently controlled and reflected on the phonological parameter and the prosodic parameter (the pitch is different from the prosodic parameter). Only). Further, the synthesis parameter generation unit 106 generates a parameter of a volume corresponding to the elapsed time, and sets the parameter in association with the phoneme parameter and the prosody parameter.

【００２２】具体的には、発声開始時からの時間経過に
伴って、発声速度が（予め定められたある速度を上限と
して）段階的に上げられる。つまり、本実施形態におい
ては、入力されたテキストに対応する合成音声の出力が
開始された初期の段階では、聞き手は合成音に慣れてい
ないものとして、比較的ゆっくりとした速度で発声し、
時間経過に伴って聞き手が合成音に慣れるに従い、発声
速度を段階的（徐々）に上げるようにしている。このよ
うに、合成音に慣れない段階では比較的ゆっくりとして
口調での発声が行われることから、聞き手は合成音に慣
れていなくても正しく知覚することができ、合成音に慣
れた段階では比較的早口の発声がなされることから、い
らいらしないで済む。また、合成音の発声速度（特性）
が可変されることから、聞き手の注意を引くこともでき
る。More specifically, as the time elapses from the start of the utterance, the utterance speed is increased stepwise (up to a certain predetermined speed as an upper limit). That is, in the present embodiment, at the initial stage when the output of the synthesized speech corresponding to the input text is started, the listener utters at a relatively slow speed on the assumption that the listener is not used to the synthesized speech,
As the listener gets used to the synthesized sound over time, the utterance speed is increased stepwise (gradually). In this way, at a stage where the user is not accustomed to the synthesized sound, the utterance is performed in a relatively slow tone, so that the listener can perceive correctly even if they are not accustomed to the synthesized sound. Since the utterance is made quickly, no annoyance is required. Also, the utterance speed (characteristic) of the synthesized sound
Is variable, so that the listener's attention can be drawn.

【００２３】また、発声速度以外のピッチ、音量、声質
などのパラメータについては、時間経過に応じて例えば
ランダムに可変するようにしている。これにより、聞き
手が合成音に飽きるのを防ぎ、聞き手の注意をより一層
引くことができる。The parameters other than the utterance speed, such as the pitch, volume, and voice quality, are varied, for example, at random over time. This can prevent the listener from getting tired of the synthesized sound, and can further draw the listener's attention.

【００２４】なお、以上に述べたパラメータのコントロ
ールの仕方、更にはパラメータの種類は一例であり、こ
れに限るものではない。要は、発声開始からの経過時間
に応じて、発声速度、ピッチ、音量、声質などの各種パ
ラメータのうちの少なくとも１つをコントロールするこ
とで、合成音声の特性を変えればよく、これにより聞き
手の注意を引くことができる。The above-described method of controlling the parameters and the types of the parameters are merely examples, and the present invention is not limited thereto. In short, by controlling at least one of various parameters such as the utterance speed, pitch, volume, and voice quality according to the elapsed time from the start of utterance, it is only necessary to change the characteristics of the synthesized speech. Attention can be drawn.

【００２５】なお、発声速度、ピッチ、音量、声質など
のパラメータをコントロールして合成音声の特性を変え
る技術自体は、前記した特開平６−３３７８７６号及び
特開平６−８３３８１号等に記載されているようによく
知られているため、ここでは説明を省略する。［第２の実施形態］図２は本発明の第２の実施形態に係
る音声合成装置の概略構成を示すブロック図であり、図
１と同一部分には同一符号を付してある。The technique itself for controlling the parameters such as the utterance speed, pitch, volume, and voice quality to change the characteristics of the synthesized speech is described in the above-mentioned Japanese Patent Application Laid-Open Nos. Hei 6-337876 and Hei 6-83381. The description is omitted here because it is well known. [Second Embodiment] FIG. 2 is a block diagram showing a schematic configuration of a speech synthesizer according to a second embodiment of the present invention, and the same parts as those in FIG.

【００２６】図２の音声合成装置が図１の音声合成装置
と異なる点は、図１中の発声時間計測タイマ１０４に代
えて、カレンダ機能と時計機能を持つ計時部２０４を備
えると共に、図１の合成パラメータ生成部１０６に代え
て、計時部２０４の計時データ（カレンダデータ、時刻
データ）をもとに、発声速度、ピッチ、音量、声質など
のパラメータの少なくとも１つをコントロールして合成
音声の特性を可変する合成パラメータ生成部２０６を備
える他、合成音声の特性を可変するための種々の条件が
登録される条件テーブル２０９を新たに備えていること
である。The difference between the voice synthesizer of FIG. 2 and the voice synthesizer of FIG. 1 is that the voice synthesizer of FIG. 1 includes a timer 204 having a calendar function and a clock function instead of the utterance time measuring timer 104 of FIG. Instead of the synthesized parameter generation unit 106, at least one of the parameters such as the utterance speed, pitch, volume, and voice quality is controlled based on the clock data (calendar data, time data) of the clock unit 204. In addition to the provision of the synthesis parameter generation unit 206 for changing the characteristics, a condition table 209 in which various conditions for changing the characteristics of the synthesized voice are registered is newly provided.

【００２７】図２の構成において、入力部１０１から音
声合成の対象となる例えばニュースやメール等の漢字か
な混じり文のテキストが入力されると、言語処理部１０
３は当該入力されたテキストと単語辞書１０２を照合
し、テキストに含まれる単語や旬等についてのアクセン
ト型、読み、品詞情報を求め、その品詞情報に伴うアク
セント型、境界を決定することにより、漢字かな混じり
文の読みの形式への変換を行い、音韻系列と韻律情報を
生成する。In the configuration shown in FIG. 2, when the text of a kanji-kana sentence, such as news or mail, to be subjected to speech synthesis is input from the input unit 101, the language processing unit 10
3 collates the input text with the word dictionary 102, obtains the accent type, reading, and part-of-speech information on the words and seasons included in the text, and determines the accent type and boundary associated with the part-of-speech information, The kanji kana-mixed sentence is converted into a reading format to generate a phonemic sequence and prosodic information.

【００２８】合成パラメータ生成部２０６は、言語処理
部１０３により生成された音韻系列に従って、対応する
音声素片を音声素片ファイル１０５から順次読み出し、
読み出した音声素片を接続することにより当該音韻系列
に対応する音韻パラメータを生成する。また合成パラメ
ータ生成部１０６は、言語処理部１０３により生成され
た韻律情報をもとに、ピッチパターンと有声・無声情報
からなる韻律パラメータを生成する。The synthesis parameter generation unit 206 sequentially reads out corresponding speech units from the speech unit file 105 according to the phoneme sequence generated by the language processing unit 103,
The phoneme parameters corresponding to the phoneme sequence are generated by connecting the read speech units. Further, the synthesis parameter generation unit 106 generates a prosody parameter including a pitch pattern and voiced / unvoiced information based on the prosody information generated by the language processing unit 103.

【００２９】この際、合成パラメータ生成部２０６は、
計時部２０４によって計測される時間（計時）データと
しての、例えば発話（発声）時点の時刻情報及び日付、
曜日などのカレンダ情報をもとに、条件テーブル２０９
を参照する。この条件テーブル２０９には、例えば１日
２４時間を朝の時刻範囲と昼の時刻範囲と夕の時刻範囲
と夜の時刻範囲とに分けた場合の各々の時刻範囲につい
て、適用すべき発声速度、ピッチ、音量、声質などのパ
ラメータの値が、曜日（または休日と非休日に分類され
る日付）別に登録されている。また条件テーブル２０９
には、誕生日などの予め定められた特定の日付につい
て、適用すべき発声速度、ピッチ、音量、声質などのパ
ラメータの値が登録されている。At this time, the synthesis parameter generation unit 206
For example, time information and date at the time of utterance (speech) as time (time measurement) data measured by the timekeeping unit 204,
Condition table 209 based on calendar information such as days of the week
See The condition table 209 includes, for example, a speech rate to be applied for each time range when 24 hours a day is divided into a morning time range, a day time range, an evening time range, and a night time range. The values of parameters such as pitch, volume, and voice quality are registered for each day of the week (or dates classified as holidays and non-holidays). Also, the condition table 209
Registers the values of parameters such as the utterance speed, pitch, volume, and voice quality to be applied to a predetermined specific date such as a birthday.

【００３０】そこで合成パラメータ生成部２０６は、計
時部２０４の示す発話時点の時刻情報と、日付、曜日な
どのカレンダ情報により条件テーブル２０９を参照する
ことで、当該テーブル２０９から、発話時点の時刻（が
属する時刻範囲）と曜日（または休日と非休日に分類さ
れる日付）に対応する発声速度、ピッチ、音量、声質な
どのパラメータの値を取り出す。また、合成パラメータ
生成部２０６は、発話時点の日付が、テーブル２０９に
登録されている（誕生日などの）特定日付に一致する場
合には、上記発話時点の時刻（が属する時刻範囲）と曜
日に対応するパラメータ値に代えて、当該特定日付に対
応する発声速度、ピッチ、音量、声質などのパラメータ
の値を取り出す。そして合成パラメータ生成部２０６
は、条件テーブル２０９から取り出したパラメータ値で
対応するパラメータをコントロールしながら、入力され
たテキストに対応する音韻パラメータ及び韻律パラメー
タを生成する。また合成パラメータ生成部２０６は、音
量については、条件テーブル２０９から取り出したパラ
メータ値を、上記生成した音韻パラメータ及び韻律パラ
メータに対応付けて設定する。The synthesis parameter generation unit 206 refers to the condition table 209 based on the time information of the utterance time indicated by the timer unit 204 and the calendar information such as the date and the day of the week. , The values of parameters such as the utterance speed, pitch, volume, and voice quality corresponding to the day of the week (or a date classified as a holiday or a non-holiday). When the date of the utterance coincides with a specific date (such as a birthday) registered in the table 209, the synthesis parameter generation unit 206 determines the time of the utterance (the time range to which the utterance belongs) and the day of the week. , The values of parameters such as the utterance speed, pitch, volume, and voice quality corresponding to the specific date are extracted. Then, the synthesis parameter generation unit 206
Generates a phonemic parameter and a prosodic parameter corresponding to the input text while controlling the corresponding parameter with the parameter value extracted from the condition table 209. The synthesis parameter generation unit 206 sets the parameter values extracted from the condition table 209 in association with the generated phoneme parameters and prosody parameters.

【００３１】音声合成部１０７は、合成パラメータ生成
部２０６によって生成された音韻パラメータ及び韻律パ
ラメータと音量パラメータから、音源の生成とディジタ
ルフィルタリング処理を行って（入力されたテキストに
対する）合成音声を生成し、スピーカ１０８に出力す
る。The speech synthesis unit 107 generates a sound source and performs digital filtering processing on the phoneme parameter, the prosody parameter, and the volume parameter generated by the synthesis parameter generation unit 206 to generate a synthesized speech (for the input text). , To the speaker 108.

【００３２】このように本実施形態においては、時刻範
囲と曜日に応じて合成音声の特性を変えることができ
る。これにより、例えばウイークデイの朝は、発声速度
＝遅め、ピッチ＝高め（高音）、音量＝大きめ、声質＝
女性音とすることで、目覚めをよくし、夜は発声速度＝
遅め、ピッチ＝低め（低温）、音量＝小さめ、声質＝男
性音とすることで、安らぎを与えるといったことが可能
となる。また、聞き手の誕生日などの特定の日付では、
それに適した合成音声の特性を設定することもできる。
つまり、合成音声の特性を変えることで、日時や聞き手
にとっての個人的な情報を表すことができる。［第３の実施形態］図３は本発明の第３の実施形態に係
る音声合成装置の概略構成を示すブロック図であり、図
１と同一部分には同一符号を付してある。As described above, in the present embodiment, the characteristics of the synthesized speech can be changed according to the time range and the day of the week. Thus, for example, in the morning of a weekday, the utterance speed = slow, the pitch = high (treble), the volume = loud, and the voice quality =
Improve awakening by using female sound, and utterance speed at night =
By setting it to be slow, pitch = low (low temperature), volume = low, and voice quality = male sound, it is possible to provide comfort. Also, on certain dates, such as the listener ’s birthday,
It is also possible to set the characteristics of the synthesized speech suitable for that.
In other words, by changing the characteristics of the synthesized speech, the date and time and personal information for the listener can be represented. [Third Embodiment] FIG. 3 is a block diagram showing a schematic configuration of a speech synthesizer according to a third embodiment of the present invention, and the same parts as those in FIG.

【００３３】図３の音声合成装置が図１の音声合成装置
と異なる点は、発声時間計測タイマ１０４に代えて、後
述する言語処理部３０３での言語処理の対象となった
（テキストに含まれる）単語の出現回数を保持・管理す
るための出現単語管理部３０４を備えると共に、図１中
の合成パラメータ生成部１０６に代えて、出現単語管理
部３０４によって管理される単語の出現回数に基づい
て、発声速度、ピッチ、音量、声質などのパラメータの
少なくとも１つをコントロールして合成音声の特性を可
変する合成パラメータ生成部３０６を備えていることで
ある。言語処理部３０３は、入力部１０１から入力され
たテキストに含まれる各単語の情報を出現単語管理部３
０４に通知する機能を有する。The difference between the speech synthesizer of FIG. 3 and the speech synthesizer of FIG. 1 is that the speech synthesizer shown in FIG. 1) It includes an appearance word management unit 304 for holding and managing the number of appearances of a word, and based on the number of appearances of the word managed by the appearance word management unit 304 instead of the synthesis parameter generation unit 106 in FIG. , A synthesis parameter generation unit 306 that controls at least one of parameters such as a utterance speed, a pitch, a volume, and a voice quality to vary characteristics of a synthesized voice. The language processing unit 303 converts the information of each word included in the text input from the input unit 101 into the appearing word management unit 3
04.

【００３４】図３の構成において、入力部１０１から音
声合成の対象となる例えばニュースやメール等の漢字か
な混じり文のテキストが入力されると、言語処理部３０
３は当該入力されたテキストと単語辞書１０２を照合
し、テキストに含まれる単語や旬等についてのアクセン
ト型、読み、品詞情報を求め、その品詞情報に伴うアク
セント型、境界を決定することにより、漢字かな混じり
文の読みの形式への変換を行い、音韻系列と韻律情報を
生成する。同時に言語処理部３０３は、上記テキストに
含まれた単語の情報を出現単語記憶管理部３０４に通知
する。出現単語記憶管理部３０４は、言語処理部３０３
からの単語情報通知を受け取ると、自身が保持・管理し
ている、対応する単語の出現回数を１インクリメントす
る。In the configuration shown in FIG. 3, when a text of a sentence mixed with Chinese characters, such as news or mail, to be subjected to speech synthesis is input from the input unit 101, the language processing unit 30
3 collates the input text with the word dictionary 102, obtains the accent type, reading, and part-of-speech information on the words and seasons included in the text, and determines the accent type and boundary associated with the part-of-speech information, The kanji kana-mixed sentence is converted into a reading format to generate a phonemic sequence and prosodic information. At the same time, the language processing unit 303 notifies the word storage management unit 304 of the information on the words included in the text. Appearance word storage management section 304 includes language processing section 303
When the word information notification is received from, the number of appearances of the corresponding word held and managed by itself is incremented by one.

【００３５】合成パラメータ生成部３０６は、言語処理
部３０３により生成された音韻系列に従って、対応する
音声素片を音声素片ファイル１０５から順次読み出し、
読み出した音声素片を接続することにより当該音韻系列
に対応する音韻パラメータを生成する。また合成パラメ
ータ生成部３０６は、言語処理部３０３により生成され
た韻律情報をもとに、ピッチパターンと有声・無声情報
からなる韻律パラメータを生成する。The synthesis parameter generation unit 306 sequentially reads out corresponding speech units from the speech unit file 105 according to the phoneme sequence generated by the language processing unit 303,
The phoneme parameters corresponding to the phoneme sequence are generated by connecting the read speech units. The synthesis parameter generation unit 306 generates a prosody parameter including a pitch pattern and voiced / unvoiced information based on the prosody information generated by the language processing unit 303.

【００３６】この際、合成パラメータ生成部３０６は出
現単語記憶管理部３０４を参照し、音声合成の対象とな
る各単語についての（当該出現単語記憶管理部３０４に
より保持・管理されている）出現回数に基づいて、発声
速度、ピッチ、音量、声質などの各種パラメータを独立
にコントロールしながら、入力されたテキストに対応す
る音韻パラメータ及び韻律パラメータを生成する。また
合成パラメータ生成部３０６は、音量については、出現
回数に対応した値の音量パラメータを各単語別に生成
し、該当する音韻パラメータ及び韻律パラメータに対応
付けて設定する。At this time, the synthesis parameter generation unit 306 refers to the appearance word storage management unit 304, and the number of appearances (held and managed by the appearance word storage management unit 304) for each word to be subjected to speech synthesis. , While independently controlling various parameters such as utterance speed, pitch, volume, and voice quality, generate phonemic parameters and prosodic parameters corresponding to the input text. The synthesis parameter generation unit 306 generates a volume parameter of a value corresponding to the number of appearances for each word, and sets the volume in association with the corresponding phoneme parameter and prosody parameter.

【００３７】音声合成部１０７は、合成パラメータ生成
部３０６によって生成された音韻パラメータと韻律パラ
メータから、音源の生成とディジタルフィルタリング処
理を行って（入力されたテキストに対する）合成音声を
生成し、スピーカ１０８に出力する。The speech synthesizer 107 generates a sound source from the phoneme parameter and the prosody parameter generated by the synthesis parameter generator 306 and performs digital filtering processing to generate a synthesized voice (with respect to the input text). Output to

【００３８】このように本実施形態においては、出現単
語記憶管理部３０４により保持・管理されている、音声
合成の対象となる各単語についての出現回数に基づい
て、合成音声の特性を変えることができる。これによ
り、例えば初めて出現した単語はゆっくりとした口調で
発声することで、耳慣れない（馴染みのない）単語でも
聞き手が正しく知覚することができ、２回以上出現した
単語は標準の速度で発声することで、聞き手がいらいら
しないで済む。なお、出現回数が多い単語ほど発声速度
が（予め定められたある速度を上限として）上げられる
構成であっても構わない。この他、初めて出現した単語
は、標準とは異なるパラメータ値を適用し、２回以上出
現した単語は予め定められた標準のパラメータ値を適用
することで、つまり初めて出現した単語は他とは（今ま
でとは）異なった発声速度、ピッチ、音量、或いは声質
で発声することで、聞き手の注意を引くこともできる。［第４の実施形態］図４は本発明の第３の実施形態に係
る音声合成装置の概略構成を示すブロック図であり、図
１と同一部分には同一符号を付してある。As described above, in the present embodiment, it is possible to change the characteristics of the synthesized speech based on the number of appearances of each word to be subjected to speech synthesis, which is held and managed by the appearance word storage management unit 304. it can. Thus, for example, a word that appears for the first time is uttered with a slow tone, so that the listener can correctly perceive even an unfamiliar (unfamiliar) word, and a word that appears twice or more utters at a standard speed. That saves the listener from getting frustrated. Note that a configuration may be adopted in which the utterance speed of a word having a higher number of appearances can be increased (up to a predetermined speed). In addition, a word that appears for the first time uses a parameter value different from the standard, and a word that appears twice or more applies a predetermined standard parameter value. Speaking at a different rate, pitch, volume, or vocal quality can also draw the listener's attention. [Fourth Embodiment] FIG. 4 is a block diagram showing a schematic configuration of a speech synthesizer according to a third embodiment of the present invention, and the same parts as those in FIG.

【００３９】図４の音声合成装置が図１の音声合成装置
と異なる点は、図１中の音声合成部１０７に代えて、音
声の合成に用いられる音韻パラメータ及び韻律パラメー
タを、対応するテキスト（未発声テキスト）と共に保持
するためのバッファ（出力待ちバッファ）４０７ａを有
する音声合成部４０７を備えると共に、図１中の発声時
間計測タイマ１０４に代えて文章変換部４０４が備えら
れていることである。この文章変換部４０４は、上記バ
ッファ４０７ａ中の音韻パラメータ及び韻律パラメータ
に基づく発声が終了する前に、次のテキストについての
発声が必要な場合に、ユーザの選択指定に応じて、バッ
ファ４０７ａ中の未発声の単語列に基づいて要約を作成
する機能を有している。文章変換部４０４はまた、バッ
ファ４０７ａ中の未発声の音韻パラメータ及び韻律パラ
メータを、ユーザの選択指定に応じて、発声速度が上が
るように音声合成部４０７により書き換えさせる、或い
は未発声の単語列の合成音をフェードアウトする指示を
音声合成部４０７に与える機能（制御機能）も有してい
る。文章変換部４０４は更に、バッファ４０７ａ中の未
発声の単語列を、ユーザの選択指定に応じて、次のテキ
スト（文章）と結合して簡略化する機能も有している。The difference between the speech synthesizer shown in FIG. 4 and the speech synthesizer shown in FIG. 1 is that the speech synthesis unit 107 shown in FIG. 1 is replaced by a phoneme parameter and a prosodic parameter used for speech synthesis. A speech synthesizing unit 407 having a buffer (output waiting buffer) 407a for holding together with the unuttered text) is provided, and a sentence conversion unit 404 is provided instead of the speech time measuring timer 104 in FIG. . When the utterance for the next text is required before the utterance based on the phoneme parameter and the prosody parameter in the buffer 407a is completed, the sentence conversion unit 404 responds to the user's selection designation in the buffer 407a. It has a function of creating a summary based on an unuttered word string. The sentence conversion unit 404 also causes the voice synthesis unit 407 to rewrite the unvoiced phonological parameters and prosodic parameters in the buffer 407a so as to increase the utterance speed according to the user's selection designation, or to change the unvoiced word string. It also has a function (control function) of giving an instruction to fade out the synthesized sound to the voice synthesis unit 407. The sentence conversion unit 404 further has a function of combining the unspoken word string in the buffer 407a with the next text (sentence) according to the user's selection designation to simplify it.

【００４０】また、図４の音声合成装置が図１の音声合
成装置と異なる点は、図１中の入力部１０１に代えて、
（次のテキストを対象とする）新たな発声要求を文章変
換部４０４にも通知する入力部４０１を備え、図１中の
言語処理部１０３に代えて、入力部４０１により入力さ
れたテキストに対応する音韻系列と韻律情報、及び文章
変換部４０４から与えられる要約に対応する音韻系列と
韻律情報をそれぞれ生成する言語処理部４０３を備え、
そして図１中の合成パラメータ生成部１０６に代えて、
言語処理部４０３により生成された音韻系列と韻律情報
に従って、対応する音韻パラメータと韻律パラメータを
生成する合成パラメータ生成部４０６を備えていること
である。The difference between the speech synthesizer of FIG. 4 and the speech synthesizer of FIG. 1 is that, instead of the input unit 101 in FIG.
An input unit 401 for notifying a new utterance request (for the next text) to the sentence conversion unit 404 is also provided. The input unit 401 corresponds to the text input by the input unit 401 instead of the language processing unit 103 in FIG. A linguistic processing unit 403 for generating a phonological sequence and prosodic information, and a phonological sequence and prosodic information corresponding to the abstract given from the sentence converting unit 404, respectively.
Then, instead of the synthesis parameter generation unit 106 in FIG.
It has a synthesis parameter generation unit 406 that generates corresponding phonological parameters and prosody parameters according to the phonological sequence and prosody information generated by the language processing unit 403.

【００４１】図４の構成において、入力部４０１から発
声要求が与えられ、音声合成の対象となる漢字かな混じ
り文のテキストが入力されると、言語処理部４０３は当
該入力されたテキストと単語辞書１０２を照合し、テキ
ストに含まれる単語や旬等についてのアクセント型、読
み、品詞情報を求め、その品詞情報に伴うアクセント
型、境界を決定することにより、漢字かな混じり文の読
みの形式への変換を行い、音韻系列と韻律情報を生成す
る。In the configuration of FIG. 4, when an utterance request is given from the input unit 401 and a text of a kanji-kana mixed sentence to be subjected to speech synthesis is input, the language processing unit 403 executes the input text and the word dictionary. By comparing 102 with the words and seasons included in the text, the accent type, reading, and part of speech information are obtained, and the accent type and the boundary associated with the part of speech information are determined, so that the kanji kana mixed sentence can be read. A conversion is performed to generate a phoneme sequence and prosody information.

【００４２】合成パラメータ生成部４０６は、言語処理
部４０３により生成された音韻系列に従って、対応する
音声素片を音声素片ファイル１０５から順次読み出し、
読み出した音声素片を接続することにより当該音韻系列
に対応する音韻パラメータを生成する。また合成パラメ
ータ生成部４０６は、言語処理部４０３により生成され
た韻律情報をもとに、ピッチパターンと有声・無声情報
からなる韻律パラメータを生成する。The synthesis parameter generation unit 406 sequentially reads out corresponding speech units from the speech unit file 105 in accordance with the phoneme sequence generated by the language processing unit 403,
The phoneme parameters corresponding to the phoneme sequence are generated by connecting the read speech units. Further, the synthesis parameter generation unit 406 generates a prosody parameter including a pitch pattern and voiced / unvoiced information based on the prosody information generated by the language processing unit 403.

【００４３】音声合成部４０７は、合成パラメータ生成
部４０６によって生成された音韻パラメータ及び韻律パ
ラメータを受け取ってバッファ４０７ａに一時的に保持
する。そして音声合成部４０７は、合成音声の発声速度
と連動して、バッファ４０７ａ中の音韻パラメータ及び
韻律パラメータをもとに、音源の生成とディジタルフィ
ルタリング処理を行って入力されたテキストに対する合
成音声を生成し、スピーカ１０８に出力する。The speech synthesis unit 407 receives the phoneme parameters and the prosody parameters generated by the synthesis parameter generation unit 406, and temporarily stores them in the buffer 407a. Then, the speech synthesis unit 407 performs a sound source generation and a digital filtering process on the basis of the phonemic parameters and the prosodic parameters in the buffer 407a in conjunction with the utterance speed of the synthesized speech to generate a synthesized speech for the input text. Then, the signal is output to the speaker 108.

【００４４】このように、入力されたテキストに対する
合成音声の出力の期間中に、即ちバッファ４０７ａ中の
パラメータに対する発声が終了する前に、次の発声要求
が入力部４０１から入力されたものとする。As described above, it is assumed that the next utterance request is input from the input unit 401 during the output of the synthesized voice for the input text, that is, before the utterance for the parameters in the buffer 407a ends. .

【００４５】この場合、文章変換部４０４は、音声合成
部４０７に問い合わせて、バッファ４０７ａ中の未発声
のパラメータに対応する（入力テキスト中の）単語列
（つまり、未発声の単語列）を調査し、その未発声の単
語列に基づいて要約を作成する要約作成処理、または未
発声の単語列を新たに発声要求された入力部４０１から
のテキスト（文章）と結合して簡略化する文章結合・簡
略化処理、または音声合成部４０７に対して未発声の単
語列の発声速度を速める指示を与える処理、または音声
合成部４０７に対して未発声の単語列の合成音をフェー
ドアウトする指示を与える処理のいずれか１つを実行す
る。ここで、いずれを実行するかは、予めユーザ操作に
よって選択指定できるようになっている。In this case, the sentence conversion unit 404 makes an inquiry to the speech synthesis unit 407 to investigate a word string (in the input text) corresponding to the unvoiced parameter in the buffer 407a (ie, an unvoiced word string). Then, summarization processing for creating a summary based on the unuttered word string, or sentence combining for simplifying the unspoken word string by combining it with a text (sentence) from the input unit 401 for which a new utterance request has been made Simplification processing, processing for giving an instruction to the speech synthesizer 407 to increase the utterance speed of the unuttered word string, or giving an instruction to the speech synthesizer 407 to fade out the synthesized sound of the unuttered word string Execute any one of the processes. Here, which one to execute can be selected and designated by a user operation in advance.

【００４６】まず、要約作成処理が指定されている場合
について説明する。この場合、文章変換部４０４は、音
声合成部４０７のバッファ４０７ａに保持されている未
発声の単語列から、例えば「船坂他，“冗長度削減によ
る関連新聞記事の要約”，信学技報，NLC96-15，pp.39-
46，1996-07 」に記載されているような要約技法によ
り、要約を作成する。そして文章変換部４０４は、作成
した要約を言語処理部４０３に渡して、言語処理を要求
する。同時に文章変換部４０４は、音声合成部４０７に
対してバッファ４０７ａ内の未発声の単語列と対応する
音韻パラメータ及び韻律パラメータとを無効化、例えば
クリアさせる。First, the case where the digest creation processing is specified will be described. In this case, the sentence conversion unit 404 uses, for example, “Funasaka et al.,“ Summary of related newspaper article by redundancy reduction ”, IEICE technical report, from the unuttered word string held in the buffer 407a of the speech synthesis unit 407. NLC96-15, pp.39-
46, 1996-07 ", and summarization is performed by a summarization technique. Then, the sentence conversion unit 404 sends the created summary to the language processing unit 403, and requests language processing. At the same time, the sentence conversion unit 404 invalidates, for example, clears the phoneme parameter and the prosody parameter corresponding to the unvoiced word string in the buffer 407a in the speech synthesis unit 407.

【００４７】言語処理部４０３は、文章変換部４０４か
ら要約を受け取ると、入力部４０１から入力される新た
なテキストに対する言語処理に先立ち、当該要約に対す
る言語処理を行って対応する音韻系列と韻律情報を生成
し、合成パラメータ生成部４０６に出力する。続いて言
語処理部４０３は、上記新たなテキストに対する言語処
理を開始する。When the language processing unit 403 receives the digest from the sentence conversion unit 404, it performs linguistic processing on the summary before performing linguistic processing on the new text input from the input unit 401, and performs the corresponding phonological sequence and prosodic information. Is generated and output to the synthesis parameter generation unit 406. Subsequently, the language processing unit 403 starts language processing on the new text.

【００４８】合成パラメータ生成部４０６は、言語処理
部４０３から要約の音韻系列及び韻律情報を受け取る
と、音韻系列をもとに対応する音韻パラメータを生成
し、同じく韻律情報をもとに対応する韻律パラメータを
生成する。Upon receiving the summary phoneme sequence and the prosody information from the language processing unit 403, the synthesis parameter generation unit 406 generates a corresponding phoneme parameter based on the phoneme sequence, and similarly generates a corresponding prosody based on the prosody information. Generate parameters.

【００４９】音声合成部４０７は、合成パラメータ生成
部４０６によって生成された要約の音韻パラメータ及び
韻律パラメータを受け取ってバッファ４０７ａに一時的
に保持し、当該音韻パラメータ及び韻律パラメータをも
とに、音源の生成とディジタルフィルタリング処理を行
って要約に対する合成音声を生成し、スピーカ１０８に
出力する。これにより、現在出力中の音声が終わる前
に、次の出力要求があった場合、現在出力中の内容を短
時間で終了し、次の発声要求に対する処理に速やかに移
行することができる。The speech synthesizer 407 receives the summary phoneme parameters and prosody parameters generated by the synthesis parameter generator 406, and temporarily stores them in the buffer 407a. Generation and digital filtering are performed to generate a synthesized speech for the summary, and output to the speaker 108. Accordingly, if there is a next output request before the currently output voice ends, the content currently being output can be completed in a short time, and the process can immediately shift to the process for the next utterance request.

【００５０】なお、上記のような要約を作成する代わり
に、単に未発声の単語列から、予め定められたキーワー
ドを抽出し、その抽出したキーワード（またはキーワー
ドの列）の合成音を生成するようにしても構わない。そ
のためには、キーワードの候補となる単語を予めテーブ
ル等に登録しておき、未発声の単語列の中からその登録
単語に一致する単語を探すようにすればよい。未発声の
単語列からの要約の作成も、キーワードの抽出も、未発
声の単語列（からなる文章）を簡略化する点では同様で
ある。Instead of creating a summary as described above, a predetermined keyword is simply extracted from an unuttered word string, and a synthesized sound of the extracted keyword (or keyword string) is generated. It does not matter. For that purpose, a word that is a candidate for a keyword may be registered in a table or the like in advance, and a word that matches the registered word may be searched from an unuttered word string. The creation of an abstract from an unuttered word string and the extraction of a keyword are similar in that the unuttered word string (a sentence) is simplified.

【００５１】次に、文章結合・簡略化処理が指定されて
いる場合について説明する。この場合、文章変換部４０
４は、音声合成部４０７のバッファ４０７に保持されて
いる未発声の単語列と、入力部４０１から入力される新
たなテキスト中の例えば先頭の１文とを結合して、その
要約を作成する。そして文章変換部４０４は、作成した
要約と新たなテキスト中の後続の文字列とを言語処理部
４０３に渡して、言語処理を要求する。同時に文章変換
部４０４は、音声合成部４０７に対してバッファ４０７
ａ内の未発声の単語列と対応する音韻パラメータ及び韻
律パラメータとをクリア（無効化）させる。なお、未発
声の単語列と新たなテキスト全体とを結合して、その要
約を作成するようにしても構わない。Next, a case where the text combining / simplifying process is specified will be described. In this case, the sentence conversion unit 40
Reference numeral 4 combines an unuttered word string held in the buffer 407 of the speech synthesis unit 407 with, for example, the first sentence in a new text input from the input unit 401 to create a summary thereof. . Then, the sentence conversion unit 404 passes the created summary and the subsequent character string in the new text to the language processing unit 403, and requests language processing. At the same time, the sentence conversion unit 404 sends a buffer 407 to the speech synthesis unit 407.
The phonetic parameter and the prosodic parameter corresponding to the unvoiced word string in a are cleared (invalidated). It should be noted that the unuttered word string and the entire new text may be combined to create a summary.

【００５２】言語処理部４０３は、文章変換部４０４か
ら要約及び新たなテキスト中の後続の文字列とを受け取
ると、その受け取った文章に対する言語処理を行って対
応する音韻系列と韻律情報を生成し、合成パラメータ生
成部４０６に出力する。When the language processing unit 403 receives the digest and the subsequent character string in the new text from the sentence conversion unit 404, the language processing unit 403 performs linguistic processing on the received sentence to generate a corresponding phoneme sequence and prosodic information. , To the synthesis parameter generation unit 406.

【００５３】合成パラメータ生成部４０６は、言語処理
部４０３から音韻系列及び韻律情報を受け取ると、音韻
系列をもとに対応する音韻パラメータを生成し、同じく
韻律情報をもとに対応する韻律パラメータを生成する。Upon receiving the phoneme sequence and the prosody information from the language processing unit 403, the synthesis parameter generation unit 406 generates a corresponding phoneme parameter based on the phoneme sequence, and similarly generates a corresponding prosody parameter based on the prosody information. Generate.

【００５４】音声合成部４０７は、合成パラメータ生成
部４０６によって生成された音韻パラメータ及び韻律パ
ラメータを受け取ってバッファ４０７ａに一時的に保持
し、当該音韻パラメータ及び韻律パラメータをもとに、
音源の生成とディジタルフィルタリング処理を行って合
成音声を生成し、スピーカ１０８に出力する。これによ
り、現在出力中の音声が終わる前に、次の出力要求があ
った場合、現在出力中の内容を短時間で終了し、次の発
声要求に対する処理に速やかに移行することができる。The speech synthesis unit 407 receives the phoneme parameters and the prosody parameters generated by the synthesis parameter generation unit 406 and temporarily stores them in the buffer 407a, and based on the phoneme parameters and the prosody parameters,
A sound source is generated and a digital filtering process is performed to generate a synthesized speech, which is output to the speaker 108. Accordingly, if there is a next output request before the currently output voice ends, the content currently being output can be completed in a short time, and the process can immediately shift to the process for the next utterance request.

【００５５】次に、発声速度を速めること（発声速度ア
ップ）が指定されている場合について説明する。この場
合、文章変換部４０４は、音声合成部４０７に対して未
発声の単語列の発声速度を速める指示を与えると共に、
言語処理部４０２に対して上記新たなテキストに対する
言語処理を開始させる。Next, a case in which an increase in the utterance speed (increase in utterance speed) is specified will be described. In this case, the sentence conversion unit 404 gives an instruction to the speech synthesis unit 407 to increase the utterance speed of the unuttered word string,
It causes the language processing unit 402 to start language processing on the new text.

【００５６】音声合成部４０７は、文章変換部４０４か
らの上記指示を受け取ると、バッファ４０７ａに保持さ
れている未発声の単語列に対応する音韻パラメータ及び
韻律パラメータの少なくとも一方を対象に、発声速度が
早くなるように変更を施す。そして音声合成部４０７
は、この変更後の音韻パラメータ及び韻律パラメータを
もとに、音源の生成とディジタルフィルタリング処理を
行って合成音声を生成し、スピーカ１０８に出力する。
これにより、現在出力中の音声が終わる前に、次の出力
要求があった場合、現在出力中の内容を短時間で終了
し、次の発声要求に対する処理に速やかに移行すること
ができる。Upon receiving the above instruction from the sentence conversion unit 404, the speech synthesis unit 407 subjects the utterance speed to at least one of the phonemic parameter and the prosodic parameter corresponding to the unvoiced word string held in the buffer 407a. Make changes to be faster. And the speech synthesis unit 407
Generates a synthesized voice by performing sound source generation and digital filtering processing based on the changed phonemic parameters and prosodic parameters, and outputs the synthesized voice to the speaker 108.
Accordingly, if there is a next output request before the currently output voice ends, the content currently being output can be completed in a short time, and the process can immediately shift to the process for the next utterance request.

【００５７】次に、フェードアウトが指定されている場
合について説明する。この場合、文章変換部４０４は、
音声合成部４０７に対して未発声の単語列の発声をフェ
ードアウトする指示を与えると共に、言語処理部４０２
に対して上記新たなテキストに対する言語処理を開始さ
せる。Next, the case where fade-out is designated will be described. In this case, the sentence conversion unit 404
An instruction to fade out the utterance of the unuttered word string is given to the speech synthesizer 407 and the language processor 402
Starts language processing for the new text.

【００５８】音声合成部４０７は、文章変換部４０４か
らの上記指示を受け取ると、バッファ４０７ａに保持さ
れている未発声の単語列に対応する音韻パラメータ及び
韻律パラメータをもとに、音源の生成とディジタルフィ
ルタリング処理を行って合成音声を生成する。この際、
音声合成部４０７は、一定時間Ｔ後に音量が所定レベル
以下となるように合成音声の音量を徐々に小さくしなが
らスピーカ１０８に出力する。そして音声合成部４０７
は、合成音声の音量が所定レベル以下になると、合成音
声の出力を強制終了し、バッファ４０７ａ内の未発声の
単語列と対応する音韻パラメータ及び韻律パラメータと
をクリア（無効化）する。これにより、現在出力中の音
声が終わる前に次の出力要求があった場合、現在出力中
の内容を短時間で強制終了し、次の発声要求に対する処
理に速やかに移行することができる。しかも、音量を徐
々に下げて所定レベル以下となったところで強制終了す
るため、聞き手に違和感を与えずに、次の要求内容の出
力処理に進むことができる。Upon receiving the above instruction from the sentence conversion unit 404, the speech synthesis unit 407 generates a sound source based on the phoneme parameters and the prosody parameters corresponding to the unvoiced word string held in the buffer 407a. A synthetic speech is generated by performing a digital filtering process. On this occasion,
The voice synthesizer 407 outputs the synthesized voice to the speaker 108 while gradually lowering the volume of the synthesized voice so that the volume becomes equal to or lower than a predetermined level after a predetermined time T. And the speech synthesis unit 407
When the volume of the synthesized voice falls below a predetermined level, the output of the synthesized voice is forcibly terminated, and the unvoiced word string and the corresponding phonemic parameters and prosodic parameters in the buffer 407a are cleared (invalidated). Thus, if there is a next output request before the end of the currently output voice, the content currently being output can be forcibly terminated in a short time, and the process can immediately proceed to the process for the next utterance request. In addition, since the volume is gradually lowered and the forced termination is performed when the volume becomes equal to or lower than the predetermined level, it is possible to proceed to the output processing of the next request content without giving a sense of incongruity to the listener.

【００５９】このように本実施形態においては、現在出
力中の音声が終わる前に次の出力要求があった場合、現
在出力中の内容を聞き手に違和感を与えずに短時間で終
了し、次の発声要求に対する処理に速やかに移行するこ
とができるため、例えばカーナビゲーション装置におい
て、ある交差点での左折または右折の案内の音声出力が
終わる前に、その次の交差点での左折または右折の案内
の音声出力要求が発生する場合などに適している。As described above, in the present embodiment, if there is a next output request before the currently output voice ends, the content being output currently ends in a short time without giving the listener an uncomfortable feeling. For example, in the car navigation device, before the voice output of the guidance of the left or right turn at an intersection is completed, the guidance of the guidance of the left or right turn at the next intersection may be performed. It is suitable when a voice output request is issued.

【００６０】なお、上記第４の実施形態では、要約作成
処理、文章結合・簡略化処理、発声速度を速める指示を
与える処理、またはフェードアウトする指示を与える処
理のいずれか１つを、ユーザ操作によって選択指定でき
るものとして説明したが、上記処理のうちの１つだけを
行うことが固定的に定められているものであっても構わ
ない。In the fourth embodiment, any one of the process of summarizing, the process of combining and simplifying sentences, the process of giving an instruction to increase the utterance speed, and the process of giving an instruction to fade out are performed by a user operation. Although described as being selectable, it may be fixedly determined that only one of the above processes is performed.

【００６１】また、以上に述べた図１乃至図４の構成の
音声合成装置は、いずれも、コンピュータ、例えば図５
に示すようなスピーカ１０８を内蔵したパーソナルコン
ピュータ５０１に、第１乃至第４の実施形態で適用した
ような、テキストから音声を生成する文音声合成処理
（文音声変換処理）用のプログラムが記録された記録媒
体、例えばＣＤ−ＲＯＭ５０２を装着して、当該ＣＤ−
ＲＯＭ５０２に記録されているプログラムをパーソナル
コンピュータ５０１で読み取り実行させることによって
も実現可能である。また、上記文音声合成処理用のプロ
グラムは、ＣＤ−ＲＯＭ５０２の他に、フロッピーディ
スク、メモリカード等の記録媒体、或いはネットワーク
等の通信媒体により供給することも可能である。Each of the above-described speech synthesizers having the structures shown in FIGS. 1 to 4 is a computer, for example, FIG.
A program for a sentence-to-speech synthesis process (sentence-to-speech conversion process) for generating speech from text, as applied in the first to fourth embodiments, is recorded in a personal computer 501 having a built-in speaker 108 as shown in FIG. The recording medium, for example, the CD-ROM 502,
It can also be realized by reading and executing the program recorded in the ROM 502 by the personal computer 501. Further, the program for the sentence / speech synthesis processing can be supplied from a recording medium such as a floppy disk, a memory card, or a communication medium such as a network in addition to the CD-ROM 502.

【００６２】[0062]

【発明の効果】以上詳述したように本発明によれば、同
じ文章でも、発声開始からの経過時間に応じて自動的に
合成音の特性を変えることができる。また、本発明によ
れば、同じ文章でも、発生時の時刻やカレンダ情報等の
計時情報に応じて自動的に合成音の特性を変えることが
できる。As described in detail above, according to the present invention, the characteristics of the synthesized speech can be automatically changed according to the elapsed time from the start of the utterance even for the same sentence. Further, according to the present invention, even for the same sentence, the characteristics of the synthesized sound can be automatically changed according to time information such as the time of occurrence and calendar information.

【００６３】また、本発明によれば、同じ単語でも、そ
の出現回数に応じて自動的に合成音の特性を変えること
ができる。また、本発明によれば、現在出力中の音声が
終わる前に次の出力要求があった場合、現在出力中の内
容を聞き手に違和感を与えずに短時間で終了し、次の発
声要求に対する処理に速やかに移行することができる。
このように本発明によれば、同じ文章でも時と場合に応
じて自動的に読み方を変えることができる。Further, according to the present invention, even for the same word, the characteristics of synthesized speech can be automatically changed according to the number of appearances. Further, according to the present invention, when the next output request is issued before the currently output voice ends, the content currently being output is completed in a short time without giving a listener a sense of incongruity, and the next output request is issued. It is possible to shift to processing promptly.
As described above, according to the present invention, how to read the same sentence can be automatically changed according to time and case.

[Brief description of the drawings]

【図１】本発明の第１の実施形態に係る音声合成装置の
概略構成を示すブロック図。FIG. 1 is a block diagram showing a schematic configuration of a speech synthesizer according to a first embodiment of the present invention.

【図２】本発明の第２の実施形態に係る音声合成装置の
概略構成を示すブロック図。FIG. 2 is a block diagram showing a schematic configuration of a speech synthesizer according to a second embodiment of the present invention.

【図３】本発明の第３の実施形態に係る音声合成装置の
概略構成を示すブロック図。FIG. 3 is a block diagram showing a schematic configuration of a speech synthesizer according to a third embodiment of the present invention.

【図４】本発明の第４の実施形態に係る音声合成装置の
概略構成を示すブロック図。FIG. 4 is a block diagram showing a schematic configuration of a speech synthesizer according to a fourth embodiment of the present invention.

【図５】図１乃至図４の音声合成装置を実現可能なパー
ソナルコンピュータの外観を示す図。FIG. 5 is an exemplary external view of a personal computer capable of realizing the speech synthesizer shown in FIGS. 1 to 4;

【符号の説明】１０１，４０１…入力部１０２…単語辞書１０３，３０３，４０３…言語処理部１０４…発声時間計測タイマ１０５…音声素片ファイル１０６，２０６，３０６，４０６…合成パラメータ生成
部１０７，４０７…音声合成部１０８…スピーカ２０４…計時部２０９…条件テーブル３０４…出現単語管理部４０４…文章変換部（制御手段）４０７ａ…バッファ[Explanation of Codes] 101, 401: Input unit 102: Word dictionary 103, 303, 403: Language processing unit 104: Speech time measurement timer 105: Speech unit files 106, 206, 306, 406: Synthesis parameter generation unit 107, 407 voice synthesis unit 108 speaker 204 clock unit 209 condition table 304 appearance word management unit 404 text conversion unit (control means) 407a buffer

Claims

[Claims]

1. A speech synthesis method for converting input text data into a symbol string composed of a phoneme sequence and prosody information by language processing and generating a synthesized speech from the symbol string. A voice synthesizing method characterized by measuring an elapsed time from the start of utterance and changing characteristics of the synthesized sound according to the measured elapsed time.

2. A speech synthesis method for converting input text data into a symbol string comprising a phoneme sequence and prosodic information by language processing and generating a synthesized sound from the symbol string. A voice synthesizing method characterized by acquiring at least one piece of timekeeping information and changing a characteristic of the synthesized sound according to the acquired timekeeping information.

3. A speech synthesis method for converting input text data into a symbol string comprising a phoneme sequence and prosodic information by language processing, and generating a synthesized sound from the symbol string, wherein all of the output of the synthesized sound are performed. In the case where the number of appearances is retained and managed for each word in the text data of and the synthesized speech is generated based on the input text data, the number of appearances is examined for each word in the text data. A speech synthesis method characterized by changing characteristics of a synthesized sound between a word that appears and a word that appears two or more times.

4. A speech synthesis method for converting input text data into a symbol string composed of a phoneme sequence and prosodic information by language processing and generating a synthesized speech from the symbol string. If there is a request to output a synthesized voice for the next text data before output is completed, the text data portion corresponding to the unoutput synthesized voice is simplified, and the synthesized voice of the simplified text data portion is output. A speech synthesis method characterized by generating and outputting in place of the original text data portion.

5. A speech synthesis method for converting input text data into a symbol string composed of a phoneme sequence and prosodic information by language processing and generating a synthesized speech from the symbol string. If there is a request to output a synthesized voice for the next text data before output is completed, the text data portion corresponding to the unoutput synthesized voice and at least the first sentence of the next text data are combined, and A speech synthesis method characterized by simplifying a combined sentence portion, and generating and outputting a synthesized speech of the simplified sentence portion instead of the original sentence portion.

6. A speech synthesis method for converting input text data into a symbol string composed of a phoneme sequence and prosodic information by language processing and generating a synthesized speech from the symbol string. A speech synthesizing method characterized in that when there is a request to output a synthesized voice for the next text data before the output is completed, control is performed so as to increase the utterance speed of a non-output synthesized voice.

7. A speech synthesis method for converting input text data into a symbol string composed of a phoneme sequence and prosody information by language processing and generating a synthesized speech from the symbol string. If there is a request to output a synthesized voice for the next text data before the output is completed, the unoutput synthesized voice is output while gradually lowering the volume. A speech synthesis method characterized by switching to output of a synthesized speech for the text data.

8. A speech synthesizer for converting input text data into a symbol string composed of a phoneme sequence and prosodic information by language processing and generating a synthesized speech from the symbol string. Utterance time measurement means for measuring an elapsed time from the start of utterance synthesis parameter generation means for generating a synthesis parameter to be used for speech synthesis from a phoneme sequence corresponding to the text data and prosody information, wherein the utterance time measurement means Synthetic parameter generating means for generating the synthetic parameters so that the characteristics of the synthetic sound are varied according to the measured elapsed time, and voice synthesizing means for generating synthetic sounds in accordance with the synthetic parameters generated by the synthetic parameter generating means A speech synthesizer comprising:

9. A speech synthesizer for converting input text data into a symbol string comprising a phoneme sequence and prosodic information by language processing and generating a synthesized speech from the symbol string. Clocking means for acquiring at least one of the clocking information; and synthesizing parameter generating means for generating a synthesizing parameter to be used for speech synthesis from a phonological sequence corresponding to the text data and prosodic information. A synthesis parameter generation unit that generates the synthesis parameter such that a characteristic of the synthesis sound is changed according to timing information at the time of utterance of the synthesis sound corresponding to the text data; and a synthesis parameter generation unit that generates the synthesis parameter. A voice synthesizing means for generating a synthesized voice in accordance with the synthesis parameter.

10. A speech synthesizer for converting input text data into a symbol string composed of a phoneme sequence and prosodic information by language processing and generating a synthesized sound from the symbol string, wherein all of the output of the synthesized sound are performed. An appearance word management unit that holds and manages the number of appearances for each word in the text data; and a synthesis parameter generation unit that generates a synthesis parameter to be used for speech synthesis from a phoneme sequence and prosody information corresponding to the text data. For each word in the text data, the number of appearances is checked based on the management content of the appearing word management means, and the characteristics of the synthesized sound change at least between the word that first appears and the word that appears twice or more. Parameter generation means for generating the synthesis parameter as described above, and the synthesis parameter generated by the synthesis parameter generation means. And a voice synthesizing means for generating a synthesized voice in accordance with data.

11. A language processing means for converting input text data into a symbol string comprising a phoneme sequence and prosody information by language processing, and a phoneme sequence and prosody information corresponding to the text data converted by the language processing means. Synthesizing parameter generating means for generating a synthesizing parameter to be used for speech synthesis from a synthesizing parameter; buffer means for holding the synthesizing parameter generated by the synthesizing parameter generating means and text data corresponding to the parameter; A voice synthesizer for generating and outputting a synthesized voice in accordance with the stored synthesis parameters; and a request for outputting a synthesized voice for the next text data before the generation and output of the synthesized voice by the voice synthesizer are completed. The text data portion in the buffer means corresponding to the unoutput synthesized sound. Simplified, the conjunction to perform language processing by the language processing unit, a voice synthesizing apparatus characterized by comprising a text converting means for invalidating the corresponding contents held in the buffer means.

12. A language processing means for converting input text data into a symbol string comprising a phonological sequence and prosodic information by linguistic processing, and a phonological sequence and prosodic information corresponding to said text data converted by said linguistic processing means. Synthesizing parameter generating means for generating a synthesizing parameter to be used for speech synthesis from a synthesizing parameter; buffer means for holding the synthesizing parameter generated by the synthesizing parameter generating means and text data corresponding to the parameter; A voice synthesizer for generating and outputting a synthesized voice in accordance with the stored synthesis parameters; and a request for outputting a synthesized voice for the next text data before the generation and output of the synthesized voice by the voice synthesizer are completed. The text data portion in the buffer means corresponding to the unoutput synthesized sound. A sentence that combines at least the first sentence of the next text data, simplifies the combined sentence portion, performs language processing by the language processing unit, and invalidates the corresponding held content in the buffer unit. A speech synthesizing device comprising: a conversion unit.

13. A speech synthesizer for converting input text data into a symbol sequence comprising a phoneme sequence and prosodic information by language processing and generating a synthesized speech from the symbol sequence. Synthesis parameter generation means for generating synthesis parameters to be used for speech synthesis from prosody information; buffer means for holding the synthesis parameters generated by the synthesis parameter generation means and text data corresponding to the parameters; and the buffer Means for generating and outputting a synthesized sound in accordance with the synthesis parameters held in the means; and a request for outputting a synthesized sound for the next text data before the generation and output of the synthesized sound by the voice synthesis means are completed. If there is, the synthesis parameter in the buffer means corresponding to the unoutput synthesized sound Speech synthesis apparatus characterized by comprising a control means for performing control for rewriting as utterance speed is quickened.

14. A speech synthesizer for converting input text data into a symbol sequence comprising a phoneme sequence and prosodic information by language processing, and generating a synthesized speech from the symbol sequence. Synthesis parameter generation means for generating synthesis parameters to be used for speech synthesis from prosody information; buffer means for holding the synthesis parameters generated by the synthesis parameter generation means and text data corresponding to the parameters; and the buffer Speech synthesis means for generating and outputting a synthesized sound according to the synthesis parameters held in the means,
If there is a request to output a synthesized sound for the next text data before the generation and output of the synthesized sound is completed, the volume gradually decreases according to the synthesis parameters in the buffer means corresponding to the unoutputted synthesized sound. Sound synthesis means for generating and outputting a synthesized sound so as to generate and output a synthesized sound according to a synthesis parameter corresponding to the next text data when the sound volume falls below a predetermined level. A speech synthesizer characterized by the following.

15. Input text data is converted into a symbol string comprising a phoneme sequence and prosodic information by language processing,
A program for generating a synthesized sound from the symbol string, wherein the program monitors an elapsed time from the start of utterance of the synthesized sound corresponding to the text data, and varies a characteristic of the synthesized sound according to the elapsed time. A computer-readable recording medium on which is recorded.

16. Input text data is converted into a symbol string comprising a phoneme sequence and prosodic information by language processing,
A program for generating a synthetic sound from the symbol string, wherein if a synthetic sound output request for the next text data is requested before output of the synthetic sound corresponding to the text data is completed, A computer-readable recording medium storing a program for simplifying a text data portion corresponding to a synthesized voice of the above, and generating and outputting the synthesized voice of the simplified text data portion in place of the original text data portion.