JPH08146991A

JPH08146991A - Information processing apparatus and control method thereof

Info

Publication number: JPH08146991A
Application number: JP6283259A
Authority: JP
Inventors: Katsuhiko Kawasaki; 勝彦川崎; Yasuhiro Komori; 康弘小森; Yasunori Ohora; 恭則大洞
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 1994-11-17
Filing date: 1994-11-17
Publication date: 1996-06-07

Abstract

(57)【要約】【目的】音声入力による対話を自然な状態にさせると
共に、尚且つ、ユーザにとっては必要な情報を迅速に得
ることを可能にする。【構成】マイク１から入力された音声をＡ／Ｄ変換部
２を介してデジタル信号にした後、音声認識部３で認識
する。この認識した結果に基づく応答文を作成し、それ
を音声合成部６で音声信号にすると共に、それを表示部
５に表示する。この出力中に、ユーザが発声すると、そ
の出力を一時的に中断し、その発声内容に応じて再開す
るか、終了するかを決定する。 (57) [Abstract] [Purpose] It is possible to make a dialogue by voice input a natural state and quickly obtain necessary information for the user. [Structure] A voice input from a microphone 1 is converted into a digital signal via an A / D conversion unit 2, and then recognized by a voice recognition unit 3. A response sentence is created based on the recognized result, and the voice synthesis unit 6 converts it into a voice signal and displays it on the display unit 5. When the user utters during this output, the output is temporarily interrupted, and it is determined whether to resume or end according to the utterance content.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は情報処理装置及びその制
御方法、詳しくは発声に対して応答する情報処理装置及
びその制御方法に関するものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to an information processing apparatus and its control method, and more particularly to an information processing apparatus which responds to utterances and its control method.

【０００２】[0002]

【従来の技術】最近、音声対話装置が用いられるように
なってきた。図３を用いて、従来の音声対話方式につい
て説明する。まず、マイク等に向かってユーザが図３
（１）のように発声したとする。マイクに入力された音
声は、Ａ／Ｄ変換部２によってアナログ信号からディジ
タル信号に変換され、そのディジタル信号は音声認識部
３によって図３（１）のような日本語文に変換される。
この日本語文は対話管理部４によって解釈され、それに
対応する図３（２）のような応答文が作成される。この
応答文は音声合成部６によって単語に分割され、読みと
アクセントが付加され、音韻パラメータと韻律パラメー
タとのディジタル信号に変換され、Ｄ／Ａ変換部７によ
ってアナログ信号に変換されて、スピーカ８によって音
声として出力される。以上のような音声入力と音声出力
との繰り返しによ、ユーザとシステムとの対話が行なわ
れていた。2. Description of the Related Art Recently, a voice dialog device has been used. A conventional voice dialogue system will be described with reference to FIG. First, the user looks into the microphone, etc.
Suppose you uttered as in (1). The voice input to the microphone is converted from an analog signal to a digital signal by the A / D conversion unit 2, and the digital signal is converted into a Japanese sentence as shown in FIG.
This Japanese sentence is interpreted by the dialogue management unit 4, and a corresponding response sentence as shown in FIG. 3B is created. This response sentence is divided into words by the voice synthesis unit 6, added with reading and accent, converted into digital signals of phonological parameters and prosody parameters, converted into analog signals by the D / A conversion unit 7, and the speaker 8 Is output as voice. By repeating the voice input and the voice output as described above, the dialogue between the user and the system is performed.

【０００３】[0003]

【発明が解決しようとする課題】上述した従来の音声対
話方式では、音声出力中にユーザが音声を入力する際、
ユーザの入力を無視して合成が続いたり、ユーザの入力
により合成がストップしたりして、対話の自然性がなか
った。In the above-mentioned conventional voice interaction system, when the user inputs voice during voice output,
There was no naturalness of the dialogue because the user input was ignored and the composition continued, or the user input stopped the composition.

【０００４】[0004]

【課題を解決するための手段】及び[Means for Solving the Problems] and

【作用】本発明は、上記問題点に鑑みなされたものであ
り、音声入力による対話を自然な状態にさせると共に、
尚且つ、ユーザにとっては必要な情報を迅速に得ること
を可能ならしめる情報処理装置及びその制御方法を提供
しようとするものである。The present invention has been made in view of the above problems, and makes the dialogue by voice input natural.
Further, it is an object of the present invention to provide an information processing apparatus and a control method thereof that enable a user to quickly obtain necessary information.

【０００５】この課題を解決するため、例えば本発明の
情報処理装置は以下の構成を備える。すなわち、音声入
力手段と、音声認識手段と、前記音声入力手段に対して
応答する出力手段を備えた情報処理装置において、前記
音声入力手段から入力された音声を認識し、音声入力に
対応する文を生成する文生成手段と、該文生成手段で生
成された文に対応する出力文を構成し、前記出力手段を
制御して外部に出力する出力制御手段と、該出力制御手
段で出力中、前記音声入力手段からの入力を監視し、音
声入力があった場合に、当該出力を一時的に停止する第
１の制御手段と、入力された音声に応じて、出力途中で
あった応答文を再開するか、中断するかを制御する第２
の制御手段とを備える。To solve this problem, for example, the information processing apparatus of the present invention has the following configuration. That is, in an information processing device including a voice input unit, a voice recognition unit, and an output unit that responds to the voice input unit, a voice input from the voice input unit is recognized, and a sentence corresponding to the voice input is recognized. And an output control unit configured to form an output sentence corresponding to the sentence generated by the sentence generation unit and controlling the output unit to output the output sentence to the outside, and outputting by the output control unit, A first control unit that monitors the input from the voice input unit and temporarily stops the output when a voice input is made, and a response sentence that is being output according to the input voice. The second to control whether to restart or suspend
And the control means of.

【０００６】また、本発明の実施態様に従えば、前記出
力手段は、音声出力手段及び対話文を表示する表示手段
の両方を含むことが望ましい。これによって、入力文と
応答文の両方が表示されるので、音声出力された内容を
確実に確認することが可能になる。According to an embodiment of the present invention, it is desirable that the output means includes both a voice output means and a display means for displaying a dialogue sentence. As a result, since both the input sentence and the response sentence are displayed, it is possible to surely confirm the content output by voice.

【０００７】また、第２の制御手段は、音声入力手段か
ら入力された音声情報の長さによって制御しても良い。
この結果、音声の長さのみで良いので、制御を簡単にす
ることが可能になる。Further, the second control means may be controlled by the length of the voice information inputted from the voice input means.
As a result, it is possible to simplify the control because only the length of the voice is required.

【０００８】また、第２の制御手段は、音声入力手段か
ら入力された音声情報の認識結果によって制御するよう
にしても良い。この結果、ユーザの意図を察知して、対
応する処理を確実に行うことが可能になる。Further, the second control means may be controlled by the recognition result of the voice information inputted from the voice input means. As a result, it becomes possible to detect the user's intention and reliably perform the corresponding processing.

【０００９】また、第２の制御手段が、応答文の出力再
開を行うことを決定した場合、中断された位置より遡る
方向に沿って区切り位置をサーチし、当該サーチした位
置から出力を再開することが望ましい。特に、その区切
り位置は、一文節、一アクセント、一呼吸段落、或いは
文頭が含まれるようにすることで、出力が途中であって
も、その出力文を正しく、且つ、自然な状態で出力する
ことが可能になる。Further, when the second control means decides to restart the output of the response sentence, the delimiter position is searched along the direction going back from the interrupted position, and the output is restarted from the searched position. Is desirable. In particular, the delimiter position includes one phrase, one accent, one breath paragraph, or the beginning of a sentence, so that the output sentence is output correctly and in a natural state even if the output is in the middle. It will be possible.

【００１０】[0010]

【実施例】以下、添付図面に従って本発明に係る実姉例
を詳細に説明する。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS An example of an actual sister according to the present invention will be described in detail below with reference to the accompanying drawings.

【００１１】図２は、本実施例の音声対話装置の構成を
示すブロック図である。この装置は、マイク１、Ａ／Ｄ
変換部、音声認識部３、対話管理部４、会話内容等を表
示する表示部５、音声合成部６、Ｄ／Ａ変換部７、そし
てスピーカ８から構成されている。また、ＣＰＵ９は、
その内部の主メモリに格納されたプログラムに従って上
記各構成要素を制御する。FIG. 2 is a block diagram showing the configuration of the voice interactive apparatus of this embodiment. This device is a microphone 1, A / D
It is composed of a conversion unit, a voice recognition unit 3, a dialogue management unit 4, a display unit 5 for displaying conversation contents, a voice synthesis unit 6, a D / A conversion unit 7, and a speaker 8. Further, the CPU 9
Each of the above components is controlled according to a program stored in the main memory therein.

【００１２】ＣＰＵ９の処理手順としては、図１の通り
であり、音声合成出力開始ステップＳ１１と、音声入力
が存在するかどうかを判定するステップＳ１２と、音声
合成出力を継続するステップＳ１３と、音声入力が終了
したかどうかを判定するステップＳ１４と、音声合成出
力を一時中断するステップＳ１５と、音声入力存続期間
がある閾値より大きいか小さいかを判定するステップＳ
１６と、音声合成出力を終了するステップＳ１７とから
なる。The processing procedure of the CPU 9 is as shown in FIG. 1. A voice synthesis output start step S11, a step S12 for determining whether or not a voice input exists, a step S13 for continuing the voice synthesis output, and a voice Step S14 for determining whether the input is completed, Step S15 for temporarily suspending the voice synthesis output, and Step S for determining whether the voice input duration is larger or smaller than a certain threshold value.
16 and step S17 for ending the voice synthesis output.

【００１３】次に、このように構成された本実施例の音
声対話方式の動作について、図１および図４を参照しな
がら説明する。Next, the operation of the voice dialogue system of the present embodiment thus constructed will be described with reference to FIGS. 1 and 4.

【００１４】まず、ユーザが図４の手順（１）のように
発生したとする。すると、ステップＳ１１によって図４
の手順（２）のように音声合成出力が開始される。音声
合成出力中にも音声認識部３は入力を受け付け、その結
果は随時対話管理部４に送られており、対話管理部４は
ステップＳ１２によって音声入力が存在するかどうかを
判定し、入力が存在しなければステップＳ１３によって
音声合成出力を継続し、ステップＳ１２に戻る。ステッ
プ１４によって音声入力が終了したと判定された場合
は、ステップ１６に移る。ステップ１６によって音声入
力存続期間がある閾値Ｔｈより小さいと判定された場合
には、ステップ１３によって音声合成出力を継続し、ス
テップ１２に戻る。ステップ１６によって音声入力存続
期間がある閾値Ｔｈより大きいと判定された場合には、
ステップ１７に移って音声合成出力を終了する。First, it is assumed that the user occurs as in the procedure (1) in FIG. Then, in step S11, as shown in FIG.
The voice synthesis output is started as in the procedure (2). The voice recognition unit 3 accepts the input even during the voice synthesis output, and the result is sent to the dialogue management unit 4 at any time. The dialogue management unit 4 determines in step S12 whether the voice input is present, and the input is If it does not exist, the voice synthesis output is continued in step S13, and the process returns to step S12. If it is determined in step 14 that the voice input is completed, the process proceeds to step 16. When it is determined in step 16 that the voice input duration is smaller than the certain threshold Th, the voice synthesis output is continued in step 13 and the process returns to step 12. If it is determined in step 16 that the voice input duration is greater than a certain threshold Th,
Moving to step 17, the voice synthesis output is ended.

【００１５】いま、図４の手順（２）のようにシステム
が「神戸市立須磨海浜水族館の説明は、２４０００平方
ｍの敷」まで音声合成出力した段階で、図４の手順
（３）のようにユーザが「えっ」と発生したとする。す
ると、ステップＳ１２によって音声入力が存在すると判
定され、ステップＳ１４とステップＳ１５とによって音
声合成出力が一時中断される。Now, as shown in step (2) of FIG. 4, when the system performs voice synthesis output up to “Explanation of Kobe City Suma Seaside Aquarium is 24000 square meters of floor space”, as shown in step (3) of FIG. It is assumed that the user has generated "Eh". Then, it is determined in step S12 that there is a voice input, and voice synthesis output is temporarily suspended in steps S14 and S15.

【００１６】ここでは、ステップ１６によって「えっ」
の存続期間がＴｈより小さいと判定されたとする。する
と、ステップＳ１７によって、図４の手順（４）のよう
に「地に水族園本館、ラッコ館など…」と音声合成出力
が継続される。さらに、図４手順（４）のようにシステ
ムが「…の大群が泳ぎ、ラッ」まで音声合成出力した時
点で、図４の手順（５）のようにユーザが「ありがと
う」と発生したとする。すると、ステップ１２によって
音声入力が存在すると判定され、ステップＳ１４とステ
ップＳ１５とによって音声合成出力が一時中断される。
ここではステップ１６によって「ありがとう」の存続期
間がＴｈより大きいと判定されたとする。すると、ステ
ップ１７に移って音声合成出力を終了する。以上の対話
状況は図４のように、表示部５の画面上に表示される。Here, in step 16, "huh"
It is assumed that it is determined that the lifetime of is less than Th. Then, in step S17, the voice synthesis output is continued as in the procedure (4) of FIG. 4, "Aquarium main building, sea otter building ...". Furthermore, it is assumed that the user has generated "Thank you" as in step (5) of FIG. 4 when the system synthesizes and outputs "horde of swimming ... . Then, it is determined in step 12 that the voice input is present, and the voice synthesis output is temporarily suspended in steps S14 and S15.
Here, it is assumed that it is determined in step 16 that the duration of "thank you" is longer than Th. Then, the process proceeds to step 17 to end the voice synthesis output. The above conversation status is displayed on the screen of the display unit 5, as shown in FIG.

【００１７】以上説明したように本実施例によれば、応
答文に基づく音声合成出力中に、ユーザからの発声を検
出したとき、その音声合成出力を一時停止し、その発声
内容（上記実施例では存続期間）に従って、音声合成出
力を継続するか、または、中断するかを決定する。従っ
て、上記の如く、ユーザがそれ以上の情報の必要性がな
いと判断した場合には、「ありがとう」等発声すること
で、それ以降の音声合成を止めさせることが可能にな
り、場合によっては次の問い合わせを即座に行うことも
可能になる。As described above, according to the present embodiment, when the user's utterance is detected during the voice synthesis output based on the response sentence, the voice synthesis output is temporarily stopped and the utterance content (the above embodiment). Then, it is determined whether to continue or interrupt the speech synthesis output according to the duration. Therefore, as described above, when the user determines that there is no need for further information, it is possible to stop subsequent speech synthesis by uttering "Thank you" or the like. It is also possible to make the next inquiry immediately.

【００１８】［第２の実施例の説明］次に、本発明の第
２の実施例について述べる。[Description of Second Embodiment] Next, a second embodiment of the present invention will be described.

【００１９】図５を参照すると、本実施例の音声対話方
式における処理は、音声合成出力開始ステップＳ５１、
音声入力が存在するかどうかを判定するステップＳ５
２、音声合成出力を継続するステップＳ５３、音声入力
が終了したかどうかを判定するステップＳ５４、音声合
成出力を一時中断するステップＳ５５、音声入力存続期
間がある閾値より大きいか小さいかを判定するステップ
Ｓ５６、音声合成出力の継続位置を決定するステップＳ
５７、そして、音声合成出力を終了するステップＳ５８
とからなる。なお、本第２の実施例の上記各ステップは
ＣＰＵ９内の主メモリに予め記憶されているものであ
る。Referring to FIG. 5, the process in the voice interactive system of this embodiment is performed by a voice synthesis output starting step S51,
Step S5 of determining whether or not voice input is present
2. Step S53 of continuing the voice synthesis output, step S54 of determining whether the voice input is finished, step S55 of temporarily interrupting the voice synthesis output, and determining whether the voice input duration is larger or smaller than a certain threshold. S56, step S of determining the continuation position of the voice synthesis output
57, and step S58 for terminating the voice synthesis output.
Consists of The above steps of the second embodiment are stored in advance in the main memory of the CPU 9.

【００２０】次に、このように構成された本実施例の音
声対話方式の動作について、図５および図６を参照しな
がら説明する。Next, the operation of the voice interactive system of the present embodiment thus constructed will be described with reference to FIGS. 5 and 6.

【００２１】まず、ユーザが図６の手順（１）のように
発声したとする。これに対してシステムが図６の手順
（２）のように「神戸市立須磨海浜水族館の説明は、２
４０００平方ｍの敷」まで音声合成出力した段階で、図
６の手順（３）のようにユーザが「えっ」と発声し、そ
の存在期間がＴｈより小さいと判定されたとする。する
と、ステップＳ５７によって音声合成出力の再開位置が
中断された文節の一文節前の文節の先頭位置に設定され
る（但し、再開位置は一アクセント句前または一呼気段
落前など、区切りの良い位置ならばどこでもよい）。First, it is assumed that the user utters as in step (1) of FIG. On the other hand, the system is like the procedure (2) in Fig. 6, "The explanation of Kobe City Suma Sea Aquarium is 2
It is assumed that at the stage of voice synthesis output up to 4000 square meters, the user utters “um” as in step (3) of FIG. 6, and it is determined that the existence period is smaller than Th. Then, in step S57, the restart position of the speech synthesis output is set to the beginning position of the phrase one phrase before the interrupted phrase (however, the restart position is a position with a good break, such as one accent phrase before or one breath paragraph before). If so any).

【００２２】次に、ステップ５３に移って図６の手順
（４）のようにシステムが「２４０００平方ｍの敷地に
水族園本館…」と音声合成出力を継続する。さらに、図
６の手順（４）のようにシステムが「…の大群が泳ぎ」
まで音声合成出力した時点で、図６（５）のようにユー
ザが「ありがとう」と発生したとする。するとステップ
５２によって音声入力が存在すると判定され、ステップ
Ｓ５４とステップＳ５５によって音声合成出力が一時中
断される。ここでは、ステップＳ５６によって「ありが
とう」の存続時間がＴｈより大きいと判断されたとす
る。すると、ステップＳ５８に移って音声合成出力を終
了する。以上の対話状況は図６のように表示部５の画面
上に表示される。Next, in step 53, the system continues the speech synthesis output as "Procedure of the aquarium on the site of 24,000 square meters ..." As in step (4) of FIG. In addition, as shown in step (4) in Fig. 6, the system says "The horde of ... swims."
It is assumed that the user has generated "Thank you" as shown in FIG. 6 (5) at the time of performing voice synthesis output up to. Then, it is determined in step 52 that the voice input is present, and the voice synthesis output is temporarily interrupted in steps S54 and S55. Here, it is assumed that it is determined in step S56 that the duration of "thank you" is longer than Th. Then, the process proceeds to step S58 to end the voice synthesis output. The above conversation status is displayed on the screen of the display unit 5 as shown in FIG.

【００２３】以上の結果、本第２の実施例によれば、音
声合成出力中に、一時的にそれを中断した場合には、そ
の中断位置からではなく、その直前の文節や、アクセン
ト等聞き取り易い位置から再開するので、上記第１の実
施例と比較し、ユーザフレンドリィになる。As a result of the above, according to the second embodiment, when the voice synthesis output is temporarily interrupted, the phrase immediately before the interruption or the accent or the like is heard not from the interruption position. Since the operation is restarted from the easy position, it becomes user-friendly as compared with the first embodiment.

【００２４】［第３の実施例の説明］次に、本発明の第
３の実施例について述べる。[Description of Third Embodiment] Next, a third embodiment of the present invention will be described.

【００２５】図７を参照すると、本実施例の音声対話方
式における処理は、音声合成出力開始ステップＳ７１、
音声入力が存在するかどうかを判定するステップＳ７
２、音声合成出力を継続するステップＳ７３、音声入力
が終了したかどうかを判定するステップＳ７４、音声合
成出力を一時中断するステップＳ７５、音声認識結果が
肯定的か否定的かを判定するステップＳ７６、音声合成
出力の継続位置を決定するステップＳ７７、そして、音
声合成出力を終了するステップＳ７８とからなる。Referring to FIG. 7, the process in the voice interactive system of this embodiment is performed by a voice synthesis output starting step S71.
Step S7 of determining whether or not voice input is present
2. Step S73 of continuing voice synthesis output, step S74 of determining whether voice input has ended, step S75 of temporarily suspending voice synthesis output, step S76 of determining whether the voice recognition result is positive or negative, Step S77 of deciding the continuous position of the voice synthesis output and step S78 of terminating the voice synthesis output.

【００２６】次に、このように構成された本実施例の音
声対話方式の動作について、図７および図８を参照しな
がら説明する。Next, the operation of the voice dialogue system of the present embodiment thus constructed will be described with reference to FIGS. 7 and 8.

【００２７】まず、ユーザが図８の手順（１）のように
発生したとする。これに対してシステムが図８の手順
（２）のように「神戸市立…など７館が点在」と応答し
たとする。この時点でユーザが図８の手順（３）のよう
に「はい」と応じると、ステップ７２によって音声入力
が存在すると判定され、ステップ７５によって音声合成
出力が一時中断し、ステップ７６によって音声入力「は
い」の認識結果が肯定的か否定的かを判定する。ここで
は「はい」「いいえ」「わかりました」等は肯定的であ
るとみなして、ステップ７７に移って音声合成出力の継
続位置が決定される。ここでは、文の終了時点なので区
切りが良いとみなし、ステップ７３に移って音声合成出
力が継続され、図８の手順（４）のように「間口２５ｍ
…の大群が泳ぎ、ラッコの」と音声合成出力が続けられ
る。この時点でユーザが図８の手順（５）のように「終
了してください」と肯定的でない発生をするとステップ
７６によって認識結果が否定的であると判定されて、ス
テップ７８に移り音声合成出力を終了する。以上の対話
状況は図８のように、表示部５の画面上に表示される。First, it is assumed that the user occurs as in the procedure (1) of FIG. It is assumed that the system responds to this in response to "Seven buildings such as Kobe City ... scattered" as in step (2) of FIG. At this point, if the user answers "Yes" as in step (3) of FIG. 8, it is determined in step 72 that the voice input is present, the voice synthesis output is temporarily suspended in step 75, and the voice input "in step 76". Whether the recognition result of "Yes" is positive or negative is determined. Here, "Yes", "No", "Understood", etc. are regarded as affirmative, and the process proceeds to step 77, where the continuous position of the voice synthesis output is determined. Here, since it is at the end of the sentence, it is considered that the division is good, the process proceeds to step 73, and the voice synthesis output is continued, and as in step (4) of FIG.
A horde of ... swims and is a sea otter. ”The voice synthesis output continues. At this point, if the user makes a non-affirmative occurrence of "please end" as in step (5) of FIG. 8, it is determined in step 76 that the recognition result is negative, and the flow proceeds to step 78 to synthesize voice output. To finish. The above conversation status is displayed on the screen of the display unit 5, as shown in FIG.

【００２８】以上説明したように本実施例によれば、随
時入力を受け付ける音声認識手段と、入力に対する応答
を音声で出力する音声合成手段と、入力／出力を管理す
る対話管理手段と、生成する応答文の音声合成による応
答をユーザの入力に応じてコントロールする手段を設け
ることにより、ユーザとシステムとの対話の自然性が改
善される。As described above, according to this embodiment, a voice recognition means for receiving an input at any time, a voice synthesizing means for outputting a response to the input by voice, and a dialogue managing means for managing input / output are generated. The naturalness of the dialogue between the user and the system is improved by providing the means for controlling the response by the voice synthesis of the response sentence according to the user's input.

【００２９】尚、本発明は、複数の機器から構成される
システムに適用しても、１つの機器から成る装置に適用
しても良い。また、本発明はシステム或は装置にプログ
ラムを供給することによって達成される場合にも適用で
きることは言うまでもない。The present invention may be applied to a system including a plurality of devices or an apparatus including a single device. Further, it goes without saying that the present invention can be applied to the case where it is achieved by supplying a program to a system or an apparatus.

【００３０】[0030]

【発明の効果】以上説明したように本発明によれば、音
声入力による対話を自然な状態にさせると共に、尚且
つ、ユーザにとっては必要な情報を迅速に得ることが可
能になる。As described above, according to the present invention, it is possible to make a dialogue by voice input a natural state and to obtain necessary information quickly for the user.

【００３１】[0031]

[Brief description of drawings]

【図１】本発明の第１の実施例の音声対話方式の処理を
示す流れ図である。FIG. 1 is a flowchart showing a process of a voice interaction system according to a first embodiment of this invention.

【図２】本実施例の音声対話方式が適用される音声対話
装置の構成の一例を示すブロック図である。FIG. 2 is a block diagram showing an example of the configuration of a voice dialog device to which the voice dialog system of the present embodiment is applied.

【図３】従来の音声対話方式の動作例を示す図である。FIG. 3 is a diagram showing an operation example of a conventional voice interaction method.

【図４】本発明の第１の実施例の動作例を示す図であ
る。FIG. 4 is a diagram showing an operation example of the first exemplary embodiment of the present invention.

【図５】本発明の第２の実施例の処理を示す流れ図であ
る。FIG. 5 is a flow chart showing processing of the second exemplary embodiment of the present invention.

【図６】本発明の第２の実施例の動作例を示す図であ
る。FIG. 6 is a diagram showing an operation example of the second exemplary embodiment of the present invention.

【図７】本発明の第３の実施例の処理を示す流れ図であ
る。FIG. 7 is a flow chart showing the processing of the third exemplary embodiment of the present invention.

【図８】本発明の第３の実施例の動作例を示す図であ
る。FIG. 8 is a diagram showing an operation example of the third exemplary embodiment of the present invention.

[Explanation of symbols]

１マイク２Ａ／Ｄ変換部３音声認識部４対話管理部５表示部６音声合成部７Ｄ／Ａ変換部８スピーカ 1 Microphone 2 A / D Converter 3 Voice Recognition 4 Dialog Management 5 Display 6 Voice Synthesizer 7 D / A Converter 8 Speaker

Claims

[Claims]

1. An information processing apparatus comprising a voice input means, a voice recognition means, and an output means for responding to the voice input means, wherein the voice input from the voice input means is recognized and used for voice input. A sentence generation unit that generates a corresponding sentence; an output control unit that configures an output sentence corresponding to the sentence generated by the sentence generation unit and controls the output unit to output the sentence to the outside; During the output, the input from the voice input means is monitored, and when there is a voice input, the first control means that temporarily stops the output and the output is in progress according to the input voice. An information processing apparatus comprising: a second control unit that controls whether to restart or suspend the response sentence.

2. The information processing apparatus according to claim 1, wherein the output unit includes both a voice output unit and a display unit for displaying a dialogue sentence.

3. The information processing apparatus according to claim 1, wherein the second control unit controls according to a length of voice information input from the voice input unit.

4. The information processing apparatus according to claim 1, wherein the second control unit controls according to a recognition result of voice information input from the voice input unit.

5. The second control means, when determining to restart the output of the response sentence, searches for a delimiter position along a direction going back from the interrupted position, and restarts the output from the searched position. The information processing apparatus according to claim 1, wherein the information processing apparatus comprises:

6. The information processing apparatus according to claim 1, wherein the delimiter position includes one phrase, one accent, one breath paragraph, or the beginning of a sentence.

7. A method of controlling an information processing apparatus, comprising: a voice input unit, a voice recognition unit, and an output unit that responds to the voice input unit, wherein a voice input from the voice input unit is recognized, A sentence generation step of generating a sentence corresponding to a voice input; an output control step of forming an output sentence corresponding to the sentence generated in the sentence generation step and controlling the output means to output to the outside; During the output in the control step, the input from the voice input means is monitored, and when there is a voice input, the first control step of temporarily stopping the output, and the middle of the output depending on the input voice And a second control step for controlling whether to restart or suspend the response sentence that was described above.

8. The method according to claim 7, wherein the output means includes both a voice output means and a display means for displaying a dialogue sentence.

9. The method of controlling an information processing apparatus according to claim 7, wherein the second control step is controlled by the length of the voice information input from the voice input means.

10. The control method of an information processing apparatus according to claim 7, wherein the second control step is controlled by a recognition result of voice information input from a voice input unit.

11. When the second control step determines to restart the output of the response sentence, the delimiter position is searched along a direction going back from the interrupted position, and the output is restarted from the searched position. The method for controlling an information processing device according to claim 7, wherein

12. The control method of the information processing apparatus according to claim 7, wherein the delimiter position includes one phrase, one accent, one breath paragraph, or the beginning of a sentence.