JP2024087465A

JP2024087465A - Voice processing device

Info

Publication number: JP2024087465A
Application number: JP2022202302A
Authority: JP
Inventors: 岳志阿久津; Takashi Akutsu
Original assignee: Subaru Corp
Current assignee: Subaru Corp
Priority date: 2022-12-19
Filing date: 2022-12-19
Publication date: 2024-07-01

Abstract

To provide a voice processing device which can appropriately perform processing corresponding to voice uttered by a user.SOLUTION: An outside-vehicle environment detection system includes: a voice analysis unit for detecting a predetermined voice command included in voice data based on voice data; and a control signal generation unit for generating a control signal instructing processing which the voice command indicates based on the voice command detected by the voice analysis unit. The voice command includes: a first portion; a second portion which is arranged after the first portion and contains a verb; and a third portion which is arranged after the second portion and is the same as the first portion.SELECTED DRAWING: Figure 4

Description

本開示は、音声に基づいて処理を行う音声処理装置に関する。 This disclosure relates to a voice processing device that performs processing based on voice.

近年、ユーザが発した音声を認識する音声認識技術がしばしば用いられる。例えば、特許文献１には、ユーザが発した音声を認識する音声認識装置が開示されている。 In recent years, voice recognition technology that recognizes voices uttered by a user is often used. For example, Patent Document 1 discloses a voice recognition device that recognizes voices uttered by a user.

特開２０１８－０４５１２７号公報JP 2018-045127 A

ユーザが発した音声に基づいて処理を行う装置では、ユーザが発した音声に応じた処理を適切に行うことが望まれている。 In a device that performs processing based on the voice uttered by a user, it is desirable to perform processing appropriately according to the voice uttered by the user.

ユーザが発した音声に応じた処理を適切に行うことができる音声処理装置を提供することが望ましい。 It is desirable to provide a voice processing device that can appropriately process voice uttered by a user.

本開示の一実施の形態に係る音声処理装置は、音声解析部と、制御信号生成部とを備えている。音声解析部は、音声データに基づいて、音声データに含まれる、予め定められた音声コマンドを検出するものである。制御信号生成部は、音声解析部により検出された音声コマンドに基づいて、その音声コマンドが示す処理を指示する制御信号を生成するものである。音声コマンドは、第１の部分と、第１の部分よりも後に配置され、動詞を含む第２の部分と、第２の部分よりも後に配置され、第１の部分と同じである第３の部分とを含む。 A voice processing device according to an embodiment of the present disclosure includes a voice analysis unit and a control signal generation unit. The voice analysis unit detects a predetermined voice command included in the voice data based on the voice data. The control signal generation unit generates a control signal that instructs a process indicated by the voice command detected by the voice analysis unit based on the voice command. The voice command includes a first portion, a second portion that is located after the first portion and includes a verb, and a third portion that is located after the second portion and is the same as the first portion.

本開示の一実施の形態に係る音声処理装置によれば、ユーザが発した音声に応じた処理を適切に行うことができる。 According to an embodiment of the voice processing device of the present disclosure, it is possible to appropriately perform processing according to the voice uttered by the user.

本開示の一実施の形態に係る音声処理システムの一構成例を表す構成図である。1 is a configuration diagram illustrating an example of a configuration of a voice processing system according to an embodiment of the present disclosure. 音声コマンドの一例を表す説明図である。FIG. 2 is an explanatory diagram illustrating an example of a voice command. 音声コマンドの他の一例を表す説明図である。FIG. 11 is an explanatory diagram illustrating another example of a voice command. 音声コマンドの他の一例を表す説明図である。FIG. 11 is an explanatory diagram illustrating another example of a voice command.

以下、本開示の実施の形態について、図面を参照して詳細に説明する。 The following describes in detail the embodiments of the present disclosure with reference to the drawings.

＜実施の形態＞
［構成例］
図１は、一実施の形態に係る音声処理装置を備えた音声処理システム１の一構成例を表すものである。この音声処理システム１は、この例では、車両１０に適用されている。車両１０は、自動車などの車両である。車両１０は、ユーザインタフェース１１と、マイクロフォン１２と、処理部２０と、通信部１３と、ナビゲーション処理部１４と、ヘッドランプ制御部１５と、ドアロック制御部１６とを備えている。 <Embodiment>
[Configuration example]
1 shows an example of a configuration of a voice processing system 1 including a voice processing device according to an embodiment. In this example, the voice processing system 1 is applied to a vehicle 10. The vehicle 10 is a vehicle such as an automobile. The vehicle 10 includes a user interface 11, a microphone 12, a processing unit 20, a communication unit 13, a navigation processing unit 14, a headlamp control unit 15, and a door lock control unit 16.

ユーザインタフェース１１は、例えば、表示パネル、タッチパネル、各種ボタンなどを含み、車両１０の乗員の操作を受け付けるとともに、様々な情報を車両１０の乗員に提示するように構成される。 The user interface 11 includes, for example, a display panel, a touch panel, various buttons, etc., and is configured to accept operations by the occupant of the vehicle 10 and present various information to the occupant of the vehicle 10.

マイクロフォン１２は、車両１０の乗員が発した音声に応じた音波を電気信号（音声信号）に変換するように構成される。 The microphone 12 is configured to convert sound waves corresponding to the voice emitted by the occupant of the vehicle 10 into an electrical signal (audio signal).

処理部２０は、例えば１または複数のプロセッサおよび１または複数のメモリを含んで構成される。処理部２０は、音声解析部２１と、制御信号生成部２２とを有している。 The processing unit 20 is configured to include, for example, one or more processors and one or more memories. The processing unit 20 has a voice analysis unit 21 and a control signal generation unit 22.

音声解析部２１は、マイクロフォン１２から供給された音声信号に基づいて、音声を解析するように構成される。具体的には、音声解析部２１は、まず、マイクロフォン１２から供給された音声信号に基づいて、所定のサンプリングレートでＡＤ（Analog to Digital）変換を行うことにより、音声データを生成する。そして、音声解析部２１は、この音声データに基づいてフーリエ変換を行うことにより、スペクトルデータを生成する。具体的には、音声解析部２１は、所定の時間間隔で、スペクトルデータを順次生成することにより、一連のスペクトルデータを生成する。音声解析部２１は、一連のスペクトルデータのそれぞれに基づいて、音声データに含まれる文字や単語を抽出することにより、その音声データが示す文章を含む文章データを生成する。そして、音声解析部２１は、この文章データに基づいて、車両１０の乗員が指示する処理の内容を示す音声コマンドを検出する。音声処理システム１において使用可能な音声コマンドは、予め定められている。なお、この例では、音声解析部２１は、文章データに基づいて音声コマンドを検出したが、これに限定されるものではなく、これに代えて、例えば、音声データに基づいて音声コマンドを検出してもよいし、スペクトルデータに基づいて音声コマンドを検出してもよい。そして、音声解析部２１は、音声コマンドの検出結果を制御信号生成部２２に供給するようになっている。 The voice analysis unit 21 is configured to analyze voice based on the voice signal supplied from the microphone 12. Specifically, the voice analysis unit 21 first generates voice data by performing AD (Analog to Digital) conversion at a predetermined sampling rate based on the voice signal supplied from the microphone 12. Then, the voice analysis unit 21 generates spectrum data by performing Fourier transform based on this voice data. Specifically, the voice analysis unit 21 generates a series of spectrum data by sequentially generating spectrum data at a predetermined time interval. The voice analysis unit 21 generates sentence data including sentences indicated by the voice data by extracting characters and words contained in the voice data based on each of the series of spectrum data. Then, the voice analysis unit 21 detects a voice command indicating the content of the process instructed by the occupant of the vehicle 10 based on the sentence data. The voice commands available in the voice processing system 1 are predetermined. In this example, the voice analysis unit 21 detects the voice command based on the text data, but this is not limited to this. Instead, for example, the voice command may be detected based on the voice data, or the voice command may be detected based on the spectrum data. The voice analysis unit 21 then supplies the detection result of the voice command to the control signal generation unit 22.

制御信号生成部２２は、音声解析部２１における音声コマンドの検出結果に基づいて、音声コマンドが示す処理に応じた制御信号を生成するように構成される。そして、処理部２０は、この制御信号を、通信部１３、ナビゲーション処理部１４、ヘッドランプ制御部１５、およびドアロック制御部１６に供給するようになっている。 The control signal generating unit 22 is configured to generate a control signal according to the processing indicated by the voice command based on the detection result of the voice command by the voice analyzing unit 21. The processing unit 20 then supplies this control signal to the communication unit 13, the navigation processing unit 14, the headlamp control unit 15, and the door lock control unit 16.

通信部１３は、例えば無線ＬＡＮ（Local Area Network）やブルートゥース（登録商標）などの無線通信を行うことにより、外部機器と通信を行うように構成される。この例では、通信部１３は、車両１０の乗員のスマートフォン１００と通信を行うことができるようになっている。 The communication unit 13 is configured to communicate with external devices by performing wireless communication, such as a wireless LAN (Local Area Network) or Bluetooth (registered trademark). In this example, the communication unit 13 is configured to communicate with a smartphone 100 of an occupant of the vehicle 10.

ナビゲーション処理部１４は、車両１０が走行すべき目的地までのルート（予定走行ルート）を決定するとともに、車両１０の乗員に予定走行ルートのルート情報を提供することにより、決定した予定走行ルートに沿って車両１０を誘導するように構成される。ナビゲーション処理部１４は、ＧＰＳ（Global Positioning System）などのＧＮＳＳ（Global Navigation Satellite System）を用いて、地上での車両１０の位置を取得し、道路地図についての情報を含む地図情報データベースを用いて、車両１０の予定走行ルートを決定する。ナビゲーション処理部１４は、例えば、地図情報データベースを記憶し、記憶された地図情報データベースを用いて予定走行ルートを決定してもよいし、例えば地図情報データベースが記憶されたネットワークサーバと通信を行うことにより予定走行ルートを決定してもよい。ナビゲーション処理部１４は、例えば、車両１０の乗員がユーザインタフェース１１を操作することにより入力した目的地についての情報に基づいて目的地までの予定走行ルートを決定し、決定した予定走行ルートについてのルート情報を、このユーザインタフェース１１を用いて乗員に提供するようになっている。 The navigation processing unit 14 is configured to determine a route (planned driving route) to a destination along which the vehicle 10 should travel, and to provide route information of the planned driving route to the occupant of the vehicle 10, thereby guiding the vehicle 10 along the determined planned driving route. The navigation processing unit 14 acquires the position of the vehicle 10 on the ground using a global navigation satellite system (GNSS) such as a global positioning system (GPS), and determines the planned driving route of the vehicle 10 using a map information database including information on road maps. The navigation processing unit 14 may, for example, store a map information database and determine the planned driving route using the stored map information database, or may, for example, determine the planned driving route by communicating with a network server in which the map information database is stored. The navigation processing unit 14 is configured to, for example, determine a planned driving route to a destination based on information about the destination input by the occupant of the vehicle 10 by operating the user interface 11, and provide route information about the determined planned driving route to the occupant using the user interface 11.

ヘッドランプ制御部１５は、車両１０の前方に光を照射するヘッドランプの点灯動作および消灯動作を制御するように構成される。 The headlamp control unit 15 is configured to control the on and off operations of the headlamp that emits light ahead of the vehicle 10.

ドアロック制御部１６は、乗員が乗降する車両１０のドアの施錠動作および開錠動作を制御するように構成される。 The door lock control unit 16 is configured to control the locking and unlocking of the doors of the vehicle 10 when passengers get in and out of the vehicle.

スマートフォン１００は、この例では車両１０の乗員が所有する高機能携帯電話である。このスマートフォン１００には、例えば電話帳データを含む様々なデータが記憶されている。スマートフォン１００は、車両１０の通信部１３との間で通信を行うことができるようになっている。 In this example, the smartphone 100 is a high-function mobile phone owned by a passenger of the vehicle 10. The smartphone 100 stores various data including, for example, phone book data. The smartphone 100 is capable of communicating with the communication unit 13 of the vehicle 10.

この構成により、音声処理システム１では、処理部２０は、マイクロフォン１２から供給された音声信号に基づいて、音声コマンドを検出することにより、車両１の乗員の音声指示を把握し、制御信号を生成する。そして、通信部１３、ナビゲーション処理部１４、ヘッドランプ制御部１５、およびドアロック制御部１６は、この制御信号に基づいて、乗員の音声指示に応じた処理を行うようになっている。 In this configuration, in the voice processing system 1, the processing unit 20 detects voice commands based on the voice signal supplied from the microphone 12, thereby grasping the voice instructions of the occupant of the vehicle 1, and generates a control signal. Then, the communication unit 13, navigation processing unit 14, headlamp control unit 15, and door lock control unit 16 perform processing according to the voice instructions of the occupant based on this control signal.

ここで、処理部２０は、本開示の一実施の形態における「音声処理装置」の一具体例に対応する。音声解析部２１は、本開示の一実施の形態における「音声解析部」の一具体例に対応する。制御信号生成部２２は、本開示の一実施の形態における「制御信号生成部」の一具体例に対応する。車両１０は、本開示の一実施の形態における「車両」の一具体例に対応する。 Here, the processing unit 20 corresponds to a specific example of a "voice processing device" in one embodiment of the present disclosure. The voice analysis unit 21 corresponds to a specific example of a "voice analysis unit" in one embodiment of the present disclosure. The control signal generation unit 22 corresponds to a specific example of a "control signal generation unit" in one embodiment of the present disclosure. The vehicle 10 corresponds to a specific example of a "vehicle" in one embodiment of the present disclosure.

［動作および作用］
続いて、本実施の形態の音声処理システム１の動作および作用について説明する。 [Actions and Functions]
Next, the operation and function of the voice processing system 1 of the present embodiment will be described.

（全体動作概要）
まず、図１～３を参照して、音声処理システム１の動作を説明する。ユーザインタフェース１１は、車両１０の乗員の操作を受け付けるとともに、様々な情報を車両１０の乗員に提示する。マイクロフォン１２は、車両１０の乗員が発した音声に応じた音波を電気信号（音声信号）に変換する。処理部２０の音声解析部２１は、マイクロフォン１２から供給された音声信号に基づいて、音声コマンドを検出する。制御信号生成部２２は、音声解析部２１における音声コマンドの検出結果に基づいて、音声コマンドが示す処理に応じた制御信号を生成する。通信部１３は、車両１０の乗員のスマートフォン１００と通信を行う。ナビゲーション処理部１４は、車両１０が走行すべき目的地までのルート（予定走行ルート）を決定するとともに、車両１０の乗員に予定走行ルートのルート情報を提供することにより、決定した予定走行ルートに沿って車両１０を誘導する。ヘッドランプ制御部１５は、車両１０の前方に光を照射するヘッドランプの点灯動作および消灯動作を制御する。ドアロック制御部１６は、乗員が乗降する車両１０のドアの施錠動作および開錠動作を制御する。 (Overall operation overview)
First, the operation of the voice processing system 1 will be described with reference to FIGS. 1 to 3. The user interface 11 accepts operations by the occupant of the vehicle 10 and presents various information to the occupant of the vehicle 10. The microphone 12 converts sound waves corresponding to the voice uttered by the occupant of the vehicle 10 into an electric signal (voice signal). The voice analysis unit 21 of the processing unit 20 detects a voice command based on the voice signal supplied from the microphone 12. The control signal generation unit 22 generates a control signal corresponding to the process indicated by the voice command based on the detection result of the voice command by the voice analysis unit 21. The communication unit 13 communicates with the smartphone 100 of the occupant of the vehicle 10. The navigation processing unit 14 determines a route (planned driving route) to a destination to which the vehicle 10 should travel, and provides route information of the planned driving route to the occupant of the vehicle 10, thereby guiding the vehicle 10 along the determined planned driving route. The headlamp control unit 15 controls the turning on and off of the headlamp that irradiates light ahead of the vehicle 10. The door lock control unit 16 controls the locking and unlocking of the doors of the vehicle 10 when passengers get in and out of the vehicle.

（詳細動作）
図２は、音声コマンドの一例を表すものである。音声コマンドは、車両１０の乗員は、音声コマンドを含む音声を発することにより、この例では、通信部１３、ナビゲーション処理部１４、ヘッドランプ制御部１５、およびドアロック制御部１６に処理を行わせることができる。 (Detailed operation)
2 shows an example of a voice command. In this example, the passenger of the vehicle 10 can cause the communication unit 13, the navigation processing unit 14, the headlamp control unit 15, and the door lock control unit 16 to perform processing by uttering a voice including the voice command.

例えば、図２（Ａ）に示したように、音声データに含まれる文章が「Ａさんに電話して」である場合には、音声解析部２１は、この、Ａさんへの電話を指示する音声コマンドを検出し、制御信号生成部２２は、スマートフォン１００がＡさんに電話を行うように制御する制御信号を生成する。処理部２０は、この制御信号を通信部１３に供給する。通信部１３は、この制御信号をスマートフォン１００に送信する。スマートフォン１００は、この制御信号に基づいて、電話帳データを用いてＡさんの電話番号を特定し、この電話番号を用いてＡさんに電話をかける処理を行う。 For example, as shown in FIG. 2(A), if the sentence included in the voice data is "Call A", the voice analysis unit 21 detects this voice command instructing to call A, and the control signal generation unit 22 generates a control signal that controls the smartphone 100 to call A. The processing unit 20 supplies this control signal to the communication unit 13. The communication unit 13 transmits this control signal to the smartphone 100. Based on this control signal, the smartphone 100 identifies A's phone number using the phone book data, and performs processing to call A using this phone number.

また、例えば、図２（Ｂ）に示したように、音声データに含まれる文章が「Ｂ市に行きたい」である場合には、音声解析部２１は、この、Ｂ市へのルートの提供を指示する音声コマンドを検出し、制御信号生成部２２は、音声解析部２１から供給された情報に基づいて、ナビゲーション処理部１４がＢ市までの予定走行ルートを決定するように制御する制御信号を生成する。処理部２０は、この制御信号をナビゲーション処理部１４に供給する。ナビゲーション処理部１４は、この制御信号に基づいて、Ｂ市までの予定走行ルートを決定し、決定した予定走行ルートをユーザインタフェース１１に表示させる。 Also, for example, as shown in FIG. 2(B), if the sentence included in the voice data is "I want to go to City B," the voice analysis unit 21 detects this voice command instructing the provision of a route to City B, and the control signal generation unit 22 generates a control signal for controlling the navigation processing unit 14 to determine a planned driving route to City B based on the information supplied from the voice analysis unit 21. The processing unit 20 supplies this control signal to the navigation processing unit 14. The navigation processing unit 14 determines the planned driving route to City B based on this control signal, and causes the user interface 11 to display the determined planned driving route.

また、例えば、図２（Ｃ）に示したように、音声データに含まれる文章が「ヘッドランプを消して」である場合には、音声解析部２１は、この、ヘッドランプの消灯を指示する音声コマンドを検出し、制御信号生成部２２は、ヘッドランプ制御部１５がヘッドランプを消灯させるように制御する制御信号を生成する。処理部２０は、この制御信号をヘッドランプ制御部１５に供給する。ヘッドランプ制御部１５は、この制御信号に基づいて、ヘッドランプを消灯するように、ヘッドランプの動作を制御する。 Also, for example, as shown in FIG. 2(C), if the sentence included in the voice data is "Turn off the headlamps," the voice analysis unit 21 detects this voice command instructing to turn off the headlamps, and the control signal generation unit 22 generates a control signal that controls the headlamp control unit 15 to turn off the headlamps. The processing unit 20 supplies this control signal to the headlamp control unit 15. Based on this control signal, the headlamp control unit 15 controls the operation of the headlamps to turn off the headlamps.

また、例えば、図２（Ｄ）に示したように、音声データに含まれる文章が「ドアをロックして」である場合には、音声解析部２１は、この、ドアの施錠を指示する音声コマンドを検出し、制御信号生成部２２は、ドアロック制御部１６がドアをロックするように制御する制御信号を生成する。処理部２０は、この制御信号をドアロック制御部１６に供給する。ドアロック制御部１６は、この制御信号に基づいて、ドアを施錠する。 For example, as shown in FIG. 2(D), if the sentence included in the voice data is "lock the door," the voice analysis unit 21 detects this voice command instructing to lock the door, and the control signal generation unit 22 generates a control signal that controls the door lock control unit 16 to lock the door. The processing unit 20 supplies this control signal to the door lock control unit 16. The door lock control unit 16 locks the door based on this control signal.

ところで、例えば、図２（Ａ）において「Ａさん」の文字数が少ない場合があり得る。また、例えば、図２（Ｂ）において「Ｂ市」の文字数が少ない場合があり得る。音声解析部２１は、このような文字数が少ない単語を認識しにくい可能性がある。また、車両１０では、走行音などのノイズが多いので、音声解析部２１は、このような文字数が少ない単語を認識しにくい可能性がある。 For example, in FIG. 2(A), the number of characters in "Mr. A" may be small. Also, in FIG. 2(B), the number of characters in "City B" may be small. The voice analysis unit 21 may have difficulty recognizing such words with few characters. In addition, since there is a lot of noise in the vehicle 10, such as driving sounds, the voice analysis unit 21 may have difficulty recognizing such words with few characters.

図３は、音声コマンドのより具体的な一例を表すものである。例えば、「ママ」の文字数は２文字であり短く、「ママ」は例えば「電話して」よりも短い。また「ママ」は同じ音の繰り返しである。よって、音声解析部２１は、この「ママ」の部分を認識しにくい可能性がある。また、例えば「津市」の文字数は２文字であり短く、「津市」は「行きたい」よりも短い。よって、音声解析部２１は、この「津市」の部分を認識しにくい可能性がある。 Figure 3 shows a more specific example of a voice command. For example, "mama" is short, having only two characters, and is shorter than, for example, "call me." Also, "mama" is a repetition of the same sound. Therefore, it may be difficult for the voice analysis unit 21 to recognize this "mama" portion. Also, for example, "tsu-shi" is short, having only two characters, and is shorter than "want to go." Therefore, it may be difficult for the voice analysis unit 21 to recognize this "tsu-shi" portion.

そこで、音声解析部２１は、このような文字数が少ない単語を認識しやすくするために、以下に示すような音声コマンドをも解析することができる。 Therefore, in order to make it easier to recognize words with such a small number of characters, the voice analysis unit 21 can also analyze voice commands such as those shown below.

図４は、音声コマンドの他の一例を表すものである。これらの音声コマンドは、３つの部分Ｐ１～Ｐ３を含む。部分Ｐ１，Ｐ２からなる文章は、図３の文章と同じである。部分Ｐ３は、部分Ｐ１と同じである。例えば、部分Ｐ２は、動詞を含む。そして、部分Ｐ１，Ｐ３は、その動詞の目的語を含む。この目的語は、例えば、普通名詞、固有名詞、代名詞、略称などがあり得る。 Figure 4 shows another example of a voice command. These voice commands include three parts P1 to P3. The sentence consisting of parts P1 and P2 is the same as the sentence in Figure 3. Part P3 is the same as part P1. For example, part P2 includes a verb. And parts P1 and P3 include the object of the verb. This object can be, for example, a common noun, a proper noun, a pronoun, an abbreviation, etc.

例えば、図４（Ａ）に示したように、音声データに含まれる文章が「ママに電話して、ママに」である場合には、音声解析部２１は、部分Ｐ１（“ママに”）と部分Ｐ３（“ママに”）の類似性が高いことを検出する。乗員は、このように「ママに」の部分を繰り返して発している。よって、解析される音素が増えるので、音声解析部２１は、この「ママに」を認識することができる。そして、制御信号生成部２２は、スマートフォン１００がママに電話を行うように制御する制御信号を生成する。処理部２０は、この制御信号を通信部１３に供給する。通信部１３は、この制御信号をスマートフォン１００に送信する。スマートフォン１００は、この制御信号に基づいて、電話帳データを用いて電話番号を特定し、この電話番号を用いて電話をかける処理を行う。 For example, as shown in FIG. 4A, if the sentence included in the voice data is "Call Mama, Mama," the voice analysis unit 21 detects that there is a high similarity between part P1 ("Mama") and part P3 ("Mama"). The passenger repeatedly utters the "Mama" part in this way. As a result, the number of phonemes to be analyzed increases, and the voice analysis unit 21 can recognize this "Mama." The control signal generation unit 22 then generates a control signal that controls the smartphone 100 to call Mama. The processing unit 20 supplies this control signal to the communication unit 13. The communication unit 13 transmits this control signal to the smartphone 100. Based on this control signal, the smartphone 100 identifies a telephone number using the telephone directory data, and performs processing to make a call using this telephone number.

例えば、図４（Ｂ）に示したように、音声データに含まれる文章が「津市に行きたい、三重県津市に」である場合には、音声解析部２１は、部分Ｐ１（“津市に”）と部分Ｐ３（“津市に”）の類似性が高いことを検出する。乗員は、このように「津市に」の部分を繰り返して発している。よって、解析される音素が増えるので、音声解析部２１は、「津市に」を認識することができる。そして、制御信号生成部２２は、音声解析部２１から供給された情報に基づいて、ナビゲーション処理部１４が三重県津市までの予定走行ルートを決定するように制御する制御信号を生成する。処理部２０は、この制御信号をナビゲーション処理部１４に供給する。ナビゲーション処理部１４は、この制御信号に基づいて、三重県津市までの予定走行ルートを決定し、決定した予定走行ルートをユーザインタフェース１１に表示させる。 For example, as shown in FIG. 4B, if the sentence included in the voice data is "I want to go to Tsu City, Tsu City, Mie Prefecture," the voice analysis unit 21 detects that there is a high similarity between part P1 ("Tsu City") and part P3 ("Tsu City"). The occupant repeatedly utters the part "Tsu City" in this way. As a result, the number of phonemes to be analyzed increases, and the voice analysis unit 21 can recognize "Tsu City." Then, based on the information supplied from the voice analysis unit 21, the control signal generation unit 22 generates a control signal that controls the navigation processing unit 14 to determine a planned driving route to Tsu City, Mie Prefecture. The processing unit 20 supplies this control signal to the navigation processing unit 14. Based on this control signal, the navigation processing unit 14 determines a planned driving route to Tsu City, Mie Prefecture, and displays the determined planned driving route on the user interface 11.

人間は、例えば、他の人から短い言葉が発せられた場合に、その短い言葉を認識しにくい場合がある。人間は、この場合において、もう一度その言葉が発せられると、その短い言葉を認識しやすい。同様に、音声解析部２１は、互いに類似性が高い部分Ｐ１，Ｐ３を検出した場合に、その部分Ｐ１，Ｐ３を精度よく認識することができる。これにより、音声処理システム１では、短い単語を含む文章の認識精度を高めることができ、ユーザが発した音声に応じた処理を適切に行うことができる。 For example, when a short word is spoken by another person, humans may find it difficult to recognize the short word. In this case, if the word is spoken again, humans will find it easier to recognize the short word. Similarly, when the voice analysis unit 21 detects parts P1 and P3 that are highly similar to each other, it can accurately recognize the parts P1 and P3. This allows the voice processing system 1 to improve the recognition accuracy of sentences containing short words and appropriately perform processing according to the voice spoken by the user.

このように、音声処理システム１では、音声データに基づいて、音声データに含まれる、予め定められた音声コマンドを検出する音声解析部２１と、音声解析部２１により検出された音声コマンドに基づいて、その音声コマンドが示す処理を指示する制御信号を生成する制御信号生成部２２とを設けるようにした。この音声コマンドは、第１の部分（部分Ｐ１）と、第１の部分よりも後に配置され、動詞を含む第２の部分（部分Ｐ２）と、第２の部分よりも後に配置され、第１の部分と同じである第３の部分（部分Ｐ３）とを含むようにした。これにより、音声解析部２１は、その部分Ｐ１，Ｐ３を精度よく認識することができる。その結果、音声処理システム１では、例えば、短い単語を含む文章の認識精度を高めることができるので、ユーザが発した音声に応じた処理を適切に行うことができる。 In this way, the voice processing system 1 is provided with a voice analysis unit 21 that detects a predetermined voice command contained in the voice data based on the voice data, and a control signal generation unit 22 that generates a control signal instructing a process indicated by the voice command based on the voice command detected by the voice analysis unit 21. The voice command includes a first part (part P1), a second part (part P2) that is arranged after the first part and includes a verb, and a third part (part P3) that is arranged after the second part and is the same as the first part. This allows the voice analysis unit 21 to accurately recognize parts P1 and P3. As a result, the voice processing system 1 can improve the recognition accuracy of sentences that include short words, for example, and can appropriately perform processing according to the voice uttered by the user.

また、音声処理システム１では、第１の部分（部分Ｐ１）における文字の数は、第２の部分（部分Ｐ２）における文字の数よりも少ないようにした。このような場合でも、この部分Ｐ１と同じである部分Ｐ３があるので、この文字の数が少ない部分が繰り返されるため、この文字の数が少ない部分を精度よく認識することができる。その結果、音声処理システム１では、ユーザが発した音声に応じた処理を適切に行うことができる。 Furthermore, in the voice processing system 1, the number of characters in the first part (part P1) is made smaller than the number of characters in the second part (part P2). Even in such a case, since there is a part P3 that is the same as part P1, this part with a smaller number of characters is repeated, and therefore this part with a smaller number of characters can be recognized with high accuracy. As a result, the voice processing system 1 can appropriately perform processing according to the voice uttered by the user.

また、音声処理システム１では、第１の部分は、第２の部分に含まれる動詞の目的語を含むようにした。目的語は、例えば、普通名詞だけではなく、固有名詞、代名詞、略称などがあり得るので、認識されにくい場合がある。音声処理システム１では、このような場合でも、この部分Ｐ１と同じである部分Ｐ３があるので、例えば目的語を認識しにくい場合でも、この目的語が繰り返されるため、この目的語を精度よく認識することができる。その結果、音声処理システム１では、ユーザが発した音声に応じた処理を適切に行うことができる。 In addition, in the speech processing system 1, the first part includes the object of the verb included in the second part. The object may be, for example, not only a common noun but also a proper noun, pronoun, abbreviation, etc., and may be difficult to recognize. In the speech processing system 1, even in such cases, there is a part P3 that is the same as the part P1, so that even if the object is difficult to recognize, for example, the object is repeated and the object can be recognized with high accuracy. As a result, the speech processing system 1 can appropriately perform processing according to the voice uttered by the user.

また、音声処理システム１では、処理部２０は、車両１に設けられるようにした。そして、車両１は、制御信号に基づいて、音声コマンドに応じた処理を行うようにした。これにより、音声処理システム１では、走行音などのノイズが多い車両内において、ユーザが発した音声に応じた処理を適切に行うことができる。 In addition, in the voice processing system 1, the processing unit 20 is provided in the vehicle 1. The vehicle 1 then performs processing according to the voice command based on the control signal. This allows the voice processing system 1 to appropriately perform processing according to the voice uttered by the user even in a vehicle where there is a lot of noise, such as driving sounds.

［効果］
以上のように本実施の形態では、音声データに基づいて、音声データに含まれる、予め定められた音声コマンドを検出する音声解析部と、音声解析部により検出された音声コマンドに基づいて、その音声コマンドが示す処理を指示する制御信号を生成する制御信号生成部とを設けるようにした。この音声コマンドは、第１の部分と、第１の部分よりも後に配置され、動詞を含む第２の部分と、第２の部分よりも後に配置され、第１の部分と同じである第３の部分とを含むようにした。これにより、ユーザが発した音声に応じた処理を適切に行うことができる。 [effect]
As described above, in this embodiment, a voice analysis unit detects a predetermined voice command included in the voice data based on the voice data, and a control signal generation unit generates a control signal instructing a process indicated by the voice command based on the voice command detected by the voice analysis unit. The voice command includes a first part, a second part that is arranged after the first part and includes a verb, and a third part that is arranged after the second part and is the same as the first part. This makes it possible to appropriately perform a process according to the voice uttered by the user.

本実施の形態では、第１の部分における文字の数は、第２の部分における文字の数よりも少ないようにしたので、ユーザが発した音声に応じた処理を適切に行うことができる。 In this embodiment, the number of characters in the first part is set to be less than the number of characters in the second part, so that processing can be appropriately performed according to the voice uttered by the user.

本実施の形態では、第１の部分は、第２の部分に含まれる動詞の目的語を含むようにしたので、ユーザが発した音声に応じた処理を適切に行うことができる。 In this embodiment, the first part includes the object of the verb contained in the second part, so that appropriate processing can be performed according to the voice uttered by the user.

本実施の形態では、処理部を、車両に設けるようにした。そして、車両は、制御信号に基づいて、音声コマンドに応じた処理を行うようにした。これにより、ユーザが発した音声に応じた処理を適切に行うことができる。 In this embodiment, the processing unit is provided in the vehicle. The vehicle then performs processing in response to the voice command based on the control signal. This allows appropriate processing to be performed in response to the voice uttered by the user.

以上、実施の形態を挙げて本発明を説明したが、本発明はこれらの実施の形態等には限定されず、種々の変形が可能である。 The present invention has been described above using embodiments, but the present invention is not limited to these embodiments and various modifications are possible.

例えば、上記実施の形態では、本技術を車両１０に適用したが、これに限定されるものではなく、これに代えて、様々な装置に適用することができる。具体的には、本技術を、例えばスマートフォンやスマートスピーカに適用してもよい。 For example, in the above embodiment, the present technology is applied to a vehicle 10, but the present technology is not limited to this and can be applied to various devices instead. Specifically, the present technology may be applied to, for example, a smartphone or a smart speaker.

また、例えば、図２～４に記載の音声コマンドは一例であり、他の音声コマンドであってもよい。 In addition, for example, the voice commands shown in Figures 2 to 4 are just examples, and other voice commands may be used.

本明細書中に記載された効果はあくまで例示であり、本開示の効果は、本明細書中に記載された効果に限定されない。よって、本開示に関して、他の効果が得られてもよい。 The effects described in this specification are merely examples, and the effects of the present disclosure are not limited to the effects described in this specification. Therefore, other effects may be obtained with respect to the present disclosure.

さらに、本開示は、以下の態様を取り得る。 Furthermore, the present disclosure may take the following forms:

（１）
音声データに基づいて、前記音声データに含まれる、予め定められた音声コマンドを検出する音声解析部と、
前記音声解析部により検出された前記音声コマンドに基づいて、その音声コマンドが示す処理を指示する制御信号を生成する制御信号生成部と
を備え、
前記音声コマンドは、第１の部分と、前記第１の部分よりも後に配置され、動詞を含む第２の部分と、前記第２の部分よりも後に配置され、前記第１の部分と同じである第３の部分とを含む
音声処理装置。
（２）
前記第１の部分における文字の数は、前記第２の部分における文字の数よりも少ない
前記（１）に記載の音声処理装置。
（３）
前記第１の部分は、前記第２の部分に含まれる前記動詞の目的語を含む
前記（１）または（２）に記載の音声処理装置。
（４）
前記音声処理装置は、車両設けられ、
前記車両は、前記制御信号に基づいて、前記音声コマンドに応じた処理を行う
前記（１）から（３）のいずれかに記載の音声処理装置。 (1)
a voice analysis unit that detects a predetermined voice command included in the voice data based on the voice data;
a control signal generating unit configured to generate a control signal for instructing a process indicated by the voice command based on the voice command detected by the voice analyzing unit,
The voice command includes a first portion, a second portion located after the first portion and including a verb, and a third portion located after the second portion and identical to the first portion.
(2)
The speech processing device according to (1), wherein the number of characters in the first portion is smaller than the number of characters in the second portion.
(3)
The speech processing device according to (1) or (2), wherein the first portion includes an object of the verb included in the second portion.
(4)
The voice processing device is provided in a vehicle,
The voice processing device according to any one of (1) to (3), wherein the vehicle performs processing according to the voice command based on the control signal.

１…音声処理システム、１０…車両、１１…ユーザインタフェース、１２…マイクロフォン、１３…通信部、１４…ナビゲーション処理部、１５…ヘッドランプ制御部、１６…ドアロック制御部、２０…処理部、２１…音声解析部、２２…制御信号生成部、１００…スマートフォン。 1... voice processing system, 10... vehicle, 11... user interface, 12... microphone, 13... communication unit, 14... navigation processing unit, 15... headlamp control unit, 16... door lock control unit, 20... processing unit, 21... voice analysis unit, 22... control signal generation unit, 100... smartphone.

Claims

a voice analysis unit that detects a predetermined voice command included in the voice data based on the voice data;
a control signal generating unit configured to generate a control signal for instructing a process indicated by the voice command based on the voice command detected by the voice analyzing unit,
The voice command includes a first portion, a second portion located after the first portion and including a verb, and a third portion located after the second portion and identical to the first portion.

The speech processing device of claim 1 , wherein the number of characters in the first portion is less than the number of characters in the second portion.

The speech processing device according to claim 1 , wherein the first portion includes an object of the verb included in the second portion.

The voice processing device is provided in a vehicle,
The voice processing device according to claim 1 , wherein the vehicle performs processing in accordance with the voice command based on the control signal.