[go: up one dir, main page]

WO2015098306A1 - Response control device and control program - Google Patents

Response control device and control program Download PDF

Info

Publication number
WO2015098306A1
WO2015098306A1 PCT/JP2014/079411 JP2014079411W WO2015098306A1 WO 2015098306 A1 WO2015098306 A1 WO 2015098306A1 JP 2014079411 W JP2014079411 W JP 2014079411W WO 2015098306 A1 WO2015098306 A1 WO 2015098306A1
Authority
WO
WIPO (PCT)
Prior art keywords
phrase
additional
response
candidate
unit
Prior art date
Application number
PCT/JP2014/079411
Other languages
French (fr)
Japanese (ja)
Inventor
正徳 荻野
暁 本村
Original Assignee
シャープ株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by シャープ株式会社 filed Critical シャープ株式会社
Publication of WO2015098306A1 publication Critical patent/WO2015098306A1/en

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/02Methods for producing synthetic speech; Speech synthesisers
    • G10L13/027Concept to speech synthesisers; Generation of natural phrases from machine-based concepts
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue

Definitions

  • the present invention relates to a response control device that responds to a user's voice.
  • Patent Document 1 a request is forwarded to a specific server in response to a user request, and the server searches the information space on the Internet when information that is not in the local storage system is requested, and displays the search result.
  • a technique for sending back to the robot is disclosed.
  • Japanese Patent Publication Japanese Patent Laid-Open No. 2003-111981 (Released on April 15, 2003)” Japanese Patent Publication “Japanese Patent Laid-Open No. 2006-106761 (Publication Date: Apr. 20, 2006)” Japanese Patent Publication “Japanese Unexamined Patent Application Publication No. 2009-265219 (Publication Date: November 12, 2009)”
  • the conventional technology as described above has a problem that there is a high possibility that a waiting time from when a user speaks until a response to the speech is acquired becomes long. That is, in the above-described robot and terminal, after the search process in the local storage area and the process at the terminal, the search process on the Internet and the process at the server are executed. Therefore, there is a high possibility that the time from when the above-described robot and terminal acquire the user's utterance until the response to the utterance is output becomes longer.
  • the terminal and the server each execute voice processing in parallel.
  • the present invention has been made in view of the above problems, and an object of the present invention is to provide response control that makes an appropriate response based on a plurality of candidates obtained in parallel with a plurality of voice processes for a user's utterance. It is to realize an apparatus or the like.
  • a response control device is a response control device that controls a response to speech, and is generated based on the speech by each of a plurality of response generation units.
  • a candidate phrase acquiring unit that acquires a plurality of candidate phrases and a candidate phrase having the highest importance of information of each of the plurality of candidate phrases from the plurality of candidate phrases acquired by the candidate phrase acquiring unit It is characterized by comprising a selection unit that selects as a phrase.
  • the response control apparatus control method is a response control apparatus control method for controlling a response to voice, and is generated based on the voice by each of a plurality of response generation units. From the candidate phrase acquisition step for acquiring a plurality of candidate phrases and the plurality of candidate phrases acquired by the candidate phrase acquisition step, the candidate phrase having the highest importance of the information included in each of the plurality of candidate phrases is selected as a response phrase. And a selection step of selecting as a feature.
  • an appropriate response can be made to the voice based on a plurality of candidate phrases generated by each of the plurality of response generation units.
  • Embodiment 1 Hereinafter, an embodiment of the present invention will be described with reference to FIGS.
  • the response control device according to one embodiment of the present invention is realized as the mobile terminal 1 (hereinafter, abbreviated as the terminal 1)
  • the terminal 1 hereinafter, abbreviated as the terminal 1
  • FIG. 2 is a diagram showing an outline of the voice response system 100.
  • the voice response system 100 includes a terminal 1 and a voice processing server 2 (hereinafter abbreviated as “server 2”), and the terminal 1 and the server 2 can communicate with each other. It has become.
  • the terminal 1 itself performs a process (response generation process) for generating a response candidate phrase (hereinafter abbreviated as a candidate phrase) for the user's voice
  • the server 2 also performs a response generation process in the terminal 1 in parallel.
  • the response generation process is executed. Therefore, compared with conventional voice processing in which processing at the server is executed after processing at the terminal, the terminal 1 has a longer waiting time for the user to acquire a response to the utterance after the user utters. Can be shortened.
  • the candidate phrase generated by the terminal 1 is referred to as “candidate phrase (A)”, and the candidate phrase generated by the server 2 is referred to as “candidate phrase (B)”.
  • the terminal 1 acquires a candidate phrase (A) and a candidate phrase (B).
  • the terminal 1 selects a candidate phrase with higher information importance (response level) from the above two candidate phrases as a selection response phrase (hereinafter abbreviated as a selection phrase) to be output, and selects the selected phrase.
  • a selection response phrase hereinafter abbreviated as a selection phrase
  • the terminal 1 and the server 2 acquire the first and second external information from the external information providing servers 98 and 99, respectively, and use them for each response generation process. Note that the results of the response generation process may differ between the terminal 1 and the server 2 depending on the information search capability, vocabulary, etc. of each of the terminal 1 and the server 2.
  • a candidate phrase (A) is generated, “The maximum temperature is XX degrees.”
  • the server 2 receives from the external information providing server 99 the weather information “Today's weather is sunny”, the weather cause information “Sunny because it is covered with high pressure”, and “The maximum temperature is XX degrees”. Is obtained as second external information, a candidate phrase (B) is generated: "It's sunny. It's covered with high pressure. The maximum temperature is OO degrees.” Then, the server 2 notifies the terminal 1 of the candidate phrase (B). The terminal 1 compares the candidate phrase (A) with the candidate phrase (B), and selects the candidate phrase with the higher importance of information as the selection phrase to be output.
  • the terminal 1 selects the candidate phrase (B) including the weather cause information in addition to the weather information and the maximum temperature information included in the candidate phrase (A) as a selection phrase, and “sunny. "It's covered with high pressure. The maximum temperature is XX degrees.”
  • the terminal 1 is a response control device that controls a response to voice, and is generated based on the voice by the first response generation unit 13 and the second response generation unit 22 (a plurality of response generation units).
  • the candidate acquisition unit 141 candidate phrase acquisition unit
  • each of the plurality of candidate phrases has a response level (importance of information)
  • a response selection unit 142 selection unit that selects a candidate phrase having the highest value as a selected phrase (response phrase).
  • the terminal 1 can make an appropriate response based on the plurality of candidate phrases generated for the voice by each of the plurality of response generation units. That is, the terminal 1 selects a candidate phrase having the highest importance of information as a selected phrase from a plurality of candidate phrases generated in parallel by the first response generation unit 13 and the second response generation unit 22, and selects the selection phrase. Outputs a phrase.
  • the terminal 1 executes a plurality of response generation processes in parallel to shorten the time from utterance acquisition to response output, and selects one response to be output from the results of the plurality of response generation processes.
  • each of the plurality of candidate phrases includes one or more reference phrases and zero or more additional phrases, and the response selection unit 142 does not include candidate phrases including additional phrases. It is determined that the response level is higher than the candidate phrase.
  • the terminal 1 selects a selection phrase to be output from a plurality of candidate phrases generated in parallel by the first response generation unit 13 and the second response generation unit 22 according to the presence or absence of an additional phrase. Therefore, the terminal 1 can output not only the reference phrase that is a direct response to the user's call but also an additional phrase that is an additional response.
  • the terminal 1 includes a first response generation unit 13 (response generation unit), and executes response generation processing by itself. That is, the terminal 1 executes the response generation process by itself using information that the server 2 cannot acquire, such as the current position information of the user who carries the terminal 1.
  • a first response generation unit 13 response generation unit
  • a plurality of response generation processes are executed in parallel, and one response to be output is selected from the results of the plurality of response generation processes. It is not essential that the first response generator 13 is provided.
  • the terminal 1 performs a response process in parallel with its own response generation process on an external device including a response generation unit other than the first response generation unit 13 (for example, the server 2 including the second response generation unit 22).
  • the candidate phrase generated by the external device is acquired. Details will be described later.
  • the “voice processing” executed by the voice response system 100 refers to processing including voice recognition processing, response generation processing, and voice synthesis processing.
  • the “voice recognition process” is a process for converting user call voice data acquired by the microphone 17 into a call phrase that is corresponding character data, and is similar to a known voice recognition process for converting voice data into character data. It may be the process.
  • the “response generation process” is a process for generating a candidate phrase that is character data corresponding to the calling phrase.
  • the “voice synthesis process” is a process for generating voice data corresponding to a candidate phrase that is character data, and may be a process similar to a known voice synthesis process for converting character data into voice data.
  • the voice data generated by the voice synthesis process is output from the speaker 192.
  • the terminal 1 executes a response selection process in addition to the voice recognition process, the response generation process, and the voice synthesis process.
  • the “response selection process” is described later in detail, but the importance of information included in each of the plurality of candidate phrases from a plurality of candidate phrases generated as a result of a plurality of response generation processes performed in parallel. Is the process of selecting the candidate phrase with the highest as the selected phrase.
  • the “calling phrase” refers to character data obtained by the voice recognition unit 12 executing voice recognition processing on a certain calling voice acquired by the microphone 17.
  • the responses generated by the first response generation unit 13 and the second response generation unit 22 are referred to as “candidate phrases”.
  • the candidate phrase includes a “reference phrase” that is a direct answer to the call phrase, and an “addition phrase” that includes an additional answer or information to the call phrase may be added. There may be a plurality of at least one of a reference phrase and an additional phrase for a certain calling phrase.
  • the candidate phrase is “reference phrase only” or “combination of a reference phrase and one or more additional phrases”.
  • the additional phrase (A-1), “the best phrase”, “they are covered with high pressure”, “highest” If the additional phrase (A-2) “Temperature is OO degrees” can be selected, the following candidate phrases can be assumed. That is, the candidate phrase “It's sunny” with only the reference phrase, and the candidate phrase “It ’s sunny. It ’s covered in high pressure.” With the additional phrase (A-1) added to the reference phrase, Candidate phrases “Additional phrase (A-2) to the standard phrase,“ It ’s sunny. The maximum temperature will be XX degrees. ”And the additional phrase (A-1) and additional phrase (A- 4) Candidate phrases “candidate. Because it is covered with high pressure. The maximum temperature is OO degrees” can be assumed.
  • an additional phrase may be added “after” the reference phrase, or an additional phrase may be added “before” the reference phrase.
  • a candidate phrase having a reference phrase between two or more additional phrases may be generated.
  • there is no limit before and after two or more additional phrases and “It ’s sunny. Because it ’s covered with high pressure. The highest temperature is ⁇ degrees.”, “It ’s sunny. "It's a degree. It's covered with high pressure.”
  • FIG. 1 is a block diagram illustrating a main configuration of the terminal 1 and the server 2. As illustrated, the terminal 1 includes a first control unit 10, a microphone 17, a first storage unit 18, and an output unit 19.
  • the microphone 17 converts voice or the like into an electrical signal and notifies the voice recognition unit 12 of it.
  • the output unit 19 includes a display unit 191 and a speaker 192.
  • the display unit 191 outputs the selected phrase notified as character data from the selection result output unit 143 as an image.
  • the speaker 192 outputs the voice data notified from the voice synthesizer 15 as voice.
  • the first storage unit 18 stores various data used by the terminal 1.
  • the first storage unit 18 includes (1) a control program executed by the first control unit 10 of the terminal 1, (2) an OS program, (3) an application program for executing various functions, and (4) the application. Stores various data to be read when the program is executed.
  • the above data (1) to (4) are, for example, ROM (read only memory), flash memory, EPROM (Erasable Programmable ROM), EEPROM (registered trademark) (Electrically EPROM), HDD (Hard Disk Drive), etc. It is stored in a non-volatile storage device.
  • the first storage unit 18 stores a first reference phrase table 181 and a first additional phrase table 182.
  • the first control unit 10 controls the functions of the terminal 1 including a voice recognition process, a response generation process, a response selection process, and a voice synthesis process, and controls the first communication unit 11, the voice recognition unit 12, 1 response generation unit 13, response control unit 14, speech synthesis unit 15 and first external information acquisition unit 16 are included.
  • the first communication unit 11 communicates with the server 2 and the like. More specifically, the first communication unit 11 includes (1) a call phrase that is a result of the voice recognition unit 12 performing voice recognition processing on the call voice acquired by the microphone 17 from the voice recognition unit 12, and the call An execution request (response generation processing request) for generating a candidate phrase for the phrase is acquired. Then, the call phrase and the response generation process request are transmitted to the server 2. (2) The candidate phrase (B) that is the response generation processing result of the second response generation unit 22 is received from the server 2 and the candidate phrase is notified to the candidate acquisition unit 141.
  • the first external information that is information other than the information held by the terminal 1 is necessary when the first response generation unit 13 tries to execute the response generation processing, the first external information is externally stored. From the information providing server 98 and the like, and notifies the first external information acquisition unit 16.
  • the voice recognition unit 12 executes a voice recognition process. That is, the voice recognition unit 12 first converts the call voice data notified from the microphone 17 into a call phrase that is character data. And the said communication phrase and the request
  • FIG. The voice recognition unit 12 may use a known technique for voice recognition that converts voice data into character data, and the voice recognition process itself can be performed using a conventional technique, and thus the details are omitted.
  • the first response generation unit 13 executes response generation processing. That is, the 1st response production
  • the first response generation unit 13 may generate the candidate phrase (A) using the first external information notified from the first external information acquisition unit 16. Details will be described later.
  • the response control unit 14 includes a candidate acquisition unit 141, a response selection unit 142, and a selection result output unit 143.
  • the candidate acquisition unit 141 acquires the candidate phrase (A) from the first response generation unit 13 and the candidate phrase (B) generated by the second response generation unit 22 from the first communication unit 11 and acquires the acquired candidate phrase ( A) and (B) are notified to the response selection unit 142.
  • the response selection unit 142 executes response selection processing. Specifically, the response selection unit 142 selects a candidate phrase having a higher importance (response level) of information included in each candidate phrase from the candidate phrases (A) and (B) notified from the candidate acquisition unit 141. Is selected as a selection phrase to be output. Details will be described later. The response selection unit 142 notifies the selection result output unit 143 of the selected phrase.
  • the selection result output unit 143 notifies the voice synthesis unit 15 and the display unit 191 of the selected phrase notified from the response selection unit 142.
  • the speech synthesizer 15 executes speech synthesis processing. That is, the speech synthesizer 15 converts the selected phrase, which is character data notified from the selection result output unit 143, into speech data and causes the speaker 192 to output it.
  • the voice synthesizer 15 may use a known technique related to voice synthesis that converts character data into voice data, and the voice synthesis process itself can be performed using conventional techniques, and thus the details are omitted.
  • the first external information acquisition unit 16 acquires first external information which is information other than the information held by the terminal 1 from the external information providing server 98, and uses the first external information as a first response generation unit. 13 is notified.
  • the first external information acquisition unit 16 may acquire the first external information in response to a request from the first response generation unit 13.
  • the server 2 is configured to include the second control unit 20 and the second storage unit 24.
  • the second storage unit 24 stores a second reference phrase table 241 and a second additional phrase table 242, details of which will be described later.
  • the second control unit 20 includes a second communication unit 21, a second response generation unit 22, and a second external information acquisition unit 23.
  • the second communication unit 21 (1) receives a call phrase and a response generation process request as a result of the voice recognition process by the voice recognition unit 12 from the terminal 1, and receives the call phrase and the response generation process request.
  • the second response generation unit 22 is notified.
  • the candidate phrase (B) that is the response generation processing result is acquired from the second response generation unit 22, and the candidate phrase (B) is transmitted to the terminal 1.
  • the second external information is externally stored. From the information providing server 99 and the like, and notifies the second external information acquiring unit 23 of the information.
  • the second response generator 22 executes a response generation process. That is, the second response generation unit 22 performs a process of generating a candidate phrase (B) for the calling phrase notified from the second communication unit 21.
  • the second response generation unit 22 may generate the candidate phrase (B) using the second external information notified from the second external information acquisition unit 23. Details will be described later.
  • the second external information acquisition unit 23 acquires second external information that is information other than the information held by the server 2 from the external information providing server 99, and uses the second external information as a second response generation unit. 22 is notified.
  • the second external information acquisition unit 23 may acquire the second external information in response to a request from the second response generation unit 22.
  • FIG. 3 is a diagram illustrating an example of the first reference phrase table 181 and the second reference phrase table 241 stored in the terminal 1 and the server 2.
  • FIG. 4 is a diagram illustrating an example of the first additional phrase table 182 and the second additional phrase table 242 stored in the terminal 1 and the server 2.
  • the first additional phrase table 182 and the second additional phrase table 242 are collectively referred to as an “addition phrase table”.
  • call phrase and the reference phrase are associated with each other in the reference phrase table in FIG.
  • reference phrase table “call ID” for identifying each call phrase is associated with each call phrase, and “reference ID” for identifying each reference phrase is associated with each reference phrase. ing.
  • the reference ID and the additional phrase are associated with each other, and “addition ID” for identifying each additional phrase is associated with each additional phrase.
  • “additional conditions” are set as conditions for adding to the reference phrase. If there is an additional phrase that satisfies the additional condition, the first response generating unit 13 and the second response generating unit 22 add the additional phrase to the reference phrase.
  • generation part 22 perform a response production
  • the additional phrase of additional ID “3” has an additional condition of “successful acquisition of weather cause information”, and the additional phrase (contents) is “according to acquired weather cause information”. is there.
  • information related to the cause of weather unsunny, cloudy, rain, etc.
  • an external information providing server 98/99 for example, a weather information server
  • the information is used as an additional phrase. Show.
  • There are various possible causes of the weather but it is not necessary to store all the causes in the additional phrase table in advance. For example, generation of candidate phrases using the weather cause information acquired from the weather information server as additional phrases It may be used sometimes.
  • additional phrases of additional ID “4” and “5”.
  • information such as the maximum temperature and the probability of precipitation is acquired from a weather information server or the like
  • the location of “ ⁇ ” of the additional phrase is acquired. It is supposed to be replaced with a value.
  • the terminal 1 and the server 2 each store a reference phrase table and an additional phrase table.
  • the contents of the reference phrase table and the additional phrase table stored in each of the terminal 1 and the server 2 may be common or different.
  • the first response generation unit 13 and the second response generation unit 22 always respond to a certain calling phrase.
  • the same candidate phrase is not necessarily generated. That is, even if the contents of the first reference phrase table 181 and the contents of the second reference phrase table 241 are the same, and the contents of the first additional phrase table 182 and the contents of the second additional phrase table 242 are the same, for example, The following situations can occur:
  • the contents of the reference phrase table and the additional phrase table stored in each of the terminal 1 and the server 2 may be different as follows.
  • the second reference phrase table 241 and the second additional phrase table 242 of the server 2 may be set with conditions for acquiring and analyzing various information on the Internet as additional conditions.
  • the first reference phrase table 181 and the first additional phrase table 182 of the terminal 1 include the current date and the current location information of the user carrying the terminal 1 (GPS (Global Positioning of the terminal 1) as additional conditions and the like. Conditions that only the terminal 1 can acquire, such as current position information acquired by (System) etc., may be set.
  • FIG. 5 is a sequence diagram illustrating an outline of processing performed by the terminal 1 and the server 2.
  • the basic flow of audio processing executed by the terminal 1 can be organized as follows. That is, when the microphone 17 of the terminal 1 acquires the user's calling voice (S101), the microphone 17 converts the calling voice into voice data and notifies the voice recognition unit 12 of the voice data.
  • the voice recognition unit 12 performs a voice recognition process on the voice data (S102).
  • the voice recognition unit 12 executes a voice recognition process on the voice data to acquire a calling phrase, and the acquired response phrase together with the request for the response generation process is used for the first response generation unit 13 and the first communication unit 11. (S103).
  • the first response generation unit 13 When the voice recognition unit 12 notifies the call phrase and the request for response generation processing, the first response generation unit 13 performs response generation processing (S104). Then, the first response generation unit 13 notifies the candidate acquisition unit 141 of the generated candidate phrase (A). In addition, the first communication unit 11 transmits the call phrase notified from the voice recognition unit 12 and a request for response generation processing to the server 2.
  • the second communication unit 21 of the server 2 notifies the second response generation unit 22 of the call phrase received from the terminal 1 and the request for response generation processing.
  • the second response generation unit 22 performs response generation processing (S104 ').
  • the second response generation unit 22 notifies the generated candidate phrase (B) to the second communication unit 21, and the second communication unit 21 transmits the candidate phrase (B) to the terminal 1.
  • the first communication unit 11 of the terminal 1 notifies the candidate acquisition unit 141 of the candidate phrase (B) received from the server 2.
  • the candidate acquisition unit 141 acquires the result of the response generation processing of the terminal 1 and the server 2, that is, acquires the candidate phrase (A) from the first response generation unit 13 and the candidate phrase (B) from the first communication unit 11. (S105).
  • the candidate acquisition unit 141 notifies the response selection unit 142 of the candidate phrases (A) and (B).
  • the response selection unit 142 executes a response selection process, that is, selects either of the candidate phrases (A) or (B) as the selected phrase (S106).
  • the response selection unit 142 notifies the selection result output unit 143 of the candidate phrase selected as the selected phrase.
  • the selection result output unit 143 notifies the display unit 191 and the speech synthesis unit 15 of the selected phrase notified from the response selection unit 142.
  • the voice synthesis unit 15 performs voice synthesis processing on the selected phrase notified from the selection result output unit 143, and outputs a response to the user as a voice (S107). Next, details of the response generation process and the response selection process will be described.
  • FIG. 6 is a diagram illustrating a flow of response generation processing executed by the first response generation unit 13 and the second response generation unit 22.
  • response generation unit When it is not necessary to distinguish between the first response generation unit 13 and the second response generation unit 22, both are collectively referred to as a “response generation unit”.
  • the response generation unit when notified of the calling phrase, the response generation unit first selects a reference phrase corresponding to the calling phrase with reference to the reference phrase table (S201). When there are a plurality of reference phrases corresponding to the calling phrase, the response generation unit selects a reference phrase that matches the condition.
  • the response generator selects a reference phrase that meets the conditions from the above three reference phrases.
  • the response generation unit refers to the additional phrase table illustrated in FIG. 4 and selects an additional ID associated with (related to) the reference ID selected in S201 (S202).
  • the response generation unit refers to the additional phrase table and checks whether there is any other additional ID corresponding to (related to) the above-mentioned reference ID (S205). That is, the response generation unit confirms whether there is an additional ID whose “related reference ID” is the reference ID selected in S201 and for which an additional condition has not been confirmed yet. . If there is an additional ID that has not been confirmed whether the additional condition is satisfied (No in S205), the process returns to S202, selects an additional ID that has not been confirmed whether the additional condition is satisfied (S202), and performs the processing after S203. repeat.
  • the response generation unit may execute the process of S204 multiple times by looping in S205, the response generation unit does not overwrite the additional phrase to be added to the reference phrase each time the loop is performed. Add an additional phrase to be added to the phrase.
  • the response generation unit repeats the determination in S205 until there is no additional ID for which the additional condition has not been confirmed.
  • the process may transition to S206 without making the determination of S205. Further, when the number of additional phrases to be added to the reference phrase exceeds a predetermined number, the process may proceed to S206 without performing the determination of S205.
  • the response generation unit confirms the candidate phrase generated by adding the additional phrase in S204 to the reference phrase selected in S201 as the candidate phrase to be notified to the candidate acquisition unit 141 or the second communication unit 21, and then the candidate phrase Is assigned a response level (S206). That is, when the candidate phrase includes an additional phrase, the response level is “1”, and when the candidate phrase does not include the additional phrase, the response level is “0”.
  • the response generation unit For example, if the reference phrase is “It's sunny” and the additional phrase related to the reference phrase and the additional phrase that satisfies the additional condition is “The maximum temperature is XX degrees”, the response The generation unit generates a candidate phrase “It's sunny. The maximum temperature is OO degrees.”, And sets the response level of the candidate phrase to “1”. On the other hand, when the reference phrase is “sunny,” and there is no additional phrase related to the reference phrase and satisfying the additional condition, the response generation unit selects the candidate “sunny.” A phrase is generated and the response level is set to “0”.
  • the response generation unit determines the additional phrase after determining the reference phrase, but the procedure of the response generation process is not limited to this.
  • the processing may be executed in the following order.
  • any reference ID corresponding to the calling phrase is acquired. Thereafter, with reference to the additional phrase table, the additional condition of the additional ID corresponding to (related to) the acquired reference ID is confirmed.
  • filled it determines as a reference
  • the response generation unit determines whether to select the reference ID and the additional ID in ascending order.
  • the reference ID and the additional ID are selected in ascending order, and may be determined in ascending order or in an arbitrary order.
  • the response generation unit gives a response level to the candidate phrase, but the response generation unit may not give the response level to the candidate phrase.
  • the response selection unit 142 may give a response level to each candidate level by analyzing the candidate phrases (A) and (B) notified from the candidate acquisition unit 141. And the response selection part 142 may select the candidate phrase with the higher assigned response level as a selection phrase. Specifically, after the response selection unit 142 gives a response level to each candidate phrase (for example, in S211), a candidate phrase with a high response level may be selected as the selected phrase.
  • FIG. 7 is a diagram illustrating a response selection process performed by the response selection unit (the first response generation unit 13 and the second response generation unit 22).
  • the response selection unit 142 selects a candidate phrase with a high response level from the candidate phrase (A) generated by the terminal 1 and the candidate phrase (B) generated by the server 2 as a selected phrase for outputting to the user ( S211). That is, the response selection unit 142 selects a candidate phrase including an additional phrase as a selected phrase from the candidate phrase (A) and the candidate phrase (B).
  • any candidate phrase may be selected. “If candidate phrase (A) and (B) have the same response level, select candidate phrase (A)” may be determined in advance, or conversely, “select candidate phrase (B)” Also good.
  • the candidate phrase acquired first in time may be selected”, or conversely “ It is good also as selecting the candidate phrase acquired later.
  • the process flow of the terminal 1, which is a response control device that controls responses to voice, can be organized as follows. That is, S105 (candidate phrase acquisition step) in which a plurality of candidate phrases generated based on speech are acquired by the first response generation unit 13 and the second response generation unit 22 (a plurality of response generation units), and S105, respectively.
  • S105 candidate phrase acquisition step
  • the candidate phrase having the highest response level (importance of information) of each of the plurality of candidate phrases is selected as a selected phrase (response phrase) from the plurality of candidate phrases acquired by the response selection unit 142 in S106 or S211 (selection step).
  • the voice response system 100 that outputs a response to the voice as voice or a character image uses the voice processing of both the terminal 1 and the server 2 to maximize the user's expectation for the output response phrase.
  • the terminal 1 performs a response generation process in parallel with the terminal 1 and the server 2, so that the terminal 1 performs the process on the server after the process on the terminal,
  • the waiting time from call voice acquisition to response can be shortened.
  • the terminal 1 can select a candidate phrase having the highest importance of information from the plurality of candidate phrases as a selected phrase and output the selected phrase.
  • terminal 1A The outline of the mobile terminal 1A according to the present embodiment (hereinafter abbreviated as terminal 1A) will be described as follows. That is, the terminal 1 ⁇ / b> A stores a first additional phrase table 182 ⁇ / b> A in which additional points are associated with each additional phrase in the first storage unit 18. Further, the response selection unit 142A (selection unit) of the terminal 1A sets the total value of the additional points set in the additional phrase included in the candidate phrase as the response level (importance of information) of the candidate phrase. The terminal 1 gives a response level to the candidate phrase according to the presence / absence of the additional phrase, whereas the terminal 1A responds to the candidate phrase according to the additional point set in the additional phrase included in the candidate phrase. Grant a level. In other respects, the basic configuration of the terminal 1A is the same as the configuration of the terminal 1.
  • the terminal 1A can output a candidate phrase with high importance of information by selecting a selected phrase to be output from a plurality of candidate phrases according to a total value of additional points of the additional phrases included in each candidate phrase.
  • the server 2A stores, in the second storage unit 24, a second additional phrase table 242A in which an additional point is associated with each additional phrase.
  • the basic configuration of the server 2A is the same as the configuration of the server 2.
  • FIG. 1 is a block diagram showing the main configuration of the terminal 1 and the server 2 and also shows the main configuration of the terminal 1A having the same configuration as the terminal 1 and the server 2A having the same configuration as the server 2. .
  • FIG. 1 is a block diagram showing the main configuration of the terminal 1 and the server 2 and also shows the main configuration of the terminal 1A having the same configuration as the terminal 1 and the server 2A having the same configuration as the server 2. .
  • first response generation unit 13A and the second response generation unit 22A both are collectively referred to as a “response generation unit”.
  • first additional phrase table 182A and the second additional phrase table 242A are referred to as “additional phrase tables”.
  • the response level is given to the candidate phrase by the response selection unit 142A summing the additional points of the additional phrases included in the candidate phrase. May be.
  • the response selection unit 142A only needs to select the total value of the additional points set in the additional phrases included in the candidate phrase as the response level of the candidate phrase and select the candidate phrase with the highest response level as the selected phrase.
  • the response level may be assigned anywhere.
  • the terminal 1A selects a selection phrase to be output according to the addition point of the additional phrase included in the candidate phrase, that is, the importance of the information included in the additional phrase. Therefore, the terminal 1A can output a candidate phrase having the highest importance of information from a plurality of candidate phrases.
  • FIG. 8 is a diagram illustrating an example of the first additional phrase table 182A stored in the terminal 1A and the second additional phrase table 242A stored in the server 2A.
  • additional points are set for the additional phrases.
  • “Additional point” is a point set for each additional ID, and indicates the importance of information included in each additional phrase.
  • the response level of each candidate phrase is the total value of the additional points set for the additional phrases included in each candidate phrase. Therefore, the response level of the candidate phrase with only the reference phrase, to which no additional phrase is added, is “0”.
  • the response generation unit adds the additional point set for the added additional phrase to the response level of the candidate phrase including the reference phrase.
  • the additional points may be the same for all additional phrases, or may differ for each additional phrase.
  • the response generation unit or the response selection unit 142A sets the response level of the candidate phrase according to the number of additional phrases included in the candidate phrase.
  • the response generation unit or the response selection unit 142A determines the response level of the candidate phrase by weighting the number of additional phrases included in the candidate phrase by the additional point of each additional phrase. To do.
  • an additional point may be set according to the difficulty of satisfying the additional condition.
  • FIG. 9 is a sequence diagram showing the flow of response generation processing of the terminal 1A and the server 2A. 9 differs from the response generation process of FIG. 6 in that the process of S301 is added between S204 and S205. That is, in S301, the response generation unit adds the additional point of the additional phrase (addition ID) added to the reference phrase in S204 to the response level.
  • the response generation unit determines the candidate phrase generated by adding the additional phrase in S204 to the reference phrase selected in S201 as the candidate phrase to be notified to the candidate acquisition unit 141A or the second communication unit 21.
  • the response generation unit determines the total value of the additional points of the additional phrases included in the candidate phrase as the response level of the candidate phrase.
  • the response generation unit notifies the candidate acquisition unit 141A or the second communication unit 21 of the confirmed candidate phrase and the response level of the candidate phrase.
  • the candidate acquisition unit 141A includes the candidate phrase (A) generated by the first response generation unit 13A and the response level of the candidate phrase (A), the candidate phrase (B) generated by the second response generation unit 22A, and the candidate phrase.
  • the response level of (B) is acquired from the first response generation unit 13A and the first communication unit 11. Then, the candidate acquisition unit 141A notifies them to the response selection unit 142.
  • the response selection unit 142 determines the total value of the additional points set in the additional phrases included in each of the candidate phrases (A) and (B) as the candidate phrase (A ) And (B) as the respective response levels (importance), the candidate phrase with the higher response level is selected as the selected phrase.
  • FIG. 10 is a block diagram showing a main configuration of a voice response system 300 including a mobile terminal 3 (hereinafter abbreviated as terminal 3) which is a response control apparatus according to the present embodiment.
  • terminal 3 a mobile terminal 3
  • the outline of the terminal 3 will be described as follows.
  • the terminal 3 stores a first additional phrase table 183 in which a category is set for each additional phrase in the first storage unit 18.
  • the terminal 3 is a candidate phrase that has not been selected by the response selection unit 142A (selection unit) and has the same content as the reference phrase included in the selection phrase (response phrase) selected by the response selection unit 142A.
  • the phrase adding unit 341 that adds the additional phrase to the selected phrase is provided. .
  • both are collectively called an "addition phrase table.”
  • the terminal 3 can add an additional phrase included in the candidate phrase that has not been selected as the selected phrase by the response selecting unit 142A to the selected phrase. Therefore, the terminal 3 can output a phrase that cannot be generated only by a single response generation process, for example, a phrase that cannot be generated only by the first response generation unit 13A or the second response generation unit 22A.
  • the response generation unit assigns a response level will be described below, but the response selection unit 142A adds the additional points of the additional phrases included in the candidate phrase, and assigns the response level to the candidate phrase. Also good.
  • FIG. 11 is a diagram illustrating an example of an additional phrase table stored in the terminal 3 and the voice processing server 2. As illustrated, in the additional phrase table, a category is associated with each additional phrase. The category indicates what additional information the additional phrase relates to.
  • FIG. 12 is a sequence diagram showing the flow of response generation processing of the terminal 3 and the voice processing server 2.
  • the response generation unit of the terminal 3 and the voice processing server 2 determines a category of the candidate phrase in addition to generating a candidate phrase and assigning a response level to the candidate phrase.
  • the response generation unit includes the process of S406 instead of S306 in the response generation process of FIG. That is, in S406, the response generation unit generates a candidate phrase from the reference phrase determined in S201 and the additional phrase determined in S204. In S406, the response generation unit determines the candidate phrase generated by adding the additional phrase in S204 to the reference phrase selected in S201 as the candidate phrase to be notified to the candidate acquisition unit 141A or the second communication unit 21. The response generation unit determines the total value of the additional points of the additional phrases included in the candidate phrase as the response level of the candidate phrase. Further, the response generation unit determines the category of the additional phrase included in the candidate phrase as the category of the candidate phrase.
  • the response generation unit displays “sunny.
  • a candidate phrase “The temperature will be XX degrees” is generated.
  • the response generation unit sets the category of the candidate phrase “It's sunny. The maximum temperature will be XX degrees.” And the additional phrase “The maximum temperature will be XX degrees.” The category is “Highest temperature”.
  • the response generation unit sets the category of the candidate phrase as “ The maximum temperature and the probability of precipitation are fixed. That is, when the first response generation unit 13A generates a candidate phrase (A) that says “It is sunny. Because it is covered with high pressure. The probability of precipitation is XX%.” ) Is “2”, and the category is “reason for weather, probability of precipitation”.
  • the second response generation unit 22A When the second response generation unit 22A generates a candidate phrase (B) saying “It's sunny. The maximum temperature is XX degrees.”, The response level of the candidate phrase (B) is “1”, category Is the “highest temperature”.
  • the response generation unit notifies the candidate acquisition unit 141A or the second communication unit 21 of the confirmed candidate phrase, the response level and category of the candidate phrase.
  • FIG. 13 is a sequence diagram showing a flow of response selection processing of the terminal 3.
  • 142 A of response selection parts select a candidate phrase with a high response level, ie, the total value of an additional point, as a selection phrase (S411).
  • the response selection unit 142A selects the candidate phrase (A) as the selected phrase.
  • 142 A of response selection parts notify a candidate phrase (A) and a candidate phrase (B) to the phrase addition part 341 with the information of which candidate phrase was selected as a selection phrase.
  • the phrase adding unit 341 is a candidate phrase that is not selected as the selected phrase and includes a reference phrase having the same content as the reference phrase (A-0) included in the candidate phrase (A) selected as the selected phrase. It is confirmed whether there is any (S412).
  • the phrase adding unit 341 first extracts the reference phrase (A-0) included in the candidate phrase (A) selected as the selected phrase. Next, the reference phrase (B-0) of the candidate phrase (B) not selected as the selected phrase is extracted. Then, the phrase adding unit 341 determines whether the reference phrase (A-0) and the reference phrase (B-0) match (has the same content).
  • whether the reference phrase (A-0) and the reference phrase (B-0) match is not a determination of whether each phrase is the same, but whether the two reference phrases include a specific word. It may be determined whether or not. That is, for example, when the reference phrase (A-0) and the reference phrase (B-0) contain the same word “sunny”, the reference phrase (A-0) and the reference phrase (B-0) It may be determined that they match.
  • the standard phrase (A-0) “It's sunny.”
  • the phrase adding unit 341 selects the candidate phrase that has not been selected as the selected phrase. Then, a candidate phrase (B) including a reference phrase having the same content as the reference phrase (A-0) is acquired (S413).
  • the phrase adding unit 341 Get the candidate phrase (B) "The maximum temperature is XX degrees.”
  • the phrase adding unit 341 selects a new added phrase as the candidate phrase (A). Is not added, and the process is terminated.
  • the reference phrase (B-0) included in the candidate phrase that has not been selected as the selected phrase is “rain is,” and is the reference phrase (A-0) included in the candidate phrase selected as the selected phrase. If it is “clear,” the phrase adding unit 341 determines that the reference phrase (A-0) and the reference phrase (B-0) do not match, and ends the process.
  • the phrase adding unit 341 is an additional phrase included in the candidate phrase (B), and the candidate phrase ( An additional phrase (B-1) corresponding to a category that does not match the category of A) is acquired (S415).
  • category “reason for weather, probability of precipitation” “Sunny, because it is covered with high pressure.
  • Terminal 3 adds a new additional phrase to the candidate phrase selected as the selected phrase based on the category of each candidate phrase, and outputs the result.
  • the additional phrase (B-1) selected by the second response generation unit 22A is added to the candidate phrase (A) generated by the first response generation unit 13A. Accordingly, the terminal 3 can output a phrase that cannot be generated only by the first response generation unit 13A or the second response generation unit 22A.
  • the terminal 3 can output a candidate phrase after adding another candidate phrase. For example, when the response from the server 2 is delayed beyond a certain threshold due to a delay in network communication or the like, an additional phrase of a different category is included in the later response even after the previous response has already been output. If this is the case, a later additional phrase can be added to the previous response.
  • the basic processing procedure is the same as the response selection processing shown in FIG. That is, first, the terminal 3 temporarily stores the candidate phrase (A) to be output in the first storage unit 18 and then outputs the candidate phrase (A). Thereafter, when a candidate phrase (B) that is a candidate phrase that has not yet been output and includes a basic phrase similar to the basic phrase of the selected phrase that has already been output, the terminal 3 selects the category of the candidate phrase (B). Compare with the category of the candidate phrase (A).
  • the terminal 3 is an additional phrase contained in the said candidate phrase (B), Comprising: The said candidate phrase (A) Acquires and outputs an additional phrase corresponding to a category different from the category. That is, the terminal 3 can output a phrase in which a new additional phrase is added to the already output candidate phrase.
  • the control blocks (first control units 10, 30) of the terminals 1, 1A and 3 may be realized by a logic circuit (hardware) formed in an integrated circuit (IC chip) or the like, or a CPU (Central Processing Unit). ) May be implemented by software.
  • the terminals 1, 1A, and 3 include a CPU that executes instructions of a program that is software that realizes each function, and a ROM (Read Only) in which the program and various data are recorded so as to be readable by a computer (or CPU) Memory) or a storage device (these are referred to as “recording media”), a RAM (Random Access Memory) for expanding the program, and the like.
  • a computer or CPU
  • the recording medium a “non-temporary tangible medium” such as a tape, a disk, a card, a semiconductor memory, a programmable logic circuit, or the like can be used.
  • the program may be supplied to the computer via an arbitrary transmission medium (such as a communication network or a broadcast wave) that can transmit the program.
  • a transmission medium such as a communication network or a broadcast wave
  • the present invention can also be realized in the form of a data signal embedded in a carrier wave in which the program is embodied by electronic transmission.
  • the terminals 1, 1A and 3 select and output the candidate phrase having the highest importance (response level) of information from the plurality of candidate phrases generated in parallel. That is, for the terminals 1, 1 ⁇ / b> A, and 3, the candidate acquisition units 141 and 141 ⁇ / b> A can acquire a plurality of candidate phrases, and the response selection units 142 and 142 ⁇ / b> A It is only necessary to select the candidate phrase having the highest importance as the selected phrase, and other configurations are not essential.
  • the terminals 1, 1 ⁇ / b> A, and 3 include the first response generation units 13 and 13 ⁇ / b> A and the servers 2 and 2 ⁇ / b> A include the second response generation units 22 and 22 ⁇ / b> A
  • this configuration is not essential.
  • the terminal 1 may not include the first response generation unit 13, and the server 2 may include the first response generation unit 13 and the second response generation unit 22.
  • the server 2 includes the second response generation unit 22.
  • the terminal 1 may be provided with the 1st response production
  • the candidate acquisition units 141 and 141A acquire two candidate phrases. For example, three or more candidate phrases may be acquired.
  • the response selection units 142 and 142A may select a candidate phrase having the highest importance of information as a selected phrase from three or more candidate phrases.
  • the server 2 includes the second voice recognition unit, and the terminal 1 transmits the voice data from the microphone 17 to the server 2.
  • the terminal 1 and the server 2 may execute the speech recognition process in parallel.
  • the server 2 may include the voice recognition unit 12 instead of the terminal 1, and the voice recognition unit 12 of the server 2 may perform a voice recognition process and a response generation process request on the voice data from the microphone 17.
  • the server 2 includes the speech synthesizer 15 and based on the selection phrase acquired from the selection result output units 143 and 143A, the speaker 192. Audio data to be output to the user may be generated.
  • a server has a higher processing capacity than a terminal, can use abundant vocabulary, and has a high recognition accuracy for voice recognition and a high number of responses that can be handled.
  • the server has a larger acoustic model dictionary, language model dictionary, etc. than the terminal, has a high speech recognition processing capacity, can handle many interactive response scenarios, and has a large amount of phoneme data and is clear. Output audio.
  • the portable terminal (response control apparatus) according to each aspect of the present invention may be realized by a computer.
  • the portable terminal is operated by operating the computer as each unit (limited to software elements) included in the portable terminal.
  • a mobile terminal control program for realizing the terminal by a computer and a computer-readable recording medium on which the control program is recorded also fall within the scope of the present invention.
  • the present invention can be widely used in response control devices that control responses to voice.
  • 1 ⁇ 1A ⁇ 3 mobile terminal (response control device), 13 ⁇ 13A first response generator (response generator), 22 ⁇ 22A second response generator (response generator), 141 ⁇ 141A candidate acquisition unit (candidate phrase) Acquisition unit), 142 / 142A response selection unit (selection unit), 341 phrase addition unit

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Telephonic Communication Services (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The purpose of the present invention is to output a response to the utterance of a user in a short period of time, which includes important information. A terminal (1) is provided with a response selection unit (142) for selecting a candidate phrase including the most important information as the response phrase from among a plurality of candidate phrases acquired from a first and a second response generation unit (13, 22) which generates candidate phrases in parallel in response to a speech.

Description

応答制御装置、制御プログラムResponse control device, control program
 本発明は、ユーザの音声に応答する応答制御装置等に関する。 The present invention relates to a response control device that responds to a user's voice.
 従来、自動的に会話等の処理を行うロボットおよび音声処理システムが普及している。例えば特許文献1には、ユーザのリクエストに応じて特定のサーバにリクエストを転送し、サーバは、ローカルの記憶システムにない情報が要求されているとインターネット上の情報空間を検索して検索結果をロボットに送り返す技術が開示されている。 Conventionally, robots and voice processing systems that automatically process conversations and the like have become widespread. For example, in Patent Document 1, a request is forwarded to a specific server in response to a user request, and the server searches the information space on the Internet when information that is not in the local storage system is requested, and displays the search result. A technique for sending back to the robot is disclosed.
日本国公開特許公報「特開2003-111981号公報(公開日:2003年04月15日)」Japanese Patent Publication “Japanese Patent Laid-Open No. 2003-111981 (Released on April 15, 2003)” 日本国公開特許公報「特開2006-106761号公報(公開日:2006年04月20日)」Japanese Patent Publication “Japanese Patent Laid-Open No. 2006-106761 (Publication Date: Apr. 20, 2006)” 日本国公開特許公報「特開2009-265219号公報(公開日:2009年11月12日)」Japanese Patent Publication “Japanese Unexamined Patent Application Publication No. 2009-265219 (Publication Date: November 12, 2009)”
 しかしながら、上述のような従来技術は、ユーザが発話してから該発話に対する応答を取得するまでの待ち時間が長くなる可能性が高いという課題がある。すなわち、上述のロボットおよび端末では、ローカルの記憶領域での検索処理および端末での処理の後に、インターネット上での検索処理およびサーバでの処理が実行される。従って、上述のロボットおよび端末がユーザの発話を取得してから、該発話に対する応答を出力するまでの時間は長くなる可能性が高い。ここで、ユーザが発話してから該発話に対する応答を取得するまでの待ち時間を短縮するために、端末とサーバとが各々音声処理を並行して実行することが考えられる。しかし、端末とサーバとが各々音声処理を並行して実行する場合、端末の音声処理結果とサーバの音声処理結果のいずれをユーザに出力するかという問題が残る。そして、上述のような従来技術には、端末とサーバとが各々音声処理を並行して実行する場合に端末の音声処理結果とサーバの音声処理結果のいずれをユーザに出力するかという問題を解決する手段は、開示も示唆もされていない。 However, the conventional technology as described above has a problem that there is a high possibility that a waiting time from when a user speaks until a response to the speech is acquired becomes long. That is, in the above-described robot and terminal, after the search process in the local storage area and the process at the terminal, the search process on the Internet and the process at the server are executed. Therefore, there is a high possibility that the time from when the above-described robot and terminal acquire the user's utterance until the response to the utterance is output becomes longer. Here, in order to reduce the waiting time from when the user utters until the response to the utterance is acquired, it is conceivable that the terminal and the server each execute voice processing in parallel. However, when each of the terminal and the server executes voice processing in parallel, there remains a problem of which of the voice processing result of the terminal and the voice processing result of the server is output to the user. The conventional technology as described above solves the problem of outputting to the user either the terminal voice processing result or the server voice processing result when the terminal and the server execute voice processing in parallel. The means to do is not disclosed or suggested.
 本発明は、上記問題点に鑑みてなされたものであり、その目的は、ユーザの発話に対して複数の音声処理を並行して得た複数の候補に基づいて、適切な応答を行う応答制御装置等を実現することにある。 The present invention has been made in view of the above problems, and an object of the present invention is to provide response control that makes an appropriate response based on a plurality of candidates obtained in parallel with a plurality of voice processes for a user's utterance. It is to realize an apparatus or the like.
 上記の課題を解決するために、本発明の一態様に係る応答制御装置は、音声に対する応答を制御する応答制御装置であって、複数の応答生成部のそれぞれによって、上記音声に基づいて生成された複数の候補フレーズを取得する候補フレーズ取得部と、上記候補フレーズ取得部が取得した上記複数の候補フレーズから、該複数の候補フレーズのそれぞれが有する情報の重要度が最も高い候補フレーズを、応答フレーズとして選択する選択部とを備えていることを特徴としている。 In order to solve the above-described problem, a response control device according to an aspect of the present invention is a response control device that controls a response to speech, and is generated based on the speech by each of a plurality of response generation units. A candidate phrase acquiring unit that acquires a plurality of candidate phrases and a candidate phrase having the highest importance of information of each of the plurality of candidate phrases from the plurality of candidate phrases acquired by the candidate phrase acquiring unit It is characterized by comprising a selection unit that selects as a phrase.
 また、本発明の一態様に係る応答制御装置の制御方法は、音声に対する応答を制御する応答制御装置の制御方法であって、複数の応答生成部のそれぞれによって、上記音声に基づいて生成された複数の候補フレーズを取得する候補フレーズ取得ステップと、上記候補フレーズ取得ステップが取得した上記複数の候補フレーズから、該複数の候補フレーズのそれぞれが有する情報の重要度が最も高い候補フレーズを、応答フレーズとして選択する選択ステップとを含むことを特徴としている。 The response control apparatus control method according to an aspect of the present invention is a response control apparatus control method for controlling a response to voice, and is generated based on the voice by each of a plurality of response generation units. From the candidate phrase acquisition step for acquiring a plurality of candidate phrases and the plurality of candidate phrases acquired by the candidate phrase acquisition step, the candidate phrase having the highest importance of the information included in each of the plurality of candidate phrases is selected as a response phrase. And a selection step of selecting as a feature.
 本発明の一態様によれば、上記音声に対し上記複数の応答生成部のそれぞれにより生成された複数の候補フレーズに基づいて、適切な応答を行うことができるという効果を奏する。 According to an aspect of the present invention, there is an effect that an appropriate response can be made to the voice based on a plurality of candidate phrases generated by each of the plurality of response generation units.
本発明の実施形態に係る応答制御装置を含む音声応答システムの要部構成を示すブロック図である。It is a block diagram which shows the principal part structure of the voice response system containing the response control apparatus which concerns on embodiment of this invention. 図1の音声応答システムの概要を示す図である。It is a figure which shows the outline | summary of the voice response system of FIG. 図1の応答制御装置および音声処理サーバに格納されている基準フレーズテーブルの例を示す図である。It is a figure which shows the example of the reference | standard phrase table stored in the response control apparatus and voice processing server of FIG. 図1の応答制御装置および音声処理サーバに格納されている付加フレーズテーブルの例を示す図である。It is a figure which shows the example of the additional phrase table stored in the response control apparatus and voice processing server of FIG. 図1の応答制御装置および音声処理サーバの音声処理の概要を示すシーケンス図である。It is a sequence diagram which shows the outline | summary of the audio | voice processing of the response control apparatus and audio | voice processing server of FIG. 図1の応答制御装置および音声処理サーバの応答生成処理の流れを示すシーケンス図である。It is a sequence diagram which shows the flow of the response production | generation process of the response control apparatus of FIG. 1, and a speech processing server. 図1の応答制御装置の応答選択処理の流れを示すシーケンス図である。It is a sequence diagram which shows the flow of the response selection process of the response control apparatus of FIG. 本発明の別の実施形態に係る応答制御装置および音声処理サーバに格納されている付加フレーズテーブルの例を示す図である。It is a figure which shows the example of the additional phrase table stored in the response control apparatus which concerns on another embodiment of this invention, and a speech processing server. 本発明の別の実施形態に係る応答制御装置および音声処理サーバの応答生成処理の流れを示すシーケンス図である。It is a sequence diagram which shows the flow of the response production | generation process of the response control apparatus which concerns on another embodiment of this invention, and a speech processing server. 本発明のさらに別の実施形態に係る応答制御装置を含む音声応答システムの要部構成を示すブロック図である。It is a block diagram which shows the principal part structure of the voice response system containing the response control apparatus which concerns on another embodiment of this invention. 図10の応答制御装置および音声処理サーバに格納されている付加フレーズテーブルの例を示す図である。It is a figure which shows the example of the additional phrase table stored in the response control apparatus and voice processing server of FIG. 図10の応答制御装置および音声処理サーバの応答生成処理の流れを示すシーケンス図である。It is a sequence diagram which shows the flow of the response production | generation process of the response control apparatus of FIG. 10, and a speech processing server. 図10の応答制御装置の応答選択処理の流れを示すシーケンス図である。It is a sequence diagram which shows the flow of the response selection process of the response control apparatus of FIG.
 〔実施形態1〕
 以下、本発明の一実施の形態について、図1~図7を参照して説明する。ここでは、本発明の一態様に係る応答制御装置を、携帯端末1(以下、端末1と略記する)として実現した例について説明する。
Embodiment 1
Hereinafter, an embodiment of the present invention will be described with reference to FIGS. Here, an example in which the response control device according to one embodiment of the present invention is realized as the mobile terminal 1 (hereinafter, abbreviated as the terminal 1) will be described.
 先ず、図2を参照して、端末1を含む音声応答システム100の概要を説明する。図2は、音声応答システム100の概要を示す図である。図示の通り、本実施の形態に係る音声応答システム100は、端末1と音声処理サーバ2(以下、「サーバ2」と略記する)とを含む構成であり、端末1とサーバ2とは通信可能となっている。端末1は、ユーザの音声に対する応答候補フレーズ(以下、候補フレーズと略記する)を生成する処理(応答生成処理)を自ら行うとともに、サーバ2にも、端末1での応答生成処理と並行して、応答生成処理を実行させる。従って、端末での処理の後にサーバでの処理を実行するような従来の音声処理に比べ、端末1は、ユーザが発話してから該発話に対する応答をユーザが取得するまでのユーザの待ち時間を短縮できる。 First, the outline of the voice response system 100 including the terminal 1 will be described with reference to FIG. FIG. 2 is a diagram showing an outline of the voice response system 100. As illustrated, the voice response system 100 according to the present embodiment includes a terminal 1 and a voice processing server 2 (hereinafter abbreviated as “server 2”), and the terminal 1 and the server 2 can communicate with each other. It has become. The terminal 1 itself performs a process (response generation process) for generating a response candidate phrase (hereinafter abbreviated as a candidate phrase) for the user's voice, and the server 2 also performs a response generation process in the terminal 1 in parallel. The response generation process is executed. Therefore, compared with conventional voice processing in which processing at the server is executed after processing at the terminal, the terminal 1 has a longer waiting time for the user to acquire a response to the utterance after the user utters. Can be shortened.
 なお以下では、端末1が生成する候補フレーズを「候補フレーズ(A)」と、サーバ2が生成する候補フレーズを「候補フレーズ(B)」と呼ぶ。端末1は、候補フレーズ(A)と候補フレーズ(B)とを取得する。そして端末1は、上記2つの候補フレーズから、情報の重要度(応答レベル)がより高い候補フレーズを、出力すべき選択応答フレーズ(以下、選択フレーズと略記する)として選択し、該選択フレーズを出力する。例えば、ユーザが端末1に「今日の天気はなに?」と呼びかけると、端末1は、上記呼びかけに対する応答生成処理を自ら実行するとともに、サーバ2に対し、上記呼びかけに対する応答生成処理をリクエストする。 In the following, the candidate phrase generated by the terminal 1 is referred to as “candidate phrase (A)”, and the candidate phrase generated by the server 2 is referred to as “candidate phrase (B)”. The terminal 1 acquires a candidate phrase (A) and a candidate phrase (B). Then, the terminal 1 selects a candidate phrase with higher information importance (response level) from the above two candidate phrases as a selection response phrase (hereinafter abbreviated as a selection phrase) to be output, and selects the selected phrase. Output. For example, when the user calls the terminal 1 “What is the weather today?”, The terminal 1 executes a response generation process for the call itself and requests the server 2 for a response generation process for the call. .
 端末1およびサーバ2は各々、外部の情報提供サーバ98・99から第1および第2外部情報を取得し、各々の応答生成処理に利用する。なお、端末1およびサーバ2の各々の有する情報検索能力、語彙力等により、端末1とサーバ2とで、応答生成処理の結果は異なり得る。 The terminal 1 and the server 2 acquire the first and second external information from the external information providing servers 98 and 99, respectively, and use them for each response generation process. Note that the results of the response generation process may differ between the terminal 1 and the server 2 depending on the information search capability, vocabulary, etc. of each of the terminal 1 and the server 2.
 例えば、端末1は、外部の情報提供サーバ98から、「今日の天気は晴れ」との天気情報および「最高気温は○○度」との最高気温情報を第1外部情報として取得すると、「晴れだよ。最高気温は○○度だよ。」との候補フレーズ(A)を生成する。サーバ2は、外部の情報提供サーバ99から、「今日の天気は晴れ」との天気情報、「高気圧に覆われているので晴れ」との天気原因情報、および「最高気温は○○度」との最高気温情報を第2外部情報として取得すると、「晴れだよ。高気圧に覆われているからね。最高気温は○○度だよ。」との候補フレーズ(B)を生成する。そして、サーバ2は候補フレーズ(B)を端末1に通知する。端末1は、候補フレーズ(A)と候補フレーズ(B)とを比較し、情報の重要度が高い方の候補フレーズを、出力すべき選択フレーズとして選択する。 For example, when the terminal 1 acquires the weather information “Today's weather is sunny” and the highest temperature information “Highest temperature is XX degrees” from the external information providing server 98 as the first external information, A candidate phrase (A) is generated, “The maximum temperature is XX degrees.” The server 2 receives from the external information providing server 99 the weather information “Today's weather is sunny”, the weather cause information “Sunny because it is covered with high pressure”, and “The maximum temperature is XX degrees”. Is obtained as second external information, a candidate phrase (B) is generated: "It's sunny. It's covered with high pressure. The maximum temperature is OO degrees." Then, the server 2 notifies the terminal 1 of the candidate phrase (B). The terminal 1 compares the candidate phrase (A) with the candidate phrase (B), and selects the candidate phrase with the higher importance of information as the selection phrase to be output.
 図2で、端末1は、候補フレーズ(A)に含まれる天気情報と最高気温情報とに加えて、天気原因情報を含む候補フレーズ(B)を、選択フレーズとして選択し、「晴れだよ。高気圧に覆われているからね。最高気温は○○度だよ。」との選択フレーズを音声出力している。 In FIG. 2, the terminal 1 selects the candidate phrase (B) including the weather cause information in addition to the weather information and the maximum temperature information included in the candidate phrase (A) as a selection phrase, and “sunny. "It's covered with high pressure. The maximum temperature is XX degrees."
 以上に説明した端末1の概要を整理すれば、以下の通りである。すなわち、端末1は、音声に対する応答を制御する応答制御装置であって、第1応答生成部13および第2応答生成部22(複数の応答生成部)のそれぞれによって、上記音声に基づいて生成された複数の候補フレーズを取得する候補取得部141(候補フレーズ取得部)と、候補取得部141が取得した上記複数の候補フレーズから、該複数の候補フレーズのそれぞれが有する応答レベル(情報の重要度)が最も高い候補フレーズを、選択フレーズ(応答フレーズ)として選択する応答選択部142(選択部)とを備えている。 The summary of the terminal 1 described above is summarized as follows. That is, the terminal 1 is a response control device that controls a response to voice, and is generated based on the voice by the first response generation unit 13 and the second response generation unit 22 (a plurality of response generation units). From the candidate acquisition unit 141 (candidate phrase acquisition unit) that acquires a plurality of candidate phrases and the plurality of candidate phrases acquired by the candidate acquisition unit 141, each of the plurality of candidate phrases has a response level (importance of information) ) Includes a response selection unit 142 (selection unit) that selects a candidate phrase having the highest value as a selected phrase (response phrase).
 従って、端末1は、上記複数の応答生成部のそれぞれにより上記音声に対して生成された複数の候補フレーズに基づいて、適切な応答を行うことができる。すなわち、端末1は、第1応答生成部13および第2応答生成部22によって並行して生成された複数の候補フレーズから、情報の重要度が最も高い候補フレーズを選択フレーズとして選択し、該選択フレーズを出力する。端末1は、発話取得から応答出力までの時間を短縮するために複数の応答生成処理を並行して実行させ、該複数の応答生成処理の結果から、出力すべき応答を1つ選択する。 Therefore, the terminal 1 can make an appropriate response based on the plurality of candidate phrases generated for the voice by each of the plurality of response generation units. That is, the terminal 1 selects a candidate phrase having the highest importance of information as a selected phrase from a plurality of candidate phrases generated in parallel by the first response generation unit 13 and the second response generation unit 22, and selects the selection phrase. Outputs a phrase. The terminal 1 executes a plurality of response generation processes in parallel to shorten the time from utterance acquisition to response output, and selects one response to be output from the results of the plurality of response generation processes.
 本実施の形態において、上記複数の候補フレーズはそれぞれ、1個以上の基準フレーズと0個以上の付加フレーズとからなり、応答選択部142は、付加フレーズを含む候補フレーズを、付加フレーズを含まない候補フレーズよりも応答レベルが高いと判定する。 In the present embodiment, each of the plurality of candidate phrases includes one or more reference phrases and zero or more additional phrases, and the response selection unit 142 does not include candidate phrases including additional phrases. It is determined that the response level is higher than the candidate phrase.
 従って、端末1は、第1応答生成部13および第2応答生成部22が各々並行して生成した複数の候補フレーズから、付加フレーズの有無に応じて、出力すべき選択フレーズを選択する。従って、端末1は、ユーザの上記呼びかけに対する直接的な応答である基準フレーズだけでなく、付加的な応答である付加フレーズも出力できる。 Therefore, the terminal 1 selects a selection phrase to be output from a plurality of candidate phrases generated in parallel by the first response generation unit 13 and the second response generation unit 22 according to the presence or absence of an additional phrase. Therefore, the terminal 1 can output not only the reference phrase that is a direct response to the user's call but also an additional phrase that is an additional response.
 端末1は第1応答生成部13(応答生成部)を備え、自ら応答生成処理を実行する。つまり、端末1は、例えば端末1を携帯するユーザの現在位置情報等、サーバ2が取得できない情報を利用して、自ら応答生成処理を実行する。 The terminal 1 includes a first response generation unit 13 (response generation unit), and executes response generation processing by itself. That is, the terminal 1 executes the response generation process by itself using information that the server 2 cannot acquire, such as the current position information of the user who carries the terminal 1.
 なお、呼びかけ音声取得から応答までの時間を短縮するために複数の応答生成処理を並行して実行させ、該複数の応答生成処理の結果から出力すべき応答を1つ選択するのに、端末1が第1応答生成部13を備えることは必須ではない。なお、端末1は、第1応答生成部13以外の応答生成部を備える外部の装置(例えば、第2応答生成部22を備えるサーバ2)に、自らの応答生成処理に並行させて応答処理を実行させ、該外部の装置により生成された候補フレーズを取得する。詳細は後述する。 In order to shorten the time from call voice acquisition to response, a plurality of response generation processes are executed in parallel, and one response to be output is selected from the results of the plurality of response generation processes. It is not essential that the first response generator 13 is provided. In addition, the terminal 1 performs a response process in parallel with its own response generation process on an external device including a response generation unit other than the first response generation unit 13 (for example, the server 2 including the second response generation unit 22). The candidate phrase generated by the external device is acquired. Details will be described later.
  (用語説明)
 音声応答システム100の実行する「音声処理」とは、音声認識処理と応答生成処理と音声合成処理とを含む処理を指す。「音声認識処理」とは、マイク17が取得したユーザの呼びかけ音声データを、対応する文字データである呼びかけフレーズに変換する処理であり、音声データを文字データに変換する公知の音声認識処理と同様の処理であってもよい。「応答生成処理」とは、上記呼びかけフレーズに対応する文字データである候補フレーズを生成する処理である。「音声合成処理」とは、文字データである候補フレーズに対応する音声データを生成する処理であり、文字データを音声データに変換する公知の音声合成処理と同様の処理であってもよい。音声合成処理により生成された音声データは、スピーカ192から出力される。
(Glossary)
The “voice processing” executed by the voice response system 100 refers to processing including voice recognition processing, response generation processing, and voice synthesis processing. The “voice recognition process” is a process for converting user call voice data acquired by the microphone 17 into a call phrase that is corresponding character data, and is similar to a known voice recognition process for converting voice data into character data. It may be the process. The “response generation process” is a process for generating a candidate phrase that is character data corresponding to the calling phrase. The “voice synthesis process” is a process for generating voice data corresponding to a candidate phrase that is character data, and may be a process similar to a known voice synthesis process for converting character data into voice data. The voice data generated by the voice synthesis process is output from the speaker 192.
 なお、端末1は、音声認識処理と応答生成処理と音声合成処理とに加え、応答選択処理を実行する。「応答選択処理」とは、詳細は後述するが、各々並行して行われる複数の応答生成処理の結果として生成される複数の候補フレーズから、該複数の候補フレーズのそれぞれが有する情報の重要度が最も高い候補フレーズを、選択フレーズとして選択する処理である。 The terminal 1 executes a response selection process in addition to the voice recognition process, the response generation process, and the voice synthesis process. The “response selection process” is described later in detail, but the importance of information included in each of the plurality of candidate phrases from a plurality of candidate phrases generated as a result of a plurality of response generation processes performed in parallel. Is the process of selecting the candidate phrase with the highest as the selected phrase.
 「呼びかけフレーズ」とは、マイク17が取得した或る呼びかけ音声に対し、音声認識部12が音声認識処理を実行して得る、文字データを指す。上記呼びかけフレーズに対し、第1応答生成部13および第2応答生成部22が生成する応答を「候補フレーズ」と呼ぶ。上記候補フレーズは、上記呼びかけフレーズに対する直接的な回答である「基準フレーズ」を含み、また、上記呼びかけフレーズに対する付加的な回答または情報を含む「付加フレーズ」が付加されていてもよい。或る呼びかけフレーズに対し、基準フレーズおよび付加フレーズの少なくとも一方は複数あってもよい。上記候補フレーズは、「基準フレーズのみ」または「基準フレーズと1つ以上の付加フレーズとの組み合わせ」である。 The “calling phrase” refers to character data obtained by the voice recognition unit 12 executing voice recognition processing on a certain calling voice acquired by the microphone 17. In response to the call phrase, the responses generated by the first response generation unit 13 and the second response generation unit 22 are referred to as “candidate phrases”. The candidate phrase includes a “reference phrase” that is a direct answer to the call phrase, and an “addition phrase” that includes an additional answer or information to the call phrase may be added. There may be a plurality of at least one of a reference phrase and an additional phrase for a certain calling phrase. The candidate phrase is “reference phrase only” or “combination of a reference phrase and one or more additional phrases”.
 例えば、呼びかけフレーズ=「今日の天気は何?」に対し、「晴れだよ。」との基準フレーズ、「高気圧に覆われているからね。」との付加フレーズ(A-1)、「最高気温は○○度になるよ。」との付加フレーズ(A-2)が選択され得る場合、以下の候補フレーズが想定できる。すなわち、基準フレーズのみの「晴れだよ。」という候補フレーズと、基準フレーズに付加フレーズ(A-1)を加えた「晴れだよ。高気圧に覆われているからね。」という候補フレーズと、基準フレーズに付加フレーズ(A-2)を加えた「晴れだよ。最高気温は○○度になるよ。」という候補フレーズと、基準フレーズに付加フレーズ(A-1)および付加フレーズ(A-2)を加えた「晴れだよ。高気圧に覆われているからね。最高気温は○○度だよ。」という候補フレーズという4種類の候補フレーズが想定できる。 For example, in response to the call phrase = “What is the weather today?”, The additional phrase (A-1), “the best phrase”, “they are covered with high pressure”, “highest” If the additional phrase (A-2) “Temperature is OO degrees” can be selected, the following candidate phrases can be assumed. That is, the candidate phrase “It's sunny” with only the reference phrase, and the candidate phrase “It ’s sunny. It ’s covered in high pressure.” With the additional phrase (A-1) added to the reference phrase, Candidate phrases “Additional phrase (A-2) to the standard phrase,“ It ’s sunny. The maximum temperature will be XX degrees. ”And the additional phrase (A-1) and additional phrase (A- 4) Candidate phrases “candidate. Because it is covered with high pressure. The maximum temperature is OO degrees” can be assumed.
 なお、候補フレーズにおいて、基準フレーズに対し付加フレーズを付加する位置について制限はない。基準フレーズの「後」に付加フレーズを付加してもよいし、基準フレーズの「前」に付加フレーズを付加してもよい。さらに、2つ以上の付加フレーズの間に基準フレーズのある候補フレーズを生成してもよい。また、2以上の付加フレーズの前後について制限はなく、「晴れだよ。高気圧に覆われているからね。最高気温は○○度だよ。」としても、「晴れだよ。最高気温は○○度だよ。高気圧に覆われているからね。」としてもよい。 In addition, there is no restriction on the position where the additional phrase is added to the reference phrase in the candidate phrase. An additional phrase may be added “after” the reference phrase, or an additional phrase may be added “before” the reference phrase. Further, a candidate phrase having a reference phrase between two or more additional phrases may be generated. Also, there is no limit before and after two or more additional phrases, and “It ’s sunny. Because it ’s covered with high pressure. The highest temperature is ○○ degrees.”, “It ’s sunny. "It's a degree. It's covered with high pressure."
  (端末の要部構成)
 図1は、端末1およびサーバ2の要部構成を示すブロック図である。図示の通り、端末1は、第1制御部10、マイク17、第1記憶部18、および出力部19を含む構成である。
(Key terminal configuration)
FIG. 1 is a block diagram illustrating a main configuration of the terminal 1 and the server 2. As illustrated, the terminal 1 includes a first control unit 10, a microphone 17, a first storage unit 18, and an output unit 19.
 マイク17は、音声等を電気信号に変換し、音声認識部12に通知する。 The microphone 17 converts voice or the like into an electrical signal and notifies the voice recognition unit 12 of it.
 出力部19は、表示部191とスピーカ192とを含む。表示部191は、選択結果出力部143から文字データとして通知される選択フレーズを画像として出力する。スピーカ192は、音声合成部15から通知される音声データを音声として出力する。 The output unit 19 includes a display unit 191 and a speaker 192. The display unit 191 outputs the selected phrase notified as character data from the selection result output unit 143 as an image. The speaker 192 outputs the voice data notified from the voice synthesizer 15 as voice.
 第1記憶部18は、端末1が使用する各種データを格納する。第1記憶部18は、端末1の第1制御部10が実行する(1)制御プログラム、(2)OSプログラム、(3)各種機能を実行するためのアプリケーションプログラム、および、(4)該アプリケーションプログラムを実行するときに読み出す各種データを記憶する。上記の(1)~(4)のデータは、例えば、ROM(read only memory)、フラッシュメモリ、EPROM(Erasable Programmable ROM)、EEPROM(登録商標)(Electrically EPROM)、HDD(Hard Disc Drive)等の不揮発性記憶装置に記憶される。また、第1記憶部18には、第1基準フレーズテーブル181と第1付加フレーズテーブル182とが格納されている。 The first storage unit 18 stores various data used by the terminal 1. The first storage unit 18 includes (1) a control program executed by the first control unit 10 of the terminal 1, (2) an OS program, (3) an application program for executing various functions, and (4) the application. Stores various data to be read when the program is executed. The above data (1) to (4) are, for example, ROM (read only memory), flash memory, EPROM (Erasable Programmable ROM), EEPROM (registered trademark) (Electrically EPROM), HDD (Hard Disk Drive), etc. It is stored in a non-volatile storage device. The first storage unit 18 stores a first reference phrase table 181 and a first additional phrase table 182.
 第1制御部10は、音声認識処理、応答生成処理、応答選択処理、音声合成処理を含む端末1の機能を統括して制御するものであり、第1通信部11、音声認識部12、第1応答生成部13、応答制御部14、音声合成部15および第1外部情報取得部16を含む。 The first control unit 10 controls the functions of the terminal 1 including a voice recognition process, a response generation process, a response selection process, and a voice synthesis process, and controls the first communication unit 11, the voice recognition unit 12, 1 response generation unit 13, response control unit 14, speech synthesis unit 15 and first external information acquisition unit 16 are included.
 第1通信部11は、サーバ2等との通信を行う。より詳細には、第1通信部11は、(1)音声認識部12から、マイク17が取得した呼びかけ音声に対し音声認識部12が音声認識処理を実行した結果である呼びかけフレーズと、該呼びかけフレーズに対する候補フレーズを生成する処理の実行要求(応答生成処理のリクエスト)とを取得する。そして、上記呼びかけフレーズと応答生成処理のリクエストとをサーバ2へ送信する。(2)サーバ2から、第2応答生成部22の応答生成処理結果である候補フレーズ(B)を受信し、該候補フレーズを候補取得部141に通知する。(3)第1応答生成部13が応答生成処理を実行しようとする際に端末1の保持している情報以外の情報である第1外部情報が必要である場合、該第1外部情報を外部の情報提供サーバ98等から取得し、第1外部情報取得部16に通知する。 The first communication unit 11 communicates with the server 2 and the like. More specifically, the first communication unit 11 includes (1) a call phrase that is a result of the voice recognition unit 12 performing voice recognition processing on the call voice acquired by the microphone 17 from the voice recognition unit 12, and the call An execution request (response generation processing request) for generating a candidate phrase for the phrase is acquired. Then, the call phrase and the response generation process request are transmitted to the server 2. (2) The candidate phrase (B) that is the response generation processing result of the second response generation unit 22 is received from the server 2 and the candidate phrase is notified to the candidate acquisition unit 141. (3) When the first external information that is information other than the information held by the terminal 1 is necessary when the first response generation unit 13 tries to execute the response generation processing, the first external information is externally stored. From the information providing server 98 and the like, and notifies the first external information acquisition unit 16.
 音声認識部12は、音声認識処理を実行する。つまり、音声認識部12は先ず、マイク17から通知された呼びかけ音声データを文字データである呼びかけフレーズに変換する。そして、上記呼びかけフレーズと、応答生成処理のリクエストとを、第1通信部11および第1応答生成部13に通知する。音声認識部12は、音声データを文字データに変換する公知の音声認識に関する技術を利用してよく、音声認識処理そのものは従来技術を用いて可能であるので、詳細は省略する。 The voice recognition unit 12 executes a voice recognition process. That is, the voice recognition unit 12 first converts the call voice data notified from the microphone 17 into a call phrase that is character data. And the said communication phrase and the request | requirement of a response production | generation process are notified to the 1st communication part 11 and the 1st response production | generation part 13. FIG. The voice recognition unit 12 may use a known technique for voice recognition that converts voice data into character data, and the voice recognition process itself can be performed using a conventional technique, and thus the details are omitted.
 第1応答生成部13は、応答生成処理を実行する。つまり、第1応答生成部13は、音声認識部12から通知される文字データとしての呼びかけフレーズに対して候補フレーズ(A)を生成する。第1応答生成部13は、第1外部情報取得部16から通知される第1外部情報を利用して候補フレーズ(A)を生成してもよい。詳細は後述する。 The first response generation unit 13 executes response generation processing. That is, the 1st response production | generation part 13 produces | generates a candidate phrase (A) with respect to the calling phrase as character data notified from the speech recognition part 12. FIG. The first response generation unit 13 may generate the candidate phrase (A) using the first external information notified from the first external information acquisition unit 16. Details will be described later.
 応答制御部14は、候補取得部141、応答選択部142および選択結果出力部143を含む。 The response control unit 14 includes a candidate acquisition unit 141, a response selection unit 142, and a selection result output unit 143.
 候補取得部141は、第1応答生成部13から候補フレーズ(A)を、第1通信部11から第2応答生成部22が生成した候補フレーズ(B)を、取得し、取得した候補フレーズ(A)および(B)を応答選択部142に通知する。 The candidate acquisition unit 141 acquires the candidate phrase (A) from the first response generation unit 13 and the candidate phrase (B) generated by the second response generation unit 22 from the first communication unit 11 and acquires the acquired candidate phrase ( A) and (B) are notified to the response selection unit 142.
 応答選択部142は、応答選択処理を実行する。具体的には、応答選択部142は、候補取得部141から通知された候補フレーズ(A)および(B)から、それぞれの候補フレーズが有する情報の重要度(応答レベル)が高い方の候補フレーズを、出力すべき選択フレーズとして選択する。詳細は後述する。応答選択部142は、上記選択フレーズを、選択結果出力部143に通知する。 The response selection unit 142 executes response selection processing. Specifically, the response selection unit 142 selects a candidate phrase having a higher importance (response level) of information included in each candidate phrase from the candidate phrases (A) and (B) notified from the candidate acquisition unit 141. Is selected as a selection phrase to be output. Details will be described later. The response selection unit 142 notifies the selection result output unit 143 of the selected phrase.
 選択結果出力部143は、応答選択部142から通知された上記選択フレーズを、音声合成部15および表示部191に通知する。 The selection result output unit 143 notifies the voice synthesis unit 15 and the display unit 191 of the selected phrase notified from the response selection unit 142.
 音声合成部15は、音声合成処理を実行する。つまり、音声合成部15は、選択結果出力部143から通知される文字データである選択フレーズを音声データに変換し、スピーカ192に出力させる。音声合成部15は、文字データを音声データに変換する公知の音声合成に関する技術を利用してよく、音声合成処理そのものは従来技術を用いて可能であるので、詳細は省略する。 The speech synthesizer 15 executes speech synthesis processing. That is, the speech synthesizer 15 converts the selected phrase, which is character data notified from the selection result output unit 143, into speech data and causes the speaker 192 to output it. The voice synthesizer 15 may use a known technique related to voice synthesis that converts character data into voice data, and the voice synthesis process itself can be performed using conventional techniques, and thus the details are omitted.
 第1外部情報取得部16は、外部の情報提供サーバ98から、端末1の保持している情報以外の情報等である第1外部情報を取得し、該第1外部情報を第1応答生成部13に通知する。第1外部情報取得部16は、第1応答生成部13からのリクエストに応じて、第1外部情報を取得してもよい。 The first external information acquisition unit 16 acquires first external information which is information other than the information held by the terminal 1 from the external information providing server 98, and uses the first external information as a first response generation unit. 13 is notified. The first external information acquisition unit 16 may acquire the first external information in response to a request from the first response generation unit 13.
  (サーバの要部構成)
 サーバ2は、第2制御部20および第2記憶部24を含む構成である。第2記憶部24には、第2基準フレーズテーブル241と第2付加フレーズテーブル242とが格納されており、詳細は後述する。第2制御部20は、第2通信部21、第2応答生成部22および第2外部情報取得部23を含む。
(Server configuration)
The server 2 is configured to include the second control unit 20 and the second storage unit 24. The second storage unit 24 stores a second reference phrase table 241 and a second additional phrase table 242, details of which will be described later. The second control unit 20 includes a second communication unit 21, a second response generation unit 22, and a second external information acquisition unit 23.
 第2通信部21は、(1)端末1から、音声認識部12による音声認識処理の結果である呼びかけフレーズと応答生成処理のリクエストとを受信し、該呼びかけフレーズと応答生成処理のリクエストとを第2応答生成部22へ通知する。(2)第2応答生成部22から、応答生成処理結果である候補フレーズ(B)を取得し、該候補フレーズ(B)を端末1に送信する。(3)第2応答生成部22が応答生成処理を実行しようとする際にサーバ2の保持している情報以外の情報である第2外部情報が必要である場合、該第2外部情報を外部の情報提供サーバ99等から取得し、第2外部情報取得部23に通知する。 The second communication unit 21 (1) receives a call phrase and a response generation process request as a result of the voice recognition process by the voice recognition unit 12 from the terminal 1, and receives the call phrase and the response generation process request. The second response generation unit 22 is notified. (2) The candidate phrase (B) that is the response generation processing result is acquired from the second response generation unit 22, and the candidate phrase (B) is transmitted to the terminal 1. (3) If the second external information that is information other than the information held by the server 2 is required when the second response generation unit 22 tries to execute the response generation processing, the second external information is externally stored. From the information providing server 99 and the like, and notifies the second external information acquiring unit 23 of the information.
 第2応答生成部22は、応答生成処理を実行する。つまり、第2応答生成部22は、第2通信部21から通知される呼びかけフレーズに対して候補フレーズ(B)を生成する処理を行う。第2応答生成部22は、第2外部情報取得部23から通知される第2外部情報を利用して、候補フレーズ(B)を生成してもよい。詳細は後述する。 The second response generator 22 executes a response generation process. That is, the second response generation unit 22 performs a process of generating a candidate phrase (B) for the calling phrase notified from the second communication unit 21. The second response generation unit 22 may generate the candidate phrase (B) using the second external information notified from the second external information acquisition unit 23. Details will be described later.
 第2外部情報取得部23は、外部の情報提供サーバ99から、サーバ2の保持している情報以外の情報等である第2外部情報を取得し、該第2外部情報を第2応答生成部22に通知する。第2外部情報取得部23は、第2応答生成部22からのリクエストに応じて、上記第2外部情報を取得してもよい。 The second external information acquisition unit 23 acquires second external information that is information other than the information held by the server 2 from the external information providing server 99, and uses the second external information as a second response generation unit. 22 is notified. The second external information acquisition unit 23 may acquire the second external information in response to a request from the second response generation unit 22.
  (記憶部に格納されている情報)
 図3は、端末1およびサーバ2に格納されている第1基準フレーズテーブル181および第2基準フレーズテーブル241の例を示す図である。図4は、端末1およびサーバ2に格納されている第1付加フレーズテーブル182および第2付加フレーズテーブル242の例を示す図である。なお以下では、第1基準フレーズテーブル181と第2基準フレーズテーブル241とを区別する必要がない場合、両者を併せて「基準フレーズテーブル」と呼ぶ。同様に、第1付加フレーズテーブル182と第2付加フレーズテーブル242とを併せて「付加フレーズテーブル」と呼ぶ。
(Information stored in the storage unit)
FIG. 3 is a diagram illustrating an example of the first reference phrase table 181 and the second reference phrase table 241 stored in the terminal 1 and the server 2. FIG. 4 is a diagram illustrating an example of the first additional phrase table 182 and the second additional phrase table 242 stored in the terminal 1 and the server 2. Hereinafter, when it is not necessary to distinguish between the first reference phrase table 181 and the second reference phrase table 241, both are collectively referred to as a “reference phrase table”. Similarly, the first additional phrase table 182 and the second additional phrase table 242 are collectively referred to as an “addition phrase table”.
 図3の基準フレーズテーブルには、呼びかけフレーズと基準フレーズとが対応付けられている。また、基準フレーズテーブルにおいて、各呼びかけフレーズを識別するための「呼びかけID」が各呼びかけフレーズに対応付けられており、各基準フレーズを識別するための「基準ID」が各基準フレーズに対応付けられている。 The call phrase and the reference phrase are associated with each other in the reference phrase table in FIG. In the reference phrase table, “call ID” for identifying each call phrase is associated with each call phrase, and “reference ID” for identifying each reference phrase is associated with each reference phrase. ing.
 図4の付加フレーズテーブルには、基準IDと付加フレーズとが対応付けられており、各付加フレーズを識別するための「付加ID」が各付加フレーズに対応付けられている。また、各付加フレーズには、基準フレーズに付加するための条件として、「付加条件」が設定されている。第1応答生成部13および第2応答生成部22は、付加条件を満たす付加フレーズがあると、該付加フレーズを基準フレーズに付加する。なお、第1応答生成部13および第2応答生成部22が応答生成処理を行う際、基準フレーズテーブルの基準フレーズおよび付加フレーズテーブルの付加フレーズの内容は予め決められている。 4, the reference ID and the additional phrase are associated with each other, and “addition ID” for identifying each additional phrase is associated with each additional phrase. In each additional phrase, “additional conditions” are set as conditions for adding to the reference phrase. If there is an additional phrase that satisfies the additional condition, the first response generating unit 13 and the second response generating unit 22 add the additional phrase to the reference phrase. In addition, when the 1st response production | generation part 13 and the 2nd response production | generation part 22 perform a response production | generation process, the content of the reference phrase of a reference | standard phrase table and the additional phrase of an additional phrase table is decided beforehand.
 ただし、付加フレーズテーブルにおける付加フレーズの内容は、予め決められていなくともよい。図4の付加フレーズテーブルにおいて、付加ID=「3」の付加フレーズは、付加条件が「天気原因情報の取得に成功」であり、付加フレーズ(の内容)は「取得した天気原因情報による」である。これは、天気(晴れ、曇り、雨など)の原因に係る情報を外部の情報提供サーバ98・99(例えば、天気情報サーバなど)から取得できた場合に、その情報を付加フレーズとすることを示す。天気の原因には様々なものが考えられるが、その原因を予め全て付加フレーズテーブルに保持しておく必要はなく、例えば、天気情報サーバから取得した天気原因情報を付加フレーズとして、候補フレーズの生成時に利用してもよい。付加ID=「4」、「5」の付加フレーズについても同様であり、最高気温および降水確率などの情報を天気情報サーバなどから取得した場合に、付加フレーズの「○○」の箇所を取得した値に置き換えることを想定している。 However, the content of the additional phrase in the additional phrase table may not be determined in advance. In the additional phrase table of FIG. 4, the additional phrase of additional ID = “3” has an additional condition of “successful acquisition of weather cause information”, and the additional phrase (contents) is “according to acquired weather cause information”. is there. This means that when information related to the cause of weather (sunny, cloudy, rain, etc.) can be acquired from an external information providing server 98/99 (for example, a weather information server), the information is used as an additional phrase. Show. There are various possible causes of the weather, but it is not necessary to store all the causes in the additional phrase table in advance. For example, generation of candidate phrases using the weather cause information acquired from the weather information server as additional phrases It may be used sometimes. The same applies to the additional phrases of additional ID = “4” and “5”. When information such as the maximum temperature and the probability of precipitation is acquired from a weather information server or the like, the location of “○○” of the additional phrase is acquired. It is supposed to be replaced with a value.
 端末1およびサーバ2は各々、基準フレーズテーブルおよび付加フレーズテーブルを格納している。端末1およびサーバ2の各々が格納している基準フレーズテーブルおよび付加フレーズテーブルの内容は、共通であってもよいし、異なっていてもよい。 The terminal 1 and the server 2 each store a reference phrase table and an additional phrase table. The contents of the reference phrase table and the additional phrase table stored in each of the terminal 1 and the server 2 may be common or different.
 なお、端末1とサーバ2とで基準フレーズテーブルおよび付加フレーズテーブルの内容が同じであったとしても、第1応答生成部13と第2応答生成部22とが、或る呼びかけフレーズに対して常に同じ候補フレーズを生成するとは限らない。つまり、第1基準フレーズテーブル181の内容と第2基準フレーズテーブル241の内容とが同じで、かつ、第1付加フレーズテーブル182の内容と第2付加フレーズテーブル242の内容とが同じでも、例えば、以下のような事態があり得る。 Even if the contents of the reference phrase table and the additional phrase table are the same between the terminal 1 and the server 2, the first response generation unit 13 and the second response generation unit 22 always respond to a certain calling phrase. The same candidate phrase is not necessarily generated. That is, even if the contents of the first reference phrase table 181 and the contents of the second reference phrase table 241 are the same, and the contents of the first additional phrase table 182 and the contents of the second additional phrase table 242 are the same, for example, The following situations can occur:
 すなわち、サーバ2が最高気温に関する情報を取得して付加ID=「4」の付加フレーズを生成できたのに対し、端末1は最高気温に関する情報を取得できず、付加ID=「4」の付加フレーズを生成できない、というような事態である。 That is, while the server 2 can acquire the information about the maximum temperature and generate the additional phrase with the additional ID = “4”, the terminal 1 cannot acquire the information about the maximum temperature and the additional ID = “4”. This is a situation where a phrase cannot be generated.
 さらに、端末1およびサーバ2の各々が格納している基準フレーズテーブルおよび付加フレーズテーブルの内容は、以下のように異なっていてもよい。例えば、サーバ2の第2基準フレーズテーブル241および第2付加フレーズテーブル242には、付加条件等として、インターネット上の様々な情報を取得し解析する必要のあるような条件が設定されていてもよい。他方、端末1の第1基準フレーズテーブル181および第1付加フレーズテーブル182には、付加条件等として、今日の日付、および端末1を携帯するユーザの現在位置情報(端末1の備えるGPS(Global Positioning System)等で取得する現在位置情報)など、端末1のみが取得できる条件が設定されていてもよい。 Furthermore, the contents of the reference phrase table and the additional phrase table stored in each of the terminal 1 and the server 2 may be different as follows. For example, the second reference phrase table 241 and the second additional phrase table 242 of the server 2 may be set with conditions for acquiring and analyzing various information on the Internet as additional conditions. . On the other hand, the first reference phrase table 181 and the first additional phrase table 182 of the terminal 1 include the current date and the current location information of the user carrying the terminal 1 (GPS (Global Positioning of the terminal 1) as additional conditions and the like. Conditions that only the terminal 1 can acquire, such as current position information acquired by (System) etc., may be set.
  (音声処理の概要)
 図5は、端末1およびサーバ2の行う処理の概要を示すシーケンス図である。端末1の実行する音声処理の基本的な流れは、以下のように整理できる。すなわち、端末1のマイク17がユーザの呼びかけ音声を取得する(S101)と、マイク17は上記呼びかけ音声を音声データに変換し、該音声データを音声認識部12に通知する。音声認識部12は、上記音声データに対し音声認識処理を実行する(S102)。
(Outline of audio processing)
FIG. 5 is a sequence diagram illustrating an outline of processing performed by the terminal 1 and the server 2. The basic flow of audio processing executed by the terminal 1 can be organized as follows. That is, when the microphone 17 of the terminal 1 acquires the user's calling voice (S101), the microphone 17 converts the calling voice into voice data and notifies the voice recognition unit 12 of the voice data. The voice recognition unit 12 performs a voice recognition process on the voice data (S102).
 音声認識部12は、上記音声データに対し音声認識処理を実行して呼びかけフレーズを取得し、取得した該呼びかけフレーズを、応答生成処理のリクエストと共に、第1応答生成部13および第1通信部11に通知する(S103)。 The voice recognition unit 12 executes a voice recognition process on the voice data to acquire a calling phrase, and the acquired response phrase together with the request for the response generation process is used for the first response generation unit 13 and the first communication unit 11. (S103).
 音声認識部12から上記呼びかけフレーズと応答生成処理のリクエストとを通知されると、第1応答生成部13は応答生成処理を行う(S104)。そして、第1応答生成部13は生成した候補フレーズ(A)を候補取得部141に通知する。また、第1通信部11は、音声認識部12から通知された上記呼びかけフレーズと応答生成処理のリクエストとをサーバ2に送信する。サーバ2の第2通信部21は、端末1から受信した上記呼びかけフレーズと応答生成処理のリクエストとを第2応答生成部22に通知する。第2通信部21から上記呼びかけフレーズと応答生成処理のリクエストとを通知されると、第2応答生成部22は応答生成処理を行う(S104’)。第2応答生成部22は生成した候補フレーズ(B)を第2通信部21に通知し、第2通信部21は該候補フレーズ(B)を端末1に送信する。 When the voice recognition unit 12 notifies the call phrase and the request for response generation processing, the first response generation unit 13 performs response generation processing (S104). Then, the first response generation unit 13 notifies the candidate acquisition unit 141 of the generated candidate phrase (A). In addition, the first communication unit 11 transmits the call phrase notified from the voice recognition unit 12 and a request for response generation processing to the server 2. The second communication unit 21 of the server 2 notifies the second response generation unit 22 of the call phrase received from the terminal 1 and the request for response generation processing. When the second communication unit 21 notifies the call phrase and the request for response generation processing, the second response generation unit 22 performs response generation processing (S104 '). The second response generation unit 22 notifies the generated candidate phrase (B) to the second communication unit 21, and the second communication unit 21 transmits the candidate phrase (B) to the terminal 1.
 端末1の第1通信部11は、サーバ2から受信した上記候補フレーズ(B)を候補取得部141に通知する。候補取得部141は、端末1およびサーバ2の応答生成処理の結果を取得し、つまり、第1応答生成部13から候補フレーズ(A)を、第1通信部11から候補フレーズ(B)を取得する(S105)。候補取得部141は、候補フレーズ(A)および(B)を、応答選択部142に通知する。応答選択部142は、応答選択処理を実行し、つまり、候補フレーズ(A)または(B)のいずれかを選択フレーズとして選択する(S106)。 The first communication unit 11 of the terminal 1 notifies the candidate acquisition unit 141 of the candidate phrase (B) received from the server 2. The candidate acquisition unit 141 acquires the result of the response generation processing of the terminal 1 and the server 2, that is, acquires the candidate phrase (A) from the first response generation unit 13 and the candidate phrase (B) from the first communication unit 11. (S105). The candidate acquisition unit 141 notifies the response selection unit 142 of the candidate phrases (A) and (B). The response selection unit 142 executes a response selection process, that is, selects either of the candidate phrases (A) or (B) as the selected phrase (S106).
 応答選択部142は、選択フレーズとして選択した候補フレーズを選択結果出力部143に通知する。選択結果出力部143は、応答選択部142から通知された上記選択フレーズを表示部191および音声合成部15に通知する。音声合成部15は、選択結果出力部143から通知された選択フレーズに対し音声合成処理を実行し、ユーザへ応答を音声出力する(S107)。次に、応答生成処理および応答選択処理の詳細を説明する。 The response selection unit 142 notifies the selection result output unit 143 of the candidate phrase selected as the selected phrase. The selection result output unit 143 notifies the display unit 191 and the speech synthesis unit 15 of the selected phrase notified from the response selection unit 142. The voice synthesis unit 15 performs voice synthesis processing on the selected phrase notified from the selection result output unit 143, and outputs a response to the user as a voice (S107). Next, details of the response generation process and the response selection process will be described.
  (応答生成処理)
 図6は、第1応答生成部13と第2応答生成部22とが実行する応答生成処理の流れを示す図である。なお以下では、第1応答生成部13と第2応答生成部22とを区別する必要がない場合、両者を併せて「応答生成部」と呼ぶ。
(Response generation process)
FIG. 6 is a diagram illustrating a flow of response generation processing executed by the first response generation unit 13 and the second response generation unit 22. Hereinafter, when it is not necessary to distinguish between the first response generation unit 13 and the second response generation unit 22, both are collectively referred to as a “response generation unit”.
 図6に示す通り、応答生成部は、呼びかけフレーズを通知されると、先ず、基準フレーズテーブルを参照して、呼びかけフレーズに対応する基準フレーズを選択する(S201)。呼びかけフレーズに対応する基準フレーズが複数ある場合、応答生成部は、条件に合致する基準フレーズを選択する。 As shown in FIG. 6, when notified of the calling phrase, the response generation unit first selects a reference phrase corresponding to the calling phrase with reference to the reference phrase table (S201). When there are a plurality of reference phrases corresponding to the calling phrase, the response generation unit selects a reference phrase that matches the condition.
 例えば、図3に例示する基準フレーズテーブルにおいて、呼びかけID=「1」の「おはよう。」との呼びかけフレーズに対応する基準フレーズは、基準ID=「1-1」の「おはよう。」との基準フレーズのみである。従って、呼びかけID=「1」の呼びかけフレーズに対し、応答生成部は、基準フレーズテーブルを参照して、基準ID=「1-1」の基準フレーズを選択する。 For example, in the reference phrase table illustrated in FIG. 3, the reference phrase corresponding to the call phrase “Good morning” with call ID = “1” is the reference with “Good morning” with reference ID = “1-1”. There are only phrases. Therefore, for the call phrase with call ID = “1”, the response generation unit selects the reference phrase with reference ID = “1-1” with reference to the reference phrase table.
 他方、基準フレーズテーブルにおいて、呼びかけID=「2」の「今日の天気は何?」との呼びかけフレーズには、基準ID=「2-1」、「2-2」、「2-3」の3つの基準フレーズが対応付けられている。つまり、呼びかけID=「2」の呼びかけフレーズに対し、応答生成部は、基準フレーズテーブルを参照して、上記3つの基準フレーズを選択しうる。 On the other hand, in the reference phrase table, the call ID of “What is today's weather?” With the call ID = “2” has the reference ID = “2-1”, “2-2”, “2-3”. Three reference phrases are associated. That is, for the call phrase with call ID = “2”, the response generation unit can select the three reference phrases with reference to the reference phrase table.
 応答生成部は、上記3つの基準フレーズから、条件に合致する基準フレーズを選択する。具体的には、天気情報サーバなどの外部の情報提供サーバ98・99から天気情報を取得し、「今日の天気」が「晴れ」であれば、基準ID=「2-1」の「晴れだよ。」を選択する。天気情報サーバから取得した「今日の天気」が「曇り」であれば、基準ID=「2-2」の「曇りだよ。」を選択する。天気情報を取得できなければ、基準ID=「2-4」の「わからないよ。」を選択する。 The response generator selects a reference phrase that meets the conditions from the above three reference phrases. Specifically, weather information is acquired from an external information providing server 98/99 such as a weather information server, and if “Today's weather” is “Sunny”, the criteria ID = “2-1” is “Sunny” Select. If “Today's weather” acquired from the weather information server is “cloudy”, “cloudy” with reference ID = “2-2” is selected. If the weather information cannot be acquired, “I don't know” with reference ID = “2-4” is selected.
 応答生成部は、次に、図4に例示する付加フレーズテーブルを参照して、S201で選択した基準IDに対応付けられている(関連する)付加IDを選択する(S202)。 Next, the response generation unit refers to the additional phrase table illustrated in FIG. 4 and selects an additional ID associated with (related to) the reference ID selected in S201 (S202).
 例えば、基準フレーズテーブルでは、呼びかけID=「1」の呼びかけフレーズに、基準ID=「1-1」の基準フレーズが対応付けられている。そして、付加フレーズテーブルにおいて、関連する基準ID=「1-1」である付加フレーズの付加IDは、「1」および「2」である。 For example, in the reference phrase table, a reference phrase with reference ID = “1-1” is associated with a call phrase with call ID = “1”. In the additional phrase table, the additional IDs of the additional phrases having the related reference ID = “1-1” are “1” and “2”.
 従って、応答生成部は、先ず、付加ID=「1」の付加フレーズについて、付加条件が満たされているかを確認する(S203)。付加ID=「1」の付加フレーズの付加条件が満たされているのを確認できた場合(S203でYes)、付加ID=「1」の付加フレーズを、基準ID=「1-1」の基準フレーズに付加する(S204)。 Therefore, the response generation unit first confirms whether the additional condition is satisfied for the additional phrase of the additional ID = “1” (S203). If it can be confirmed that the additional condition of the additional phrase of additional ID = “1” is satisfied (Yes in S203), the additional phrase of additional ID = “1” is selected as the reference of reference ID = “1-1”. It is added to the phrase (S204).
 例えば、端末1およびサーバ2の少なくとも一方が、マイク17から取得したユーザの呼びかけ音声に基づき、該ユーザの感情(楽しい、悲しい等)に係る情報を取得できた場合、応答生成部は、付加ID=「1」の付加条件である「ユーザの感情が楽しい」が満たされているかを確認する。そして、ユーザの呼びかけ音声に基づいて「ユーザの感情が楽しい」であることを確認できた場合、応答生成部は、付加ID=「1」の付加フレーズである「今日もいいことあるといいね。」を、基準ID=「1-1」の基準フレーズに付加する。なお、ユーザの音声に基づいて該ユーザの感情を推定する技術そのものは従来技術を用いて可能であるので、説明を省略する。付加ID=「1」の付加フレーズの付加条件が満たされているのを確認できない場合(S203でNo)、付加ID=「1」の付加フレーズを基準ID=「1-1」の基準フレーズに付加せず、S205に遷移する。 For example, when at least one of the terminal 1 and the server 2 can acquire information related to the user's emotion (fun, sad, etc.) based on the user's call voice acquired from the microphone 17, the response generation unit = Check whether the additional condition of “1”, “the user's emotion is fun” is satisfied. Then, when it is confirmed that “the user's emotion is fun” based on the user's call voice, the response generation unit is “additional ID =“ 1 ”is an additional phrase“ today is good. Is added to the reference phrase of the reference ID = “1-1”. Note that the technology for estimating the user's emotion based on the user's voice can be performed using conventional technology, and thus description thereof is omitted. When it cannot be confirmed that the additional condition of the additional phrase of additional ID = “1” is satisfied (No in S203), the additional phrase of additional ID = “1” is changed to the reference phrase of reference ID = “1-1”. Without adding, the process proceeds to S205.
 応答生成部は、次に、付加フレーズテーブルを参照して、上記基準IDに対応する(関連する)、他の付加IDが無いか確認する(S205)。つまり、応答生成部は、「関連する基準ID」がS201で選択した基準IDである付加IDであって、付加条件が満たされているかを未だ確認していない付加IDが無いか、を確認する。付加条件が満たされているかを未確認の付加IDがある場合(S205でNo)、S202に戻って、付加条件が満たされているかを未確認の付加IDを選択し(S202)、S203以降の処理を繰り返す。 Next, the response generation unit refers to the additional phrase table and checks whether there is any other additional ID corresponding to (related to) the above-mentioned reference ID (S205). That is, the response generation unit confirms whether there is an additional ID whose “related reference ID” is the reference ID selected in S201 and for which an additional condition has not been confirmed yet. . If there is an additional ID that has not been confirmed whether the additional condition is satisfied (No in S205), the process returns to S202, selects an additional ID that has not been confirmed whether the additional condition is satisfied (S202), and performs the processing after S203. repeat.
 例えば、図4に例示する付加フレーズテーブルにおいて、関連する基準IDが「2-1」である付加IDは、「3」、「4」、「5」である。S201で基準ID=「2-1」を選択し、付加ID=「3」の付加条件は既に確認済み、付加ID=「4」、「5」の付加条件は未確認であれば、応答生成部は、次に、付加ID=「4」の付加条件が満たされているかを確認する。付加条件が満たされているかを確認していない付加IDがない場合(S205でYes)、S206へ遷移する。 For example, in the additional phrase table illustrated in FIG. 4, the additional IDs with the related reference ID “2-1” are “3”, “4”, and “5”. If the reference ID = “2-1” is selected in S201, the additional condition of the additional ID = “3” has already been confirmed, and the additional conditions of the additional ID = “4” and “5” have not been confirmed, the response generation unit Next, it is confirmed whether or not the additional condition of additional ID = “4” is satisfied. If there is no additional ID for which it is not confirmed whether the additional condition is satisfied (Yes in S205), the process proceeds to S206.
 例えば、呼びかけID=「3」の呼びかけフレーズに対し、基準ID=「3-3」の基準フレーズを選択した場合、応答生成部は、該基準フレーズに付加する付加フレーズを以下のように選択する。すなわち、関連する基準IDが「3-3」である付加ID=「8」の付加条件である、「今日=土用の丑の日」との付加条件が満たされるかを確認するため、応答生成部は、先ず、今日の日付を取得する。そして、今日の日付が「土曜の丑の日」でない場合、応答生成部は、S204の処理は実行しない。また、S205で、関連する基準IDが「3-3」であって、関連する基準IDが「3-3」であるその他の付加IDもないため、付加フレーズは選択されず、つまり、「付加フレーズなし」となる。 For example, when a reference phrase with reference ID = “3-3” is selected for a call phrase with call ID = “3”, the response generation unit selects an additional phrase to be added to the reference phrase as follows: . That is, in order to confirm whether or not the additional condition of “Today = Dominant Day”, which is the additional condition of the additional ID = “8” with the related reference ID “3-3”, is satisfied, First, get today's date. If today's date is not “Saturday Day”, the response generation unit does not execute the process of S204. In S205, since there is no other additional ID with the related reference ID “3-3” and the related reference ID “3-3”, the additional phrase is not selected, that is, “additional” "No phrase".
 なお、S205でループすることにより応答生成部はS204の処理を複数回実行する可能性があるが、応答生成部は、ループの都度、基準フレーズに付加する付加フレーズを上書きするのではなく、基準フレーズに付加する付加フレーズを追加する。 Although the response generation unit may execute the process of S204 multiple times by looping in S205, the response generation unit does not overwrite the additional phrase to be added to the reference phrase each time the loop is performed. Add an additional phrase to be added to the phrase.
 例えば、基準ID=「2-1」の基準フレーズを選択し、付加ID=「3」の付加条件が満たされているのを確認の後、付加ID=「4」の付加条件が満たされているのを確認すると、応答生成部は以下の処理を実行する。すなわち、付加ID=「3」の付加条件が満たされているのを確認して生成した「晴れだよ。高気圧に覆われているからね。」との候補フレーズに、「最高気温は○○度になるよ。」との付加フレーズを付加し、「晴れだよ。高気圧に覆われているからね。最高気温は○○度になるよ。」との候補フレーズを生成する。 For example, after selecting the reference phrase of reference ID = “2-1” and confirming that the additional condition of additional ID = “3” is satisfied, the additional condition of additional ID = “4” is satisfied. If it is confirmed, the response generation unit executes the following processing. That is, a candidate phrase “Sunny, because it is covered with high pressure” generated by confirming that the additional condition of additional ID = “3” is satisfied is “the highest temperature is XX Add an additional phrase, "It's a degree." And generate a candidate phrase, "It's sunny. It's covered with high pressure. The maximum temperature is XX degrees."
 また、上記説明では、応答生成部は、付加条件を未確認の付加IDがなくなるまでS205の判定を繰り返しているが、付加条件が満たされているかの確認を行った付加フレーズが所定数以上になると、S205の判定をせずにS206に遷移するとしてもよい。また、基準フレーズに付加する付加フレーズが所定数以上になると、S205の判定をせずにS206に遷移するとしてもよい。 Further, in the above description, the response generation unit repeats the determination in S205 until there is no additional ID for which the additional condition has not been confirmed. However, when the number of additional phrases that have confirmed whether the additional condition is satisfied exceeds a predetermined number. The process may transition to S206 without making the determination of S205. Further, when the number of additional phrases to be added to the reference phrase exceeds a predetermined number, the process may proceed to S206 without performing the determination of S205.
 応答生成部は、S201で選択した基準フレーズに対しS204で付加フレーズを付加して生成した候補フレーズを、候補取得部141または第2通信部21に通知する候補フレーズとして確定した後、該候補フレーズに応答レベルを付与する(S206)。すなわち、上記候補フレーズが付加フレーズを含む場合は応答レベルを「1」とし、付加フレーズを含まない場合は応答レベルを「0」とする。 The response generation unit confirms the candidate phrase generated by adding the additional phrase in S204 to the reference phrase selected in S201 as the candidate phrase to be notified to the candidate acquisition unit 141 or the second communication unit 21, and then the candidate phrase Is assigned a response level (S206). That is, when the candidate phrase includes an additional phrase, the response level is “1”, and when the candidate phrase does not include the additional phrase, the response level is “0”.
 例えば、基準フレーズが「晴れだよ。」であり、該基準フレーズに関連する付加フレーズであって、付加条件を満たす付加フレーズが「最高気温は○○度になるよ。」である場合、応答生成部は、「晴れだよ。最高気温は○○度になるよ。」との候補フレーズを生成し、該候補フレーズの応答レベルを「1」とする。他方、基準フレーズが「晴れだよ。」であり、該基準フレーズに関連する付加フレーズであって、付加条件を満たす付加フレーズがない場合、応答生成部は、「晴れだよ。」との候補フレーズを生成し、応答レベルを「0」とする。 For example, if the reference phrase is “It's sunny” and the additional phrase related to the reference phrase and the additional phrase that satisfies the additional condition is “The maximum temperature is XX degrees”, the response The generation unit generates a candidate phrase “It's sunny. The maximum temperature is OO degrees.”, And sets the response level of the candidate phrase to “1”. On the other hand, when the reference phrase is “sunny,” and there is no additional phrase related to the reference phrase and satisfying the additional condition, the response generation unit selects the candidate “sunny.” A phrase is generated and the response level is set to “0”.
 なお、上記の説明で応答生成部は、基準フレーズを決定した後、付加フレーズを決定したが、応答生成処理の手順はこれに限られるものではない。例えば、以下の順序で処理を実行してもよい。 In the above description, the response generation unit determines the additional phrase after determining the reference phrase, but the procedure of the response generation process is not limited to this. For example, the processing may be executed in the following order.
 すなわち、先ず、基準フレーズテーブルを参照して、呼びかけフレーズに対応する何れかの基準IDを取得する。その後、付加フレーズテーブルを参照して、取得した上記基準IDに対応する(関連する)付加IDの付加条件を確認する。上記付加条件が満たされていると判定した場合、取得した基準IDの基準フレーズを選択する基準フレーズとして確定する。付加条件が満たされていることを確認できない場合、基準フレーズテーブルを参照して、呼びかけフレーズに対応する別の基準IDを取得し、同様に該基準IDに対応する付加IDの付加条件が満たされているかを確認していく。 That is, first, with reference to the reference phrase table, any reference ID corresponding to the calling phrase is acquired. Thereafter, with reference to the additional phrase table, the additional condition of the additional ID corresponding to (related to) the acquired reference ID is confirmed. When it determines with the said additional conditions being satisfy | filled, it determines as a reference | standard phrase which selects the reference | standard phrase of the acquired reference | standard ID. If it cannot be confirmed that the additional condition is satisfied, the reference phrase table is referred to obtain another reference ID corresponding to the calling phrase, and the additional condition of the additional ID corresponding to the reference ID is also satisfied. I will check if it is.
 例えば、基準フレーズテーブルを参照して、呼びかけID=「3」の「今日の晩御飯は何にしよう?」に対し、先ず、基準ID=「3-1」を取得する。次に、付加フレーズテーブルを参照して、基準ID=「3-1」に対応付けられている付加ID=「6」を取得する。付加ID=「6」の付加条件である「気温<10度」が満たされているのを確認すると、基準ID=「3-1」の基準フレーズを選択する基準フレーズとして確定し、同時に、付加ID=「6」の付加フレーズを選択する付加フレーズとして確定する。 For example, referring to the reference phrase table, for the call ID = “3”, “What should I do for today's dinner?”, First, the reference ID = “3-1” is acquired. Next, with reference to the additional phrase table, the additional ID = “6” associated with the reference ID = “3-1” is acquired. When it is confirmed that the additional condition “additional ID =“ 6 ”,“ temperature <10 ° C. ”is satisfied, the reference phrase of the reference ID =“ 3-1 ”is determined as a reference phrase to be selected and added at the same time. The additional phrase with ID = “6” is determined as the additional phrase to be selected.
 他方、付加ID=「6」の付加条件が満たされていないことを確認した場合、基準ID=「3-1」の基準フレーズ、および付加ID=「6」の付加フレーズ以外の、基準フレーズおよび付加フレーズの組合せについて、選択の可否を判定する。すなわち、基準フレーズテーブルを参照して、呼びかけID=「3」に対応する、基準ID=「3-1」の次の基準IDである基準ID=「3-2」を取得し、先ほど同様、付加フレーズテーブルを参照して基準ID=「3-2」に対応する付加IDの付加条件が満たされているかを確認する。 On the other hand, if it is confirmed that the additional condition of additional ID = “6” is not satisfied, the reference phrase other than the reference phrase of reference ID = “3-1” and the additional phrase of additional ID = “6” Whether or not the combination of additional phrases can be selected is determined. That is, referring to the reference phrase table, the reference ID = “3-2”, which is the next reference ID of the reference ID = “3-1”, corresponding to the call ID = “3” is acquired. With reference to the additional phrase table, it is confirmed whether the additional condition of the additional ID corresponding to the reference ID = “3-2” is satisfied.
 なお、付加ID=「6」の付加条件が満たされていないことを確認した場合に加え、付加ID=「6」の付加条件が満たされていることを確認できない場合も、基準ID=「3-2」を取得する。 In addition to confirming that the additional condition of additional ID = “6” is not satisfied, reference ID = “3” is also used when it is not possible to confirm that the additional condition of additional ID = “6” is satisfied. -2 ".
 また、上記の説明で応答生成部は、基準フレーズテーブルおよび付加フレーズテーブルにおいて選択しうる基準フレーズおよび付加フレーズが複数ある場合、基準IDおよび付加IDが小さい順に選択するか否かを決定する。 In the above description, when there are a plurality of reference phrases and additional phrases that can be selected in the reference phrase table and the additional phrase table, the response generation unit determines whether to select the reference ID and the additional ID in ascending order.
 例えば、基準ID=「1-1」の基準フレーズに付加し得る付加フレーズの付加IDが「1」と「2」とである場合、応答生成部は、先ず、付加ID=「1」の付加条件を確認し、次に、付加ID=「2」の付加条件を確認する。 For example, when the additional IDs of the additional phrases that can be added to the reference phrase of the reference ID = “1-1” are “1” and “2”, the response generation unit first adds the additional ID = “1”. Confirm the conditions, and then confirm the additional condition of additional ID = “2”.
 しかしながら、基準IDおよび付加IDが小さい順に選択するか否かを決定することは必須ではなく、大きい順に決定してもよいし、任意の順序で決定してよい。 However, it is not essential to determine whether or not the reference ID and the additional ID are selected in ascending order, and may be determined in ascending order or in an arbitrary order.
 さらに、上記の説明では応答生成部が候補フレーズに応答レベルを付与するが、候補フレーズに応答レベルを付与するのは応答生成部でなくともよい。例えば、応答選択部142が、候補取得部141から通知される候補フレーズ(A)および(B)を解析することによって、各候補レベルに応答レベルを付与してもよい。そして、応答選択部142は付与した応答レベルが高い方の候補フレーズを選択フレーズとして選択するとしてもよい。具体的には、応答選択部142が(例えば、S211で)各候補フレーズに対し応答レベルを付与した後、応答レベルの高い候補フレーズを選択フレーズとして選択してもよい。 Furthermore, in the above description, the response generation unit gives a response level to the candidate phrase, but the response generation unit may not give the response level to the candidate phrase. For example, the response selection unit 142 may give a response level to each candidate level by analyzing the candidate phrases (A) and (B) notified from the candidate acquisition unit 141. And the response selection part 142 may select the candidate phrase with the higher assigned response level as a selection phrase. Specifically, after the response selection unit 142 gives a response level to each candidate phrase (for example, in S211), a candidate phrase with a high response level may be selected as the selected phrase.
  (応答選択処理)
 図7は、応答選択部(第1応答生成部13および第2応答生成部22)が行う応答選択処理を説明する図である。応答選択部142は、端末1が生成した候補フレーズ(A)と、サーバ2が生成した候補フレーズ(B)とから、応答レベルの高い候補フレーズをユーザへ出力するための選択フレーズとして選択する(S211)。つまり、応答選択部142は、候補フレーズ(A)と候補フレーズ(B)とから、付加フレーズを含む候補フレーズを、選択フレーズとして選択する。
(Response selection process)
FIG. 7 is a diagram illustrating a response selection process performed by the response selection unit (the first response generation unit 13 and the second response generation unit 22). The response selection unit 142 selects a candidate phrase with a high response level from the candidate phrase (A) generated by the terminal 1 and the candidate phrase (B) generated by the server 2 as a selected phrase for outputting to the user ( S211). That is, the response selection unit 142 selects a candidate phrase including an additional phrase as a selected phrase from the candidate phrase (A) and the candidate phrase (B).
 従って、端末1が生成した候補フレーズ(A)と、サーバ2が生成した候補フレーズ(B)とから、付加フレーズの有無によって、出力すべき選択フレーズを選択することによって、ユーザの呼びかけに対する直接的な応答だけでなく、付加的な応答も出力できる。 Therefore, by selecting a selection phrase to be output from the candidate phrase (A) generated by the terminal 1 and the candidate phrase (B) generated by the server 2 depending on the presence or absence of an additional phrase, direct response to the user's call is made. It can output not only simple responses but also additional responses.
 なお、候補フレーズ(A)の応答レベルと候補フレーズ(B)の応答レベルとが等しい場合、いずれの候補フレーズを選択してもよい。「候補フレーズ(A)と(B)とで応答レベルが等しい場合、候補フレーズ(A)を選択する」と予め決めておいてもよいし、逆に「候補フレーズ(B)を選択する」としてもよい。 Note that if the response level of the candidate phrase (A) is equal to the response level of the candidate phrase (B), any candidate phrase may be selected. “If candidate phrase (A) and (B) have the same response level, select candidate phrase (A)” may be determined in advance, or conversely, “select candidate phrase (B)” Also good.
 さらに、「候補フレーズ(A)と(B)とで応答レベルが等しい場合には、時間的に一番先に取得した候補フレーズを選択する」としてもよいし、逆に「時間的に一番後に取得した候補フレーズを選択する」としてもよい。 Furthermore, “if the candidate phrases (A) and (B) have the same response level, the candidate phrase acquired first in time may be selected”, or conversely “ It is good also as selecting the candidate phrase acquired later.
 音声に対する応答を制御する応答制御装置である端末1の処理の流れは以下のように整理できる。すなわち、第1応答生成部13および第2応答生成部22(複数の応答生成部)のそれぞれによって、音声に基づいて生成された複数の候補フレーズを取得するS105(候補フレーズ取得ステップ)と、S105において応答選択部142が取得した上記複数の候補フレーズから、該複数の候補フレーズのそれぞれが有する応答レベル(情報の重要度)が最も高い候補フレーズを、選択フレーズ(応答フレーズ)として選択するS106またはS211(選択ステップ)とを含む。 The process flow of the terminal 1, which is a response control device that controls responses to voice, can be organized as follows. That is, S105 (candidate phrase acquisition step) in which a plurality of candidate phrases generated based on speech are acquired by the first response generation unit 13 and the second response generation unit 22 (a plurality of response generation units), and S105, respectively. In S106, the candidate phrase having the highest response level (importance of information) of each of the plurality of candidate phrases is selected as a selected phrase (response phrase) from the plurality of candidate phrases acquired by the response selection unit 142 in S106 or S211 (selection step).
 音声を取得すると該音声に対する応答を音声または文字画像として出力する音声応答システム100は、端末1およびサーバ2の双方の音声処理を利用し、出力する応答フレーズに対するユーザの期待度を最大化する。具体的には、端末1は、端末1とサーバ2とに並行して応答生成処理を実行させることにより、端末での処理の後にサーバでの処理を実行するような従来の音声処理に比べ、呼びかけ音声取得から応答までの待ち時間を短縮できる。また、端末1は、上記複数の候補フレーズから、情報の重要度が最も高い候補フレーズを、選択フレーズとして選択し、該選択フレーズを出力できる。 When the voice is acquired, the voice response system 100 that outputs a response to the voice as voice or a character image uses the voice processing of both the terminal 1 and the server 2 to maximize the user's expectation for the output response phrase. Specifically, the terminal 1 performs a response generation process in parallel with the terminal 1 and the server 2, so that the terminal 1 performs the process on the server after the process on the terminal, The waiting time from call voice acquisition to response can be shortened. Further, the terminal 1 can select a candidate phrase having the highest importance of information from the plurality of candidate phrases as a selected phrase and output the selected phrase.
 〔実施形態2〕
 以下、本発明の他の実施形態について、図1、8および9に基づき説明する。なお、上述した各実施形態にて説明した部材と同じ機能を有する部材については、同じ符号を付記し説明を省略する。
[Embodiment 2]
Hereinafter, another embodiment of the present invention will be described with reference to FIGS. In addition, about the member which has the same function as the member demonstrated in each embodiment mentioned above, the same code | symbol is attached and description is abbreviate | omitted.
 本実施の形態に係る携帯端末1A(以下、端末1Aと略記する)の概要を説明しておけば、以下の通りである。すなわち、端末1Aは、第1記憶部18に、各付加フレーズに付加ポイントが対応付けられている第1付加フレーズテーブル182Aを格納している。また、端末1Aの応答選択部142A(選択部)は、候補フレーズに含まれる付加フレーズに設定された上記付加ポイントの合計値を、当該候補フレーズの応答レベル(情報の重要度)とする。端末1が、付加フレーズの有無に応じて、候補フレーズに応答レベルを付与したのに対し、端末1Aは、候補フレーズに含まれる付加フレーズに設定されている付加ポイントに応じて、候補フレーズに応答レベルを付与する。それ以外の点では、端末1Aの基本的な構成は、端末1の構成と同様である。 The outline of the mobile terminal 1A according to the present embodiment (hereinafter abbreviated as terminal 1A) will be described as follows. That is, the terminal 1 </ b> A stores a first additional phrase table 182 </ b> A in which additional points are associated with each additional phrase in the first storage unit 18. Further, the response selection unit 142A (selection unit) of the terminal 1A sets the total value of the additional points set in the additional phrase included in the candidate phrase as the response level (importance of information) of the candidate phrase. The terminal 1 gives a response level to the candidate phrase according to the presence / absence of the additional phrase, whereas the terminal 1A responds to the candidate phrase according to the additional point set in the additional phrase included in the candidate phrase. Grant a level. In other respects, the basic configuration of the terminal 1A is the same as the configuration of the terminal 1.
 端末1Aは、複数の候補フレーズから、各候補フレーズに含まれる付加フレーズの付加ポイントの合計値によって、出力すべき選択フレーズを選択することにより、情報の重要度の高い候補フレーズを出力できる。 The terminal 1A can output a candidate phrase with high importance of information by selecting a selected phrase to be output from a plurality of candidate phrases according to a total value of additional points of the additional phrases included in each candidate phrase.
 サーバ2Aは、第2記憶部24に、各付加フレーズに付加ポイントが対応付けられている第2付加フレーズテーブル242Aを格納している。それ以外の点では、サーバ2Aの基本的な構成は、サーバ2の構成と同様である。 The server 2A stores, in the second storage unit 24, a second additional phrase table 242A in which an additional point is associated with each additional phrase. In other respects, the basic configuration of the server 2A is the same as the configuration of the server 2.
 図1は、端末1およびサーバ2の要部構成を示すブロック図であるとともに、端末1と同様の構成を備える端末1A、および、サーバ2と同様の構成を備えるサーバ2Aの要部構成を示す。以下、さらに詳細を説明する。 FIG. 1 is a block diagram showing the main configuration of the terminal 1 and the server 2 and also shows the main configuration of the terminal 1A having the same configuration as the terminal 1 and the server 2A having the same configuration as the server 2. . Hereinafter, further details will be described.
 なお以下では、第1応答生成部13Aと第2応答生成部22Aとを区別する必要がない場合、両者を併せて「応答生成部」と呼ぶ。同様に、第1付加フレーズテーブル182Aと第2付加フレーズテーブル242Aとを「付加フレーズテーブル」と呼ぶ。 In the following, when there is no need to distinguish between the first response generation unit 13A and the second response generation unit 22A, both are collectively referred to as a “response generation unit”. Similarly, the first additional phrase table 182A and the second additional phrase table 242A are referred to as “additional phrase tables”.
 また、以下では応答生成部が応答レベルを付与する例を説明するが、応答レベルは、応答選択部142Aが候補フレーズに含まれる付加フレーズの付加ポイントを合計することによって、該候補フレーズに付与してもよい。応答選択部142Aが、候補フレーズに含まれる付加フレーズに設定された付加ポイントの合計値を該候補フレーズの応答レベルとして、該応答レベルが最も高い候補フレーズを選択フレーズとして選択できさえすればよく、応答レベルの付与はどこで行ってもよい。 In the following, an example in which the response generation unit assigns a response level will be described. The response level is given to the candidate phrase by the response selection unit 142A summing the additional points of the additional phrases included in the candidate phrase. May be. The response selection unit 142A only needs to select the total value of the additional points set in the additional phrases included in the candidate phrase as the response level of the candidate phrase and select the candidate phrase with the highest response level as the selected phrase. The response level may be assigned anywhere.
 端末1Aは、候補フレーズに含まれる付加フレーズの付加ポイント、つまり該付加フレーズが有する情報の重要度により、出力すべき選択フレーズを選択する。従って、端末1Aは、複数の候補フレーズから、情報の重要度が最も高い候補フレーズを出力できる。 The terminal 1A selects a selection phrase to be output according to the addition point of the additional phrase included in the candidate phrase, that is, the importance of the information included in the additional phrase. Therefore, the terminal 1A can output a candidate phrase having the highest importance of information from a plurality of candidate phrases.
 図8は、端末1Aに格納されている第1付加フレーズテーブル182Aおよびサーバ2Aに格納されている第2付加フレーズテーブル242Aの例を示す図である。第1付加フレーズテーブル182Aおよび第2付加フレーズテーブル242Aにおいて、付加フレーズには付加ポイントが設定されている。「付加ポイント」とは、各付加IDに設定されているポイントであり、各付加フレーズが有する情報の重要度を示す。 FIG. 8 is a diagram illustrating an example of the first additional phrase table 182A stored in the terminal 1A and the second additional phrase table 242A stored in the server 2A. In the first additional phrase table 182A and the second additional phrase table 242A, additional points are set for the additional phrases. “Additional point” is a point set for each additional ID, and indicates the importance of information included in each additional phrase.
 本実施の形態において、各候補フレーズの応答レベルは、各候補フレーズに含まれる付加フレーズに設定された付加ポイントの合計値である。従って、付加フレーズの付加されていない、基準フレーズのみの候補フレーズの応答レベルは「0」である。応答生成部は、基準フレーズに付加フレーズを付加する都度、該基準フレーズを含む候補フレーズの応答レベルに、付加した付加フレーズに設定されている付加ポイントを加算していく。 In the present embodiment, the response level of each candidate phrase is the total value of the additional points set for the additional phrases included in each candidate phrase. Therefore, the response level of the candidate phrase with only the reference phrase, to which no additional phrase is added, is “0”. Each time an additional phrase is added to the reference phrase, the response generation unit adds the additional point set for the added additional phrase to the response level of the candidate phrase including the reference phrase.
 なお、付加ポイントは、全付加フレーズで同じでもよいし、付加フレーズ毎に異なってもよい。全付加フレーズの付加ポイントが同じ場合、応答生成部または応答選択部142Aは、候補フレーズに含まれる付加フレーズの個数に応じて該候補フレーズの応答レベルを設定する。付加ID毎に付加ポイントが異なる場合、応答生成部または応答選択部142Aは、候補フレーズが含む付加フレーズの個数に、各付加フレーズの付加ポイントによる重み付けをして、該候補フレーズの応答レベルを決定する。 The additional points may be the same for all additional phrases, or may differ for each additional phrase. When the addition points of all the additional phrases are the same, the response generation unit or the response selection unit 142A sets the response level of the candidate phrase according to the number of additional phrases included in the candidate phrase. When the additional points differ for each additional ID, the response generation unit or the response selection unit 142A determines the response level of the candidate phrase by weighting the number of additional phrases included in the candidate phrase by the additional point of each additional phrase. To do.
 例えば、図示の付加フレーズテーブルにおいて、付加ID=「8」の付加フレーズの付加ポイント=「2」は、付加ID=「7」の付加フレーズの付加ポイント=「1」より大きい。付加ID=「7」の付加フレーズの付加条件が「気温>30度」であり、付加ID=「8」の付加フレーズの付加条件は「今日=土用の丑の日」である。付加ID=「8」の付加条件は、「今日というその日」についての条件であり、付加ID=「7」の付加条件である「気温」についての条件よりも条件として限定的であり、付加ID=「8」の方が、付加ID=「7」より付加ポイントがより高い。このように、付加条件の満たし難さに応じて、付加ポイントが設定されてもよい。 For example, in the illustrated additional phrase table, the additional point of the additional phrase of additional ID = “8” = “2” is larger than the additional point of additional phrase of additional ID = “7” = “1”. The additional condition of the additional phrase with the additional ID = “7” is “temperature> 30 degrees”, and the additional condition of the additional phrase with the additional ID = “8” is “today = soil-making day”. The additional condition of the additional ID = “8” is a condition for “the day that is today”, and is more limited as a condition than the condition for the “temperature” that is the additional condition of the additional ID = “7”. = “8” has a higher additional point than additional ID = “7”. Thus, an additional point may be set according to the difficulty of satisfying the additional condition.
 図9は、端末1Aおよびサーバ2Aの応答生成処理の流れを示すシーケンス図である。図9の応答生成処理は、図6の応答生成処理と比べて、S204とS205との間に、S301の処理が追加されている点が異なる。すなわち、S301において応答生成部は、S204において基準フレーズに付加した付加フレーズ(付加ID)の付加ポイントを、応答レベルに加算する。 FIG. 9 is a sequence diagram showing the flow of response generation processing of the terminal 1A and the server 2A. 9 differs from the response generation process of FIG. 6 in that the process of S301 is added between S204 and S205. That is, in S301, the response generation unit adds the additional point of the additional phrase (addition ID) added to the reference phrase in S204 to the response level.
 また、図9の応答生成処理は、図6の応答生成処理におけるS206に代えて、S306の処理を含む。すなわち、S306において応答生成部は、S201で選択した基準フレーズに対しS204で付加フレーズを付加して生成した候補フレーズを、候補取得部141Aまたは第2通信部21に通知する候補フレーズとして確定する。また応答生成部は、該候補フレーズに含まれる付加フレーズの付加ポイントの合計値を、該候補フレーズの応答レベルとして確定する。そして応答生成部は、確定した候補フレーズおよび該候補フレーズの応答レベルを候補取得部141Aまたは第2通信部21に通知する。 9 includes the process of S306 instead of S206 in the response generation process of FIG. That is, in S306, the response generation unit determines the candidate phrase generated by adding the additional phrase in S204 to the reference phrase selected in S201 as the candidate phrase to be notified to the candidate acquisition unit 141A or the second communication unit 21. The response generation unit determines the total value of the additional points of the additional phrases included in the candidate phrase as the response level of the candidate phrase. The response generation unit notifies the candidate acquisition unit 141A or the second communication unit 21 of the confirmed candidate phrase and the response level of the candidate phrase.
 候補取得部141Aは、第1応答生成部13Aが生成した候補フレーズ(A)および該候補フレーズ(A)の応答レベルと、第2応答生成部22Aが生成した候補フレーズ(B)および該候補フレーズ(B)の応答レベルとを、第1応答生成部13Aおよび第1通信部11から取得する。そして候補取得部141Aは、それらを応答選択部142に通知する。 The candidate acquisition unit 141A includes the candidate phrase (A) generated by the first response generation unit 13A and the response level of the candidate phrase (A), the candidate phrase (B) generated by the second response generation unit 22A, and the candidate phrase. The response level of (B) is acquired from the first response generation unit 13A and the first communication unit 11. Then, the candidate acquisition unit 141A notifies them to the response selection unit 142.
 応答選択部142は、候補フレーズ(A)および(B)のそれぞれについて、候補フレーズ(A)および(B)のそれぞれに含まれる付加フレーズに設定された付加ポイントの合計値を、候補フレーズ(A)および(B)のそれぞれの応答レベル(重要度)として、該応答レベルの高い方の候補フレーズを、選択フレーズとして選択する。 For each of the candidate phrases (A) and (B), the response selection unit 142 determines the total value of the additional points set in the additional phrases included in each of the candidate phrases (A) and (B) as the candidate phrase (A ) And (B) as the respective response levels (importance), the candidate phrase with the higher response level is selected as the selected phrase.
 図8に例示する付加フレーズテーブルにおいて、付加ID=「4」の「最高気温は○○度になるよ。」との付加フレーズの付加ポイントは「1」である。また、付加ID=「5」の「降水確率は○○%だよ。」との付加フレーズの付加ポイントは「1」である。従って、呼びかけID=「2」の呼びかけフレーズに対して、第1応答生成部13Aが「晴れだよ。最高気温は○○度になるよ。」との候補フレーズ(A)を生成する場合、該候補フレーズ(A)の応答レベルは「1」である。他方、第2応答生成部22Aが「晴れだよ。最高気温は○○度になるよ。降水確率は○○%だよ。」という候補フレーズ(B)を生成する場合、該候補フレーズ(B)の応答レベルは「2」である。候補取得部141Aは、候補フレーズ(B)の応答レベル=2が、候補フレーズ(A)の応答レベル=1よりも大きいため、候補フレーズ(B)を、出力すべき選択フレーズとして選択する。 In the additional phrase table illustrated in FIG. 8, the additional point of the additional phrase “1 is the highest temperature is XX degrees” with the additional ID = “4” is “1”. Further, the additional point of the additional phrase of “additional ID =“ 5 ”“ the probability of precipitation is OO% ”is“ 1 ”. Therefore, when the first response generation unit 13A generates a candidate phrase (A) for the call phrase of call ID = “2”, “It's sunny. The maximum temperature is OO degrees.” The response level of the candidate phrase (A) is “1”. On the other hand, when the second response generation unit 22A generates a candidate phrase (B) “It is sunny. The maximum temperature is XX degrees. The precipitation probability is XX%.” ) Is “2”. Since the response level = 2 of the candidate phrase (B) is higher than the response level = 1 of the candidate phrase (A), the candidate acquisition unit 141A selects the candidate phrase (B) as a selection phrase to be output.
 〔実施形態3〕
 以下、本発明の他の実施形態について、図10~13に基づき説明する。なお、上述した各実施形態にて説明した部材と同じ機能を有する部材については、同じ符号を付記し説明を省略する。
[Embodiment 3]
Hereinafter, another embodiment of the present invention will be described with reference to FIGS. In addition, about the member which has the same function as the member demonstrated in each embodiment mentioned above, the same code | symbol is attached and description is abbreviate | omitted.
 図10は、本実施の形態に係る応答制御装置である携帯端末3(以下、端末3と略記する)を含む音声応答システム300の要部構成を示すブロック図である。端末3の概要を説明しておけば、以下の通りである。 FIG. 10 is a block diagram showing a main configuration of a voice response system 300 including a mobile terminal 3 (hereinafter abbreviated as terminal 3) which is a response control apparatus according to the present embodiment. The outline of the terminal 3 will be described as follows.
 すなわち、端末3は、第1記憶部18に、各付加フレーズにカテゴリが設定されている第1付加フレーズテーブル183を格納している。また、端末3は、応答選択部142A(選択部)によって選択されなかった候補フレーズであって、応答選択部142Aによって選択された選択フレーズ(応答フレーズ)に含まれる基準フレーズと同内容の基準フレーズを含む候補フレーズが、上記選択フレーズに含まれる付加フレーズに設定されたカテゴリとは異なるカテゴリが設定された付加フレーズを含む場合、該付加フレーズを、上記選択フレーズに付加するフレーズ追加部341を備える。なお、第1付加フレーズテーブル183と第2付加フレーズテーブル243とを区別する必要がない場合、両者を併せて「付加フレーズテーブル」と呼ぶ。 That is, the terminal 3 stores a first additional phrase table 183 in which a category is set for each additional phrase in the first storage unit 18. The terminal 3 is a candidate phrase that has not been selected by the response selection unit 142A (selection unit) and has the same content as the reference phrase included in the selection phrase (response phrase) selected by the response selection unit 142A. When the candidate phrase including the phrase includes an additional phrase in which a category different from the category set in the additional phrase included in the selected phrase is included, the phrase adding unit 341 that adds the additional phrase to the selected phrase is provided. . In addition, when it is not necessary to distinguish the 1st addition phrase table 183 and the 2nd addition phrase table 243, both are collectively called an "addition phrase table."
 端末3は、応答選択部142Aによって選択フレーズとして選択されなかった候補フレーズに含まれる付加フレーズを、該選択フレーズに付加することができる。従って、端末3は、単一の応答生成処理のみでは生成できないフレーズを、例えば、第1応答生成部13Aまたは第2応答生成部22Aのみでは生成できないフレーズを、出力できる。また、以下では応答生成部が応答レベルを付与する例を説明するが、応答レベルは、応答選択部142Aが候補フレーズに含まれる付加フレーズの付加ポイントを合計して、該候補フレーズに付与してもよい。 The terminal 3 can add an additional phrase included in the candidate phrase that has not been selected as the selected phrase by the response selecting unit 142A to the selected phrase. Therefore, the terminal 3 can output a phrase that cannot be generated only by a single response generation process, for example, a phrase that cannot be generated only by the first response generation unit 13A or the second response generation unit 22A. In addition, an example in which the response generation unit assigns a response level will be described below, but the response selection unit 142A adds the additional points of the additional phrases included in the candidate phrase, and assigns the response level to the candidate phrase. Also good.
 図11は、端末3および音声処理サーバ2に格納されている付加フレーズテーブルの例を示す図である。図示のように、付加フレーズテーブルにおいて、各付加フレーズにはカテゴリが対応付けられている。カテゴリは、付加フレーズがどのような付加情報に関するかを示す。 FIG. 11 is a diagram illustrating an example of an additional phrase table stored in the terminal 3 and the voice processing server 2. As illustrated, in the additional phrase table, a category is associated with each additional phrase. The category indicates what additional information the additional phrase relates to.
 例えば、図11の付加フレーズテーブルにおいて、付加ID=「1」の「今日もいいことあるといいね。」のカテゴリは「感情」である。これは、「今日もいいことあるといいね。」との付加フレーズは、「感情」という付加情報に関することを示す。 For example, in the additional phrase table of FIG. 11, the category of “I wish to be good today” with additional ID = “1” is “emotion”. This indicates that the additional phrase “I hope there is something good today” relates to the additional information “emotion”.
 次に、端末3の実行する処理の流れを図12・13を用いて説明する。図12は、端末3および音声処理サーバ2の応答生成処理の流れを示すシーケンス図である。端末3および音声処理サーバ2の応答生成部は、候補フレーズを生成して該候補フレーズに対し応答レベルを付与するのに加えて、該候補フレーズのカテゴリを決定する。 Next, the flow of processing executed by the terminal 3 will be described with reference to FIGS. FIG. 12 is a sequence diagram showing the flow of response generation processing of the terminal 3 and the voice processing server 2. The response generation unit of the terminal 3 and the voice processing server 2 determines a category of the candidate phrase in addition to generating a candidate phrase and assigning a response level to the candidate phrase.
 図12の応答生成処理は、図9の応答生成処理におけるS306に代えて、S406の処理を含む。すなわち、S406において応答生成部は、S201において決定した基準フレーズと、S204において決定した付加フレーズとから候補フレーズを生成する。S406において応答生成部は、S201で選択した基準フレーズに対しS204で付加フレーズを付加して生成した候補フレーズを、候補取得部141Aまたは第2通信部21に通知する候補フレーズとして確定する。応答生成部は、上記候補フレーズに含まれる付加フレーズの付加ポイントの合計値を、該候補フレーズの応答レベルとして確定する。さらに応答生成部は、上記候補フレーズに含まれる付加フレーズのカテゴリを、該候補フレーズのカテゴリとして確定する。 12 includes the process of S406 instead of S306 in the response generation process of FIG. That is, in S406, the response generation unit generates a candidate phrase from the reference phrase determined in S201 and the additional phrase determined in S204. In S406, the response generation unit determines the candidate phrase generated by adding the additional phrase in S204 to the reference phrase selected in S201 as the candidate phrase to be notified to the candidate acquisition unit 141A or the second communication unit 21. The response generation unit determines the total value of the additional points of the additional phrases included in the candidate phrase as the response level of the candidate phrase. Further, the response generation unit determines the category of the additional phrase included in the candidate phrase as the category of the candidate phrase.
 例えば、S201で「晴れだよ。」を基準フレーズとして選択し、S204で「最高気温は○○度になるよ。」を付加フレーズとして選択した場合、応答生成部は、「晴れだよ。最高気温は○○度になるよ。」との候補フレーズを生成する。そして、S406において応答生成部は、「晴れだよ。最高気温は○○度になるよ。」との候補フレーズのカテゴリを、「最高気温は○○度になるよ。」との付加フレーズのカテゴリである「最高気温」に確定する。 For example, when “sunny” is selected as a reference phrase in S201 and “highest temperature is XX degrees” is selected as an additional phrase in S204, the response generation unit displays “sunny. A candidate phrase “The temperature will be XX degrees” is generated. In S <b> 406, the response generation unit sets the category of the candidate phrase “It's sunny. The maximum temperature will be XX degrees.” And the additional phrase “The maximum temperature will be XX degrees.” The category is “Highest temperature”.
 同様に、「晴れだよ。最高気温は○○度になるよ。降水確率は○○%だよ。」との候補フレーズを生成した場合、応答生成部は、該候補フレーズのカテゴリを、「最高気温」と「降水確率」とに確定する。つまり、第1応答生成部13Aが「晴れだよ。高気圧に覆われているからね。降水確率は○○%だよ。」との候補フレーズ(A)を生成した場合、該候補フレーズ(A)の応答レベルは「2」、カテゴリは「天気理由、降水確率」である。第2応答生成部22Aが「晴れだよ。最高気温は○○度になるよ。」との候補フレーズ(B)を生成した場合、該候補フレーズ(B)の応答レベルは「1」、カテゴリは「最高気温」である。応答生成部は、確定した候補フレーズ、該候補フレーズの応答レベルおよびカテゴリを候補取得部141Aまたは第2通信部21に通知する。 Similarly, when generating a candidate phrase “It's sunny. The maximum temperature is XX degrees. Precipitation probability is XX%.”, The response generation unit sets the category of the candidate phrase as “ The maximum temperature and the probability of precipitation are fixed. That is, when the first response generation unit 13A generates a candidate phrase (A) that says “It is sunny. Because it is covered with high pressure. The probability of precipitation is XX%.” ) Is “2”, and the category is “reason for weather, probability of precipitation”. When the second response generation unit 22A generates a candidate phrase (B) saying “It's sunny. The maximum temperature is XX degrees.”, The response level of the candidate phrase (B) is “1”, category Is the “highest temperature”. The response generation unit notifies the candidate acquisition unit 141A or the second communication unit 21 of the confirmed candidate phrase, the response level and category of the candidate phrase.
 図13は端末3の応答選択処理の流れを示すシーケンス図である。応答選択部142Aは、応答レベル、つまり付加ポイントの合計値の高い候補フレーズを、選択フレーズとして選択する(S411)。 FIG. 13 is a sequence diagram showing a flow of response selection processing of the terminal 3. 142 A of response selection parts select a candidate phrase with a high response level, ie, the total value of an additional point, as a selection phrase (S411).
 例えば、応答レベル=「2」である「晴れだよ。高気圧に覆われているからね。降水確率は○○%だよ。」との候補フレーズ(A)と、応答レベル=「1」である「晴れだよ。最高気温は○○度になるよ。」との候補フレーズ(B)とを取得すると、応答選択部142Aは候補フレーズ(A)を選択フレーズとして選択する。応答選択部142Aは、候補フレーズ(A)と候補フレーズ(B)とを、どちらの候補フレーズを選択フレーズとして選択したかの情報と一緒に、フレーズ追加部341に通知する。フレーズ追加部341は、選択フレーズとして選択されなかった候補フレーズであって、選択フレーズとして選択された候補フレーズ(A)に含まれる基準フレーズ(A-0)と同内容の基準フレーズを含む候補フレーズがあるか確認する(S412)。 For example, when the response level is “2”, “Sunny, because it is covered with high pressure. The probability of precipitation is XX%” and the response level is “1”. Upon obtaining a candidate phrase (B) that says “It's sunny. The maximum temperature will be XX degrees.”, The response selection unit 142A selects the candidate phrase (A) as the selected phrase. 142 A of response selection parts notify a candidate phrase (A) and a candidate phrase (B) to the phrase addition part 341 with the information of which candidate phrase was selected as a selection phrase. The phrase adding unit 341 is a candidate phrase that is not selected as the selected phrase and includes a reference phrase having the same content as the reference phrase (A-0) included in the candidate phrase (A) selected as the selected phrase. It is confirmed whether there is any (S412).
 具体的には、フレーズ追加部341は、先ず、選択フレーズとして選択された候補フレーズ(A)に含まれる基準フレーズ(A-0)を抽出する。次に、選択フレーズとして選択されなかった候補フレーズ(B)の基準フレーズ(B-0)を抽出する。そして、フレーズ追加部341は、基準フレーズ(A-0)と基準フレーズ(B-0)とが一致するか(同内容か)を判定する。 Specifically, the phrase adding unit 341 first extracts the reference phrase (A-0) included in the candidate phrase (A) selected as the selected phrase. Next, the reference phrase (B-0) of the candidate phrase (B) not selected as the selected phrase is extracted. Then, the phrase adding unit 341 determines whether the reference phrase (A-0) and the reference phrase (B-0) match (has the same content).
 なお、基準フレーズ(A-0)と基準フレーズ(B-0)とが一致するかの判定は、「一言一句同じか」という判定ではなく、該2つの基準フレーズが特定の語を含むか否かという判定でもよい。つまり、例えば、基準フレーズ(A-0)と基準フレーズ(B-0)とが同じ「晴れ」という語を含む場合には、基準フレーズ(A-0)と基準フレーズ(B-0)とは一致していると判定してもよい。 Note that whether the reference phrase (A-0) and the reference phrase (B-0) match is not a determination of whether each phrase is the same, but whether the two reference phrases include a specific word. It may be determined whether or not. That is, for example, when the reference phrase (A-0) and the reference phrase (B-0) contain the same word “sunny”, the reference phrase (A-0) and the reference phrase (B-0) It may be determined that they match.
 例えば、「晴れだよ。高気圧に覆われているからね。降水確率は○○%だよ。」との候補フレーズ(A)の基準フレーズ(A-0)=「晴れだよ。」と、「晴れだよ。最高気温は○○度になるよ。」との候補フレーズ(B)の基準フレーズ(B-0)=「晴れだよ。」とが同内容であるかを判定する。基準フレーズ(A-0)と同内容の基準フレーズを含む、候補フレーズ(A)以外の候補フレーズがある場合(S412でYes)、フレーズ追加部341は、選択フレーズとして選択されなかった候補フレーズであって、基準フレーズ(A-0)と同内容の基準フレーズを含む候補フレーズ(B)を取得する(S413)。 For example, the standard phrase (A-0) = “It's sunny.” For the candidate phrase (A), “It ’s sunny. It ’s covered with high pressure. The probability of precipitation is XX%.” It is determined whether the reference phrase (B-0) = “It's sunny” of the candidate phrase (B) = “It ’s sunny. When there is a candidate phrase other than the candidate phrase (A) that includes the same phrase as the reference phrase (A-0) (Yes in S412), the phrase adding unit 341 selects the candidate phrase that has not been selected as the selected phrase. Then, a candidate phrase (B) including a reference phrase having the same content as the reference phrase (A-0) is acquired (S413).
 例えば、基準フレーズ(A-0)=「晴れだよ。」と、基準フレーズ(B-0)=「晴れだよ。」とが同内容であると判定すると、フレーズ追加部341は、「晴れだよ。最高気温は○○度になるよ。」との候補フレーズ(B)を取得する。 For example, when it is determined that the reference phrase (A-0) = “It's sunny” and the reference phrase (B-0) = “It's sunny” are the same, the phrase adding unit 341 Get the candidate phrase (B) "The maximum temperature is XX degrees."
 基準フレーズ(A-0)と同内容の基準フレーズを含む、候補フレーズ(A)以外の候補フレーズがない場合(S412でNo)、フレーズ追加部341は新たな付加フレーズを候補フレーズ(A)には追加せず、処理を終了する。 When there is no candidate phrase other than the candidate phrase (A) including the same phrase as the reference phrase (A-0) (No in S412), the phrase adding unit 341 selects a new added phrase as the candidate phrase (A). Is not added, and the process is terminated.
 例えば、選択フレーズとして選択しなかった候補フレーズに含まれる基準フレーズ(B-0)が「雨だよ。」であり、選択フレーズとして選択した候補フレーズに含まれる基準フレーズ(A-0)である「晴れだよ。」である場合、フレーズ追加部341は、基準フレーズ(A-0)と基準フレーズ(B-0)とは一致しないと判定し、処理を終了する。 For example, the reference phrase (B-0) included in the candidate phrase that has not been selected as the selected phrase is “rain is,” and is the reference phrase (A-0) included in the candidate phrase selected as the selected phrase. If it is “clear,” the phrase adding unit 341 determines that the reference phrase (A-0) and the reference phrase (B-0) do not match, and ends the process.
 フレーズ追加部341は、候補フレーズ(B)が、候補フレーズ(A)のカテゴリとは一致しないカテゴリを含むかを判定する(S414)。例えば、「晴れだよ。高気圧に覆われているからね。降水確率は○○%だよ。」との候補フレーズ(A)と、「晴れだよ。最高気温は○○度になるよ。」との候補フレーズ(B)とについて、フレーズ追加部341は、候補フレーズ(A)のカテゴリ=「天気理由、降水確率」と、候補フレーズ(B)のカテゴリ=「最高気温」とは一致しないと判定する。 The phrase adding unit 341 determines whether the candidate phrase (B) includes a category that does not match the category of the candidate phrase (A) (S414). For example, the candidate phrase (A) “It ’s sunny. Because it ’s covered with high pressure. The probability of precipitation is XX%.” And “It ’s sunny. The maximum temperature is XX degrees. ”For the candidate phrase (B), the category of the candidate phrase (A) =“ weather reason, probability of precipitation ”does not match the category of the candidate phrase (B) =“ highest temperature ” Is determined.
 候補フレーズ(B)が候補フレーズ(A)のカテゴリとは一致しないカテゴリを含む場合(S414でYes)、フレーズ追加部341は、候補フレーズ(B)に含まれる付加フレーズであって、候補フレーズ(A)のカテゴリとは一致しないカテゴリに対応する付加フレーズ(B-1)を取得する(S415)。 When the candidate phrase (B) includes a category that does not match the category of the candidate phrase (A) (Yes in S414), the phrase adding unit 341 is an additional phrase included in the candidate phrase (B), and the candidate phrase ( An additional phrase (B-1) corresponding to a category that does not match the category of A) is acquired (S415).
 例えば、カテゴリ=「天気理由、降水確率」である「晴れだよ。高気圧に覆われているからね。降水確率は○○%だよ。」との候補フレーズ(A)と、カテゴリ=「最高気温」である「晴れだよ。最高気温は○○度になるよ。」との候補フレーズ(B)とを比較し、フレーズ追加部341は、先ず、候補フレーズ(B)は、候補フレーズ(A)と異なり、「最高気温」とのカテゴリを含むことを確認する。次に、フレーズ追加部341は、候補フレーズ(B)から、カテゴリ=「最高気温」に対応する付加フレーズ=「最高気温は○○度になるよ。」を抽出し取得する。フレーズ追加部341は、付加フレーズ(B-1)を候補フレーズ(A)に付加する(S416)。 For example, category = “reason for weather, probability of precipitation” “Sunny, because it is covered with high pressure. The probability of precipitation is XX%” and category = “highest” The phrase adding unit 341 first compares the candidate phrase (B) with the candidate phrase (B), which is “temperature”. Unlike A), it is confirmed that the category “maximum temperature” is included. Next, the phrase adding unit 341 extracts and acquires the additional phrase corresponding to the category = “maximum temperature” = “the maximum temperature will be OO degrees” from the candidate phrase (B). The phrase adding unit 341 adds the additional phrase (B-1) to the candidate phrase (A) (S416).
 例えば、フレーズ追加部341は、付加フレーズ=「最高気温は○○度になるよ。」を、「晴れだよ。高気圧に覆われているからね。降水確率は○○%だよ。」との候補フレーズ(A)に付加し、「晴れだよ。高気圧に覆われているからね。降水確率は○○%だよ。最高気温は○○になるよ。」とのフレーズを生成する。そして、フレーズ追加部341は、「晴れだよ。高気圧に覆われているからね。降水確率は○○%だよ。最高気温は○○になるよ。」とのフレーズを、選択結果出力部143Aに通知する。候補フレーズ(B)が候補フレーズ(A)のカテゴリとは一致しないカテゴリを含まない場合(S414でNo)、フレーズ追加部341は新たな付加フレーズを候補フレーズ(A)には追加せず、処理を終了する。 For example, the phrase adding unit 341 adds an additional phrase = “the maximum temperature is XX degrees”, “sunny. It is covered with high pressure. The probability of precipitation is XX%.” Is added to the candidate phrase (A), and the phrase “It's sunny. Because it is covered with high pressure. The probability of precipitation is XX%. The maximum temperature is XX.” Is generated. Then, the phrase adding unit 341 selects a phrase “selection output unit with a phrase“ It's sunny. Because it is covered with high pressure. The probability of precipitation is XX%. The maximum temperature is XX. ” Notify 143A. When the candidate phrase (B) does not include a category that does not match the category of the candidate phrase (A) (No in S414), the phrase adding unit 341 does not add a new additional phrase to the candidate phrase (A) and performs processing. Exit.
 端末3は、各候補フレーズのカテゴリに基づき、選択フレーズとして選択した候補フレーズに新たな付加フレーズを付加して出力する。例えば、第1応答生成部13Aで生成した候補フレーズ(A)に、第2応答生成部22Aで選択した付加フレーズ(B-1)を付加する。従って、端末3は、第1応答生成部13Aまたは第2応答生成部22Aのみでは生成できないフレーズを出力できる。 Terminal 3 adds a new additional phrase to the candidate phrase selected as the selected phrase based on the category of each candidate phrase, and outputs the result. For example, the additional phrase (B-1) selected by the second response generation unit 22A is added to the candidate phrase (A) generated by the first response generation unit 13A. Accordingly, the terminal 3 can output a phrase that cannot be generated only by the first response generation unit 13A or the second response generation unit 22A.
 なお、端末3は、或る候補フレーズを出力した後に別の候補フレーズを付加して出力できる。例えば、ネットワーク通信の遅延等でサーバ2からの応答が一定の閾値以上に遅延した場合など、既に先の応答を出力した後であっても、異なるカテゴリの付加フレーズが後の応答に含まれていた場合、先の応答に後の付加フレーズを付加できる。 The terminal 3 can output a candidate phrase after adding another candidate phrase. For example, when the response from the server 2 is delayed beyond a certain threshold due to a delay in network communication or the like, an additional phrase of a different category is included in the later response even after the previous response has already been output. If this is the case, a later additional phrase can be added to the previous response.
 基本的な処理の手順は、図13に示した応答選択処理と同等である。すなわち、先ず、端末3は、出力しようとしている候補フレーズ(A)をいったん第1記憶部18に格納してから、候補フレーズ(A)を出力する。その後、未だ出力していない候補フレーズであって、既に出力した選択フレーズの基礎フレーズと同様の基礎フレーズを含む候補フレーズ(B)を取得すると、端末3は、該候補フレーズ(B)のカテゴリを、上記候補フレーズ(A)のカテゴリと比較する。そして、上記候補フレーズ(B)のカテゴリが、上記候補フレーズ(A)のカテゴリと異なる場合、端末3は、上記候補フレーズ(B)に含まれる付加フレーズであって、上記候補フレーズ(A)のカテゴリとは異なるカテゴリに対応する付加フレーズを取得し、出力する。つまり端末3は、既に出力した候補フレーズに対して新しい付加フレーズを追加したフレーズを出力できる。 The basic processing procedure is the same as the response selection processing shown in FIG. That is, first, the terminal 3 temporarily stores the candidate phrase (A) to be output in the first storage unit 18 and then outputs the candidate phrase (A). Thereafter, when a candidate phrase (B) that is a candidate phrase that has not yet been output and includes a basic phrase similar to the basic phrase of the selected phrase that has already been output, the terminal 3 selects the category of the candidate phrase (B). Compare with the category of the candidate phrase (A). And when the category of the said candidate phrase (B) differs from the category of the said candidate phrase (A), the terminal 3 is an additional phrase contained in the said candidate phrase (B), Comprising: The said candidate phrase (A) Acquires and outputs an additional phrase corresponding to a category different from the category. That is, the terminal 3 can output a phrase in which a new additional phrase is added to the already output candidate phrase.
 〔実施形態4〕
 端末1、1Aおよび3の制御ブロック(第1制御部10、30)は、集積回路(ICチップ)等に形成された論理回路(ハードウェア)によって実現してもよいし、CPU(Central Processing Unit)を用いてソフトウェアによって実現してもよい。
[Embodiment 4]
The control blocks (first control units 10, 30) of the terminals 1, 1A and 3 may be realized by a logic circuit (hardware) formed in an integrated circuit (IC chip) or the like, or a CPU (Central Processing Unit). ) May be implemented by software.
 後者の場合、端末1、1Aおよび3は、各機能を実現するソフトウェアであるプログラムの命令を実行するCPU、上記プログラムおよび各種データがコンピュータ(またはCPU)で読み取り可能に記録されたROM(Read Only Memory)または記憶装置(これらを「記録媒体」と称する)、上記プログラムを展開するRAM(Random Access Memory)などを備えている。そして、コンピュータ(またはCPU)が上記プログラムを上記記録媒体から読み取って実行することにより、本発明の目的が達成される。上記記録媒体としては、「一時的でない有形の媒体」、例えば、テープ、ディスク、カード、半導体メモリ、プログラマブルな論理回路などを用いることができる。また、上記プログラムは、該プログラムを伝送可能な任意の伝送媒体(通信ネットワークや放送波等)を介して上記コンピュータに供給されてもよい。なお、本発明は、上記プログラムが電子的な伝送によって具現化された、搬送波に埋め込まれたデータ信号の形態でも実現され得る。 In the latter case, the terminals 1, 1A, and 3 include a CPU that executes instructions of a program that is software that realizes each function, and a ROM (Read Only) in which the program and various data are recorded so as to be readable by a computer (or CPU) Memory) or a storage device (these are referred to as “recording media”), a RAM (Random Access Memory) for expanding the program, and the like. And the objective of this invention is achieved when a computer (or CPU) reads the said program from the said recording medium and runs it. As the recording medium, a “non-temporary tangible medium” such as a tape, a disk, a card, a semiconductor memory, a programmable logic circuit, or the like can be used. The program may be supplied to the computer via an arbitrary transmission medium (such as a communication network or a broadcast wave) that can transmit the program. The present invention can also be realized in the form of a data signal embedded in a carrier wave in which the program is embodied by electronic transmission.
 〔変形例〕
 端末1、1Aおよび3は、複数の音声処理、特に応答生成処理を並行して実行させることにより、或る音声処理を実行し、または実行させようとしてから、別の音声処理を実行する場合に比べ、ユーザが音声を発してから応答が出力されるまでの時間を短縮できる。
[Modification]
When the terminals 1, 1 </ b> A, and 3 execute a certain voice process by executing a plurality of voice processes, in particular, a response generation process in parallel, or execute another voice process. In comparison, the time from when the user utters a sound until the response is output can be shortened.
 また、端末1、1Aおよび3は、並行して生成させた複数の候補フレーズから、情報の重要性(応答レベル)の最も高い候補フレーズを選択し、出力する。つまり、端末1、1Aおよび3にとって、候補取得部141・141Aが複数の候補フレーズを取得でき、応答選択部142・142Aが該複数の候補フレーズから、該複数の候補フレーズのそれぞれが有する情報の重要度が最も高い候補フレーズを、選択フレーズとして選択できればよく、その他の構成は必須ではない。上述の端末1、1Aおよび3が第1応答生成部13・13Aを備え、サーバ2・2Aが第2応答生成部22・22Aを備える例を説明したが、この構成は必須ではない。例えば、端末1は第1応答生成部13を備えず、サーバ2が第1応答生成部13および第2応答生成部22を備えてもよく、逆に、サーバ2は第2応答生成部22を備えず、端末1が第1応答生成部13および第2応答生成部22を備えてもよい。 Also, the terminals 1, 1A and 3 select and output the candidate phrase having the highest importance (response level) of information from the plurality of candidate phrases generated in parallel. That is, for the terminals 1, 1 </ b> A, and 3, the candidate acquisition units 141 and 141 </ b> A can acquire a plurality of candidate phrases, and the response selection units 142 and 142 </ b> A It is only necessary to select the candidate phrase having the highest importance as the selected phrase, and other configurations are not essential. Although the above-described terminals 1, 1 </ b> A, and 3 include the first response generation units 13 and 13 </ b> A and the servers 2 and 2 </ b> A include the second response generation units 22 and 22 </ b> A, this configuration is not essential. For example, the terminal 1 may not include the first response generation unit 13, and the server 2 may include the first response generation unit 13 and the second response generation unit 22. Conversely, the server 2 includes the second response generation unit 22. The terminal 1 may be provided with the 1st response production | generation part 13 and the 2nd response production | generation part 22 without providing.
 さらに、候補取得部141・141Aが取得する候補フレーズが2つであることも必須ではなく、例えば、3つ以上の候補フレーズを取得してもよい。同様に、応答選択部142・142Aは3つ以上の候補フレーズから、情報の重要度が最も高い候補フレーズを選択フレーズとして選択してもよい。 Furthermore, it is not essential that the candidate acquisition units 141 and 141A acquire two candidate phrases. For example, three or more candidate phrases may be acquired. Similarly, the response selection units 142 and 142A may select a candidate phrase having the highest importance of information as a selected phrase from three or more candidate phrases.
 また、端末1の音声認識部12が音声認識処理を実行することも必須ではなく、サーバ2が第2の音声認識部を備え、マイク17からの音声データを端末1がサーバ2に送信して、端末1とサーバ2とがそれぞれ並行して音声認識処理を実行してもよい。端末1の代わりにサーバ2が音声認識部12を備え、マイク17からの音声データに対してサーバ2の音声認識部12が音声認識処理と、応答生成処理のリクエストとを行ってもよい。 Further, it is not essential for the voice recognition unit 12 of the terminal 1 to execute the voice recognition process. The server 2 includes the second voice recognition unit, and the terminal 1 transmits the voice data from the microphone 17 to the server 2. The terminal 1 and the server 2 may execute the speech recognition process in parallel. The server 2 may include the voice recognition unit 12 instead of the terminal 1, and the voice recognition unit 12 of the server 2 may perform a voice recognition process and a response generation process request on the voice data from the microphone 17.
 さらに、端末1の音声合成部15が音声合成処理を実行することも必須ではなく、サーバ2が音声合成部15を備え、選択結果出力部143・143Aから取得する選択フレーズに基づいて、スピーカ192に出力させる音声データを生成してもよい。 Further, it is not essential for the speech synthesizer 15 of the terminal 1 to execute the speech synthesis process, and the server 2 includes the speech synthesizer 15 and based on the selection phrase acquired from the selection result output units 143 and 143A, the speaker 192. Audio data to be output to the user may be generated.
 なお、一般にサーバは、端末に比べ、高い処理能力を備え、豊富な語彙を利用でき、音声認識の認識精度および応答生成の対応可能数が高い。通常、サーバは端末よりも巨大な音響モデル辞書、言語モデル辞書等を有し、音声認識の処理能力が高く、また、数多くの対話応答シナリオに対応でき、さらに、膨大な音素データを持ちクリアな音声を出力する。 In general, a server has a higher processing capacity than a terminal, can use abundant vocabulary, and has a high recognition accuracy for voice recognition and a high number of responses that can be handled. Usually, the server has a larger acoustic model dictionary, language model dictionary, etc. than the terminal, has a high speech recognition processing capacity, can handle many interactive response scenarios, and has a large amount of phoneme data and is clear. Output audio.
 本発明の各態様に係る携帯端末(応答制御装置)は、コンピュータによって実現してもよく、この場合には、コンピュータを上記携帯端末が備える各部(ソフトウェア要素に限る)として動作させることにより上記携帯端末をコンピュータにて実現させる携帯端末の制御プログラム、およびそれを記録したコンピュータ読み取り可能な記録媒体も、本発明の範疇に入る。 The portable terminal (response control apparatus) according to each aspect of the present invention may be realized by a computer. In this case, the portable terminal is operated by operating the computer as each unit (limited to software elements) included in the portable terminal. A mobile terminal control program for realizing the terminal by a computer and a computer-readable recording medium on which the control program is recorded also fall within the scope of the present invention.
 本発明は上述した各実施形態に限定されるものではなく、請求項に示した範囲で種々の変更が可能であり、異なる実施形態にそれぞれ開示された技術的手段を適宜組み合わせて得られる実施形態についても本発明の技術的範囲に含まれる。さらに、各実施形態にそれぞれ開示された技術的手段を組み合わせることにより、新しい技術的特徴を形成することができる。 The present invention is not limited to the above-described embodiments, and various modifications are possible within the scope shown in the claims, and embodiments obtained by appropriately combining technical means disclosed in different embodiments. Is also included in the technical scope of the present invention. Furthermore, a new technical feature can be formed by combining the technical means disclosed in each embodiment.
 本発明は、音声に対する応答を制御する応答制御装置に広く利用することができる。 The present invention can be widely used in response control devices that control responses to voice.
1・1A・3 携帯端末(応答制御装置),13・13A 第1応答生成部(応答生成部),22・22A 第2応答生成部(応答生成部),141・141A 候補取得部(候補フレーズ取得部),142・142A 応答選択部(選択部),341 フレーズ追加部 1 · 1A · 3 mobile terminal (response control device), 13 · 13A first response generator (response generator), 22 · 22A second response generator (response generator), 141 · 141A candidate acquisition unit (candidate phrase) Acquisition unit), 142 / 142A response selection unit (selection unit), 341 phrase addition unit

Claims (5)

  1.  音声に対する応答を制御する応答制御装置であって、
     複数の応答生成部のそれぞれによって、上記音声に基づいて生成された複数の候補フレーズを取得する候補フレーズ取得部と、
     上記候補フレーズ取得部が取得した上記複数の候補フレーズから、該複数の候補フレーズのそれぞれが有する情報の重要度が最も高い候補フレーズを、応答フレーズとして選択する選択部とを備えることを特徴とする応答制御装置。
    A response control device for controlling a response to voice,
    A candidate phrase acquisition unit that acquires a plurality of candidate phrases generated based on the voice by each of a plurality of response generation units;
    A selection unit that selects, from among the plurality of candidate phrases acquired by the candidate phrase acquisition unit, a candidate phrase having the highest importance of information included in each of the plurality of candidate phrases, as a response phrase; Response control device.
  2.  上記複数の候補フレーズはそれぞれ、1個以上の基準フレーズと0個以上の付加フレーズとからなり、
     上記選択部は、付加フレーズを含む候補フレーズを、付加フレーズを含まない候補フレーズよりも上記重要度が高いと判定することを特徴とする請求項1に記載の応答制御装置。
    Each of the plurality of candidate phrases includes one or more reference phrases and zero or more additional phrases.
    The response control device according to claim 1, wherein the selection unit determines that a candidate phrase including an additional phrase is higher in importance than a candidate phrase including no additional phrase.
  3.  上記付加フレーズには付加ポイントが設定されており、
     上記選択部は、上記候補フレーズに含まれる上記付加フレーズに設定された上記付加ポイントの合計値を、当該候補フレーズの上記重要度とすることを特徴とする請求項2に記載の応答制御装置。
    Additional points are set for the above additional phrases,
    The response control device according to claim 2, wherein the selection unit sets the total value of the additional points set in the additional phrase included in the candidate phrase as the importance of the candidate phrase.
  4.  上記付加フレーズにはカテゴリが設定されており、
     上記選択部によって選択されなかった上記候補フレーズであって、上記選択部によって選択された上記応答フレーズに含まれる上記基準フレーズと同内容の上記基準フレーズを含む上記候補フレーズが、上記応答フレーズに含まれる上記付加フレーズに設定された上記カテゴリとは異なる上記カテゴリが設定された上記付加フレーズを含む場合、該付加フレーズを、上記応答フレーズに付加するフレーズ追加部をさらに備えることを特徴とする請求項2または3に記載の応答制御装置。
    The above additional phrase has a category,
    The candidate phrase that is not selected by the selection unit and includes the reference phrase having the same content as the reference phrase included in the response phrase selected by the selection unit is included in the response phrase The apparatus further comprises a phrase adding unit that adds the additional phrase to the response phrase when the additional phrase is set to the category different from the category set to the additional phrase. 4. The response control device according to 2 or 3.
  5.  請求項1から4のいずれか1項に記載の応答制御装置としてコンピュータを機能させるための制御プログラムであって、コンピュータを上記各部として機能させるための制御プログラム。 A control program for causing a computer to function as the response control device according to any one of claims 1 to 4, wherein the control program causes the computer to function as each of the above-described units.
PCT/JP2014/079411 2013-12-27 2014-11-06 Response control device and control program WO2015098306A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2013-273284 2013-12-27
JP2013273284A JP2015127758A (en) 2013-12-27 2013-12-27 Response control device, control program

Publications (1)

Publication Number Publication Date
WO2015098306A1 true WO2015098306A1 (en) 2015-07-02

Family

ID=53478184

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2014/079411 WO2015098306A1 (en) 2013-12-27 2014-11-06 Response control device and control program

Country Status (2)

Country Link
JP (1) JP2015127758A (en)
WO (1) WO2015098306A1 (en)

Cited By (66)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2017203808A (en) * 2016-05-09 2017-11-16 富士通株式会社 Interaction processing program, interaction processing method, and information processing apparatus
US10978090B2 (en) 2013-02-07 2021-04-13 Apple Inc. Voice trigger for a digital assistant
US10984798B2 (en) 2018-06-01 2021-04-20 Apple Inc. Voice interaction at a primary device to access call functionality of a companion device
US11009970B2 (en) 2018-06-01 2021-05-18 Apple Inc. Attention aware virtual assistant dismissal
US11037565B2 (en) 2016-06-10 2021-06-15 Apple Inc. Intelligent digital assistant in a multi-tasking environment
US11070949B2 (en) 2015-05-27 2021-07-20 Apple Inc. Systems and methods for proactively identifying and surfacing relevant content on an electronic device with a touch-sensitive display
US11087759B2 (en) 2015-03-08 2021-08-10 Apple Inc. Virtual assistant activation
US11120372B2 (en) 2011-06-03 2021-09-14 Apple Inc. Performing actions associated with task items that represent tasks to perform
US11126400B2 (en) 2015-09-08 2021-09-21 Apple Inc. Zero latency digital assistant
US11133008B2 (en) 2014-05-30 2021-09-28 Apple Inc. Reducing the need for manual start/end-pointing and trigger phrases
US11152002B2 (en) 2016-06-11 2021-10-19 Apple Inc. Application integration with a digital assistant
US11169616B2 (en) 2018-05-07 2021-11-09 Apple Inc. Raise to speak
US11237797B2 (en) 2019-05-31 2022-02-01 Apple Inc. User activity shortcut suggestions
US11257504B2 (en) 2014-05-30 2022-02-22 Apple Inc. Intelligent assistant for home automation
US11321116B2 (en) 2012-05-15 2022-05-03 Apple Inc. Systems and methods for integrating third party services with a digital assistant
US11348582B2 (en) 2008-10-02 2022-05-31 Apple Inc. Electronic devices with voice command and contextual data processing capabilities
US11380310B2 (en) 2017-05-12 2022-07-05 Apple Inc. Low-latency intelligent automated assistant
US11388291B2 (en) 2013-03-14 2022-07-12 Apple Inc. System and method for processing voicemail
US11405466B2 (en) 2017-05-12 2022-08-02 Apple Inc. Synchronization and task delegation of a digital assistant
US11423886B2 (en) 2010-01-18 2022-08-23 Apple Inc. Task flow identification based on user intent
US11431642B2 (en) 2018-06-01 2022-08-30 Apple Inc. Variable latency device coordination
US11467802B2 (en) 2017-05-11 2022-10-11 Apple Inc. Maintaining privacy of personal information
US11500672B2 (en) 2015-09-08 2022-11-15 Apple Inc. Distributed personal assistant
US11516537B2 (en) 2014-06-30 2022-11-29 Apple Inc. Intelligent automated assistant for TV user interactions
US11526368B2 (en) 2015-11-06 2022-12-13 Apple Inc. Intelligent automated assistant in a messaging environment
US11532306B2 (en) 2017-05-16 2022-12-20 Apple Inc. Detecting a trigger of a digital assistant
US11580990B2 (en) 2017-05-12 2023-02-14 Apple Inc. User-specific acoustic models
US11599331B2 (en) 2017-05-11 2023-03-07 Apple Inc. Maintaining privacy of personal information
US11657813B2 (en) 2019-05-31 2023-05-23 Apple Inc. Voice identification in digital assistant systems
US11670289B2 (en) 2014-05-30 2023-06-06 Apple Inc. Multi-command single utterance input method
US11671920B2 (en) 2007-04-03 2023-06-06 Apple Inc. Method and system for operating a multifunction portable electronic device using voice-activation
US11675491B2 (en) 2019-05-06 2023-06-13 Apple Inc. User configurable task triggers
US11675829B2 (en) 2017-05-16 2023-06-13 Apple Inc. Intelligent automated assistant for media exploration
US11696060B2 (en) 2020-07-21 2023-07-04 Apple Inc. User identification using headphones
US11705130B2 (en) 2019-05-06 2023-07-18 Apple Inc. Spoken notifications
US11710482B2 (en) 2018-03-26 2023-07-25 Apple Inc. Natural assistant interaction
US11727219B2 (en) 2013-06-09 2023-08-15 Apple Inc. System and method for inferring user intent from speech inputs
US11765209B2 (en) 2020-05-11 2023-09-19 Apple Inc. Digital assistant hardware abstraction
US11783815B2 (en) 2019-03-18 2023-10-10 Apple Inc. Multimodality in digital assistant systems
US11790914B2 (en) 2019-06-01 2023-10-17 Apple Inc. Methods and user interfaces for voice-based control of electronic devices
US11798547B2 (en) 2013-03-15 2023-10-24 Apple Inc. Voice activated device for use with a voice-based digital assistant
US11810578B2 (en) 2020-05-11 2023-11-07 Apple Inc. Device arbitration for digital assistant-based intercom systems
US11809483B2 (en) 2015-09-08 2023-11-07 Apple Inc. Intelligent automated assistant for media search and playback
US11809783B2 (en) 2016-06-11 2023-11-07 Apple Inc. Intelligent device arbitration and control
US11838734B2 (en) 2020-07-20 2023-12-05 Apple Inc. Multi-device audio adjustment coordination
US11853647B2 (en) 2015-12-23 2023-12-26 Apple Inc. Proactive assistance based on dialog communication between devices
US11853536B2 (en) 2015-09-08 2023-12-26 Apple Inc. Intelligent automated assistant in a media environment
US11854539B2 (en) 2018-05-07 2023-12-26 Apple Inc. Intelligent automated assistant for delivering content from user experiences
US11886805B2 (en) 2015-11-09 2024-01-30 Apple Inc. Unconventional virtual assistant interactions
US11888791B2 (en) 2019-05-21 2024-01-30 Apple Inc. Providing message response suggestions
US11893992B2 (en) 2018-09-28 2024-02-06 Apple Inc. Multi-modal inputs for voice commands
US11914848B2 (en) 2020-05-11 2024-02-27 Apple Inc. Providing relevant data items based on context
US11947873B2 (en) 2015-06-29 2024-04-02 Apple Inc. Virtual assistant for media playback
US11984124B2 (en) 2020-11-13 2024-05-14 Apple Inc. Speculative task flow execution
US12001933B2 (en) 2015-05-15 2024-06-04 Apple Inc. Virtual assistant in a communication session
US12010262B2 (en) 2013-08-06 2024-06-11 Apple Inc. Auto-activating smart responses based on activities from remote devices
US12014118B2 (en) 2017-05-15 2024-06-18 Apple Inc. Multi-modal interfaces having selection disambiguation and text modification capability
US12051413B2 (en) 2015-09-30 2024-07-30 Apple Inc. Intelligent device identification
US12067985B2 (en) 2018-06-01 2024-08-20 Apple Inc. Virtual assistant operations in multi-device environments
US12073147B2 (en) 2013-06-09 2024-08-27 Apple Inc. Device, method, and graphical user interface for enabling conversation persistence across two or more instances of a digital assistant
US12087308B2 (en) 2010-01-18 2024-09-10 Apple Inc. Intelligent automated assistant
US12197817B2 (en) 2016-06-11 2025-01-14 Apple Inc. Intelligent device arbitration and control
US12223282B2 (en) 2016-06-09 2025-02-11 Apple Inc. Intelligent automated assistant in a home environment
US12254887B2 (en) 2017-05-16 2025-03-18 Apple Inc. Far-field extension of digital assistant services for providing a notification of an event to a user
US12260234B2 (en) 2017-01-09 2025-03-25 Apple Inc. Application integration with a digital assistant
US12301635B2 (en) 2020-05-11 2025-05-13 Apple Inc. Digital assistant hardware abstraction

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107808007A (en) * 2017-11-16 2018-03-16 百度在线网络技术(北京)有限公司 Information processing method and device
JP7146933B2 (en) * 2018-10-05 2022-10-04 株式会社Nttドコモ Information provision device

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2003255990A (en) * 2002-03-06 2003-09-10 Sony Corp Interactive processor and method, and robot apparatus
JP2007219149A (en) * 2006-02-16 2007-08-30 Toyota Central Res & Dev Lab Inc Response generating apparatus, method, and program
JP2009198686A (en) * 2008-02-20 2009-09-03 Toyota Central R&D Labs Inc Response generator and program
JP2010224153A (en) * 2009-03-23 2010-10-07 Toyota Central R&D Labs Inc Spoken dialogue apparatus and program

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2003255990A (en) * 2002-03-06 2003-09-10 Sony Corp Interactive processor and method, and robot apparatus
JP2007219149A (en) * 2006-02-16 2007-08-30 Toyota Central Res & Dev Lab Inc Response generating apparatus, method, and program
JP2009198686A (en) * 2008-02-20 2009-09-03 Toyota Central R&D Labs Inc Response generator and program
JP2010224153A (en) * 2009-03-23 2010-10-07 Toyota Central R&D Labs Inc Spoken dialogue apparatus and program

Cited By (116)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11979836B2 (en) 2007-04-03 2024-05-07 Apple Inc. Method and system for operating a multi-function portable electronic device using voice-activation
US11671920B2 (en) 2007-04-03 2023-06-06 Apple Inc. Method and system for operating a multifunction portable electronic device using voice-activation
US11348582B2 (en) 2008-10-02 2022-05-31 Apple Inc. Electronic devices with voice command and contextual data processing capabilities
US12361943B2 (en) 2008-10-02 2025-07-15 Apple Inc. Electronic devices with voice command and contextual data processing capabilities
US11900936B2 (en) 2008-10-02 2024-02-13 Apple Inc. Electronic devices with voice command and contextual data processing capabilities
US11423886B2 (en) 2010-01-18 2022-08-23 Apple Inc. Task flow identification based on user intent
US12431128B2 (en) 2010-01-18 2025-09-30 Apple Inc. Task flow identification based on user intent
US12165635B2 (en) 2010-01-18 2024-12-10 Apple Inc. Intelligent automated assistant
US12087308B2 (en) 2010-01-18 2024-09-10 Apple Inc. Intelligent automated assistant
US11120372B2 (en) 2011-06-03 2021-09-14 Apple Inc. Performing actions associated with task items that represent tasks to perform
US11321116B2 (en) 2012-05-15 2022-05-03 Apple Inc. Systems and methods for integrating third party services with a digital assistant
US12009007B2 (en) 2013-02-07 2024-06-11 Apple Inc. Voice trigger for a digital assistant
US10978090B2 (en) 2013-02-07 2021-04-13 Apple Inc. Voice trigger for a digital assistant
US11557310B2 (en) 2013-02-07 2023-01-17 Apple Inc. Voice trigger for a digital assistant
US12277954B2 (en) 2013-02-07 2025-04-15 Apple Inc. Voice trigger for a digital assistant
US11636869B2 (en) 2013-02-07 2023-04-25 Apple Inc. Voice trigger for a digital assistant
US11862186B2 (en) 2013-02-07 2024-01-02 Apple Inc. Voice trigger for a digital assistant
US11388291B2 (en) 2013-03-14 2022-07-12 Apple Inc. System and method for processing voicemail
US11798547B2 (en) 2013-03-15 2023-10-24 Apple Inc. Voice activated device for use with a voice-based digital assistant
US12073147B2 (en) 2013-06-09 2024-08-27 Apple Inc. Device, method, and graphical user interface for enabling conversation persistence across two or more instances of a digital assistant
US11727219B2 (en) 2013-06-09 2023-08-15 Apple Inc. System and method for inferring user intent from speech inputs
US12010262B2 (en) 2013-08-06 2024-06-11 Apple Inc. Auto-activating smart responses based on activities from remote devices
US11810562B2 (en) 2014-05-30 2023-11-07 Apple Inc. Reducing the need for manual start/end-pointing and trigger phrases
US11133008B2 (en) 2014-05-30 2021-09-28 Apple Inc. Reducing the need for manual start/end-pointing and trigger phrases
US12067990B2 (en) 2014-05-30 2024-08-20 Apple Inc. Intelligent assistant for home automation
US11699448B2 (en) 2014-05-30 2023-07-11 Apple Inc. Intelligent assistant for home automation
US12118999B2 (en) 2014-05-30 2024-10-15 Apple Inc. Reducing the need for manual start/end-pointing and trigger phrases
US11257504B2 (en) 2014-05-30 2022-02-22 Apple Inc. Intelligent assistant for home automation
US11670289B2 (en) 2014-05-30 2023-06-06 Apple Inc. Multi-command single utterance input method
US11838579B2 (en) 2014-06-30 2023-12-05 Apple Inc. Intelligent automated assistant for TV user interactions
US11516537B2 (en) 2014-06-30 2022-11-29 Apple Inc. Intelligent automated assistant for TV user interactions
US12200297B2 (en) 2014-06-30 2025-01-14 Apple Inc. Intelligent automated assistant for TV user interactions
US11842734B2 (en) 2015-03-08 2023-12-12 Apple Inc. Virtual assistant activation
US11087759B2 (en) 2015-03-08 2021-08-10 Apple Inc. Virtual assistant activation
US12236952B2 (en) 2015-03-08 2025-02-25 Apple Inc. Virtual assistant activation
US12001933B2 (en) 2015-05-15 2024-06-04 Apple Inc. Virtual assistant in a communication session
US12333404B2 (en) 2015-05-15 2025-06-17 Apple Inc. Virtual assistant in a communication session
US12154016B2 (en) 2015-05-15 2024-11-26 Apple Inc. Virtual assistant in a communication session
US11070949B2 (en) 2015-05-27 2021-07-20 Apple Inc. Systems and methods for proactively identifying and surfacing relevant content on an electronic device with a touch-sensitive display
US11947873B2 (en) 2015-06-29 2024-04-02 Apple Inc. Virtual assistant for media playback
US11550542B2 (en) 2015-09-08 2023-01-10 Apple Inc. Zero latency digital assistant
US12204932B2 (en) 2015-09-08 2025-01-21 Apple Inc. Distributed personal assistant
US11809483B2 (en) 2015-09-08 2023-11-07 Apple Inc. Intelligent automated assistant for media search and playback
US11853536B2 (en) 2015-09-08 2023-12-26 Apple Inc. Intelligent automated assistant in a media environment
US11500672B2 (en) 2015-09-08 2022-11-15 Apple Inc. Distributed personal assistant
US12386491B2 (en) 2015-09-08 2025-08-12 Apple Inc. Intelligent automated assistant in a media environment
US11954405B2 (en) 2015-09-08 2024-04-09 Apple Inc. Zero latency digital assistant
US11126400B2 (en) 2015-09-08 2021-09-21 Apple Inc. Zero latency digital assistant
US12051413B2 (en) 2015-09-30 2024-07-30 Apple Inc. Intelligent device identification
US11809886B2 (en) 2015-11-06 2023-11-07 Apple Inc. Intelligent automated assistant in a messaging environment
US11526368B2 (en) 2015-11-06 2022-12-13 Apple Inc. Intelligent automated assistant in a messaging environment
US11886805B2 (en) 2015-11-09 2024-01-30 Apple Inc. Unconventional virtual assistant interactions
US11853647B2 (en) 2015-12-23 2023-12-26 Apple Inc. Proactive assistance based on dialog communication between devices
JP2017203808A (en) * 2016-05-09 2017-11-16 富士通株式会社 Interaction processing program, interaction processing method, and information processing apparatus
US12223282B2 (en) 2016-06-09 2025-02-11 Apple Inc. Intelligent automated assistant in a home environment
US11037565B2 (en) 2016-06-10 2021-06-15 Apple Inc. Intelligent digital assistant in a multi-tasking environment
US11657820B2 (en) 2016-06-10 2023-05-23 Apple Inc. Intelligent digital assistant in a multi-tasking environment
US12175977B2 (en) 2016-06-10 2024-12-24 Apple Inc. Intelligent digital assistant in a multi-tasking environment
US11809783B2 (en) 2016-06-11 2023-11-07 Apple Inc. Intelligent device arbitration and control
US12293763B2 (en) 2016-06-11 2025-05-06 Apple Inc. Application integration with a digital assistant
US12197817B2 (en) 2016-06-11 2025-01-14 Apple Inc. Intelligent device arbitration and control
US11152002B2 (en) 2016-06-11 2021-10-19 Apple Inc. Application integration with a digital assistant
US11749275B2 (en) 2016-06-11 2023-09-05 Apple Inc. Application integration with a digital assistant
US12260234B2 (en) 2017-01-09 2025-03-25 Apple Inc. Application integration with a digital assistant
US11599331B2 (en) 2017-05-11 2023-03-07 Apple Inc. Maintaining privacy of personal information
US11467802B2 (en) 2017-05-11 2022-10-11 Apple Inc. Maintaining privacy of personal information
US11837237B2 (en) 2017-05-12 2023-12-05 Apple Inc. User-specific acoustic models
US11380310B2 (en) 2017-05-12 2022-07-05 Apple Inc. Low-latency intelligent automated assistant
US11580990B2 (en) 2017-05-12 2023-02-14 Apple Inc. User-specific acoustic models
US11862151B2 (en) 2017-05-12 2024-01-02 Apple Inc. Low-latency intelligent automated assistant
US11538469B2 (en) 2017-05-12 2022-12-27 Apple Inc. Low-latency intelligent automated assistant
US11405466B2 (en) 2017-05-12 2022-08-02 Apple Inc. Synchronization and task delegation of a digital assistant
US12014118B2 (en) 2017-05-15 2024-06-18 Apple Inc. Multi-modal interfaces having selection disambiguation and text modification capability
US11532306B2 (en) 2017-05-16 2022-12-20 Apple Inc. Detecting a trigger of a digital assistant
US11675829B2 (en) 2017-05-16 2023-06-13 Apple Inc. Intelligent automated assistant for media exploration
US12254887B2 (en) 2017-05-16 2025-03-18 Apple Inc. Far-field extension of digital assistant services for providing a notification of an event to a user
US12026197B2 (en) 2017-05-16 2024-07-02 Apple Inc. Intelligent automated assistant for media exploration
US12211502B2 (en) 2018-03-26 2025-01-28 Apple Inc. Natural assistant interaction
US11710482B2 (en) 2018-03-26 2023-07-25 Apple Inc. Natural assistant interaction
US11854539B2 (en) 2018-05-07 2023-12-26 Apple Inc. Intelligent automated assistant for delivering content from user experiences
US11487364B2 (en) 2018-05-07 2022-11-01 Apple Inc. Raise to speak
US11169616B2 (en) 2018-05-07 2021-11-09 Apple Inc. Raise to speak
US11900923B2 (en) 2018-05-07 2024-02-13 Apple Inc. Intelligent automated assistant for delivering content from user experiences
US11907436B2 (en) 2018-05-07 2024-02-20 Apple Inc. Raise to speak
US11009970B2 (en) 2018-06-01 2021-05-18 Apple Inc. Attention aware virtual assistant dismissal
US11630525B2 (en) 2018-06-01 2023-04-18 Apple Inc. Attention aware virtual assistant dismissal
US12067985B2 (en) 2018-06-01 2024-08-20 Apple Inc. Virtual assistant operations in multi-device environments
US11431642B2 (en) 2018-06-01 2022-08-30 Apple Inc. Variable latency device coordination
US12080287B2 (en) 2018-06-01 2024-09-03 Apple Inc. Voice interaction at a primary device to access call functionality of a companion device
US12061752B2 (en) 2018-06-01 2024-08-13 Apple Inc. Attention aware virtual assistant dismissal
US12386434B2 (en) 2018-06-01 2025-08-12 Apple Inc. Attention aware virtual assistant dismissal
US10984798B2 (en) 2018-06-01 2021-04-20 Apple Inc. Voice interaction at a primary device to access call functionality of a companion device
US11360577B2 (en) 2018-06-01 2022-06-14 Apple Inc. Attention aware virtual assistant dismissal
US12367879B2 (en) 2018-09-28 2025-07-22 Apple Inc. Multi-modal inputs for voice commands
US11893992B2 (en) 2018-09-28 2024-02-06 Apple Inc. Multi-modal inputs for voice commands
US11783815B2 (en) 2019-03-18 2023-10-10 Apple Inc. Multimodality in digital assistant systems
US12136419B2 (en) 2019-03-18 2024-11-05 Apple Inc. Multimodality in digital assistant systems
US12154571B2 (en) 2019-05-06 2024-11-26 Apple Inc. Spoken notifications
US11675491B2 (en) 2019-05-06 2023-06-13 Apple Inc. User configurable task triggers
US12216894B2 (en) 2019-05-06 2025-02-04 Apple Inc. User configurable task triggers
US11705130B2 (en) 2019-05-06 2023-07-18 Apple Inc. Spoken notifications
US11888791B2 (en) 2019-05-21 2024-01-30 Apple Inc. Providing message response suggestions
US11657813B2 (en) 2019-05-31 2023-05-23 Apple Inc. Voice identification in digital assistant systems
US11237797B2 (en) 2019-05-31 2022-02-01 Apple Inc. User activity shortcut suggestions
US11790914B2 (en) 2019-06-01 2023-10-17 Apple Inc. Methods and user interfaces for voice-based control of electronic devices
US11765209B2 (en) 2020-05-11 2023-09-19 Apple Inc. Digital assistant hardware abstraction
US12301635B2 (en) 2020-05-11 2025-05-13 Apple Inc. Digital assistant hardware abstraction
US12197712B2 (en) 2020-05-11 2025-01-14 Apple Inc. Providing relevant data items based on context
US11810578B2 (en) 2020-05-11 2023-11-07 Apple Inc. Device arbitration for digital assistant-based intercom systems
US11924254B2 (en) 2020-05-11 2024-03-05 Apple Inc. Digital assistant hardware abstraction
US11914848B2 (en) 2020-05-11 2024-02-27 Apple Inc. Providing relevant data items based on context
US11838734B2 (en) 2020-07-20 2023-12-05 Apple Inc. Multi-device audio adjustment coordination
US11750962B2 (en) 2020-07-21 2023-09-05 Apple Inc. User identification using headphones
US11696060B2 (en) 2020-07-21 2023-07-04 Apple Inc. User identification using headphones
US12219314B2 (en) 2020-07-21 2025-02-04 Apple Inc. User identification using headphones
US11984124B2 (en) 2020-11-13 2024-05-14 Apple Inc. Speculative task flow execution

Also Published As

Publication number Publication date
JP2015127758A (en) 2015-07-09

Similar Documents

Publication Publication Date Title
WO2015098306A1 (en) Response control device and control program
CN107660303B (en) Language Model Modification for Local Speech Recognition Systems Using Remote Sources
US9905228B2 (en) System and method of performing automatic speech recognition using local private data
US10229674B2 (en) Cross-language speech recognition and translation
EP3389044A1 (en) Management layer for multiple intelligent personal assistant services
US9721563B2 (en) Name recognition system
CN103035240B (en) Method and system for speech recognition repair using contextual information
CN102884569B (en) Embedded Web Speech Recognizer Integration
JP5706384B2 (en) Speech recognition apparatus, speech recognition system, speech recognition method, and speech recognition program
US8959021B2 (en) Single interface for local and remote speech synthesis
CN110164416B (en) Voice recognition method and device, equipment and storage medium thereof
JP2020505643A (en) Voice recognition method, electronic device, and computer storage medium
US11532301B1 (en) Natural language processing
US10152298B1 (en) Confidence estimation based on frequency
CN107871502A (en) Voice dialogue system and voice dialogue method
US10170122B2 (en) Speech recognition method, electronic device and speech recognition system
CN112242144A (en) Speech recognition decoding method, apparatus, device and computer-readable storage medium based on streaming attention model
US12165640B2 (en) Response method, terminal, and storage medium for speech response
US11626107B1 (en) Natural language processing
US9530103B2 (en) Combining of results from multiple decoders
CN110659361B (en) Conversation method, device, equipment and medium
JP6559417B2 (en) Information processing apparatus, information processing method, dialogue system, and control program
US12299021B1 (en) Bi-directional voice enabled system for CPE devices
US20190147872A1 (en) Information processing device
KR20230075386A (en) Method and apparatus for speech signal processing

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 14873486

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 14873486

Country of ref document: EP

Kind code of ref document: A1