WO2018100705A1 - Dispositif de reconnaissance vocale et procédé de reconnaissance vocale - Google Patents
Dispositif de reconnaissance vocale et procédé de reconnaissance vocale Download PDFInfo
- Publication number
- WO2018100705A1 WO2018100705A1 PCT/JP2016/085689 JP2016085689W WO2018100705A1 WO 2018100705 A1 WO2018100705 A1 WO 2018100705A1 JP 2016085689 W JP2016085689 W JP 2016085689W WO 2018100705 A1 WO2018100705 A1 WO 2018100705A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- vocabulary
- recognition
- display
- vocabularies
- unit
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims description 10
- 238000005516 engineering process Methods 0.000 abstract description 2
- 238000010586 diagram Methods 0.000 description 29
- 230000006870 function Effects 0.000 description 18
- 238000004891 communication Methods 0.000 description 17
- 238000012545 processing Methods 0.000 description 17
- 239000000470 constituent Substances 0.000 description 15
- 239000000446 fuel Substances 0.000 description 15
- 230000008859 change Effects 0.000 description 8
- 230000004048 modification Effects 0.000 description 8
- 238000012986 modification Methods 0.000 description 8
- 238000012217 deletion Methods 0.000 description 4
- 230000037430 deletion Effects 0.000 description 4
- 230000000694 effects Effects 0.000 description 4
- 239000002131 composite material Substances 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/10—Speech classification or search using distance or distortion measures between unknown speech and reference templates
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/28—Constructional details of speech recognition systems
- G10L15/32—Multiple recognisers used in sequence or in parallel; Score combination systems therefor, e.g. voting systems
Definitions
- the present invention relates to a speech recognition apparatus and speech recognition method for recognizing speech.
- a recognition vocabulary (or a feature amount of a recognition vocabulary) obtained from speech and a function are stored in association with each other, and the same utterance as the recognition vocabulary corresponding to the stored speech is performed. If it is, the function associated with the voice is executed.
- the recognition result may be ambiguous.
- a device that uses the recognition result of the speech recognition device performs a search using the recognition result of the speech recognition device
- the recognition result may be ambiguous.
- a facility search it becomes ambiguous whether the facility name “BP” is “BP” in the facility category “fuel station” or “BP” in the facility category “diesel”. May end up.
- a place name search it is ambiguous whether the city name “Munchen” is a city in the state “Bavaria” or a city in the state “Hutthum”.
- the search results are “BP (fuel station)” and “BP (diesel)”, and for the above (ii), for example, the search results are “Munchen ( Bavaria) ”and“ Munchen (Hutthrum) ”. That is, it is possible to give information to distinguish the search results presented to the user. For example, in the technique of Patent Document 2, it is possible to use a recognition vocabulary including information for this distinction.
- JP 2003-323192 A Japanese Patent No. 4554272
- Patent Document 1 and Patent Document 2 are biased toward adding recognition vocabulary.
- an increase in the number of recognized vocabulary causes a reduction in recognition accuracy of the speech recognition apparatus.
- the present invention has been made in view of the above-described problems, and an object thereof is to provide a technique capable of improving the recognition accuracy of a speech recognition apparatus.
- the speech recognition apparatus recognizes a speech recognition unit that recognizes input speech and a recognition result including a main vocabulary that is a predetermined vocabulary by recognition of the speech recognition unit. Includes a main vocabulary, obtains a plurality of candidate vocabulary previously associated with the main vocabulary, obtains a priority for each candidate vocabulary, and based on the obtained priority, obtains one or more candidates from the plurality of candidate vocabularies A recognition vocabulary selection unit that selects a vocabulary as one or more recognition vocabularies.
- a recognition result including the main vocabulary is obtained by recognition
- a plurality of candidate vocabularies are acquired, and priorities are acquired for each candidate vocabulary, and a plurality of candidates are acquired based on the acquired priorities.
- One or more candidate vocabularies are selected as one or more recognition vocabularies from the candidate vocabularies.
- FIG. 1 is a block diagram showing a configuration of a speech recognition apparatus according to Embodiment 1.
- FIG. 6 is a block diagram showing a configuration of a recognized vocabulary selection unit according to Embodiment 2.
- FIG. It is a figure which shows an example of the information of the display vocabulary database which concerns on Embodiment 2.
- FIG. It is a figure which shows an example of the information of the priority database which concerns on Embodiment 2.
- FIG. 6 is a flowchart showing the operation of the speech recognition apparatus according to the second embodiment.
- FIG. 10 is a diagram illustrating an operation result of the first example of the speech recognition apparatus according to the second embodiment.
- FIG. 10 is a block diagram showing a configuration of a recognized vocabulary selection unit according to Embodiment 3.
- FIG. It is a figure which shows an example of the information of the display vocabulary database which concerns on Embodiment 3.
- FIG. It is a figure which shows an example of the information of the vehicle information database which concerns on Embodiment 3.
- 10 is a flowchart showing the operation of the speech recognition apparatus according to the third embodiment.
- FIG. 10 is a block diagram illustrating a configuration of a recognized vocabulary selection unit according to Embodiment 4. It is a figure which shows an example of the information of the display vocabulary database which concerns on Embodiment 4.
- FIG. 10 is a block diagram showing a configuration of a recognized vocabulary selection unit according to Embodiment 4. It is a figure which shows an example of the information of the display vocabulary database which concerns on Embodiment 4.
- FIG. 10 is a flowchart showing the operation of the speech recognition apparatus according to the fourth embodiment.
- FIG. 10 is a block diagram illustrating a configuration of a recognized vocabulary selection unit according to Embodiment 5. It is a figure which shows an example of the information of SW information database which concerns on Embodiment 5.
- FIG. 10 is a flowchart showing the operation of the speech recognition apparatus according to the fifth embodiment.
- FIG. 10 is a block diagram showing a configuration of a recognized vocabulary selection unit according to Embodiment 6. It is a figure which shows an example of the information of the HW information database which concerns on Embodiment 6.
- FIG. 10 is a flowchart showing the operation of the speech recognition apparatus according to the fourth embodiment.
- FIG. 10 is a block diagram illustrating a configuration of a recognized vocabulary selection unit according to Embodiment 5. It is a figure which shows an example of the information of the HW information database which concerns on Embodiment 6.
- FIG. 14 is a flowchart showing the operation of the speech recognition apparatus according to the sixth embodiment. It is a block diagram which shows the hardware constitutions of the navigation apparatus which concerns on another modification. It is a block diagram which shows the hardware constitutions of the navigation apparatus which concerns on another modification. It is a block diagram which shows the structure of the server which concerns on another modification. It is a block diagram which shows the structure of the communication terminal which concerns on another modification.
- FIG. 1 is a block diagram showing a configuration of a speech recognition apparatus 1 according to Embodiment 1 of the present invention.
- the speech recognition apparatus 1 in FIG. 1 includes a speech recognition unit 11 and a recognition vocabulary selection unit 12.
- the voice recognition unit 11 recognizes the input voice. For example, the speech recognition unit 11 sequentially converts the input speech into an analog speech signal and a digital speech signal, and recognizes a character string and a phrase corresponding to the digital speech signal based on the digital speech signal. Get as. Note that, using the technique described in Japanese Patent Laid-Open No. 9-50291, the speech recognition unit 11 recognizes a vocabulary recognized based on a vocabulary recognized by speech recognition, that is, a vocabulary most likely to be acoustically and linguistically generated by the user. May be selected as a recognition result. The voice recognition unit 11 may appropriately use dictionary data stored in the recognition dictionary database 11a when performing this recognition. Dictionary data is data including a character string acquired as a recognition result.
- the recognition vocabulary selection unit 12 When the recognition vocabulary selection unit 12 obtains a recognition result including the main vocabulary, which is a predetermined vocabulary, by the recognition of the speech recognition unit 11, the recognition vocabulary selection unit 12 associates with the main vocabulary in advance. A plurality of candidate vocabularies are acquired, and a priority is acquired for each candidate vocabulary. Each of the plurality of candidate vocabularies is a vocabulary including the associated main vocabulary.
- the recognition vocabulary selection unit 12 selects one or more candidate vocabulary from a plurality of candidate vocabulary as one or more recognition vocabulary based on the acquired priority.
- ⁇ Summary of Embodiment 1> when a recognition result including the main vocabulary is obtained, a plurality of candidate vocabularies are acquired, and a priority is acquired for each candidate vocabulary. Based on the acquired priority, one or more candidate vocabularies are selected from a plurality of candidate vocabularies as one or more recognition vocabularies. According to such a configuration, a plurality of candidate vocabularies can be narrowed down to vocabulary intended by the user based on the priority. Therefore, the recognition accuracy of the speech recognition apparatus 1 can be improved, and the confusion of the user that has occurred when many vocabularies are notified to the user can be suppressed.
- FIG. 2 is a block diagram showing a configuration of the recognition vocabulary selection unit 12 included in the speech recognition apparatus 1 according to Embodiment 2 of the present invention.
- the same or similar constituent elements as those in the first embodiment are denoted by the same reference numerals, and different constituent elements will be mainly described.
- the recognition vocabulary selection unit 12 included in the speech recognition apparatus 1 of FIG. 2 includes a display vocabulary database 12a, a result comparison unit 12b, a priority database 12c, a priority calculation unit 12d, a determination information database 12e, and a recognition vocabulary update. Part 12f.
- FIG. 3 is a diagram showing an example of information stored in the display vocabulary database 12a.
- the display vocabulary database 12a includes a main vocabulary such as “BP” and a plurality of display vocabularies such as “BP”, “BP (fuel station)”, and “BP (diesel)”. The associated information is stored.
- the main vocabulary includes, for example, the same place name given to a plurality of different places, the same name given to a plurality of facilities, the same abbreviation given to a plurality of different formal names, and names similar to these. Etc. apply.
- the display vocabulary corresponds to the candidate vocabulary described in the first embodiment.
- the plurality of display vocabularies include the main vocabulary itself and a vocabulary obtained by combining the main vocabulary with an attached vocabulary that combines the main vocabulary and details the main vocabulary.
- parentheses are appropriately added and postfix information that is an attached vocabulary following the main vocabulary is used.
- the recognition result is input from the speech recognition unit 11 to the result comparison unit 12b in FIG.
- the result comparison unit 12b acquires a plurality of display vocabulary associated with the main vocabulary from the display vocabulary database 12a.
- the recognition result is the main body vocabulary “BP” itself.
- the result comparison unit 12b acquires display vocabularies “BP”, “BP (fuel station)”, and “BP (diesel)” associated with the main vocabulary “BP”.
- the result comparison unit 12b acquires the display vocabulary "BP", "BP (fuel station)", and "BP (diesel)" associated with the main vocabulary "BP" included in the "BP station”. To do.
- the first example and the second example are the same.
- the result comparison unit 12b acquires a degree of coincidence that is the degree to which each display vocabulary matches the recognition result based on the recognition result of the speech recognition unit 11 and a plurality of display vocabularies.
- the degree of coincidence is divided into three stages of the first degree, the second degree, and the third degree.
- the first degree means that the displayed vocabulary completely matches the recognition result.
- the second degree means that in the display vocabulary that combines the main body vocabulary and the postfix information, the main body vocabulary matches part of the recognition result, and part of the postfix information matches the rest of the recognition result.
- the third degree means that in the display vocabulary that combines the body vocabulary and the postscript information, the body vocabulary matches a part of the recognition result, but the postfix information does not partially match the rest of the recognition result. To do.
- the result comparison unit 12b acquires the first degree for the display vocabulary “BP”, and obtains “BP (fuel station)” and “BP ( The third degree is obtained for the display vocabulary “diesel)”.
- the result comparison unit 12b acquires the first degree for the display vocabulary “BP” and the second for the display vocabulary “BP (fuel station)”. The degree is obtained, and the third degree is obtained for the display vocabulary “BP (diesel)”.
- FIG. 4 is a diagram illustrating an example of information stored in the priority database 12c.
- the priority database 12c associates the degree of matching with the priority. Specifically, high priority, medium priority, and low priority are associated with the first degree, the second degree, and the third degree, respectively.
- the priority calculation unit 12d is input to the priority calculation unit 12d in FIG. 2 from the result comparison unit 12b, the recognition result of the speech recognition unit 11, a plurality of display vocabularies, and the matching degree of the plurality of display vocabularies.
- the priority calculating unit 12d acquires the priority of each display vocabulary from the priority database 12c based on the input matching degree of each display vocabulary.
- the recognition vocabulary selection unit 12 uses the recognition result and each display vocabulary to set the degree of coincidence, which is the degree to which each display vocabulary matches the recognition result, as the priority of each display vocabulary. Get as.
- FIG. 5 is a diagram illustrating an example of information stored in the determination information database 12e. As shown in FIG. 5, in the determination information database 12e, a priority is associated with a determination rule as to whether to determine as a recognized vocabulary, that is, whether to select as a recognized vocabulary.
- the recognition result of the speech recognition unit 11, a plurality of display vocabularies, and priorities of the plurality of display vocabularies are input from the priority calculation unit 12d.
- the recognized vocabulary update unit 12f selects one or more display vocabulary from the plurality of display vocabularies as one or more recognized vocabulary according to the determination rule of the determination information database 12e.
- the selected recognition vocabulary is displayed on, for example, a display device (not shown), or is output as voice by a voice output device (not shown).
- the recognized vocabulary update unit 12f can exclude a plurality of display vocabularies other than the one or more recognized vocabulary words in any of the subsequent selections. It has become.
- the recognition vocabulary update unit 12f according to the second embodiment selects one or more recognition vocabularies, the recognition vocabulary update unit 12f continuously stores the one or more recognition vocabularies in the display vocabulary database 12a. A plurality of display vocabularies other than the above recognized vocabulary are deleted from the display vocabulary database 12a. In this case, the recognized vocabulary update unit 12f can exclude a plurality of display vocabularies other than the selected one or more recognized vocabularies in the next selection.
- the recognized vocabulary update unit 12f is not limited to this.
- the recognized vocabulary update unit 12f may not immediately delete the display vocabulary that has not been selected once as the recognized vocabulary from the display vocabulary database 12a. Then, the recognized vocabulary update unit 12f may delete from the display vocabulary database 12a the display vocabulary that has not been continuously selected a plurality of times that is predetermined as the recognized vocabulary. In this case, the recognized vocabulary update unit 12f can exclude a plurality of display vocabularies other than the selected one or more recognized vocabularies in the selection after the next selection.
- FIG. 6 is a flowchart showing the operation of the speech recognition apparatus 1 according to the second embodiment.
- step S1 the speech recognition unit 11 recognizes the input speech and outputs the recognition result to the result comparison unit 12b of the recognition vocabulary selection unit 12.
- step S2 the result comparison unit 12b refers to the display vocabulary database 12a, and acquires a plurality of display vocabularies and the degree of coincidence of the plurality of display vocabularies based on the recognition result from the speech recognition unit 11. To do. Then, the result comparison unit 12b outputs the recognition result of the voice recognition unit 11, the plurality of display vocabularies, and the degree of coincidence of the plurality of display vocabularies to the priority calculation unit 12d.
- step S3 the priority calculation unit 12d acquires the priority of each display vocabulary based on the matching degree of each display vocabulary from the result comparison unit 12b while referring to the priority database 12c. Then, the priority calculation unit 12d outputs the recognition result of the speech recognition unit 11, the plurality of display vocabularies, and the respective priorities of the plurality of display vocabularies to the recognition vocabulary update unit 12f.
- step S4 the recognized vocabulary update unit 12f selects and selects one or more recognized vocabulary from a plurality of display vocabularies based on the priority from the priority calculation unit 12d while referring to the determination information database 12e.
- the recognized vocabulary is output to a display device (not shown).
- the recognized vocabulary update unit 12f deletes a plurality of display vocabularies other than the selected one or more recognized vocabularies from the display vocabulary database 12a. Thereafter, the operation of FIG. 6 ends.
- FIG 7 and 8 are diagrams showing the operation results of the first example and the second example described above.
- the plurality of display vocabularies include the main vocabulary itself and a vocabulary obtained by combining the main vocabulary with the postfix information combined with the main vocabulary to make the main vocabulary detailed.
- the body vocabulary “BP” is selected as the display vocabulary in both the first example and the second example.
- the main body vocabulary can be selected as the display vocabulary regardless of the content of the recognition result.
- the speech recognition apparatus 1 when one or more recognition vocabulary is selected, the speech recognition apparatus 1 according to the second embodiment excludes a plurality of display vocabularies other than the one or more recognition vocabulary in any selection after the next selection. It is possible. According to such a configuration, it is possible to reduce processing for selecting a recognized vocabulary from a plurality of display vocabularies in any selection after the next selection. Therefore, the processing load of the speech recognition apparatus 1 can be reduced.
- the speech recognition apparatus 1 acquires the matching degree of each display vocabulary as the priority of each display vocabulary. According to such a configuration, it is possible to narrow the display vocabulary corresponding to the vocabulary intended by the user by speaking. Therefore, the recognition accuracy of the speech recognition apparatus 1 can be increased, and user confusion can be suppressed.
- the degree of coincidence and the priority are divided into three stages.
- the present invention is not limited to this, and the degree of coincidence and priority may be divided into two stages or may be divided into four or more stages.
- FIG. 9 is a block diagram showing a configuration of the recognition vocabulary selection unit 12 included in the speech recognition apparatus 1 according to Embodiment 3 of the present invention.
- constituent elements described in the third embodiment constituent elements that are the same as or similar to those in the second embodiment are denoted by the same reference numerals, and different constituent elements are mainly described.
- the voice recognition device 1 in FIG. 9 according to the third embodiment is used in a vehicle. 9 includes a vehicle information database 12g and a display vocabulary update unit 12h, in addition to the block configuration (FIG. 2) of the recognition vocabulary selection unit 12 according to the second embodiment.
- the recognition vocabulary selection unit 12 configured in this way selects one or more recognition vocabularies from the plurality of display vocabularies based on the vehicle information that is information of the vehicle and the priority of the plurality of display vocabularies. This will be described in detail below.
- FIG. 10 is a diagram showing an example of information stored in the display vocabulary database 12a.
- the information shown in FIG. 3 described in the second embodiment and the domain are associated with each other.
- the domain is a kind of vehicle information, and for the domain, for example, information related to vehicle specifications is used.
- the result comparison unit 12b in FIG. 9 obtains a plurality of display vocabulary previously associated with the main vocabulary and the domains of the plurality of display vocabularies. Obtained from the display vocabulary database 12a. In addition, the result comparison unit 12b also acquires the matching degree of each display vocabulary as in the second embodiment.
- FIG. 11 is a diagram showing an example of information stored in the vehicle information database 12g.
- a domain and any one of valid and invalid regarding the display vocabulary are associated with each other.
- the information shown in FIG. 11 may be set in advance by the user or the like, or may be automatically changed by the voice recognition device 1 or the like based on the travel history of the vehicle. For example, when it is recorded as the travel history that the number of times the vehicle has stopped at the gas oil filling station is larger than the number of times the vehicle has stopped at the gasoline filling station, the voice recognition device 1 displays “ The “valid” in “fuel station” may be changed to “invalid”, and the “invalid” in “diesel” in FIG. 11 may be changed to “valid”.
- the display vocabulary update unit 12h in FIG. 9 includes, from the result comparison unit 12b, a recognition result of the speech recognition unit 11, a plurality of display vocabularies, a degree of coincidence between the plurality of display vocabularies, a domain of the plurality of display vocabularies Is entered.
- the display vocabulary update unit 12h updates the display vocabulary to be output to the priority calculation unit 12d based on the input domain and the information in the vehicle information database 12g.
- the display vocabulary update unit 12h displays the display vocabulary “BP” and “BP (fuel station)” having “fuel station” associated with “valid” in the information of FIG.
- the recognition result of the voice recognition unit 11 are output to the priority calculation unit 12d.
- the display vocabulary update unit 12h outputs the display vocabulary “BP (diesel)” whose domain is “diesel” associated with “invalid” in the information of FIG. 11 and its matching degree to the priority calculation unit 12d. do not do.
- the configurations of the priority database 12c, the priority calculation unit 12d, the determination information database 12e, and the recognized vocabulary update unit 12f are the same as those in the second embodiment.
- FIG. 12 is a flowchart showing the operation of the speech recognition apparatus 1 according to the third embodiment.
- step S 11 as in step S 1 of FIG. 6, the speech recognition unit 11 recognizes the input speech and outputs the recognition result to the result comparison unit 12 b of the recognition vocabulary selection unit 12.
- the result comparison unit 12b refers to the display vocabulary database 12a, and based on the recognition result from the speech recognition unit 11, a plurality of display vocabularies, the degree of coincidence of the plurality of display vocabularies, Get each domain of the displayed vocabulary. Then, the result comparison unit 12b outputs the recognition result of the speech recognition unit 11, the plurality of display vocabularies, and the degree of matching and the domain of each of the plurality of display vocabularies to the display vocabulary update unit 12h.
- step S13 the display vocabulary update unit 12h outputs the recognition result of the voice recognition unit 11 to the priority calculation unit 12d. Also, the display vocabulary update unit 12h determines, based on the domain from the result comparison unit 12b, the display vocabulary in which the domain is associated with “valid” in the vehicle information database 12g and the matching degree of the display vocabulary. It outputs to the calculation part 12d.
- the display vocabulary associated with the domain “valid” may be one or plural.
- step S14 as in step S3 of FIG. 6, the priority calculation unit 12d refers to the priority database 12c, and based on the matching degree of each display vocabulary from the display vocabulary update unit 12h, Get the priority. Then, the priority calculation unit 12d outputs the recognition result of the speech recognition unit 11, the display vocabulary, and the priority of the display vocabulary to the recognition vocabulary update unit 12f.
- step S15 as in step S4 of FIG. 6, the recognized vocabulary update unit 12f selects a recognized vocabulary from the display vocabulary based on the priority from the priority calculation unit 12d while referring to the determination information database 12e.
- the selected recognition vocabulary is output to a display device (not shown).
- the recognized vocabulary update unit 12f deletes display vocabulary other than the selected recognized vocabulary from the display vocabulary database 12a. Thereafter, the operation of FIG. 12 ends.
- one or more recognition vocabularies are selected from the plurality of display vocabularies based on the vehicle information and the priorities of the plurality of display vocabularies. According to such a configuration, the recognition accuracy of the speech recognition apparatus 1 can be further increased, and user confusion can be further suppressed.
- the recognized vocabulary selection unit 12 does not change the priority based on the vehicle information.
- the present invention is not limited to this, and the recognized vocabulary selection unit 12 may change the priority based on the vehicle information.
- the recognition vocabulary selection unit 12 may change the priority of the display vocabulary whose domain is “diesel” to “low” in step S13, and maintain the priority of the display vocabulary as it is in step S14. . In this case, the same effect as described above can be obtained.
- FIG. 13 is a block diagram showing a configuration of the recognition vocabulary selection unit 12 included in the speech recognition apparatus 1 according to Embodiment 4 of the present invention.
- constituent elements described in the fourth embodiment constituent elements that are the same as or similar to those in the second embodiment are denoted by the same reference numerals, and different constituent elements are mainly described.
- the recognition vocabulary selection unit 12 configured in this way selects one or more recognition vocabularies from the plurality of display vocabularies based on the hierarchy defined in advance for the plurality of display vocabularies and the priority of the plurality of display vocabularies. It is configured as follows. This will be described in detail below.
- FIG. 14 is a diagram showing an example of information stored in the display vocabulary database 12a.
- the information of FIG. 3 described in the second embodiment and the hierarchy of the display vocabulary are associated with each other.
- the higher the number assigned to the hierarchy, the lower the hierarchy, and the vocabulary including the concept of the lower display vocabulary is used for the display vocabulary of the upper hierarchy.
- the result comparison unit 12b in FIG. 13 displays a plurality of display vocabulary previously associated with the main vocabulary and the respective levels of the plurality of display vocabularies. Obtained from the display vocabulary database 12a. In addition, the result comparison unit 12b also acquires the matching degree of each display vocabulary as in the second embodiment.
- FIG. 15 is a diagram showing an example of information stored in the hierarchical information database 12i. As shown in FIG. 15, in the hierarchy information database 12i, the hierarchy and any one of valid and invalid regarding the display vocabulary are associated with each other. Note that the information shown in FIG. 15 may be set in advance by a user or the like, or may be automatically changed by the voice recognition device 1 or the like.
- the hierarchy reference update unit 12j in FIG. 13 includes a result comparison unit 12b, a recognition result of the speech recognition unit 11, a plurality of display vocabularies, a degree of coincidence between the plurality of display vocabularies, and a hierarchy of the plurality of display vocabularies. Is entered.
- the hierarchy reference update unit 12j updates the display vocabulary to be output to the priority calculation unit 12d based on the input hierarchy and the information in the hierarchy information database 12i.
- the hierarchy reference update unit 12j displays the display vocabulary “BP” having “1” associated with “valid” in the information of FIG. Is output to the priority calculation unit 12d.
- the hierarchy reference update unit 12j displays the display vocabulary “BP (fuel station)” and “BP (diesel)” and their matching with “2” associated with “invalid” in the information of FIG. The degree is not output to the priority calculation unit 12d.
- the configurations of the priority database 12c, the priority calculation unit 12d, the determination information database 12e, and the recognized vocabulary update unit 12f are the same as those in the second embodiment.
- FIG. 16 is a flowchart showing the operation of the speech recognition apparatus 1 according to the fourth embodiment.
- step S21 as in step S1 of FIG. 6, the speech recognition unit 11 recognizes the input speech and outputs the recognition result to the result comparison unit 12b of the recognition vocabulary selection unit 12.
- the result comparison unit 12b refers to the display vocabulary database 12a, and based on the recognition result from the speech recognition unit 11, the plurality of display vocabularies, the degree of coincidence of the plurality of display vocabularies, Get each hierarchy of display vocabulary. Then, the result comparison unit 12b outputs the recognition result of the speech recognition unit 11, the plurality of display vocabularies, and the degree of coincidence and the hierarchy of the plurality of display vocabularies to the layer reference update unit 12j.
- the hierarchy reference update unit 12j outputs the recognition result of the voice recognition unit 11 to the priority calculation unit 12d. Further, the hierarchy reference updating unit 12j determines, based on the hierarchy from the result comparison unit 12b, the display vocabulary in which the hierarchy is associated with “valid” in the hierarchy information database 12i and the matching degree of the display vocabulary. It outputs to the calculation part 12d. Note that there may be one or more display vocabulary associated with a hierarchy of “valid”.
- step S24 as in step S3 of FIG. 6, the priority calculation unit 12d refers to the priority database 12c, and based on the matching degree of each display vocabulary from the hierarchy reference update unit 12j, Get the priority. Then, the priority calculation unit 12d outputs the recognition result of the speech recognition unit 11, the display vocabulary, and the priority of the display vocabulary to the recognition vocabulary update unit 12f.
- step S25 as in step S4 of FIG. 6, the recognized vocabulary update unit 12f selects a recognized vocabulary from the display vocabulary based on the priority from the priority calculation unit 12d while referring to the determination information database 12e.
- the selected recognition vocabulary is output to a display device (not shown).
- the recognized vocabulary update unit 12f deletes display vocabulary other than the selected recognized vocabulary from the display vocabulary database 12a. Thereafter, the operation of FIG. 16 ends.
- ⁇ Summary of Embodiment 4> According to the speech recognition apparatus 1 according to the fourth embodiment as described above, at least one of the plurality of display vocabularies is selected based on the hierarchy defined in advance for the plurality of display vocabularies and the priority of the plurality of display vocabularies. Select a recognized vocabulary. According to such a configuration, the recognition accuracy of the speech recognition apparatus 1 can be further increased, and user confusion can be further suppressed.
- the recognized vocabulary selection unit 12 does not change the priority based on the hierarchy.
- the present invention is not limited to this, and the recognized vocabulary selection unit 12 may change the priority based on the hierarchy.
- the recognition vocabulary selection unit 12 may change the priority of the display vocabulary having the hierarchy “2” to “low” in step S23, and maintain the priority of the display vocabulary as it is in step S24. . In this case, the same effect as described above can be obtained.
- FIG. 17 is a block diagram showing the configuration of the speech recognition apparatus 1 according to Embodiment 5 of the present invention.
- constituent elements described in the fifth embodiment constituent elements that are the same as or similar to those in the second embodiment are denoted by the same reference numerals, and different constituent elements are mainly described.
- the recognition vocabulary selection unit 12 in FIG. 17 includes a SW (software) information database 12k and a SW restriction reference update unit 12m in addition to the block configuration (FIG. 2) of the recognition vocabulary selection unit 12 according to the second embodiment. .
- the recognition vocabulary selection unit 12 configured in this way selects one or more recognition vocabulary from a plurality of display vocabularies based on software requirements in the system using the speech recognition apparatus 1 and the priority of the plurality of display vocabularies. Is configured to do. This will be described in detail below.
- FIG. 18 is a diagram showing an example of information stored in the SW information database 12k.
- the SW information database 12k stores the number of recognized vocabulary that can be displayed by the system as a software requirement in the system using the speech recognition apparatus 1. Note that the information shown in FIG. 18 may be set in advance by a user or the like, or may be automatically changed by the voice recognition device 1 or the like based on the requirements of the software.
- the priority of the recognized vocabulary is the priority obtained for the display vocabulary that has become the recognized vocabulary.
- the SW restriction reference updating unit 12m outputs it as it is.
- the SW restriction reference updating unit 12m sets the priority of each recognized vocabulary. Lower by one. As a result, the SW restriction reference updating unit 12m can set the priority of some recognized vocabulary to “low”. After the priority is changed, the SW restriction reference update unit 12m performs the same operation as the recognition vocabulary update unit 12f using the information in the determination information database 12e, so that the priority is changed from the recognized vocabulary after the priority change to “ Select a recognition vocabulary that is “medium”. The SW restriction reference updating unit 12m selects recognition vocabulary that is less than or equal to the displayable number by appropriately changing the priority as described above.
- FIG. 19 is a flowchart showing the operation of the speech recognition apparatus 1 according to the fifth embodiment. From Steps S31 to S33, operations similar to Steps S1 to S3 in FIG. 6 are performed.
- step S34 the recognized vocabulary update unit 12f selects a recognized vocabulary from a plurality of display vocabularies based on the priority from the priority calculation unit 12d while referring to the determination information database 12e. Then, the recognized vocabulary update unit 12f outputs the selected recognized vocabulary and the priority of the recognized vocabulary to the SW restriction reference update unit 12m. The recognized vocabulary update unit 12f deletes display vocabulary other than the selected recognized vocabulary from the display vocabulary database 12a.
- step S35 the SW restriction reference updating unit 12m selects and selects a recognition vocabulary having a displayable number or less based on the recognition vocabulary and priority from the recognition vocabulary update unit 12f while referring to the SW information database 12k.
- the recognized vocabulary is output to a display device (not shown).
- the SW restriction reference update unit 12m may delete the display vocabulary that has not been output from the display vocabulary database 12a by performing deletion similar to the deletion performed by the recognized vocabulary update unit 12f. Thereafter, the operation of FIG. 19 ends.
- one of the plurality of display vocabularies is selected based on the software requirements in the system using the speech recognition device 1 and the priority of the plurality of display vocabularies. Select the above recognition vocabulary. According to such a configuration, it is possible to realize the speech recognition apparatus 1 that can automatically satisfy the requirements of the software.
- FIG. 20 is a block diagram showing a configuration of speech recognition apparatus 1 according to Embodiment 6 of the present invention.
- the same or similar constituent elements as those in the second embodiment are denoted by the same reference numerals, and different constituent elements will be mainly described.
- the recognition vocabulary selection unit 12 of FIG. 20 includes an HW (hardware) information database 12n and an HW restriction reference update unit 12o in addition to the block configuration (FIG. 2) of the recognition vocabulary selection unit 12 according to the second embodiment.
- HW hardware
- HW restriction reference update unit 12o in addition to the block configuration (FIG. 2) of the recognition vocabulary selection unit 12 according to the second embodiment.
- the recognition vocabulary selection unit 12 configured as described above selects one or more recognition vocabularies from a plurality of display vocabularies based on hardware requirements in the system using the speech recognition apparatus 1 and the priorities of the plurality of display vocabularies. Configured to select. This will be described in detail below.
- FIG. 21 is a diagram showing an example of information stored in the HW information database 12n.
- the number of display vocabularies that can be stored in the future by a memory (not shown) of the system is stored as a hardware requirement in the system using the speech recognition apparatus 1.
- the information shown in FIG. 21 may be set in advance by a user or the like, or may be automatically changed by the voice recognition apparatus 1 or the like based on the hardware requirements.
- the recognition vocabulary and the priority of the recognition vocabulary are input to the HW restriction reference update unit 12o from the recognition vocabulary update unit 12f.
- the HW restriction reference update unit 12o When the number of recognized vocabulary input from the recognized vocabulary update unit 12f is equal to or less than the storable number stored in the HW information database 12n, the HW restriction reference update unit 12o outputs the same as it is.
- the HW restriction reference updating unit 12o sets the priority of each recognized vocabulary. Lower by one. As a result, the HW restriction reference updating unit 12o can set the priority of some recognized vocabularies to “low”. After changing the priority, the HW restriction reference update unit 12o performs the same operation as the recognized vocabulary update unit 12f using the information in the determination information database 12e, so that the priority is changed from the recognized vocabulary after the priority change to “ Select a recognition vocabulary that is “medium”. The HW restriction reference update unit 12o selects recognition vocabulary that is less than or equal to the storable number by appropriately changing the priority as described above.
- FIG. 22 is a flowchart showing the operation of the speech recognition apparatus 1 according to the sixth embodiment. From Steps S41 to S43, the same operation as Steps S1 to S3 in FIG. 6 is performed.
- the recognized vocabulary update unit 12f selects a recognized vocabulary from a plurality of display vocabularies based on the priority from the priority calculation unit 12d while referring to the determination information database 12e. Then, the recognized vocabulary update unit 12f outputs the selected recognized vocabulary and the priority of the recognized vocabulary to the HW restriction reference update unit 12o. The recognized vocabulary update unit 12f deletes display vocabulary other than the selected recognized vocabulary from the display vocabulary database 12a.
- the HW restriction reference updating unit 12o refers to the HW information database 12n, selects and selects recognition vocabulary less than the storable number based on the recognition vocabulary and priority from the recognition vocabulary update unit 12f.
- the recognized vocabulary is output to a display device (not shown).
- the HW restriction reference update unit 12o may delete the display vocabulary that has not been output from the display vocabulary database 12a by performing deletion similar to the deletion performed by the recognized vocabulary update unit 12f. Thereafter, the operation of FIG. 22 ends.
- ⁇ Summary of Embodiment 6> According to the speech recognition apparatus 1 according to the sixth embodiment as described above, based on the hardware requirements in the system using the speech recognition apparatus 1 and the priorities of the plurality of display vocabularies, the plurality of display vocabularies are used. Select one or more recognition vocabularies. According to such a configuration, the speech recognition apparatus 1 that can automatically satisfy the hardware requirements can be realized.
- the speech recognition unit 11 and the recognition vocabulary selection unit 12 in the speech recognition apparatus 1 described above are hereinafter referred to as “speech recognition unit 11 etc.”.
- the voice recognition unit 11 and the like are realized by a processing circuit 81 shown in FIG. That is, the processing circuit 81 recognizes the input speech, and when the recognition result including the body vocabulary which is a predetermined vocabulary is obtained by the recognition of the speech recognition unit 11, respectively. Includes a main vocabulary, obtains a plurality of candidate vocabulary previously associated with the main vocabulary, obtains a priority for each candidate vocabulary, and based on the obtained priority, obtains one or more candidates from the plurality of candidate vocabularies A recognition vocabulary selection unit 12 that selects a vocabulary as one or more recognition vocabularies.
- Dedicated hardware may be applied to the processing circuit 81, or a processor that executes a program stored in the memory may be applied.
- the processor corresponds to, for example, a central processing unit, a processing unit, an arithmetic unit, a microprocessor, a microcomputer, a DSP (Digital Signal Processor) and the like.
- the processing circuit 81 When the processing circuit 81 is dedicated hardware, the processing circuit 81 includes, for example, a single circuit, a composite circuit, a programmed processor, a parallel programmed processor, an ASIC (Application Specific Integrated Circuit), an FPGA (Field Programmable Gate). Array) or a combination thereof.
- Each function of each unit such as the speech recognition unit 11 may be realized by a circuit in which processing circuits are distributed, or the function of each unit may be realized by a single processing circuit.
- the processing circuit 81 When the processing circuit 81 is a processor, the functions of the voice recognition unit 11 and the like are realized by a combination with software or the like.
- the software or the like corresponds to, for example, software, firmware, or software and firmware.
- Software or the like is described as a program and stored in a memory.
- the processor 82 applied to the processing circuit 81 reads out and executes the program stored in the memory 83, thereby realizing the functions of the respective units. That is, when the speech recognition apparatus 1 is executed by the processing circuit 81, the step of recognizing the input speech and the recognition result including the main vocabulary which is a predetermined vocabulary are obtained by the recognition.
- the memory 83 is, for example, non-volatile such as RAM (Random Access Memory), ROM (Read Only Memory), flash memory, EPROM (Erasable Programmable Read Only Memory), EEPROM (Electrically Erasable Programmable Read Only Memory), etc.
- all storage media such as volatile semiconductor memory, HDD (Hard Disk Drive), magnetic disk, flexible disk, optical disk, compact disk, mini disk, DVD (Digital Versatile Disk) and its drive device are applicable.
- each function of the voice recognition unit 11 and the like is realized by either hardware or software.
- the present invention is not limited to this, and a configuration in which a part of the voice recognition unit 11 or the like is realized by dedicated hardware and another part is realized by software or the like.
- the function of the speech recognition unit 11 is realized by a processing circuit as dedicated hardware, and the processing circuit 81 as the processor 82 reads and executes the program stored in the memory 83 for the other functions.
- a function can be realized.
- the processing circuit 81 can realize the above functions by hardware, software, or the like, or a combination thereof.
- the voice recognition device described above includes a navigation device such as PND (Portable Navigation Device), a communication terminal including a mobile terminal such as a mobile phone, a smartphone, and a tablet, a function of an application installed in these, a server Can be applied to a speech recognition system constructed as a system by appropriately combining the above.
- a navigation device such as PND (Portable Navigation Device)
- a communication terminal including a mobile terminal such as a mobile phone, a smartphone, and a tablet
- a function of an application installed in these a server Can be applied to a speech recognition system constructed as a system by appropriately combining the above.
- each function or each component of the speech recognition apparatus described above may be distributed and arranged in each device that constructs the system, or may be concentrated on any device. .
- FIG. 25 is a block diagram showing a configuration of the server 51 according to this modification.
- the server 51 of FIG. 25 includes a communication unit 51a, a voice recognition unit 51b, and a recognition vocabulary selection unit 51c, and can perform wireless communication with the navigation device 53 of the vehicle 52.
- the communication unit 51a receives the voice data acquired by the navigation device 53 by performing wireless communication with the navigation device 53.
- the speech recognition unit 51b and the recognition vocabulary selection unit 51c are configured such that the processor (not shown) of the server 51 executes a program stored in a storage device (not shown) of the server 51, so that the speech recognition unit 11 and the recognition vocabulary selection of FIG. It has the same function as the unit 12. That is, the voice recognition unit 51b recognizes the voice data of the communication unit 51a.
- the recognition vocabulary selection unit 51c acquires a plurality of display vocabularies and priorities of the plurality of display vocabularies based on the recognition result of the speech recognition unit 51b, and selects the recognition vocabulary based on the priorities of the plurality of display vocabularies. To do. Then, the communication unit 51a transmits the recognized vocabulary selected by the recognized vocabulary selecting unit 51c to the navigation device 53.
- the server 51 configured in this way, for example, even if the navigation device 53 has only a display function and a communication function with the server 51, it is the same as the voice recognition device 1 described in the first embodiment. The effect of can be obtained.
- FIG. 26 is a block diagram showing the configuration of the communication terminal 56 according to this modification.
- 26 includes a communication unit 56a, a speech recognition unit 56b, and a recognized vocabulary selection unit 56c similar to the communication unit 51a, the speech recognition unit 51b, and the recognized vocabulary selection unit 51c, and a navigation device 58 of the vehicle 57.
- Wireless communication is possible.
- the communication terminal 56 for example, a mobile terminal such as a mobile phone, a smartphone, and a tablet carried by the driver of the vehicle 57 is applied.
- the communication terminal 56 configured in this way, for example, even if the navigation device 58 has only a display function and a communication function with the communication terminal 56, the voice recognition device 1 described in the first embodiment. The same effect can be obtained.
- the present invention can be freely combined with each embodiment and each modification within the scope of the invention, or can be appropriately modified and omitted with each embodiment and each modification.
- 1 speech recognition device 11 speech recognition unit, 12 recognition vocabulary selection unit.
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Navigation (AREA)
Abstract
L'objet de la présente invention est de fournir une technologie apte à améliorer la précision de reconnaissance d'un dispositif de reconnaissance vocale. Ledit dispositif de reconnaissance vocale comprend : une unité de reconnaissance vocale qui reconnaît une voix d'entrée ; et une unité de sélection de mot reconnu qui, lorsqu'un résultat de reconnaissance comprenant un mot principal qui est un mot prédéterminé est acquis suite à une reconnaissance par l'unité de reconnaissance vocale, acquiert une pluralité de mots candidats, dont chacun comprend le mot principal et est pré-associé au mot principal, acquiert une priorité pour chacun des mots candidats, et sélectionne, en tant qu'un ou plusieurs mots reconnus, un ou plusieurs mots candidats parmi la pluralité de mots candidats sur la base des priorités acquises.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/JP2016/085689 WO2018100705A1 (fr) | 2016-12-01 | 2016-12-01 | Dispositif de reconnaissance vocale et procédé de reconnaissance vocale |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/JP2016/085689 WO2018100705A1 (fr) | 2016-12-01 | 2016-12-01 | Dispositif de reconnaissance vocale et procédé de reconnaissance vocale |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2018100705A1 true WO2018100705A1 (fr) | 2018-06-07 |
Family
ID=62242804
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2016/085689 WO2018100705A1 (fr) | 2016-12-01 | 2016-12-01 | Dispositif de reconnaissance vocale et procédé de reconnaissance vocale |
Country Status (1)
Country | Link |
---|---|
WO (1) | WO2018100705A1 (fr) |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2006071791A (ja) * | 2004-08-31 | 2006-03-16 | Fuji Heavy Ind Ltd | 車両の音声認識装置 |
JP2007101892A (ja) * | 2005-10-04 | 2007-04-19 | Denso Corp | 音声認識装置 |
JP2008134503A (ja) * | 2006-11-29 | 2008-06-12 | Nissan Motor Co Ltd | 音声認識装置、および音声認識方法 |
JP2008134502A (ja) * | 2006-11-29 | 2008-06-12 | Nissan Motor Co Ltd | 音声認識装置、および音声認識方法 |
JP2014142465A (ja) * | 2013-01-23 | 2014-08-07 | Canon Inc | 音響モデル生成装置及び方法、並びに音声認識装置及び方法 |
-
2016
- 2016-12-01 WO PCT/JP2016/085689 patent/WO2018100705A1/fr active Application Filing
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2006071791A (ja) * | 2004-08-31 | 2006-03-16 | Fuji Heavy Ind Ltd | 車両の音声認識装置 |
JP2007101892A (ja) * | 2005-10-04 | 2007-04-19 | Denso Corp | 音声認識装置 |
JP2008134503A (ja) * | 2006-11-29 | 2008-06-12 | Nissan Motor Co Ltd | 音声認識装置、および音声認識方法 |
JP2008134502A (ja) * | 2006-11-29 | 2008-06-12 | Nissan Motor Co Ltd | 音声認識装置、および音声認識方法 |
JP2014142465A (ja) * | 2013-01-23 | 2014-08-07 | Canon Inc | 音響モデル生成装置及び方法、並びに音声認識装置及び方法 |
Non-Patent Citations (1)
Title |
---|
YONGGEE JANG ET AL.: "Speech interface on combination of search candidates from the common word parts", IEICE TECHNICAL REPORT, vol. 109, no. 355, 14 December 2009 (2009-12-14), pages 219 - 224 * |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US8412455B2 (en) | Voice-controlled navigation device and method | |
US9805722B2 (en) | Interactive speech recognition system | |
US20120290303A1 (en) | Speech recognition system and method based on word-level candidate generation | |
US10514268B2 (en) | Search system | |
US20150120288A1 (en) | System and method of performing automatic speech recognition using local private data | |
JP2012230670A (ja) | 戻ることによって誤認識を修正するシステム、方法及びコンピュータプログラム | |
US20120239399A1 (en) | Voice recognition device | |
CN106233246A (zh) | 用户界面系统、用户界面控制装置、用户界面控制方法和用户界面控制程序 | |
JP5705312B2 (ja) | 情報機器 | |
US20190115015A1 (en) | Vehicular voice recognition system and method for controlling the same | |
JP6896335B2 (ja) | 音声認識装置および音声認識方法 | |
CN103635961B (zh) | 发音信息生成装置、车载信息装置以及单词串信息处理方法 | |
KR20120052591A (ko) | 연속어 음성인식 시스템에서 오류수정 장치 및 방법 | |
KR102069700B1 (ko) | 특화영역 교체형 음성인식 시스템, 모바일 장치 및 그 방법 | |
WO2018100705A1 (fr) | Dispositif de reconnaissance vocale et procédé de reconnaissance vocale | |
WO2018073907A1 (fr) | Système de reconnaissance vocale et procédé de reconnaissance vocale | |
JP5396426B2 (ja) | 音声認識装置、音声認識方法及び音声認識プログラム | |
US9704479B2 (en) | Speech recognition device | |
JP2003162293A (ja) | 音声認識装置及び方法 | |
JP4926689B2 (ja) | 施設検索装置 | |
US11107474B2 (en) | Character input device, character input method, and character input program | |
US10915565B2 (en) | Retrieval result providing device and retrieval result providing method | |
EP3292376B1 (fr) | Approche de commutation automatique de données selon une solution de navigation embarquée utilisant une entrée vocale de destination (vde) | |
JP7038919B2 (ja) | 多言語音声認識装置および多言語音声認識方法 | |
WO2016136208A1 (fr) | Dispositif d'interaction vocale, système d'interaction vocale, procédé de commande de dispositif d'interaction vocale |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 16922780 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 16922780 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: JP |