[go: up one dir, main page]

WO2018100705A1 - Voice recognition device and voice recognition method - Google Patents

Voice recognition device and voice recognition method Download PDF

Info

Publication number
WO2018100705A1
WO2018100705A1 PCT/JP2016/085689 JP2016085689W WO2018100705A1 WO 2018100705 A1 WO2018100705 A1 WO 2018100705A1 JP 2016085689 W JP2016085689 W JP 2016085689W WO 2018100705 A1 WO2018100705 A1 WO 2018100705A1
Authority
WO
WIPO (PCT)
Prior art keywords
vocabulary
recognition
display
vocabularies
unit
Prior art date
Application number
PCT/JP2016/085689
Other languages
French (fr)
Japanese (ja)
Inventor
昭男 堀井
Original Assignee
三菱電機株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 三菱電機株式会社 filed Critical 三菱電機株式会社
Priority to PCT/JP2016/085689 priority Critical patent/WO2018100705A1/en
Publication of WO2018100705A1 publication Critical patent/WO2018100705A1/en

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/10Speech classification or search using distance or distortion measures between unknown speech and reference templates
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/28Constructional details of speech recognition systems
    • G10L15/32Multiple recognisers used in sequence or in parallel; Score combination systems therefor, e.g. voting systems

Definitions

  • the present invention relates to a speech recognition apparatus and speech recognition method for recognizing speech.
  • a recognition vocabulary (or a feature amount of a recognition vocabulary) obtained from speech and a function are stored in association with each other, and the same utterance as the recognition vocabulary corresponding to the stored speech is performed. If it is, the function associated with the voice is executed.
  • the recognition result may be ambiguous.
  • a device that uses the recognition result of the speech recognition device performs a search using the recognition result of the speech recognition device
  • the recognition result may be ambiguous.
  • a facility search it becomes ambiguous whether the facility name “BP” is “BP” in the facility category “fuel station” or “BP” in the facility category “diesel”. May end up.
  • a place name search it is ambiguous whether the city name “Munchen” is a city in the state “Bavaria” or a city in the state “Hutthum”.
  • the search results are “BP (fuel station)” and “BP (diesel)”, and for the above (ii), for example, the search results are “Munchen ( Bavaria) ”and“ Munchen (Hutthrum) ”. That is, it is possible to give information to distinguish the search results presented to the user. For example, in the technique of Patent Document 2, it is possible to use a recognition vocabulary including information for this distinction.
  • JP 2003-323192 A Japanese Patent No. 4554272
  • Patent Document 1 and Patent Document 2 are biased toward adding recognition vocabulary.
  • an increase in the number of recognized vocabulary causes a reduction in recognition accuracy of the speech recognition apparatus.
  • the present invention has been made in view of the above-described problems, and an object thereof is to provide a technique capable of improving the recognition accuracy of a speech recognition apparatus.
  • the speech recognition apparatus recognizes a speech recognition unit that recognizes input speech and a recognition result including a main vocabulary that is a predetermined vocabulary by recognition of the speech recognition unit. Includes a main vocabulary, obtains a plurality of candidate vocabulary previously associated with the main vocabulary, obtains a priority for each candidate vocabulary, and based on the obtained priority, obtains one or more candidates from the plurality of candidate vocabularies A recognition vocabulary selection unit that selects a vocabulary as one or more recognition vocabularies.
  • a recognition result including the main vocabulary is obtained by recognition
  • a plurality of candidate vocabularies are acquired, and priorities are acquired for each candidate vocabulary, and a plurality of candidates are acquired based on the acquired priorities.
  • One or more candidate vocabularies are selected as one or more recognition vocabularies from the candidate vocabularies.
  • FIG. 1 is a block diagram showing a configuration of a speech recognition apparatus according to Embodiment 1.
  • FIG. 6 is a block diagram showing a configuration of a recognized vocabulary selection unit according to Embodiment 2.
  • FIG. It is a figure which shows an example of the information of the display vocabulary database which concerns on Embodiment 2.
  • FIG. It is a figure which shows an example of the information of the priority database which concerns on Embodiment 2.
  • FIG. 6 is a flowchart showing the operation of the speech recognition apparatus according to the second embodiment.
  • FIG. 10 is a diagram illustrating an operation result of the first example of the speech recognition apparatus according to the second embodiment.
  • FIG. 10 is a block diagram showing a configuration of a recognized vocabulary selection unit according to Embodiment 3.
  • FIG. It is a figure which shows an example of the information of the display vocabulary database which concerns on Embodiment 3.
  • FIG. It is a figure which shows an example of the information of the vehicle information database which concerns on Embodiment 3.
  • 10 is a flowchart showing the operation of the speech recognition apparatus according to the third embodiment.
  • FIG. 10 is a block diagram illustrating a configuration of a recognized vocabulary selection unit according to Embodiment 4. It is a figure which shows an example of the information of the display vocabulary database which concerns on Embodiment 4.
  • FIG. 10 is a block diagram showing a configuration of a recognized vocabulary selection unit according to Embodiment 4. It is a figure which shows an example of the information of the display vocabulary database which concerns on Embodiment 4.
  • FIG. 10 is a flowchart showing the operation of the speech recognition apparatus according to the fourth embodiment.
  • FIG. 10 is a block diagram illustrating a configuration of a recognized vocabulary selection unit according to Embodiment 5. It is a figure which shows an example of the information of SW information database which concerns on Embodiment 5.
  • FIG. 10 is a flowchart showing the operation of the speech recognition apparatus according to the fifth embodiment.
  • FIG. 10 is a block diagram showing a configuration of a recognized vocabulary selection unit according to Embodiment 6. It is a figure which shows an example of the information of the HW information database which concerns on Embodiment 6.
  • FIG. 10 is a flowchart showing the operation of the speech recognition apparatus according to the fourth embodiment.
  • FIG. 10 is a block diagram illustrating a configuration of a recognized vocabulary selection unit according to Embodiment 5. It is a figure which shows an example of the information of the HW information database which concerns on Embodiment 6.
  • FIG. 14 is a flowchart showing the operation of the speech recognition apparatus according to the sixth embodiment. It is a block diagram which shows the hardware constitutions of the navigation apparatus which concerns on another modification. It is a block diagram which shows the hardware constitutions of the navigation apparatus which concerns on another modification. It is a block diagram which shows the structure of the server which concerns on another modification. It is a block diagram which shows the structure of the communication terminal which concerns on another modification.
  • FIG. 1 is a block diagram showing a configuration of a speech recognition apparatus 1 according to Embodiment 1 of the present invention.
  • the speech recognition apparatus 1 in FIG. 1 includes a speech recognition unit 11 and a recognition vocabulary selection unit 12.
  • the voice recognition unit 11 recognizes the input voice. For example, the speech recognition unit 11 sequentially converts the input speech into an analog speech signal and a digital speech signal, and recognizes a character string and a phrase corresponding to the digital speech signal based on the digital speech signal. Get as. Note that, using the technique described in Japanese Patent Laid-Open No. 9-50291, the speech recognition unit 11 recognizes a vocabulary recognized based on a vocabulary recognized by speech recognition, that is, a vocabulary most likely to be acoustically and linguistically generated by the user. May be selected as a recognition result. The voice recognition unit 11 may appropriately use dictionary data stored in the recognition dictionary database 11a when performing this recognition. Dictionary data is data including a character string acquired as a recognition result.
  • the recognition vocabulary selection unit 12 When the recognition vocabulary selection unit 12 obtains a recognition result including the main vocabulary, which is a predetermined vocabulary, by the recognition of the speech recognition unit 11, the recognition vocabulary selection unit 12 associates with the main vocabulary in advance. A plurality of candidate vocabularies are acquired, and a priority is acquired for each candidate vocabulary. Each of the plurality of candidate vocabularies is a vocabulary including the associated main vocabulary.
  • the recognition vocabulary selection unit 12 selects one or more candidate vocabulary from a plurality of candidate vocabulary as one or more recognition vocabulary based on the acquired priority.
  • ⁇ Summary of Embodiment 1> when a recognition result including the main vocabulary is obtained, a plurality of candidate vocabularies are acquired, and a priority is acquired for each candidate vocabulary. Based on the acquired priority, one or more candidate vocabularies are selected from a plurality of candidate vocabularies as one or more recognition vocabularies. According to such a configuration, a plurality of candidate vocabularies can be narrowed down to vocabulary intended by the user based on the priority. Therefore, the recognition accuracy of the speech recognition apparatus 1 can be improved, and the confusion of the user that has occurred when many vocabularies are notified to the user can be suppressed.
  • FIG. 2 is a block diagram showing a configuration of the recognition vocabulary selection unit 12 included in the speech recognition apparatus 1 according to Embodiment 2 of the present invention.
  • the same or similar constituent elements as those in the first embodiment are denoted by the same reference numerals, and different constituent elements will be mainly described.
  • the recognition vocabulary selection unit 12 included in the speech recognition apparatus 1 of FIG. 2 includes a display vocabulary database 12a, a result comparison unit 12b, a priority database 12c, a priority calculation unit 12d, a determination information database 12e, and a recognition vocabulary update. Part 12f.
  • FIG. 3 is a diagram showing an example of information stored in the display vocabulary database 12a.
  • the display vocabulary database 12a includes a main vocabulary such as “BP” and a plurality of display vocabularies such as “BP”, “BP (fuel station)”, and “BP (diesel)”. The associated information is stored.
  • the main vocabulary includes, for example, the same place name given to a plurality of different places, the same name given to a plurality of facilities, the same abbreviation given to a plurality of different formal names, and names similar to these. Etc. apply.
  • the display vocabulary corresponds to the candidate vocabulary described in the first embodiment.
  • the plurality of display vocabularies include the main vocabulary itself and a vocabulary obtained by combining the main vocabulary with an attached vocabulary that combines the main vocabulary and details the main vocabulary.
  • parentheses are appropriately added and postfix information that is an attached vocabulary following the main vocabulary is used.
  • the recognition result is input from the speech recognition unit 11 to the result comparison unit 12b in FIG.
  • the result comparison unit 12b acquires a plurality of display vocabulary associated with the main vocabulary from the display vocabulary database 12a.
  • the recognition result is the main body vocabulary “BP” itself.
  • the result comparison unit 12b acquires display vocabularies “BP”, “BP (fuel station)”, and “BP (diesel)” associated with the main vocabulary “BP”.
  • the result comparison unit 12b acquires the display vocabulary "BP", "BP (fuel station)", and "BP (diesel)" associated with the main vocabulary "BP" included in the "BP station”. To do.
  • the first example and the second example are the same.
  • the result comparison unit 12b acquires a degree of coincidence that is the degree to which each display vocabulary matches the recognition result based on the recognition result of the speech recognition unit 11 and a plurality of display vocabularies.
  • the degree of coincidence is divided into three stages of the first degree, the second degree, and the third degree.
  • the first degree means that the displayed vocabulary completely matches the recognition result.
  • the second degree means that in the display vocabulary that combines the main body vocabulary and the postfix information, the main body vocabulary matches part of the recognition result, and part of the postfix information matches the rest of the recognition result.
  • the third degree means that in the display vocabulary that combines the body vocabulary and the postscript information, the body vocabulary matches a part of the recognition result, but the postfix information does not partially match the rest of the recognition result. To do.
  • the result comparison unit 12b acquires the first degree for the display vocabulary “BP”, and obtains “BP (fuel station)” and “BP ( The third degree is obtained for the display vocabulary “diesel)”.
  • the result comparison unit 12b acquires the first degree for the display vocabulary “BP” and the second for the display vocabulary “BP (fuel station)”. The degree is obtained, and the third degree is obtained for the display vocabulary “BP (diesel)”.
  • FIG. 4 is a diagram illustrating an example of information stored in the priority database 12c.
  • the priority database 12c associates the degree of matching with the priority. Specifically, high priority, medium priority, and low priority are associated with the first degree, the second degree, and the third degree, respectively.
  • the priority calculation unit 12d is input to the priority calculation unit 12d in FIG. 2 from the result comparison unit 12b, the recognition result of the speech recognition unit 11, a plurality of display vocabularies, and the matching degree of the plurality of display vocabularies.
  • the priority calculating unit 12d acquires the priority of each display vocabulary from the priority database 12c based on the input matching degree of each display vocabulary.
  • the recognition vocabulary selection unit 12 uses the recognition result and each display vocabulary to set the degree of coincidence, which is the degree to which each display vocabulary matches the recognition result, as the priority of each display vocabulary. Get as.
  • FIG. 5 is a diagram illustrating an example of information stored in the determination information database 12e. As shown in FIG. 5, in the determination information database 12e, a priority is associated with a determination rule as to whether to determine as a recognized vocabulary, that is, whether to select as a recognized vocabulary.
  • the recognition result of the speech recognition unit 11, a plurality of display vocabularies, and priorities of the plurality of display vocabularies are input from the priority calculation unit 12d.
  • the recognized vocabulary update unit 12f selects one or more display vocabulary from the plurality of display vocabularies as one or more recognized vocabulary according to the determination rule of the determination information database 12e.
  • the selected recognition vocabulary is displayed on, for example, a display device (not shown), or is output as voice by a voice output device (not shown).
  • the recognized vocabulary update unit 12f can exclude a plurality of display vocabularies other than the one or more recognized vocabulary words in any of the subsequent selections. It has become.
  • the recognition vocabulary update unit 12f according to the second embodiment selects one or more recognition vocabularies, the recognition vocabulary update unit 12f continuously stores the one or more recognition vocabularies in the display vocabulary database 12a. A plurality of display vocabularies other than the above recognized vocabulary are deleted from the display vocabulary database 12a. In this case, the recognized vocabulary update unit 12f can exclude a plurality of display vocabularies other than the selected one or more recognized vocabularies in the next selection.
  • the recognized vocabulary update unit 12f is not limited to this.
  • the recognized vocabulary update unit 12f may not immediately delete the display vocabulary that has not been selected once as the recognized vocabulary from the display vocabulary database 12a. Then, the recognized vocabulary update unit 12f may delete from the display vocabulary database 12a the display vocabulary that has not been continuously selected a plurality of times that is predetermined as the recognized vocabulary. In this case, the recognized vocabulary update unit 12f can exclude a plurality of display vocabularies other than the selected one or more recognized vocabularies in the selection after the next selection.
  • FIG. 6 is a flowchart showing the operation of the speech recognition apparatus 1 according to the second embodiment.
  • step S1 the speech recognition unit 11 recognizes the input speech and outputs the recognition result to the result comparison unit 12b of the recognition vocabulary selection unit 12.
  • step S2 the result comparison unit 12b refers to the display vocabulary database 12a, and acquires a plurality of display vocabularies and the degree of coincidence of the plurality of display vocabularies based on the recognition result from the speech recognition unit 11. To do. Then, the result comparison unit 12b outputs the recognition result of the voice recognition unit 11, the plurality of display vocabularies, and the degree of coincidence of the plurality of display vocabularies to the priority calculation unit 12d.
  • step S3 the priority calculation unit 12d acquires the priority of each display vocabulary based on the matching degree of each display vocabulary from the result comparison unit 12b while referring to the priority database 12c. Then, the priority calculation unit 12d outputs the recognition result of the speech recognition unit 11, the plurality of display vocabularies, and the respective priorities of the plurality of display vocabularies to the recognition vocabulary update unit 12f.
  • step S4 the recognized vocabulary update unit 12f selects and selects one or more recognized vocabulary from a plurality of display vocabularies based on the priority from the priority calculation unit 12d while referring to the determination information database 12e.
  • the recognized vocabulary is output to a display device (not shown).
  • the recognized vocabulary update unit 12f deletes a plurality of display vocabularies other than the selected one or more recognized vocabularies from the display vocabulary database 12a. Thereafter, the operation of FIG. 6 ends.
  • FIG 7 and 8 are diagrams showing the operation results of the first example and the second example described above.
  • the plurality of display vocabularies include the main vocabulary itself and a vocabulary obtained by combining the main vocabulary with the postfix information combined with the main vocabulary to make the main vocabulary detailed.
  • the body vocabulary “BP” is selected as the display vocabulary in both the first example and the second example.
  • the main body vocabulary can be selected as the display vocabulary regardless of the content of the recognition result.
  • the speech recognition apparatus 1 when one or more recognition vocabulary is selected, the speech recognition apparatus 1 according to the second embodiment excludes a plurality of display vocabularies other than the one or more recognition vocabulary in any selection after the next selection. It is possible. According to such a configuration, it is possible to reduce processing for selecting a recognized vocabulary from a plurality of display vocabularies in any selection after the next selection. Therefore, the processing load of the speech recognition apparatus 1 can be reduced.
  • the speech recognition apparatus 1 acquires the matching degree of each display vocabulary as the priority of each display vocabulary. According to such a configuration, it is possible to narrow the display vocabulary corresponding to the vocabulary intended by the user by speaking. Therefore, the recognition accuracy of the speech recognition apparatus 1 can be increased, and user confusion can be suppressed.
  • the degree of coincidence and the priority are divided into three stages.
  • the present invention is not limited to this, and the degree of coincidence and priority may be divided into two stages or may be divided into four or more stages.
  • FIG. 9 is a block diagram showing a configuration of the recognition vocabulary selection unit 12 included in the speech recognition apparatus 1 according to Embodiment 3 of the present invention.
  • constituent elements described in the third embodiment constituent elements that are the same as or similar to those in the second embodiment are denoted by the same reference numerals, and different constituent elements are mainly described.
  • the voice recognition device 1 in FIG. 9 according to the third embodiment is used in a vehicle. 9 includes a vehicle information database 12g and a display vocabulary update unit 12h, in addition to the block configuration (FIG. 2) of the recognition vocabulary selection unit 12 according to the second embodiment.
  • the recognition vocabulary selection unit 12 configured in this way selects one or more recognition vocabularies from the plurality of display vocabularies based on the vehicle information that is information of the vehicle and the priority of the plurality of display vocabularies. This will be described in detail below.
  • FIG. 10 is a diagram showing an example of information stored in the display vocabulary database 12a.
  • the information shown in FIG. 3 described in the second embodiment and the domain are associated with each other.
  • the domain is a kind of vehicle information, and for the domain, for example, information related to vehicle specifications is used.
  • the result comparison unit 12b in FIG. 9 obtains a plurality of display vocabulary previously associated with the main vocabulary and the domains of the plurality of display vocabularies. Obtained from the display vocabulary database 12a. In addition, the result comparison unit 12b also acquires the matching degree of each display vocabulary as in the second embodiment.
  • FIG. 11 is a diagram showing an example of information stored in the vehicle information database 12g.
  • a domain and any one of valid and invalid regarding the display vocabulary are associated with each other.
  • the information shown in FIG. 11 may be set in advance by the user or the like, or may be automatically changed by the voice recognition device 1 or the like based on the travel history of the vehicle. For example, when it is recorded as the travel history that the number of times the vehicle has stopped at the gas oil filling station is larger than the number of times the vehicle has stopped at the gasoline filling station, the voice recognition device 1 displays “ The “valid” in “fuel station” may be changed to “invalid”, and the “invalid” in “diesel” in FIG. 11 may be changed to “valid”.
  • the display vocabulary update unit 12h in FIG. 9 includes, from the result comparison unit 12b, a recognition result of the speech recognition unit 11, a plurality of display vocabularies, a degree of coincidence between the plurality of display vocabularies, a domain of the plurality of display vocabularies Is entered.
  • the display vocabulary update unit 12h updates the display vocabulary to be output to the priority calculation unit 12d based on the input domain and the information in the vehicle information database 12g.
  • the display vocabulary update unit 12h displays the display vocabulary “BP” and “BP (fuel station)” having “fuel station” associated with “valid” in the information of FIG.
  • the recognition result of the voice recognition unit 11 are output to the priority calculation unit 12d.
  • the display vocabulary update unit 12h outputs the display vocabulary “BP (diesel)” whose domain is “diesel” associated with “invalid” in the information of FIG. 11 and its matching degree to the priority calculation unit 12d. do not do.
  • the configurations of the priority database 12c, the priority calculation unit 12d, the determination information database 12e, and the recognized vocabulary update unit 12f are the same as those in the second embodiment.
  • FIG. 12 is a flowchart showing the operation of the speech recognition apparatus 1 according to the third embodiment.
  • step S 11 as in step S 1 of FIG. 6, the speech recognition unit 11 recognizes the input speech and outputs the recognition result to the result comparison unit 12 b of the recognition vocabulary selection unit 12.
  • the result comparison unit 12b refers to the display vocabulary database 12a, and based on the recognition result from the speech recognition unit 11, a plurality of display vocabularies, the degree of coincidence of the plurality of display vocabularies, Get each domain of the displayed vocabulary. Then, the result comparison unit 12b outputs the recognition result of the speech recognition unit 11, the plurality of display vocabularies, and the degree of matching and the domain of each of the plurality of display vocabularies to the display vocabulary update unit 12h.
  • step S13 the display vocabulary update unit 12h outputs the recognition result of the voice recognition unit 11 to the priority calculation unit 12d. Also, the display vocabulary update unit 12h determines, based on the domain from the result comparison unit 12b, the display vocabulary in which the domain is associated with “valid” in the vehicle information database 12g and the matching degree of the display vocabulary. It outputs to the calculation part 12d.
  • the display vocabulary associated with the domain “valid” may be one or plural.
  • step S14 as in step S3 of FIG. 6, the priority calculation unit 12d refers to the priority database 12c, and based on the matching degree of each display vocabulary from the display vocabulary update unit 12h, Get the priority. Then, the priority calculation unit 12d outputs the recognition result of the speech recognition unit 11, the display vocabulary, and the priority of the display vocabulary to the recognition vocabulary update unit 12f.
  • step S15 as in step S4 of FIG. 6, the recognized vocabulary update unit 12f selects a recognized vocabulary from the display vocabulary based on the priority from the priority calculation unit 12d while referring to the determination information database 12e.
  • the selected recognition vocabulary is output to a display device (not shown).
  • the recognized vocabulary update unit 12f deletes display vocabulary other than the selected recognized vocabulary from the display vocabulary database 12a. Thereafter, the operation of FIG. 12 ends.
  • one or more recognition vocabularies are selected from the plurality of display vocabularies based on the vehicle information and the priorities of the plurality of display vocabularies. According to such a configuration, the recognition accuracy of the speech recognition apparatus 1 can be further increased, and user confusion can be further suppressed.
  • the recognized vocabulary selection unit 12 does not change the priority based on the vehicle information.
  • the present invention is not limited to this, and the recognized vocabulary selection unit 12 may change the priority based on the vehicle information.
  • the recognition vocabulary selection unit 12 may change the priority of the display vocabulary whose domain is “diesel” to “low” in step S13, and maintain the priority of the display vocabulary as it is in step S14. . In this case, the same effect as described above can be obtained.
  • FIG. 13 is a block diagram showing a configuration of the recognition vocabulary selection unit 12 included in the speech recognition apparatus 1 according to Embodiment 4 of the present invention.
  • constituent elements described in the fourth embodiment constituent elements that are the same as or similar to those in the second embodiment are denoted by the same reference numerals, and different constituent elements are mainly described.
  • the recognition vocabulary selection unit 12 configured in this way selects one or more recognition vocabularies from the plurality of display vocabularies based on the hierarchy defined in advance for the plurality of display vocabularies and the priority of the plurality of display vocabularies. It is configured as follows. This will be described in detail below.
  • FIG. 14 is a diagram showing an example of information stored in the display vocabulary database 12a.
  • the information of FIG. 3 described in the second embodiment and the hierarchy of the display vocabulary are associated with each other.
  • the higher the number assigned to the hierarchy, the lower the hierarchy, and the vocabulary including the concept of the lower display vocabulary is used for the display vocabulary of the upper hierarchy.
  • the result comparison unit 12b in FIG. 13 displays a plurality of display vocabulary previously associated with the main vocabulary and the respective levels of the plurality of display vocabularies. Obtained from the display vocabulary database 12a. In addition, the result comparison unit 12b also acquires the matching degree of each display vocabulary as in the second embodiment.
  • FIG. 15 is a diagram showing an example of information stored in the hierarchical information database 12i. As shown in FIG. 15, in the hierarchy information database 12i, the hierarchy and any one of valid and invalid regarding the display vocabulary are associated with each other. Note that the information shown in FIG. 15 may be set in advance by a user or the like, or may be automatically changed by the voice recognition device 1 or the like.
  • the hierarchy reference update unit 12j in FIG. 13 includes a result comparison unit 12b, a recognition result of the speech recognition unit 11, a plurality of display vocabularies, a degree of coincidence between the plurality of display vocabularies, and a hierarchy of the plurality of display vocabularies. Is entered.
  • the hierarchy reference update unit 12j updates the display vocabulary to be output to the priority calculation unit 12d based on the input hierarchy and the information in the hierarchy information database 12i.
  • the hierarchy reference update unit 12j displays the display vocabulary “BP” having “1” associated with “valid” in the information of FIG. Is output to the priority calculation unit 12d.
  • the hierarchy reference update unit 12j displays the display vocabulary “BP (fuel station)” and “BP (diesel)” and their matching with “2” associated with “invalid” in the information of FIG. The degree is not output to the priority calculation unit 12d.
  • the configurations of the priority database 12c, the priority calculation unit 12d, the determination information database 12e, and the recognized vocabulary update unit 12f are the same as those in the second embodiment.
  • FIG. 16 is a flowchart showing the operation of the speech recognition apparatus 1 according to the fourth embodiment.
  • step S21 as in step S1 of FIG. 6, the speech recognition unit 11 recognizes the input speech and outputs the recognition result to the result comparison unit 12b of the recognition vocabulary selection unit 12.
  • the result comparison unit 12b refers to the display vocabulary database 12a, and based on the recognition result from the speech recognition unit 11, the plurality of display vocabularies, the degree of coincidence of the plurality of display vocabularies, Get each hierarchy of display vocabulary. Then, the result comparison unit 12b outputs the recognition result of the speech recognition unit 11, the plurality of display vocabularies, and the degree of coincidence and the hierarchy of the plurality of display vocabularies to the layer reference update unit 12j.
  • the hierarchy reference update unit 12j outputs the recognition result of the voice recognition unit 11 to the priority calculation unit 12d. Further, the hierarchy reference updating unit 12j determines, based on the hierarchy from the result comparison unit 12b, the display vocabulary in which the hierarchy is associated with “valid” in the hierarchy information database 12i and the matching degree of the display vocabulary. It outputs to the calculation part 12d. Note that there may be one or more display vocabulary associated with a hierarchy of “valid”.
  • step S24 as in step S3 of FIG. 6, the priority calculation unit 12d refers to the priority database 12c, and based on the matching degree of each display vocabulary from the hierarchy reference update unit 12j, Get the priority. Then, the priority calculation unit 12d outputs the recognition result of the speech recognition unit 11, the display vocabulary, and the priority of the display vocabulary to the recognition vocabulary update unit 12f.
  • step S25 as in step S4 of FIG. 6, the recognized vocabulary update unit 12f selects a recognized vocabulary from the display vocabulary based on the priority from the priority calculation unit 12d while referring to the determination information database 12e.
  • the selected recognition vocabulary is output to a display device (not shown).
  • the recognized vocabulary update unit 12f deletes display vocabulary other than the selected recognized vocabulary from the display vocabulary database 12a. Thereafter, the operation of FIG. 16 ends.
  • ⁇ Summary of Embodiment 4> According to the speech recognition apparatus 1 according to the fourth embodiment as described above, at least one of the plurality of display vocabularies is selected based on the hierarchy defined in advance for the plurality of display vocabularies and the priority of the plurality of display vocabularies. Select a recognized vocabulary. According to such a configuration, the recognition accuracy of the speech recognition apparatus 1 can be further increased, and user confusion can be further suppressed.
  • the recognized vocabulary selection unit 12 does not change the priority based on the hierarchy.
  • the present invention is not limited to this, and the recognized vocabulary selection unit 12 may change the priority based on the hierarchy.
  • the recognition vocabulary selection unit 12 may change the priority of the display vocabulary having the hierarchy “2” to “low” in step S23, and maintain the priority of the display vocabulary as it is in step S24. . In this case, the same effect as described above can be obtained.
  • FIG. 17 is a block diagram showing the configuration of the speech recognition apparatus 1 according to Embodiment 5 of the present invention.
  • constituent elements described in the fifth embodiment constituent elements that are the same as or similar to those in the second embodiment are denoted by the same reference numerals, and different constituent elements are mainly described.
  • the recognition vocabulary selection unit 12 in FIG. 17 includes a SW (software) information database 12k and a SW restriction reference update unit 12m in addition to the block configuration (FIG. 2) of the recognition vocabulary selection unit 12 according to the second embodiment. .
  • the recognition vocabulary selection unit 12 configured in this way selects one or more recognition vocabulary from a plurality of display vocabularies based on software requirements in the system using the speech recognition apparatus 1 and the priority of the plurality of display vocabularies. Is configured to do. This will be described in detail below.
  • FIG. 18 is a diagram showing an example of information stored in the SW information database 12k.
  • the SW information database 12k stores the number of recognized vocabulary that can be displayed by the system as a software requirement in the system using the speech recognition apparatus 1. Note that the information shown in FIG. 18 may be set in advance by a user or the like, or may be automatically changed by the voice recognition device 1 or the like based on the requirements of the software.
  • the priority of the recognized vocabulary is the priority obtained for the display vocabulary that has become the recognized vocabulary.
  • the SW restriction reference updating unit 12m outputs it as it is.
  • the SW restriction reference updating unit 12m sets the priority of each recognized vocabulary. Lower by one. As a result, the SW restriction reference updating unit 12m can set the priority of some recognized vocabulary to “low”. After the priority is changed, the SW restriction reference update unit 12m performs the same operation as the recognition vocabulary update unit 12f using the information in the determination information database 12e, so that the priority is changed from the recognized vocabulary after the priority change to “ Select a recognition vocabulary that is “medium”. The SW restriction reference updating unit 12m selects recognition vocabulary that is less than or equal to the displayable number by appropriately changing the priority as described above.
  • FIG. 19 is a flowchart showing the operation of the speech recognition apparatus 1 according to the fifth embodiment. From Steps S31 to S33, operations similar to Steps S1 to S3 in FIG. 6 are performed.
  • step S34 the recognized vocabulary update unit 12f selects a recognized vocabulary from a plurality of display vocabularies based on the priority from the priority calculation unit 12d while referring to the determination information database 12e. Then, the recognized vocabulary update unit 12f outputs the selected recognized vocabulary and the priority of the recognized vocabulary to the SW restriction reference update unit 12m. The recognized vocabulary update unit 12f deletes display vocabulary other than the selected recognized vocabulary from the display vocabulary database 12a.
  • step S35 the SW restriction reference updating unit 12m selects and selects a recognition vocabulary having a displayable number or less based on the recognition vocabulary and priority from the recognition vocabulary update unit 12f while referring to the SW information database 12k.
  • the recognized vocabulary is output to a display device (not shown).
  • the SW restriction reference update unit 12m may delete the display vocabulary that has not been output from the display vocabulary database 12a by performing deletion similar to the deletion performed by the recognized vocabulary update unit 12f. Thereafter, the operation of FIG. 19 ends.
  • one of the plurality of display vocabularies is selected based on the software requirements in the system using the speech recognition device 1 and the priority of the plurality of display vocabularies. Select the above recognition vocabulary. According to such a configuration, it is possible to realize the speech recognition apparatus 1 that can automatically satisfy the requirements of the software.
  • FIG. 20 is a block diagram showing a configuration of speech recognition apparatus 1 according to Embodiment 6 of the present invention.
  • the same or similar constituent elements as those in the second embodiment are denoted by the same reference numerals, and different constituent elements will be mainly described.
  • the recognition vocabulary selection unit 12 of FIG. 20 includes an HW (hardware) information database 12n and an HW restriction reference update unit 12o in addition to the block configuration (FIG. 2) of the recognition vocabulary selection unit 12 according to the second embodiment.
  • HW hardware
  • HW restriction reference update unit 12o in addition to the block configuration (FIG. 2) of the recognition vocabulary selection unit 12 according to the second embodiment.
  • the recognition vocabulary selection unit 12 configured as described above selects one or more recognition vocabularies from a plurality of display vocabularies based on hardware requirements in the system using the speech recognition apparatus 1 and the priorities of the plurality of display vocabularies. Configured to select. This will be described in detail below.
  • FIG. 21 is a diagram showing an example of information stored in the HW information database 12n.
  • the number of display vocabularies that can be stored in the future by a memory (not shown) of the system is stored as a hardware requirement in the system using the speech recognition apparatus 1.
  • the information shown in FIG. 21 may be set in advance by a user or the like, or may be automatically changed by the voice recognition apparatus 1 or the like based on the hardware requirements.
  • the recognition vocabulary and the priority of the recognition vocabulary are input to the HW restriction reference update unit 12o from the recognition vocabulary update unit 12f.
  • the HW restriction reference update unit 12o When the number of recognized vocabulary input from the recognized vocabulary update unit 12f is equal to or less than the storable number stored in the HW information database 12n, the HW restriction reference update unit 12o outputs the same as it is.
  • the HW restriction reference updating unit 12o sets the priority of each recognized vocabulary. Lower by one. As a result, the HW restriction reference updating unit 12o can set the priority of some recognized vocabularies to “low”. After changing the priority, the HW restriction reference update unit 12o performs the same operation as the recognized vocabulary update unit 12f using the information in the determination information database 12e, so that the priority is changed from the recognized vocabulary after the priority change to “ Select a recognition vocabulary that is “medium”. The HW restriction reference update unit 12o selects recognition vocabulary that is less than or equal to the storable number by appropriately changing the priority as described above.
  • FIG. 22 is a flowchart showing the operation of the speech recognition apparatus 1 according to the sixth embodiment. From Steps S41 to S43, the same operation as Steps S1 to S3 in FIG. 6 is performed.
  • the recognized vocabulary update unit 12f selects a recognized vocabulary from a plurality of display vocabularies based on the priority from the priority calculation unit 12d while referring to the determination information database 12e. Then, the recognized vocabulary update unit 12f outputs the selected recognized vocabulary and the priority of the recognized vocabulary to the HW restriction reference update unit 12o. The recognized vocabulary update unit 12f deletes display vocabulary other than the selected recognized vocabulary from the display vocabulary database 12a.
  • the HW restriction reference updating unit 12o refers to the HW information database 12n, selects and selects recognition vocabulary less than the storable number based on the recognition vocabulary and priority from the recognition vocabulary update unit 12f.
  • the recognized vocabulary is output to a display device (not shown).
  • the HW restriction reference update unit 12o may delete the display vocabulary that has not been output from the display vocabulary database 12a by performing deletion similar to the deletion performed by the recognized vocabulary update unit 12f. Thereafter, the operation of FIG. 22 ends.
  • ⁇ Summary of Embodiment 6> According to the speech recognition apparatus 1 according to the sixth embodiment as described above, based on the hardware requirements in the system using the speech recognition apparatus 1 and the priorities of the plurality of display vocabularies, the plurality of display vocabularies are used. Select one or more recognition vocabularies. According to such a configuration, the speech recognition apparatus 1 that can automatically satisfy the hardware requirements can be realized.
  • the speech recognition unit 11 and the recognition vocabulary selection unit 12 in the speech recognition apparatus 1 described above are hereinafter referred to as “speech recognition unit 11 etc.”.
  • the voice recognition unit 11 and the like are realized by a processing circuit 81 shown in FIG. That is, the processing circuit 81 recognizes the input speech, and when the recognition result including the body vocabulary which is a predetermined vocabulary is obtained by the recognition of the speech recognition unit 11, respectively. Includes a main vocabulary, obtains a plurality of candidate vocabulary previously associated with the main vocabulary, obtains a priority for each candidate vocabulary, and based on the obtained priority, obtains one or more candidates from the plurality of candidate vocabularies A recognition vocabulary selection unit 12 that selects a vocabulary as one or more recognition vocabularies.
  • Dedicated hardware may be applied to the processing circuit 81, or a processor that executes a program stored in the memory may be applied.
  • the processor corresponds to, for example, a central processing unit, a processing unit, an arithmetic unit, a microprocessor, a microcomputer, a DSP (Digital Signal Processor) and the like.
  • the processing circuit 81 When the processing circuit 81 is dedicated hardware, the processing circuit 81 includes, for example, a single circuit, a composite circuit, a programmed processor, a parallel programmed processor, an ASIC (Application Specific Integrated Circuit), an FPGA (Field Programmable Gate). Array) or a combination thereof.
  • Each function of each unit such as the speech recognition unit 11 may be realized by a circuit in which processing circuits are distributed, or the function of each unit may be realized by a single processing circuit.
  • the processing circuit 81 When the processing circuit 81 is a processor, the functions of the voice recognition unit 11 and the like are realized by a combination with software or the like.
  • the software or the like corresponds to, for example, software, firmware, or software and firmware.
  • Software or the like is described as a program and stored in a memory.
  • the processor 82 applied to the processing circuit 81 reads out and executes the program stored in the memory 83, thereby realizing the functions of the respective units. That is, when the speech recognition apparatus 1 is executed by the processing circuit 81, the step of recognizing the input speech and the recognition result including the main vocabulary which is a predetermined vocabulary are obtained by the recognition.
  • the memory 83 is, for example, non-volatile such as RAM (Random Access Memory), ROM (Read Only Memory), flash memory, EPROM (Erasable Programmable Read Only Memory), EEPROM (Electrically Erasable Programmable Read Only Memory), etc.
  • all storage media such as volatile semiconductor memory, HDD (Hard Disk Drive), magnetic disk, flexible disk, optical disk, compact disk, mini disk, DVD (Digital Versatile Disk) and its drive device are applicable.
  • each function of the voice recognition unit 11 and the like is realized by either hardware or software.
  • the present invention is not limited to this, and a configuration in which a part of the voice recognition unit 11 or the like is realized by dedicated hardware and another part is realized by software or the like.
  • the function of the speech recognition unit 11 is realized by a processing circuit as dedicated hardware, and the processing circuit 81 as the processor 82 reads and executes the program stored in the memory 83 for the other functions.
  • a function can be realized.
  • the processing circuit 81 can realize the above functions by hardware, software, or the like, or a combination thereof.
  • the voice recognition device described above includes a navigation device such as PND (Portable Navigation Device), a communication terminal including a mobile terminal such as a mobile phone, a smartphone, and a tablet, a function of an application installed in these, a server Can be applied to a speech recognition system constructed as a system by appropriately combining the above.
  • a navigation device such as PND (Portable Navigation Device)
  • a communication terminal including a mobile terminal such as a mobile phone, a smartphone, and a tablet
  • a function of an application installed in these a server Can be applied to a speech recognition system constructed as a system by appropriately combining the above.
  • each function or each component of the speech recognition apparatus described above may be distributed and arranged in each device that constructs the system, or may be concentrated on any device. .
  • FIG. 25 is a block diagram showing a configuration of the server 51 according to this modification.
  • the server 51 of FIG. 25 includes a communication unit 51a, a voice recognition unit 51b, and a recognition vocabulary selection unit 51c, and can perform wireless communication with the navigation device 53 of the vehicle 52.
  • the communication unit 51a receives the voice data acquired by the navigation device 53 by performing wireless communication with the navigation device 53.
  • the speech recognition unit 51b and the recognition vocabulary selection unit 51c are configured such that the processor (not shown) of the server 51 executes a program stored in a storage device (not shown) of the server 51, so that the speech recognition unit 11 and the recognition vocabulary selection of FIG. It has the same function as the unit 12. That is, the voice recognition unit 51b recognizes the voice data of the communication unit 51a.
  • the recognition vocabulary selection unit 51c acquires a plurality of display vocabularies and priorities of the plurality of display vocabularies based on the recognition result of the speech recognition unit 51b, and selects the recognition vocabulary based on the priorities of the plurality of display vocabularies. To do. Then, the communication unit 51a transmits the recognized vocabulary selected by the recognized vocabulary selecting unit 51c to the navigation device 53.
  • the server 51 configured in this way, for example, even if the navigation device 53 has only a display function and a communication function with the server 51, it is the same as the voice recognition device 1 described in the first embodiment. The effect of can be obtained.
  • FIG. 26 is a block diagram showing the configuration of the communication terminal 56 according to this modification.
  • 26 includes a communication unit 56a, a speech recognition unit 56b, and a recognized vocabulary selection unit 56c similar to the communication unit 51a, the speech recognition unit 51b, and the recognized vocabulary selection unit 51c, and a navigation device 58 of the vehicle 57.
  • Wireless communication is possible.
  • the communication terminal 56 for example, a mobile terminal such as a mobile phone, a smartphone, and a tablet carried by the driver of the vehicle 57 is applied.
  • the communication terminal 56 configured in this way, for example, even if the navigation device 58 has only a display function and a communication function with the communication terminal 56, the voice recognition device 1 described in the first embodiment. The same effect can be obtained.
  • the present invention can be freely combined with each embodiment and each modification within the scope of the invention, or can be appropriately modified and omitted with each embodiment and each modification.
  • 1 speech recognition device 11 speech recognition unit, 12 recognition vocabulary selection unit.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Navigation (AREA)

Abstract

The purpose of the present invention is to provide technology capable of improving the recognition accuracy of a voice recognition device. This voice recognition device includes: a voice recognition unit that recognizes an input voice; and a recognized-word selection unit that, when a recognition result including a main word that is a predetermined word is acquired due to recognition by the voice recognition unit, acquires a plurality of candidate words, each including the main word and being pre-associated with the main word, acquires a priority for each of the candidate words, and selects, as one or more recognized words, one or more candidate words from among the plurality of candidate words on the basis of the acquired priorities.

Description

音声認識装置及び音声認識方法Speech recognition apparatus and speech recognition method
 本発明は、音声を認識する音声認識装置及び音声認識方法に関する。 The present invention relates to a speech recognition apparatus and speech recognition method for recognizing speech.
 音声認識装置について様々な技術が提案されている。例えば特許文献1の技術では、音声から求めた認識語彙(あるいは認識語彙の特徴量)と機能とを予め対応付けて記憶しておき、記憶された音声に対応する認識語彙と同一の発声が行われた場合に、当該音声と対応付けられた機能を実行する。 Various technologies for voice recognition devices have been proposed. For example, in the technique of Patent Document 1, a recognition vocabulary (or a feature amount of a recognition vocabulary) obtained from speech and a function are stored in association with each other, and the same utterance as the recognition vocabulary corresponding to the stored speech is performed. If it is, the function associated with the voice is executed.
 一方で、一般的に、音声認識装置の認識結果を利用する装置は、音声認識装置の認識結果を用いて検索を行う場合には、認識結果が曖昧になってしまう場合がある。例えば、(i)施設の検索であれば、「BP」という施設名は「fuel station」という施設カテゴリの「BP」なのか、「diesel」という施設カテゴリの「BP」なのかが曖昧になってしまう場合がある。また例えば、(ii)地名の検索であれば、「Munchen」という都市名は州「Bavaria」の都市なのか、州「Hutthurm」の都市なのかが曖昧である。この対策の一つとして、上記(i)については、例えば、検索結果を「BP(fuel station)」及び「BP(diesel)」とし、上記(ii)については、例えば、検索結果を「Munchen(Bavaria)」及び「Munchen(Hutthurm)」とすることが挙げられる。つまり、ユーザに提示する検索結果に区別する情報を付与することが挙げられる。例えば特許文献2の技術では、この区別するための情報も含めて認識語彙とすることが可能となっている。 On the other hand, in general, when a device that uses the recognition result of the speech recognition device performs a search using the recognition result of the speech recognition device, the recognition result may be ambiguous. For example, (i) in the case of a facility search, it becomes ambiguous whether the facility name “BP” is “BP” in the facility category “fuel station” or “BP” in the facility category “diesel”. May end up. Also, for example, (ii) in the case of a place name search, it is ambiguous whether the city name “Munchen” is a city in the state “Bavaria” or a city in the state “Hutthum”. As one of countermeasures, for the above (i), for example, the search results are “BP (fuel station)” and “BP (diesel)”, and for the above (ii), for example, the search results are “Munchen ( Bavaria) ”and“ Munchen (Hutthrum) ”. That is, it is possible to give information to distinguish the search results presented to the user. For example, in the technique of Patent Document 2, it is possible to use a recognition vocabulary including information for this distinction.
特開2003-323192号公報JP 2003-323192 A 特許第4554272号公報Japanese Patent No. 4554272
 しかしながら、特許文献1及び特許文献2の技術では、認識語彙の追加に偏ることになる。一般的には、認識語彙数の増加は音声認識装置の認識精度の低下の原因になる。このため、認識結果が、ユーザが意図する結果に絞り込まれていないという点で、音声認識装置の認識精度を向上させる余地があった。 However, the techniques of Patent Document 1 and Patent Document 2 are biased toward adding recognition vocabulary. In general, an increase in the number of recognized vocabulary causes a reduction in recognition accuracy of the speech recognition apparatus. For this reason, there is room for improving the recognition accuracy of the speech recognition apparatus in that the recognition result is not narrowed down to the result intended by the user.
 そこで、本発明は、上記のような問題点を鑑みてなされたものであり、音声認識装置の認識精度を高めることが可能な技術を提供することを目的とする。 Therefore, the present invention has been made in view of the above-described problems, and an object thereof is to provide a technique capable of improving the recognition accuracy of a speech recognition apparatus.
 本発明に係る音声認識装置は、入力された音声の認識を行う音声認識部と、音声認識部の認識によって、予め定められた語彙である本体語彙を含む認識結果が得られた場合に、それぞれが本体語彙を含み、本体語彙と予め対応付けられた複数の候補語彙を取得し、かつ各候補語彙について優先度を取得し、取得した優先度に基づいて、複数の候補語彙から1以上の候補語彙を1以上の認識語彙として選択する認識語彙選択部とを備える。 The speech recognition apparatus according to the present invention recognizes a speech recognition unit that recognizes input speech and a recognition result including a main vocabulary that is a predetermined vocabulary by recognition of the speech recognition unit. Includes a main vocabulary, obtains a plurality of candidate vocabulary previously associated with the main vocabulary, obtains a priority for each candidate vocabulary, and based on the obtained priority, obtains one or more candidates from the plurality of candidate vocabularies A recognition vocabulary selection unit that selects a vocabulary as one or more recognition vocabularies.
 本発明によれば、認識によって、本体語彙を含む認識結果が得られた場合に、複数の候補語彙を取得し、かつ各候補語彙について優先度を取得し、取得した優先度に基づいて、複数の候補語彙から1以上の候補語彙を1以上の認識語彙として選択する。これにより、音声認識装置の認識精度を高めることができる。 According to the present invention, when a recognition result including the main vocabulary is obtained by recognition, a plurality of candidate vocabularies are acquired, and priorities are acquired for each candidate vocabulary, and a plurality of candidates are acquired based on the acquired priorities. One or more candidate vocabularies are selected as one or more recognition vocabularies from the candidate vocabularies. Thereby, the recognition accuracy of the speech recognition apparatus can be increased.
 本発明の目的、特徴、態様及び利点は、以下の詳細な説明と添付図面とによって、より明白となる。 The objects, features, aspects and advantages of the present invention will become more apparent from the following detailed description and the accompanying drawings.
実施の形態1に係る音声認識装置の構成を示すブロック図である。1 is a block diagram showing a configuration of a speech recognition apparatus according to Embodiment 1. FIG. 実施の形態2に係る認識語彙選択部の構成を示すブロック図である。6 is a block diagram showing a configuration of a recognized vocabulary selection unit according to Embodiment 2. FIG. 実施の形態2に係る表示語彙データベースの情報の一例を示す図である。It is a figure which shows an example of the information of the display vocabulary database which concerns on Embodiment 2. FIG. 実施の形態2に係る優先度データベースの情報の一例を示す図である。It is a figure which shows an example of the information of the priority database which concerns on Embodiment 2. 実施の形態2に係る判定情報データベースの情報の一例を示す図である。It is a figure which shows an example of the information of the determination information database which concerns on Embodiment 2. FIG. 実施の形態2に係る音声認識装置の動作を示すフローチャートである。6 is a flowchart showing the operation of the speech recognition apparatus according to the second embodiment. 実施の形態2に係る音声認識装置の第1例の動作結果を示す図である。FIG. 10 is a diagram illustrating an operation result of the first example of the speech recognition apparatus according to the second embodiment. 実施の形態2に係る音声認識装置の第2例の動作結果を示す図である。It is a figure which shows the operation result of the 2nd example of the speech recognition apparatus which concerns on Embodiment 2. FIG. 実施の形態3に係る認識語彙選択部の構成を示すブロック図である。10 is a block diagram showing a configuration of a recognized vocabulary selection unit according to Embodiment 3. FIG. 実施の形態3に係る表示語彙データベースの情報の一例を示す図である。It is a figure which shows an example of the information of the display vocabulary database which concerns on Embodiment 3. FIG. 実施の形態3に係る車両情報データベースの情報の一例を示す図である。It is a figure which shows an example of the information of the vehicle information database which concerns on Embodiment 3. 実施の形態3に係る音声認識装置の動作を示すフローチャートである。10 is a flowchart showing the operation of the speech recognition apparatus according to the third embodiment. 実施の形態4に係る認識語彙選択部の構成を示すブロック図である。FIG. 10 is a block diagram illustrating a configuration of a recognized vocabulary selection unit according to Embodiment 4. 実施の形態4に係る表示語彙データベースの情報の一例を示す図である。It is a figure which shows an example of the information of the display vocabulary database which concerns on Embodiment 4. FIG. 実施の形態4に係る階層情報データベースの情報の一例を示す図である。It is a figure which shows an example of the information of the hierarchy information database which concerns on Embodiment 4. 実施の形態4に係る音声認識装置の動作を示すフローチャートである。10 is a flowchart showing the operation of the speech recognition apparatus according to the fourth embodiment. 実施の形態5に係る認識語彙選択部の構成を示すブロック図である。FIG. 10 is a block diagram illustrating a configuration of a recognized vocabulary selection unit according to Embodiment 5. 実施の形態5に係るSW情報データベースの情報の一例を示す図である。It is a figure which shows an example of the information of SW information database which concerns on Embodiment 5. FIG. 実施の形態5に係る音声認識装置の動作を示すフローチャートである。10 is a flowchart showing the operation of the speech recognition apparatus according to the fifth embodiment. 実施の形態6に係る認識語彙選択部の構成を示すブロック図である。FIG. 10 is a block diagram showing a configuration of a recognized vocabulary selection unit according to Embodiment 6. 実施の形態6に係るHW情報データベースの情報の一例を示す図である。It is a figure which shows an example of the information of the HW information database which concerns on Embodiment 6. FIG. 実施の形態6に係る音声認識装置の動作を示すフローチャートである。14 is a flowchart showing the operation of the speech recognition apparatus according to the sixth embodiment. その他の変形例に係るナビゲーション装置のハードウェア構成を示すブロック図である。It is a block diagram which shows the hardware constitutions of the navigation apparatus which concerns on another modification. その他の変形例に係るナビゲーション装置のハードウェア構成を示すブロック図である。It is a block diagram which shows the hardware constitutions of the navigation apparatus which concerns on another modification. その他の変形例に係るサーバの構成を示すブロック図である。It is a block diagram which shows the structure of the server which concerns on another modification. その他の変形例に係る通信端末の構成を示すブロック図である。It is a block diagram which shows the structure of the communication terminal which concerns on another modification.
 <実施の形態1>
 図1は、本発明の実施の形態1に係る音声認識装置1の構成を示すブロック図である。図1の音声認識装置1は、音声認識部11と、認識語彙選択部12とを備える。
<Embodiment 1>
FIG. 1 is a block diagram showing a configuration of a speech recognition apparatus 1 according to Embodiment 1 of the present invention. The speech recognition apparatus 1 in FIG. 1 includes a speech recognition unit 11 and a recognition vocabulary selection unit 12.
 音声認識部11は、入力された音声の認識を行う。例えば、音声認識部11は、入力された音声をアナログの音声信号及びデジタルの音声信号に順に変換し、デジタルの音声信号に基づいて、デジタルの音声信号に対応する文字列及び語句などを認識結果として取得する。なお、特開平9-50291号公報に記載の技術を用いて、音声認識部11は、音声認識で認識した語彙、つまりユーザが発生した音響的・言語的に最も確からしい語彙に基づいて認識語彙などを認識結果として選出してもよい。音声認識部11は、この認識を行う際に、認識辞書データベース11aに記憶された辞書データを適宜用いてもよい。辞書データは、認識結果として取得される文字列等を含むデータである。 The voice recognition unit 11 recognizes the input voice. For example, the speech recognition unit 11 sequentially converts the input speech into an analog speech signal and a digital speech signal, and recognizes a character string and a phrase corresponding to the digital speech signal based on the digital speech signal. Get as. Note that, using the technique described in Japanese Patent Laid-Open No. 9-50291, the speech recognition unit 11 recognizes a vocabulary recognized based on a vocabulary recognized by speech recognition, that is, a vocabulary most likely to be acoustically and linguistically generated by the user. May be selected as a recognition result. The voice recognition unit 11 may appropriately use dictionary data stored in the recognition dictionary database 11a when performing this recognition. Dictionary data is data including a character string acquired as a recognition result.
 認識語彙選択部12に予め定められた語彙である本体語彙を含む認識結果が、音声認識部11の認識によって得られた場合に、認識語彙選択部12は、当該本体語彙と予め対応付けられた複数の候補語彙を取得し、かつ各候補語彙について優先度を取得する。なお、複数の候補語彙のそれぞれは、対応付けられた本体語彙を含む語彙である。 When the recognition vocabulary selection unit 12 obtains a recognition result including the main vocabulary, which is a predetermined vocabulary, by the recognition of the speech recognition unit 11, the recognition vocabulary selection unit 12 associates with the main vocabulary in advance. A plurality of candidate vocabularies are acquired, and a priority is acquired for each candidate vocabulary. Each of the plurality of candidate vocabularies is a vocabulary including the associated main vocabulary.
 そして、認識語彙選択部12は、取得した優先度に基づいて、複数の候補語彙から1以上の候補語彙を1以上の認識語彙として選択する。 Then, the recognition vocabulary selection unit 12 selects one or more candidate vocabulary from a plurality of candidate vocabulary as one or more recognition vocabulary based on the acquired priority.
 <実施の形態1のまとめ>
 以上のような本実施の形態1に係る音声認識装置1によれば、本体語彙を含む認識結果が得られた場合に、複数の候補語彙を取得し、かつ各候補語彙について優先度を取得し、取得した優先度に基づいて、複数の候補語彙から1以上の候補語彙を1以上の認識語彙として選択する。このような構成によれば、優先度に基づいて複数の候補語彙を、ユーザが意図していた語彙に絞り込むことができる。したがって、音声認識装置1の認識精度を高めることができ、かつ、多くの語彙がユーザに通知された場合に生じていたユーザの混乱を抑制することができる。
<Summary of Embodiment 1>
According to the speech recognition apparatus 1 according to the first embodiment as described above, when a recognition result including the main vocabulary is obtained, a plurality of candidate vocabularies are acquired, and a priority is acquired for each candidate vocabulary. Based on the acquired priority, one or more candidate vocabularies are selected from a plurality of candidate vocabularies as one or more recognition vocabularies. According to such a configuration, a plurality of candidate vocabularies can be narrowed down to vocabulary intended by the user based on the priority. Therefore, the recognition accuracy of the speech recognition apparatus 1 can be improved, and the confusion of the user that has occurred when many vocabularies are notified to the user can be suppressed.
 <実施の形態2>
 図2は、本発明の実施の形態2に係る音声認識装置1が備える認識語彙選択部12の構成を示すブロック図である。以下、本実施の形態2で説明する構成要素のうち、実施の形態1と同じまたは類似する構成要素については同じ参照符号を付し、異なる構成要素について主に説明する。
<Embodiment 2>
FIG. 2 is a block diagram showing a configuration of the recognition vocabulary selection unit 12 included in the speech recognition apparatus 1 according to Embodiment 2 of the present invention. Hereinafter, among the constituent elements described in the second embodiment, the same or similar constituent elements as those in the first embodiment are denoted by the same reference numerals, and different constituent elements will be mainly described.
 図2の音声認識装置1が備える認識語彙選択部12は、表示語彙データベース12aと、結果比較部12bと、優先度データベース12cと、優先度算出部12dと、判定情報データベース12eと、認識語彙更新部12fとを備える。 The recognition vocabulary selection unit 12 included in the speech recognition apparatus 1 of FIG. 2 includes a display vocabulary database 12a, a result comparison unit 12b, a priority database 12c, a priority calculation unit 12d, a determination information database 12e, and a recognition vocabulary update. Part 12f.
 図3は、表示語彙データベース12aに記憶された情報の一例を示す図である。図3に示すように、表示語彙データベース12aには、「BP」などの本体語彙と、「BP」、「BP(fuel station)」、「BP(diesel)」などの複数の表示語彙とが互いに対応付けられた情報が記憶されている。 FIG. 3 is a diagram showing an example of information stored in the display vocabulary database 12a. As shown in FIG. 3, the display vocabulary database 12a includes a main vocabulary such as “BP” and a plurality of display vocabularies such as “BP”, “BP (fuel station)”, and “BP (diesel)”. The associated information is stored.
 本体語彙には、例えば、異なる複数の場所に付された同一の地名、複数の施設に付された同一の名称、異なる複数の正式名称に付された同一の略称、及び、これらに類似する名称などが適用される。 The main vocabulary includes, for example, the same place name given to a plurality of different places, the same name given to a plurality of facilities, the same abbreviation given to a plurality of different formal names, and names similar to these. Etc. apply.
 表示語彙は、実施の形態1で説明した候補語彙に対応している。本実施の形態2では、複数の表示語彙は、本体語彙そのもの、及び、本体語彙と組み合わさり当該本体語彙を詳細にする付属語彙と当該本体語彙とを組み合わせた語彙を含んでいる。図3の例では、括弧を適宜付されて、本体語彙の後に続く付属語彙である後置情報が用いられている。 The display vocabulary corresponds to the candidate vocabulary described in the first embodiment. In the second embodiment, the plurality of display vocabularies include the main vocabulary itself and a vocabulary obtained by combining the main vocabulary with an attached vocabulary that combines the main vocabulary and details the main vocabulary. In the example of FIG. 3, parentheses are appropriately added and postfix information that is an attached vocabulary following the main vocabulary is used.
 図2の結果比較部12bには、音声認識部11から認識結果が入力される。結果比較部12bは、音声認識部11の認識結果が本体語彙を含む場合に、当該本体語彙と予め対応付けられた複数の表示語彙を表示語彙データベース12aから取得する。 The recognition result is input from the speech recognition unit 11 to the result comparison unit 12b in FIG. When the recognition result of the speech recognition unit 11 includes the main vocabulary, the result comparison unit 12b acquires a plurality of display vocabulary associated with the main vocabulary from the display vocabulary database 12a.
 ここで、表示語彙データベース12aに図3の情報が記憶されている場合について、2つの例を説明する。 Here, two examples of the case where the information of FIG. 3 is stored in the display vocabulary database 12a will be described.
 第1例として、認識結果が「BP」という本体語彙そのものであった例について説明する。このときには、結果比較部12bは、「BP」という本体語彙に対応付けられた「BP」、「BP(fuel station)」、「BP(diesel)」という表示語彙を取得する。 As a first example, an example in which the recognition result is the main body vocabulary “BP” itself will be described. At this time, the result comparison unit 12b acquires display vocabularies “BP”, “BP (fuel station)”, and “BP (diesel)” associated with the main vocabulary “BP”.
 第2例として、認識結果が「BP station」であった例について説明する。このときには、結果比較部12bは、「BP station」に含まれる「BP」という本体語彙に対応付けられた「BP」、「BP(fuel station)」、「BP(diesel)」という表示語彙を取得する。結果比較部12bの結果としては、第1例も第2例も同じとなる。 As a second example, an example in which the recognition result is “BP station” will be described. At this time, the result comparison unit 12b acquires the display vocabulary "BP", "BP (fuel station)", and "BP (diesel)" associated with the main vocabulary "BP" included in the "BP station". To do. As a result of the result comparison unit 12b, the first example and the second example are the same.
 ところで、本実施の形態2に係る結果比較部12bは、音声認識部11の認識結果と複数の表示語彙とに基づいて、各表示語彙が認識結果と一致する度合である一致度を取得する。以下、一致度は、第1度合、第2度合及び第3度合という3段階に区分されているものとして説明する。このうち、第1度合は、表示語彙が認識結果と完全に一致することを意味する。第2度合は、本体語彙と後置情報とを組み合わせた表示語彙において、本体語彙が認識結果の一部と一致し、かつ、後置情報の一部が認識結果の残部と一致することを意味する。第3度合は、本体語彙と後置情報とを組み合わせた表示語彙において、本体語彙が認識結果の一部と一致するが、後置情報が認識結果の残部と部分的にも一致しないことを意味する。 By the way, the result comparison unit 12b according to the second embodiment acquires a degree of coincidence that is the degree to which each display vocabulary matches the recognition result based on the recognition result of the speech recognition unit 11 and a plurality of display vocabularies. Hereinafter, the description will be made assuming that the degree of coincidence is divided into three stages of the first degree, the second degree, and the third degree. Of these, the first degree means that the displayed vocabulary completely matches the recognition result. The second degree means that in the display vocabulary that combines the main body vocabulary and the postfix information, the main body vocabulary matches part of the recognition result, and part of the postfix information matches the rest of the recognition result. To do. The third degree means that in the display vocabulary that combines the body vocabulary and the postscript information, the body vocabulary matches a part of the recognition result, but the postfix information does not partially match the rest of the recognition result. To do.
 ここで、表示語彙データベース12aに図3の情報が記憶されている場合について、上述した第1例及び第2例を説明する。 Here, the case where the information of FIG. 3 is stored in the display vocabulary database 12a will be described.
 認識結果が「BP」という本体語彙そのものであった第1例では、結果比較部12bは、「BP」という表示語彙については第1度合を取得し、「BP(fuel station)」及び「BP(diesel)」という表示語彙については第3度合を取得する。 In the first example in which the recognition result is the main vocabulary “BP” itself, the result comparison unit 12b acquires the first degree for the display vocabulary “BP”, and obtains “BP (fuel station)” and “BP ( The third degree is obtained for the display vocabulary “diesel)”.
 認識結果が「BP station」であった第2例では、結果比較部12bは、「BP」という表示語彙については第1度合を取得し、「BP(fuel station)」という表示語彙については第2度合を取得し、「BP(diesel)」という表示語彙については第3度合を取得する。 In the second example in which the recognition result is “BP station”, the result comparison unit 12b acquires the first degree for the display vocabulary “BP” and the second for the display vocabulary “BP (fuel station)”. The degree is obtained, and the third degree is obtained for the display vocabulary “BP (diesel)”.
 図4は、優先度データベース12cに記憶された情報の一例を示す図である。図4に示すように、優先度データベース12cには一致度と優先度とが互いに対応付けられている。具体的には、第1度合、第2度合及び第3度合には、高、中、低の優先度がそれぞれ対応付けられている。 FIG. 4 is a diagram illustrating an example of information stored in the priority database 12c. As shown in FIG. 4, the priority database 12c associates the degree of matching with the priority. Specifically, high priority, medium priority, and low priority are associated with the first degree, the second degree, and the third degree, respectively.
 図2の優先度算出部12dには、結果比較部12bから、音声認識部11の認識結果と、複数の表示語彙と、当該複数の表示語彙の一致度とが入力される。優先度算出部12dは、入力された各表示語彙の一致度に基づいて、各表示語彙の優先度を優先度データベース12cから取得する。これにより、本実施の形態2に係る認識語彙選択部12は、認識結果と各表示語彙とに基づいて、各表示語彙が認識結果と一致する度合である一致度を、各表示語彙の優先度として取得する。 2 is input to the priority calculation unit 12d in FIG. 2 from the result comparison unit 12b, the recognition result of the speech recognition unit 11, a plurality of display vocabularies, and the matching degree of the plurality of display vocabularies. The priority calculating unit 12d acquires the priority of each display vocabulary from the priority database 12c based on the input matching degree of each display vocabulary. As a result, the recognition vocabulary selection unit 12 according to the second embodiment uses the recognition result and each display vocabulary to set the degree of coincidence, which is the degree to which each display vocabulary matches the recognition result, as the priority of each display vocabulary. Get as.
 図5は、判定情報データベース12eに記憶された情報の一例を示す図である。図5に示すように、判定情報データベース12eには、優先度と、認識語彙として判定するか否か、つまり認識語彙として選択するか否かについての判定規則とが互いに対応付けられている。 FIG. 5 is a diagram illustrating an example of information stored in the determination information database 12e. As shown in FIG. 5, in the determination information database 12e, a priority is associated with a determination rule as to whether to determine as a recognized vocabulary, that is, whether to select as a recognized vocabulary.
 図2の認識語彙更新部12fには、優先度算出部12dから、音声認識部11の認識結果と、複数の表示語彙と、当該複数の表示語彙の優先度とが入力される。認識語彙更新部12fは、入力された優先度に基づき、判定情報データベース12eの判定規則に従って、複数の表示語彙から1以上の表示語彙を1以上の認識語彙として選択する。選択された認識語彙は、例えば図示しない表示装置で表示されたり、図示しない音声出力装置で音声出力されたりする。 2, the recognition result of the speech recognition unit 11, a plurality of display vocabularies, and priorities of the plurality of display vocabularies are input from the priority calculation unit 12d. Based on the input priority, the recognized vocabulary update unit 12f selects one or more display vocabulary from the plurality of display vocabularies as one or more recognized vocabulary according to the determination rule of the determination information database 12e. The selected recognition vocabulary is displayed on, for example, a display device (not shown), or is output as voice by a voice output device (not shown).
 加えて、認識語彙更新部12fは、1以上の認識語彙を選択した場合に、当該1以上の認識語彙以外の複数の表示語彙を、次の選択以降のいずれかの選択において除外することが可能となっている。その一例として、本実施の形態2に係る認識語彙更新部12fは、1以上の認識語彙を選択した場合に、当該1以上の認識語彙を表示語彙データベース12aで継続して記憶させるとともに、当該1以上の認識語彙以外の複数の表示語彙を表示語彙データベース12aから削除する。この場合、認識語彙更新部12fは、選択された1以上の認識語彙以外の複数の表示語彙を、次の選択において除外することが可能となる。 In addition, when one or more recognized vocabulary words are selected, the recognized vocabulary update unit 12f can exclude a plurality of display vocabularies other than the one or more recognized vocabulary words in any of the subsequent selections. It has become. As an example, when the recognition vocabulary update unit 12f according to the second embodiment selects one or more recognition vocabularies, the recognition vocabulary update unit 12f continuously stores the one or more recognition vocabularies in the display vocabulary database 12a. A plurality of display vocabularies other than the above recognized vocabulary are deleted from the display vocabulary database 12a. In this case, the recognized vocabulary update unit 12f can exclude a plurality of display vocabularies other than the selected one or more recognized vocabularies in the next selection.
 ただし、認識語彙更新部12fはこれに限ったものではない。例えば、認識語彙更新部12fは、認識語彙として1回選択されなかった表示語彙を、表示語彙データベース12aからすぐに削除しなくてもよい。そして、認識語彙更新部12fは、認識語彙として予め定められた複数回以上、連続して選択されなかった表示語彙を、表示語彙データベース12aから削除してもよい。この場合、認識語彙更新部12fは、選択された1以上の認識語彙以外の複数の表示語彙を、次の選択よりも後の選択において除外することが可能となる。 However, the recognized vocabulary update unit 12f is not limited to this. For example, the recognized vocabulary update unit 12f may not immediately delete the display vocabulary that has not been selected once as the recognized vocabulary from the display vocabulary database 12a. Then, the recognized vocabulary update unit 12f may delete from the display vocabulary database 12a the display vocabulary that has not been continuously selected a plurality of times that is predetermined as the recognized vocabulary. In this case, the recognized vocabulary update unit 12f can exclude a plurality of display vocabularies other than the selected one or more recognized vocabularies in the selection after the next selection.
 <動作>
 図6は、本実施の形態2に係る音声認識装置1の動作を示すフローチャートである。
<Operation>
FIG. 6 is a flowchart showing the operation of the speech recognition apparatus 1 according to the second embodiment.
 まずステップS1にて、音声認識部11は、入力された音声の認識を行い、認識結果を認識語彙選択部12の結果比較部12bに出力する。 First, in step S1, the speech recognition unit 11 recognizes the input speech and outputs the recognition result to the result comparison unit 12b of the recognition vocabulary selection unit 12.
 ステップS2にて、結果比較部12bは、表示語彙データベース12aを参照しつつ、音声認識部11からの認識結果に基づいて、複数の表示語彙と、複数の表示語彙のそれぞれの一致度とを取得する。そして、結果比較部12bは、音声認識部11の認識結果と、複数の表示語彙と、複数の表示語彙のそれぞれの一致度とを優先度算出部12dに出力する。 In step S2, the result comparison unit 12b refers to the display vocabulary database 12a, and acquires a plurality of display vocabularies and the degree of coincidence of the plurality of display vocabularies based on the recognition result from the speech recognition unit 11. To do. Then, the result comparison unit 12b outputs the recognition result of the voice recognition unit 11, the plurality of display vocabularies, and the degree of coincidence of the plurality of display vocabularies to the priority calculation unit 12d.
 ステップS3にて、優先度算出部12dは、優先度データベース12cを参照しつつ、結果比較部12bからの各表示語彙の一致度に基づいて、各表示語彙の優先度を取得する。そして、優先度算出部12dは、音声認識部11の認識結果と、複数の表示語彙と、複数の表示語彙のそれぞれの優先度とを認識語彙更新部12fに出力する。 In step S3, the priority calculation unit 12d acquires the priority of each display vocabulary based on the matching degree of each display vocabulary from the result comparison unit 12b while referring to the priority database 12c. Then, the priority calculation unit 12d outputs the recognition result of the speech recognition unit 11, the plurality of display vocabularies, and the respective priorities of the plurality of display vocabularies to the recognition vocabulary update unit 12f.
 ステップS4にて、認識語彙更新部12fは、判定情報データベース12eを参照しつつ、優先度算出部12dからの優先度に基づいて、複数の表示語彙から1以上の認識語彙を選択し、選択された認識語彙を図示しない表示装置などに出力する。また、認識語彙更新部12fは、選択された1以上の認識語彙以外の複数の表示語彙を表示語彙データベース12aから削除する。その後、図6の動作が終了する。 In step S4, the recognized vocabulary update unit 12f selects and selects one or more recognized vocabulary from a plurality of display vocabularies based on the priority from the priority calculation unit 12d while referring to the determination information database 12e. The recognized vocabulary is output to a display device (not shown). The recognized vocabulary update unit 12f deletes a plurality of display vocabularies other than the selected one or more recognized vocabularies from the display vocabulary database 12a. Thereafter, the operation of FIG. 6 ends.
 図7及び図8は、上述した第1例及び第2例の動作結果を示す図である。 7 and 8 are diagrams showing the operation results of the first example and the second example described above.
 図7に示すように、音声認識部11の認識結果が「BP」という本体語彙そのものであった第1例では、上述した表示語彙のうち、「BP」は認識語彙として選択されるが、「BP(fuel station)」及び「BP(diesel)」は認識語彙として選択されない。このため、「BP」は表示語彙データベース12aで継続して記憶されるが、「BP(fuel station)」及び「BP(diesel)」は表示語彙データベース12aから削除される。 As shown in FIG. 7, in the first example in which the recognition result of the speech recognition unit 11 is the body vocabulary itself “BP”, among the display vocabularies described above, “BP” is selected as the recognition vocabulary. “BP (fuel station)” and “BP (diesel)” are not selected as recognition vocabularies. Therefore, “BP” is continuously stored in the display vocabulary database 12a, but “BP (fuel station)” and “BP (diesel)” are deleted from the display vocabulary database 12a.
 一方、図8に示すように、音声認識部11の認識結果が「BP station」であった第2例では、上述した表示語彙のうち、「BP」及び「BP(fuel station)」は認識語彙として選択されるが、「BP(diesel)」は認識語彙として選択されない。このため、「BP」及び「BP(fuel station)」は表示語彙データベース12aで継続して記憶されるが、「BP(diesel)」は表示語彙データベース12aから削除される。 On the other hand, as shown in FIG. 8, in the second example in which the recognition result of the speech recognition unit 11 is “BP station”, among the display vocabulary described above, “BP” and “BP (fuel station)” are recognized vocabularies. However, “BP (diesel)” is not selected as a recognition vocabulary. For this reason, “BP” and “BP (fuel station)” are continuously stored in the display vocabulary database 12a, but “BP (diesel)” is deleted from the display vocabulary database 12a.
 <実施の形態2のまとめ>
 以上のような本実施の形態2に係る音声認識装置1によれば、実施の形態1と同様に、優先度に基づいて、複数の表示語彙から1以上の表示語彙を1以上の認識語彙として選択する。このため、実施の形態1と同様に、音声認識装置1の認識精度を高めることができ、かつ、ユーザの混乱を抑制することができる。
<Summary of Embodiment 2>
According to the speech recognition apparatus 1 according to the second embodiment as described above, as in the first embodiment, one or more display vocabularies from a plurality of display vocabularies as one or more recognition vocabularies based on priority. select. For this reason, similarly to Embodiment 1, the recognition accuracy of the speech recognition apparatus 1 can be increased, and the confusion of the user can be suppressed.
 また本実施の形態2では、複数の表示語彙は、本体語彙そのもの、及び、本体語彙と組み合わさり当該本体語彙を詳細にする後置情報と当該本体語彙とを組み合わせた語彙を含む。このような構成によれば、図7及び図8に示すように、第1例及び第2例のいずれにおいても「BP」という本体語彙が表示語彙として選択される。このように、音声認識部11の認識結果が本体語彙を含む限りにおいて、認識結果の内容に関わらず、本体語彙を表示語彙として選択することができる。 In the second embodiment, the plurality of display vocabularies include the main vocabulary itself and a vocabulary obtained by combining the main vocabulary with the postfix information combined with the main vocabulary to make the main vocabulary detailed. According to such a configuration, as shown in FIGS. 7 and 8, the body vocabulary “BP” is selected as the display vocabulary in both the first example and the second example. Thus, as long as the recognition result of the speech recognition unit 11 includes the main body vocabulary, the main body vocabulary can be selected as the display vocabulary regardless of the content of the recognition result.
 また本実施の形態2に係る音声認識装置1は、1以上の認識語彙を選択した場合に、当該1以上の認識語彙以外の複数の表示語彙を、次の選択以降のいずれかの選択において除外可能となっている。このような構成によれば、次の選択以降のいずれかの選択において複数の表示語彙から認識語彙を選択する処理を軽減することができる。したがって、音声認識装置1の処理負荷を軽減することができる。 In addition, when one or more recognition vocabulary is selected, the speech recognition apparatus 1 according to the second embodiment excludes a plurality of display vocabularies other than the one or more recognition vocabulary in any selection after the next selection. It is possible. According to such a configuration, it is possible to reduce processing for selecting a recognized vocabulary from a plurality of display vocabularies in any selection after the next selection. Therefore, the processing load of the speech recognition apparatus 1 can be reduced.
 また本実施の形態2に係る音声認識装置1は、各表示語彙の一致度を、各表示語彙の優先度として取得する。このような構成によれば、ユーザが発声によって意図していた語彙に対応する表示語彙に絞り込むことができる。したがって、音声認識装置1の認識精度を高めることができ、かつ、ユーザの混乱を抑制することができる。 Also, the speech recognition apparatus 1 according to the second embodiment acquires the matching degree of each display vocabulary as the priority of each display vocabulary. According to such a configuration, it is possible to narrow the display vocabulary corresponding to the vocabulary intended by the user by speaking. Therefore, the recognition accuracy of the speech recognition apparatus 1 can be increased, and user confusion can be suppressed.
 なお、上述した実施の形態2では、一致度及び優先度は3段階で区分されていた。しかしこれに限ったものではなく、一致度及び優先度は2段階で区分されてもよく、4段階以上で区分されてもよい。 In the second embodiment described above, the degree of coincidence and the priority are divided into three stages. However, the present invention is not limited to this, and the degree of coincidence and priority may be divided into two stages or may be divided into four or more stages.
 <実施の形態3>
 図9は、本発明の実施の形態3に係る音声認識装置1が備える認識語彙選択部12の構成を示すブロック図である。以下、本実施の形態3で説明する構成要素のうち、実施の形態2と同じまたは類似する構成要素については同じ参照符号を付し、異なる構成要素について主に説明する。
<Embodiment 3>
FIG. 9 is a block diagram showing a configuration of the recognition vocabulary selection unit 12 included in the speech recognition apparatus 1 according to Embodiment 3 of the present invention. Hereinafter, among the constituent elements described in the third embodiment, constituent elements that are the same as or similar to those in the second embodiment are denoted by the same reference numerals, and different constituent elements are mainly described.
 本実施の形態3に係る図9の音声認識装置1は車両にて用いられる。そして、図9の認識語彙選択部12は、実施の形態2に係る認識語彙選択部12のブロック構成(図2)に加えて、車両情報データベース12gと、表示語彙更新部12hとを備える。このように構成された認識語彙選択部12は、当該車両の情報である車両情報と、複数の表示語彙の優先度とに基づいて、複数の表示語彙から1以上の認識語彙を選択する。以下、このことについて詳細に説明する。 The voice recognition device 1 in FIG. 9 according to the third embodiment is used in a vehicle. 9 includes a vehicle information database 12g and a display vocabulary update unit 12h, in addition to the block configuration (FIG. 2) of the recognition vocabulary selection unit 12 according to the second embodiment. The recognition vocabulary selection unit 12 configured in this way selects one or more recognition vocabularies from the plurality of display vocabularies based on the vehicle information that is information of the vehicle and the priority of the plurality of display vocabularies. This will be described in detail below.
 図10は、表示語彙データベース12aに記憶された情報の一例を示す図である。図10に示すように、本実施の形態3に係る表示語彙データベース12aでは、実施の形態2で説明した図3の情報と、ドメインとが互いに対応付けられている。ここで、ドメインは、車両情報の一種であり、ドメインには、例えば車両の仕様に関する情報などが用いられる。 FIG. 10 is a diagram showing an example of information stored in the display vocabulary database 12a. As shown in FIG. 10, in the display vocabulary database 12a according to the third embodiment, the information shown in FIG. 3 described in the second embodiment and the domain are associated with each other. Here, the domain is a kind of vehicle information, and for the domain, for example, information related to vehicle specifications is used.
 図9の結果比較部12bは、音声認識部11の認識結果が本体語彙を含む場合に、当該本体語彙と予め対応付けられた複数の表示語彙と、当該複数の表示語彙のそれぞれのドメインとを表示語彙データベース12aから取得する。また、結果比較部12bは、実施の形態2と同様に各表示語彙の一致度も取得する。 When the recognition result of the speech recognition unit 11 includes the main vocabulary, the result comparison unit 12b in FIG. 9 obtains a plurality of display vocabulary previously associated with the main vocabulary and the domains of the plurality of display vocabularies. Obtained from the display vocabulary database 12a. In addition, the result comparison unit 12b also acquires the matching degree of each display vocabulary as in the second embodiment.
 図11は、車両情報データベース12gに記憶された情報の一例を示す図である。図11に示すように、車両情報データベース12gには、ドメインと、表示語彙に関する有効及び無効のいずれか1つとが互いに対応付けられている。なお、図11に示される情報は、ユーザなどによって予め設定されてもよいし、音声認識装置1などによって車両の走行履歴に基づき自動的に変更されてもよい。例えば、車両がガソリンの給油所に立ち寄った回数よりも軽油の給油所に立ち寄った回数の方が多いことが、走行履歴として記録されている場合には、音声認識装置1が、図11の「fuel station」の「有効」を「無効」に変更し、かつ、図11の「diesel」の「無効」を「有効」に変更してもよい。 FIG. 11 is a diagram showing an example of information stored in the vehicle information database 12g. As shown in FIG. 11, in the vehicle information database 12g, a domain and any one of valid and invalid regarding the display vocabulary are associated with each other. Note that the information shown in FIG. 11 may be set in advance by the user or the like, or may be automatically changed by the voice recognition device 1 or the like based on the travel history of the vehicle. For example, when it is recorded as the travel history that the number of times the vehicle has stopped at the gas oil filling station is larger than the number of times the vehicle has stopped at the gasoline filling station, the voice recognition device 1 displays “ The “valid” in “fuel station” may be changed to “invalid”, and the “invalid” in “diesel” in FIG. 11 may be changed to “valid”.
 図9の表示語彙更新部12hには、結果比較部12bから、音声認識部11の認識結果と、複数の表示語彙と、当該複数の表示語彙の一致度と、当該複数の表示語彙のドメインとが入力される。表示語彙更新部12hは、入力されたドメインと、車両情報データベース12gの情報とに基づいて、優先度算出部12dに出力すべき表示語彙を更新する。 The display vocabulary update unit 12h in FIG. 9 includes, from the result comparison unit 12b, a recognition result of the speech recognition unit 11, a plurality of display vocabularies, a degree of coincidence between the plurality of display vocabularies, a domain of the plurality of display vocabularies Is entered. The display vocabulary update unit 12h updates the display vocabulary to be output to the priority calculation unit 12d based on the input domain and the information in the vehicle information database 12g.
 例えば、結果比較部12bから表示語彙更新部12hに図10の表示語彙及びドメインが入力され、かつ、車両情報データベース12gに図11の情報が記憶されている場合を想定する。この場合、表示語彙更新部12hは、図11の情報において「有効」と対応付けられた「fuel station」をドメインとする「BP」及び「BP(fuel station)」という表示語彙及びそれらの一致度と、音声認識部11の認識結果とを優先度算出部12dに出力する。一方、表示語彙更新部12hは、図11の情報において「無効」と対応付けられた「diesel」をドメインとする「BP(diesel)」という表示語彙及びその一致度を優先度算出部12dに出力しない。 For example, it is assumed that the display vocabulary and domain of FIG. 10 are input from the result comparison unit 12b to the display vocabulary update unit 12h, and the information of FIG. 11 is stored in the vehicle information database 12g. In this case, the display vocabulary update unit 12h displays the display vocabulary “BP” and “BP (fuel station)” having “fuel station” associated with “valid” in the information of FIG. And the recognition result of the voice recognition unit 11 are output to the priority calculation unit 12d. On the other hand, the display vocabulary update unit 12h outputs the display vocabulary “BP (diesel)” whose domain is “diesel” associated with “invalid” in the information of FIG. 11 and its matching degree to the priority calculation unit 12d. do not do.
 優先度データベース12c、優先度算出部12d、判定情報データベース12e及び、認識語彙更新部12fの構成は、実施の形態2におけるそれらの構成と同様である。 The configurations of the priority database 12c, the priority calculation unit 12d, the determination information database 12e, and the recognized vocabulary update unit 12f are the same as those in the second embodiment.
 <動作>
 図12は、本実施の形態3に係る音声認識装置1の動作を示すフローチャートである。ステップS11にて図6のステップS1と同様に、音声認識部11は、入力された音声の認識を行い、認識結果を認識語彙選択部12の結果比較部12bに出力する。
<Operation>
FIG. 12 is a flowchart showing the operation of the speech recognition apparatus 1 according to the third embodiment. In step S 11, as in step S 1 of FIG. 6, the speech recognition unit 11 recognizes the input speech and outputs the recognition result to the result comparison unit 12 b of the recognition vocabulary selection unit 12.
 ステップS12にて、結果比較部12bは、表示語彙データベース12aを参照しつつ、音声認識部11からの認識結果に基づいて、複数の表示語彙と、複数の表示語彙のそれぞれの一致度と、複数の表示語彙のそれぞれのドメインとを取得する。そして、結果比較部12bは、音声認識部11の認識結果と、複数の表示語彙と、当該複数の表示語彙のそれぞれの一致度及びドメインとを表示語彙更新部12hに出力する。 In step S12, the result comparison unit 12b refers to the display vocabulary database 12a, and based on the recognition result from the speech recognition unit 11, a plurality of display vocabularies, the degree of coincidence of the plurality of display vocabularies, Get each domain of the displayed vocabulary. Then, the result comparison unit 12b outputs the recognition result of the speech recognition unit 11, the plurality of display vocabularies, and the degree of matching and the domain of each of the plurality of display vocabularies to the display vocabulary update unit 12h.
 ステップS13にて、表示語彙更新部12hは、音声認識部11の認識結果を優先度算出部12dに出力する。また、表示語彙更新部12hは、結果比較部12bからのドメインに基づいて、車両情報データベース12gで当該ドメインが「有効」と対応付けられた表示語彙と、当該表示語彙の一致度とを優先度算出部12dに出力する。なお、ドメインが「有効」と対応付けられた表示語彙は1つの場合もあるし、複数の場合もある。 In step S13, the display vocabulary update unit 12h outputs the recognition result of the voice recognition unit 11 to the priority calculation unit 12d. Also, the display vocabulary update unit 12h determines, based on the domain from the result comparison unit 12b, the display vocabulary in which the domain is associated with “valid” in the vehicle information database 12g and the matching degree of the display vocabulary. It outputs to the calculation part 12d. The display vocabulary associated with the domain “valid” may be one or plural.
 ステップS14にて図6のステップS3と同様に、優先度算出部12dは、優先度データベース12cを参照しつつ、表示語彙更新部12hからの各表示語彙の一致度に基づいて、各表示語彙の優先度を取得する。そして、優先度算出部12dは、音声認識部11の認識結果と、表示語彙と、表示語彙の優先度とを認識語彙更新部12fに出力する。 In step S14, as in step S3 of FIG. 6, the priority calculation unit 12d refers to the priority database 12c, and based on the matching degree of each display vocabulary from the display vocabulary update unit 12h, Get the priority. Then, the priority calculation unit 12d outputs the recognition result of the speech recognition unit 11, the display vocabulary, and the priority of the display vocabulary to the recognition vocabulary update unit 12f.
 ステップS15にて図6のステップS4と同様に、認識語彙更新部12fは、判定情報データベース12eを参照しつつ、優先度算出部12dからの優先度に基づいて、表示語彙から認識語彙を選択し、選択された認識語彙を図示しない表示装置などに出力する。また、認識語彙更新部12fは、選択された認識語彙以外の表示語彙を表示語彙データベース12aから削除する。その後、図12の動作が終了する。 In step S15, as in step S4 of FIG. 6, the recognized vocabulary update unit 12f selects a recognized vocabulary from the display vocabulary based on the priority from the priority calculation unit 12d while referring to the determination information database 12e. The selected recognition vocabulary is output to a display device (not shown). The recognized vocabulary update unit 12f deletes display vocabulary other than the selected recognized vocabulary from the display vocabulary database 12a. Thereafter, the operation of FIG. 12 ends.
 <実施の形態3のまとめ>
 以上のような本実施の形態3に係る音声認識装置1によれば、車両情報と、複数の表示語彙の優先度とに基づいて、複数の表示語彙から1以上の認識語彙を選択する。このような構成によれば、音声認識装置1の認識精度をより高めることができ、かつ、ユーザの混乱をより抑制することができる。
<Summary of Embodiment 3>
According to the speech recognition apparatus 1 according to the third embodiment as described above, one or more recognition vocabularies are selected from the plurality of display vocabularies based on the vehicle information and the priorities of the plurality of display vocabularies. According to such a configuration, the recognition accuracy of the speech recognition apparatus 1 can be further increased, and user confusion can be further suppressed.
 なお、上述した実施の形態3では、認識語彙選択部12は、車両情報に基づいて優先度を変更しなかった。しかしこれに限ったものではなく、認識語彙選択部12は、車両情報に基づいて、優先度を変更してもよい。例えば、認識語彙選択部12は、ステップS13で、ドメインが「diesel」である表示語彙の優先度を「低」に変更し、ステップS14で、当該表示語彙の優先度をそのまま維持してもよい。この場合も、上述と同様の効果を得ることができる。 In the third embodiment described above, the recognized vocabulary selection unit 12 does not change the priority based on the vehicle information. However, the present invention is not limited to this, and the recognized vocabulary selection unit 12 may change the priority based on the vehicle information. For example, the recognition vocabulary selection unit 12 may change the priority of the display vocabulary whose domain is “diesel” to “low” in step S13, and maintain the priority of the display vocabulary as it is in step S14. . In this case, the same effect as described above can be obtained.
 <実施の形態4>
 図13は、本発明の実施の形態4に係る音声認識装置1が備える認識語彙選択部12の構成を示すブロック図である。以下、本実施の形態4で説明する構成要素のうち、実施の形態2と同じまたは類似する構成要素については同じ参照符号を付し、異なる構成要素について主に説明する。
<Embodiment 4>
FIG. 13 is a block diagram showing a configuration of the recognition vocabulary selection unit 12 included in the speech recognition apparatus 1 according to Embodiment 4 of the present invention. Hereinafter, among the constituent elements described in the fourth embodiment, constituent elements that are the same as or similar to those in the second embodiment are denoted by the same reference numerals, and different constituent elements are mainly described.
 図13の認識語彙選択部12は、実施の形態2に係る認識語彙選択部12のブロック構成(図2)に加えて、階層情報データベース12iと、階層参照更新部12jとを備える。このように構成された認識語彙選択部12は、複数の表示語彙に予め規定された階層と、複数の表示語彙の優先度とに基づいて、複数の表示語彙から1以上の認識語彙を選択するように構成されている。以下、このことについて詳細に説明する。 13 includes a hierarchical information database 12i and a hierarchical reference updating unit 12j in addition to the block configuration (FIG. 2) of the recognized vocabulary selecting unit 12 according to the second embodiment. The recognition vocabulary selection unit 12 configured in this way selects one or more recognition vocabularies from the plurality of display vocabularies based on the hierarchy defined in advance for the plurality of display vocabularies and the priority of the plurality of display vocabularies. It is configured as follows. This will be described in detail below.
 図14は、表示語彙データベース12aに記憶された情報の一例を示す図である。図14に示すように、本実施の形態4に係る表示語彙データベース12aでは、実施の形態2で説明した図3の情報と、表示語彙の階層とが互いに対応付けられている。この例では、階層に付された番号が大きいほど階層が下位であり、下位の表示語彙の概念を包含する語彙が、上位の階層の表示語彙に用いられている。 FIG. 14 is a diagram showing an example of information stored in the display vocabulary database 12a. As shown in FIG. 14, in the display vocabulary database 12a according to the fourth embodiment, the information of FIG. 3 described in the second embodiment and the hierarchy of the display vocabulary are associated with each other. In this example, the higher the number assigned to the hierarchy, the lower the hierarchy, and the vocabulary including the concept of the lower display vocabulary is used for the display vocabulary of the upper hierarchy.
 図13の結果比較部12bは、音声認識部11の認識結果が本体語彙を含む場合に、当該本体語彙と予め対応付けられた複数の表示語彙と、当該複数の表示語彙のそれぞれの階層とを表示語彙データベース12aから取得する。また、結果比較部12bは、実施の形態2と同様に各表示語彙の一致度も取得する。 When the recognition result of the speech recognition unit 11 includes the main vocabulary, the result comparison unit 12b in FIG. 13 displays a plurality of display vocabulary previously associated with the main vocabulary and the respective levels of the plurality of display vocabularies. Obtained from the display vocabulary database 12a. In addition, the result comparison unit 12b also acquires the matching degree of each display vocabulary as in the second embodiment.
 図15は、階層情報データベース12iに記憶された情報の一例を示す図である。図15に示すように、階層情報データベース12iには、階層と、表示語彙に関する有効及び無効のいずれか1つとが互いに対応付けられている。なお、図15に示される情報は、ユーザなどによって予め設定されてもよいし、音声認識装置1などによって自動的に変更されてもよい。 FIG. 15 is a diagram showing an example of information stored in the hierarchical information database 12i. As shown in FIG. 15, in the hierarchy information database 12i, the hierarchy and any one of valid and invalid regarding the display vocabulary are associated with each other. Note that the information shown in FIG. 15 may be set in advance by a user or the like, or may be automatically changed by the voice recognition device 1 or the like.
 図13の階層参照更新部12jには、結果比較部12bから、音声認識部11の認識結果と、複数の表示語彙と、当該複数の表示語彙の一致度と、当該複数の表示語彙の階層とが入力される。階層参照更新部12jは、入力された階層と、階層情報データベース12iの情報とに基づいて、優先度算出部12dに出力すべき表示語彙を更新する。 The hierarchy reference update unit 12j in FIG. 13 includes a result comparison unit 12b, a recognition result of the speech recognition unit 11, a plurality of display vocabularies, a degree of coincidence between the plurality of display vocabularies, and a hierarchy of the plurality of display vocabularies. Is entered. The hierarchy reference update unit 12j updates the display vocabulary to be output to the priority calculation unit 12d based on the input hierarchy and the information in the hierarchy information database 12i.
 例えば、結果比較部12bから階層参照更新部12jに図14の表示語彙及び階層が入力され、かつ、階層情報データベース12iに図15の情報が記憶されている場合を想定する。この場合、階層参照更新部12jは、図15の情報において「有効」と対応付けられた「1」を階層とする「BP」という表示語彙及びその一致度と、音声認識部11の認識結果とを優先度算出部12dに出力する。一方、階層参照更新部12jは、図15の情報において「無効」と対応付けられた「2」を階層とする「BP(fuel station)」及び「BP(diesel)」という表示語彙及びそれらの一致度を優先度算出部12dに出力しない。 For example, it is assumed that the display vocabulary and hierarchy of FIG. 14 are input from the result comparison unit 12b to the hierarchy reference update unit 12j and the information of FIG. 15 is stored in the hierarchy information database 12i. In this case, the hierarchy reference update unit 12j displays the display vocabulary “BP” having “1” associated with “valid” in the information of FIG. Is output to the priority calculation unit 12d. On the other hand, the hierarchy reference update unit 12j displays the display vocabulary “BP (fuel station)” and “BP (diesel)” and their matching with “2” associated with “invalid” in the information of FIG. The degree is not output to the priority calculation unit 12d.
 優先度データベース12c、優先度算出部12d、判定情報データベース12e及び、認識語彙更新部12fの構成は、実施の形態2におけるそれらの構成と同様である。 The configurations of the priority database 12c, the priority calculation unit 12d, the determination information database 12e, and the recognized vocabulary update unit 12f are the same as those in the second embodiment.
 <動作>
 図16は、本実施の形態4に係る音声認識装置1の動作を示すフローチャートである。ステップS21にて図6のステップS1と同様に、音声認識部11は、入力された音声の認識を行い、認識結果を認識語彙選択部12の結果比較部12bに出力する。
<Operation>
FIG. 16 is a flowchart showing the operation of the speech recognition apparatus 1 according to the fourth embodiment. In step S21, as in step S1 of FIG. 6, the speech recognition unit 11 recognizes the input speech and outputs the recognition result to the result comparison unit 12b of the recognition vocabulary selection unit 12.
 ステップS22にて、結果比較部12bは、表示語彙データベース12aを参照しつつ、音声認識部11からの認識結果に基づいて、複数の表示語彙と、複数の表示語彙のそれぞれの一致度と、複数の表示語彙のそれぞれの階層とを取得する。そして、結果比較部12bは、音声認識部11の認識結果と、複数の表示語彙と、複数の表示語彙のそれぞれの一致度及び階層とを階層参照更新部12jに出力する。 In step S22, the result comparison unit 12b refers to the display vocabulary database 12a, and based on the recognition result from the speech recognition unit 11, the plurality of display vocabularies, the degree of coincidence of the plurality of display vocabularies, Get each hierarchy of display vocabulary. Then, the result comparison unit 12b outputs the recognition result of the speech recognition unit 11, the plurality of display vocabularies, and the degree of coincidence and the hierarchy of the plurality of display vocabularies to the layer reference update unit 12j.
 ステップS23にて、階層参照更新部12jは、音声認識部11の認識結果を優先度算出部12dに出力する。また、階層参照更新部12jは、結果比較部12bからの階層に基づいて、階層情報データベース12iで当該階層が「有効」と対応付けられた表示語彙と、当該表示語彙の一致度とを優先度算出部12dに出力する。なお、階層が「有効」と対応付けられた表示語彙は1つの場合もあるし、複数の場合もある。 In step S23, the hierarchy reference update unit 12j outputs the recognition result of the voice recognition unit 11 to the priority calculation unit 12d. Further, the hierarchy reference updating unit 12j determines, based on the hierarchy from the result comparison unit 12b, the display vocabulary in which the hierarchy is associated with “valid” in the hierarchy information database 12i and the matching degree of the display vocabulary. It outputs to the calculation part 12d. Note that there may be one or more display vocabulary associated with a hierarchy of “valid”.
 ステップS24にて図6のステップS3と同様に、優先度算出部12dは、優先度データベース12cを参照しつつ、階層参照更新部12jからの各表示語彙の一致度に基づいて、各表示語彙の優先度を取得する。そして、優先度算出部12dは、音声認識部11の認識結果と、表示語彙と、表示語彙の優先度とを認識語彙更新部12fに出力する。 In step S24, as in step S3 of FIG. 6, the priority calculation unit 12d refers to the priority database 12c, and based on the matching degree of each display vocabulary from the hierarchy reference update unit 12j, Get the priority. Then, the priority calculation unit 12d outputs the recognition result of the speech recognition unit 11, the display vocabulary, and the priority of the display vocabulary to the recognition vocabulary update unit 12f.
 ステップS25にて図6のステップS4と同様に、認識語彙更新部12fは、判定情報データベース12eを参照しつつ、優先度算出部12dからの優先度に基づいて、表示語彙から認識語彙を選択し、選択された認識語彙を図示しない表示装置などに出力する。また、認識語彙更新部12fは、選択された認識語彙以外の表示語彙を表示語彙データベース12aから削除する。その後、図16の動作が終了する。 In step S25, as in step S4 of FIG. 6, the recognized vocabulary update unit 12f selects a recognized vocabulary from the display vocabulary based on the priority from the priority calculation unit 12d while referring to the determination information database 12e. The selected recognition vocabulary is output to a display device (not shown). The recognized vocabulary update unit 12f deletes display vocabulary other than the selected recognized vocabulary from the display vocabulary database 12a. Thereafter, the operation of FIG. 16 ends.
 <実施の形態4のまとめ>
 以上のような本実施の形態4に係る音声認識装置1によれば、複数の表示語彙に予め規定された階層と、複数の表示語彙の優先度とに基づいて、複数の表示語彙から1以上の認識語彙を選択する。このような構成によれば、音声認識装置1の認識精度をより高めることができ、かつ、ユーザの混乱をより抑制することができる。
<Summary of Embodiment 4>
According to the speech recognition apparatus 1 according to the fourth embodiment as described above, at least one of the plurality of display vocabularies is selected based on the hierarchy defined in advance for the plurality of display vocabularies and the priority of the plurality of display vocabularies. Select a recognized vocabulary. According to such a configuration, the recognition accuracy of the speech recognition apparatus 1 can be further increased, and user confusion can be further suppressed.
 なお、上述した実施の形態4では、認識語彙選択部12は、階層に基づいて優先度を変更しなかった。しかしこれに限ったものではなく、認識語彙選択部12は、階層に基づいて、優先度を変更してもよい。例えば、認識語彙選択部12は、ステップS23で、階層が「2」である表示語彙の優先度を「低」に変更し、ステップS24で、当該表示語彙の優先度をそのまま維持してもよい。この場合も、上述と同様の効果を得ることができる。 In Embodiment 4 described above, the recognized vocabulary selection unit 12 does not change the priority based on the hierarchy. However, the present invention is not limited to this, and the recognized vocabulary selection unit 12 may change the priority based on the hierarchy. For example, the recognition vocabulary selection unit 12 may change the priority of the display vocabulary having the hierarchy “2” to “low” in step S23, and maintain the priority of the display vocabulary as it is in step S24. . In this case, the same effect as described above can be obtained.
 <実施の形態5>
 図17は、本発明の実施の形態5に係る音声認識装置1の構成を示すブロック図である。以下、本実施の形態5で説明する構成要素のうち、実施の形態2と同じまたは類似する構成要素については同じ参照符号を付し、異なる構成要素について主に説明する。
<Embodiment 5>
FIG. 17 is a block diagram showing the configuration of the speech recognition apparatus 1 according to Embodiment 5 of the present invention. Hereinafter, among the constituent elements described in the fifth embodiment, constituent elements that are the same as or similar to those in the second embodiment are denoted by the same reference numerals, and different constituent elements are mainly described.
 図17の認識語彙選択部12は、実施の形態2に係る認識語彙選択部12のブロック構成(図2)に加えて、SW(ソフトウェア)情報データベース12kと、SW制限参照更新部12mとを備える。このように構成された認識語彙選択部12は、音声認識装置1を用いるシステムにおけるソフトウェアの要件と、複数の表示語彙の優先度とに基づいて、複数の表示語彙から1以上の認識語彙を選択するように構成されている。以下、このことについて詳細に説明する。 The recognition vocabulary selection unit 12 in FIG. 17 includes a SW (software) information database 12k and a SW restriction reference update unit 12m in addition to the block configuration (FIG. 2) of the recognition vocabulary selection unit 12 according to the second embodiment. . The recognition vocabulary selection unit 12 configured in this way selects one or more recognition vocabulary from a plurality of display vocabularies based on software requirements in the system using the speech recognition apparatus 1 and the priority of the plurality of display vocabularies. Is configured to do. This will be described in detail below.
 図18は、SW情報データベース12kに記憶された情報の一例を示す図である。図18に示すように、SW情報データベース12kには、音声認識装置1を用いるシステムにおけるソフトウェアの要件として、当該システムが表示可能な認識語彙の数が記憶されている。なお、図18に示される情報は、ユーザなどによって予め設定されてもよいし、音声認識装置1などによって上記ソフトウェアの要件に基づき自動的に変更されてもよい。 FIG. 18 is a diagram showing an example of information stored in the SW information database 12k. As shown in FIG. 18, the SW information database 12k stores the number of recognized vocabulary that can be displayed by the system as a software requirement in the system using the speech recognition apparatus 1. Note that the information shown in FIG. 18 may be set in advance by a user or the like, or may be automatically changed by the voice recognition device 1 or the like based on the requirements of the software.
 図17のSW制限参照更新部12mには、認識語彙更新部12fから認識語彙と、当該認識語彙の優先度とが入力される。ここで認識語彙の優先度とは、認識語彙となった表示語彙について得られていた優先度である。SW制限参照更新部12mは、認識語彙更新部12fから入力された認識語彙の数が、SW情報データベース12kに記憶された表示可能数以下である場合には、そのまま出力する。 17 is input with the recognition vocabulary and the priority of the recognition vocabulary from the recognition vocabulary update unit 12f. Here, the priority of the recognized vocabulary is the priority obtained for the display vocabulary that has become the recognized vocabulary. When the number of recognized vocabulary input from the recognized vocabulary updating unit 12f is equal to or less than the displayable number stored in the SW information database 12k, the SW restriction reference updating unit 12m outputs it as it is.
 一方、SW制限参照更新部12mは、認識語彙更新部12fから入力された認識語彙の数が、SW情報データベース12kに記憶された表示可能数を超える場合には、認識語彙のそれぞれの優先度を1つ低くする。この結果、SW制限参照更新部12mは、いくつかの認識語彙の優先度を「低」にすることができる。SW制限参照更新部12mは、優先度を変更した後に、判定情報データベース12eの情報を用いて認識語彙更新部12fと同様の動作を行うことにより、優先度変更後の認識語彙から優先度が「中」である認識語彙を選択する。SW制限参照更新部12mは、以上のような優先度の変更を適宜行うことにより、表示可能数以下の認識語彙を選択する。 On the other hand, when the number of recognized vocabulary input from the recognized vocabulary updating unit 12f exceeds the displayable number stored in the SW information database 12k, the SW restriction reference updating unit 12m sets the priority of each recognized vocabulary. Lower by one. As a result, the SW restriction reference updating unit 12m can set the priority of some recognized vocabulary to “low”. After the priority is changed, the SW restriction reference update unit 12m performs the same operation as the recognition vocabulary update unit 12f using the information in the determination information database 12e, so that the priority is changed from the recognized vocabulary after the priority change to “ Select a recognition vocabulary that is “medium”. The SW restriction reference updating unit 12m selects recognition vocabulary that is less than or equal to the displayable number by appropriately changing the priority as described above.
 <動作>
 図19は、本実施の形態5に係る音声認識装置1の動作を示すフローチャートである。ステップS31~S33まで、図6のステップS1~S3と同様の動作が行われる。
<Operation>
FIG. 19 is a flowchart showing the operation of the speech recognition apparatus 1 according to the fifth embodiment. From Steps S31 to S33, operations similar to Steps S1 to S3 in FIG. 6 are performed.
 ステップS34にて、認識語彙更新部12fは、判定情報データベース12eを参照しつつ、優先度算出部12dからの優先度に基づいて、複数の表示語彙から認識語彙を選択する。そして、認識語彙更新部12fは、選択された認識語彙と、当該認識語彙の優先度とをSW制限参照更新部12mに出力する。また、認識語彙更新部12fは、選択された認識語彙以外の表示語彙を表示語彙データベース12aから削除する。 In step S34, the recognized vocabulary update unit 12f selects a recognized vocabulary from a plurality of display vocabularies based on the priority from the priority calculation unit 12d while referring to the determination information database 12e. Then, the recognized vocabulary update unit 12f outputs the selected recognized vocabulary and the priority of the recognized vocabulary to the SW restriction reference update unit 12m. The recognized vocabulary update unit 12f deletes display vocabulary other than the selected recognized vocabulary from the display vocabulary database 12a.
 ステップS35にて、SW制限参照更新部12mは、SW情報データベース12kを参照しつつ、認識語彙更新部12fからの認識語彙及び優先度に基づいて、表示可能数以下の認識語彙を選択し、選択された認識語彙を図示しない表示装置などに出力する。この際、SW制限参照更新部12mは、認識語彙更新部12fが行う削除と同様の削除を行うことによって、出力されなかった表示語彙を表示語彙データベース12aから削除してもよい。その後、図19の動作が終了する。 In step S35, the SW restriction reference updating unit 12m selects and selects a recognition vocabulary having a displayable number or less based on the recognition vocabulary and priority from the recognition vocabulary update unit 12f while referring to the SW information database 12k. The recognized vocabulary is output to a display device (not shown). At this time, the SW restriction reference update unit 12m may delete the display vocabulary that has not been output from the display vocabulary database 12a by performing deletion similar to the deletion performed by the recognized vocabulary update unit 12f. Thereafter, the operation of FIG. 19 ends.
 <実施の形態5のまとめ>
 以上のような本実施の形態5に係る音声認識装置1によれば、音声認識装置1を用いるシステムにおけるソフトウェアの要件と、複数の表示語彙の優先度とに基づいて、複数の表示語彙から1以上の認識語彙を選択する。このような構成によれば、上記ソフトウェアの要件を自動的に満たすことが可能な音声認識装置1を実現することができる。
<Summary of Embodiment 5>
According to the speech recognition device 1 according to the fifth embodiment as described above, one of the plurality of display vocabularies is selected based on the software requirements in the system using the speech recognition device 1 and the priority of the plurality of display vocabularies. Select the above recognition vocabulary. According to such a configuration, it is possible to realize the speech recognition apparatus 1 that can automatically satisfy the requirements of the software.
 <実施の形態6>
 図20は、本発明の実施の形態6に係る音声認識装置1の構成を示すブロック図である。以下、本実施の形態6で説明する構成要素のうち、実施の形態2と同じまたは類似する構成要素については同じ参照符号を付し、異なる構成要素について主に説明する。
<Embodiment 6>
FIG. 20 is a block diagram showing a configuration of speech recognition apparatus 1 according to Embodiment 6 of the present invention. Hereinafter, among the constituent elements described in the sixth embodiment, the same or similar constituent elements as those in the second embodiment are denoted by the same reference numerals, and different constituent elements will be mainly described.
 図20の認識語彙選択部12は、実施の形態2に係る認識語彙選択部12のブロック構成(図2)に加えて、HW(ハードウェア)情報データベース12nと、HW制限参照更新部12oとを備える。このように構成された認識語彙選択部12は、音声認識装置1を用いるシステムにおけるハードウェアの要件と、複数の表示語彙の優先度とに基づいて、複数の表示語彙から1以上の認識語彙を選択するように構成されている。以下、このことについて詳細に説明する。 The recognition vocabulary selection unit 12 of FIG. 20 includes an HW (hardware) information database 12n and an HW restriction reference update unit 12o in addition to the block configuration (FIG. 2) of the recognition vocabulary selection unit 12 according to the second embodiment. Prepare. The recognition vocabulary selection unit 12 configured as described above selects one or more recognition vocabularies from a plurality of display vocabularies based on hardware requirements in the system using the speech recognition apparatus 1 and the priorities of the plurality of display vocabularies. Configured to select. This will be described in detail below.
 図21は、HW情報データベース12nに記憶された情報の一例を示す図である。図21に示すように、HW情報データベース12nには、音声認識装置1を用いるシステムにおけるハードウェアの要件として、当該システムの図示しないメモリが将来記憶可能な表示語彙の数が記憶されている。なお、図21に示される情報は、ユーザなどによって予め設定されてもよいし、音声認識装置1などによって上記ハードウェアの要件に基づき自動的に変更されてもよい。 FIG. 21 is a diagram showing an example of information stored in the HW information database 12n. As shown in FIG. 21, in the HW information database 12n, the number of display vocabularies that can be stored in the future by a memory (not shown) of the system is stored as a hardware requirement in the system using the speech recognition apparatus 1. Note that the information shown in FIG. 21 may be set in advance by a user or the like, or may be automatically changed by the voice recognition apparatus 1 or the like based on the hardware requirements.
 図20のHW制限参照更新部12oには、認識語彙更新部12fから認識語彙と、当該認識語彙の優先度とが入力される。HW制限参照更新部12oは、認識語彙更新部12fから入力された認識語彙の数が、HW情報データベース12nに記憶された記憶可能数以下である場合には、そのまま出力する。 20, the recognition vocabulary and the priority of the recognition vocabulary are input to the HW restriction reference update unit 12o from the recognition vocabulary update unit 12f. When the number of recognized vocabulary input from the recognized vocabulary update unit 12f is equal to or less than the storable number stored in the HW information database 12n, the HW restriction reference update unit 12o outputs the same as it is.
 一方、HW制限参照更新部12oは、認識語彙更新部12fから入力された認識語彙の数が、HW情報データベース12nに記憶された記憶可能数を超える場合には、認識語彙のそれぞれの優先度を1つ低くする。この結果、HW制限参照更新部12oは、いくつかの認識語彙の優先度を「低」にすることができる。HW制限参照更新部12oは、優先度を変更した後に、判定情報データベース12eの情報を用いて認識語彙更新部12fと同様の動作を行うことにより、優先度変更後の認識語彙から優先度が「中」である認識語彙を選択する。HW制限参照更新部12oは、以上のような優先度の変更を適宜行うことにより、記憶可能数以下の認識語彙を選択する。 On the other hand, when the number of recognized vocabulary input from the recognized vocabulary updating unit 12f exceeds the storable number stored in the HW information database 12n, the HW restriction reference updating unit 12o sets the priority of each recognized vocabulary. Lower by one. As a result, the HW restriction reference updating unit 12o can set the priority of some recognized vocabularies to “low”. After changing the priority, the HW restriction reference update unit 12o performs the same operation as the recognized vocabulary update unit 12f using the information in the determination information database 12e, so that the priority is changed from the recognized vocabulary after the priority change to “ Select a recognition vocabulary that is “medium”. The HW restriction reference update unit 12o selects recognition vocabulary that is less than or equal to the storable number by appropriately changing the priority as described above.
 <動作>
 図22は、本実施の形態6に係る音声認識装置1の動作を示すフローチャートである。ステップS41~S43まで、図6のステップS1~S3と同様の動作が行われる。
<Operation>
FIG. 22 is a flowchart showing the operation of the speech recognition apparatus 1 according to the sixth embodiment. From Steps S41 to S43, the same operation as Steps S1 to S3 in FIG. 6 is performed.
 ステップS44にて、認識語彙更新部12fは、判定情報データベース12eを参照しつつ、優先度算出部12dからの優先度に基づいて、複数の表示語彙から認識語彙を選択する。そして、認識語彙更新部12fは、選択された認識語彙と、当該認識語彙の優先度とをHW制限参照更新部12oに出力する。また、認識語彙更新部12fは、選択された認識語彙以外の表示語彙を表示語彙データベース12aから削除する。 In step S44, the recognized vocabulary update unit 12f selects a recognized vocabulary from a plurality of display vocabularies based on the priority from the priority calculation unit 12d while referring to the determination information database 12e. Then, the recognized vocabulary update unit 12f outputs the selected recognized vocabulary and the priority of the recognized vocabulary to the HW restriction reference update unit 12o. The recognized vocabulary update unit 12f deletes display vocabulary other than the selected recognized vocabulary from the display vocabulary database 12a.
 ステップS45にて、HW制限参照更新部12oは、HW情報データベース12nを参照しつつ、認識語彙更新部12fからの認識語彙及び優先度に基づいて、記憶可能数以下の認識語彙を選択し、選択された認識語彙を図示しない表示装置などに出力する。この際、HW制限参照更新部12oは、認識語彙更新部12fが行う削除と同様の削除を行うことによって、出力されなかった表示語彙を表示語彙データベース12aから削除してもよい。その後、図22の動作が終了する。 In step S45, the HW restriction reference updating unit 12o refers to the HW information database 12n, selects and selects recognition vocabulary less than the storable number based on the recognition vocabulary and priority from the recognition vocabulary update unit 12f. The recognized vocabulary is output to a display device (not shown). At this time, the HW restriction reference update unit 12o may delete the display vocabulary that has not been output from the display vocabulary database 12a by performing deletion similar to the deletion performed by the recognized vocabulary update unit 12f. Thereafter, the operation of FIG. 22 ends.
 <実施の形態6のまとめ>
 以上のような本実施の形態6に係る音声認識装置1によれば、音声認識装置1を用いるシステムにおけるハードウェアの要件と、複数の表示語彙の優先度とに基づいて、複数の表示語彙から1以上の認識語彙を選択する。このような構成によれば、上記ハードウェアの要件を自動的に満たすことが可能な音声認識装置1を実現することができる。
<Summary of Embodiment 6>
According to the speech recognition apparatus 1 according to the sixth embodiment as described above, based on the hardware requirements in the system using the speech recognition apparatus 1 and the priorities of the plurality of display vocabularies, the plurality of display vocabularies are used. Select one or more recognition vocabularies. According to such a configuration, the speech recognition apparatus 1 that can automatically satisfy the hardware requirements can be realized.
 <その他の変形例>
 上述した音声認識装置1における音声認識部11及び認識語彙選択部12を、以下「音声認識部11等」と記す。音声認識部11等は、図23に示す処理回路81により実現される。すなわち、処理回路81は、入力された音声の認識を行う音声認識部11と、音声認識部11の認識によって、予め定められた語彙である本体語彙を含む認識結果が得られた場合に、それぞれが本体語彙を含み、本体語彙と予め対応付けられた複数の候補語彙を取得し、かつ各候補語彙について優先度を取得し、取得した優先度に基づいて、複数の候補語彙から1以上の候補語彙を1以上の認識語彙として選択する認識語彙選択部12と、を備える。処理回路81には、専用のハードウェアが適用されてもよいし、メモリに格納されるプログラムを実行するプロセッサが適用されてもよい。プロセッサには、例えば、中央処理装置、処理装置、演算装置、マイクロプロセッサ、マイクロコンピュータ、DSP(Digital Signal Processor)などが該当する。
<Other variations>
The speech recognition unit 11 and the recognition vocabulary selection unit 12 in the speech recognition apparatus 1 described above are hereinafter referred to as “speech recognition unit 11 etc.”. The voice recognition unit 11 and the like are realized by a processing circuit 81 shown in FIG. That is, the processing circuit 81 recognizes the input speech, and when the recognition result including the body vocabulary which is a predetermined vocabulary is obtained by the recognition of the speech recognition unit 11, respectively. Includes a main vocabulary, obtains a plurality of candidate vocabulary previously associated with the main vocabulary, obtains a priority for each candidate vocabulary, and based on the obtained priority, obtains one or more candidates from the plurality of candidate vocabularies A recognition vocabulary selection unit 12 that selects a vocabulary as one or more recognition vocabularies. Dedicated hardware may be applied to the processing circuit 81, or a processor that executes a program stored in the memory may be applied. The processor corresponds to, for example, a central processing unit, a processing unit, an arithmetic unit, a microprocessor, a microcomputer, a DSP (Digital Signal Processor) and the like.
 処理回路81が専用のハードウェアである場合、処理回路81は、例えば、単一回路、複合回路、プログラム化したプロセッサ、並列プログラム化したプロセッサ、ASIC(Application Specific Integrated Circuit)、FPGA(Field Programmable Gate Array)、またはこれらを組み合わせたものが該当する。音声認識部11等の各部の機能それぞれは、処理回路を分散させた回路で実現されてもよいし、各部の機能をまとめて一つの処理回路で実現されてもよい。 When the processing circuit 81 is dedicated hardware, the processing circuit 81 includes, for example, a single circuit, a composite circuit, a programmed processor, a parallel programmed processor, an ASIC (Application Specific Integrated Circuit), an FPGA (Field Programmable Gate). Array) or a combination thereof. Each function of each unit such as the speech recognition unit 11 may be realized by a circuit in which processing circuits are distributed, or the function of each unit may be realized by a single processing circuit.
 処理回路81がプロセッサである場合、音声認識部11等の機能は、ソフトウェア等との組み合わせにより実現される。なお、ソフトウェア等には、例えば、ソフトウェア、ファームウェア、または、ソフトウェア及びファームウェアが該当する。ソフトウェア等はプログラムとして記述され、メモリに格納される。図24に示すように、処理回路81に適用されるプロセッサ82は、メモリ83に記憶されたプログラムを読み出して実行することにより、各部の機能を実現する。すなわち、音声認識装置1は、処理回路81により実行されるときに、入力された音声を認識するステップと、認識によって、予め定められた語彙である本体語彙を含む認識結果が得られた場合に、それぞれが本体語彙を含み、本体語彙と予め対応付けられた複数の候補語彙を取得し、かつ各候補語彙について優先度を取得し、取得した優先度に基づいて、複数の候補語彙から1以上の候補語彙を1以上の認識語彙として選択するステップと、が結果的に実行されることになるプログラムを格納するためのメモリ83を備える。換言すれば、このプログラムは、音声認識部11等の手順や方法をコンピュータに実行させるものであるともいえる。ここで、メモリ83には、例えば、RAM(Random Access Memory)、ROM(Read Only Memory)、フラッシュメモリー、EPROM(Erasable Programmable Read Only Memory)、EEPROM(Electrically Erasable Programmable Read Only Memory)などの、不揮発性または揮発性の半導体メモリ、HDD(Hard Disk Drive)、磁気ディスク、フレキシブルディスク、光ディスク、コンパクトディスク、ミニディスク、DVD(Digital Versatile Disc)及びそのドライブ装置等、あらゆる記憶媒体が該当する。 When the processing circuit 81 is a processor, the functions of the voice recognition unit 11 and the like are realized by a combination with software or the like. Note that the software or the like corresponds to, for example, software, firmware, or software and firmware. Software or the like is described as a program and stored in a memory. As shown in FIG. 24, the processor 82 applied to the processing circuit 81 reads out and executes the program stored in the memory 83, thereby realizing the functions of the respective units. That is, when the speech recognition apparatus 1 is executed by the processing circuit 81, the step of recognizing the input speech and the recognition result including the main vocabulary which is a predetermined vocabulary are obtained by the recognition. , Each including a body vocabulary, obtaining a plurality of candidate vocabulary previously associated with the body vocabulary, obtaining a priority for each candidate vocabulary, and based on the obtained priority, one or more from the plurality of candidate vocabularies And selecting a candidate vocabulary as one or more recognition vocabularies, and a memory 83 for storing a program to be executed as a result. In other words, it can be said that this program causes the computer to execute procedures and methods such as the speech recognition unit 11. Here, the memory 83 is, for example, non-volatile such as RAM (Random Access Memory), ROM (Read Only Memory), flash memory, EPROM (Erasable Programmable Read Only Memory), EEPROM (Electrically Erasable Programmable Read Only Memory), etc. In addition, all storage media such as volatile semiconductor memory, HDD (Hard Disk Drive), magnetic disk, flexible disk, optical disk, compact disk, mini disk, DVD (Digital Versatile Disk) and its drive device are applicable.
 以上、音声認識部11等の各機能が、ハードウェア及びソフトウェア等のいずれか一方で実現される構成について説明した。しかしこれに限ったものではなく、音声認識部11等の一部を専用のハードウェアで実現し、別の一部をソフトウェア等で実現する構成であってもよい。例えば、音声認識部11については専用のハードウェアとしての処理回路でその機能を実現し、それ以外についてはプロセッサ82としての処理回路81がメモリ83に格納されたプログラムを読み出して実行することによってその機能を実現することが可能である。 As described above, the configuration in which each function of the voice recognition unit 11 and the like is realized by either hardware or software has been described. However, the present invention is not limited to this, and a configuration in which a part of the voice recognition unit 11 or the like is realized by dedicated hardware and another part is realized by software or the like. For example, the function of the speech recognition unit 11 is realized by a processing circuit as dedicated hardware, and the processing circuit 81 as the processor 82 reads and executes the program stored in the memory 83 for the other functions. A function can be realized.
 以上のように、処理回路81は、ハードウェア、ソフトウェア等、またはこれらの組み合わせによって、上述の各機能を実現することができる。 As described above, the processing circuit 81 can realize the above functions by hardware, software, or the like, or a combination thereof.
 また、以上で説明した音声認識装置は、PND(Portable Navigation Device)などのナビゲーション装置と、携帯電話、スマートフォン及びタブレットなどの携帯端末を含む通信端末と、これらにインストールされるアプリケーションの機能と、サーバとを適宜に組み合わせてシステムとして構築される音声認識システムにも適用することができる。この場合、以上で説明した音声認識装置の各機能あるいは各構成要素は、前記システムを構築する各機器に分散して配置されてもよいし、いずれかの機器に集中して配置されてもよい。 The voice recognition device described above includes a navigation device such as PND (Portable Navigation Device), a communication terminal including a mobile terminal such as a mobile phone, a smartphone, and a tablet, a function of an application installed in these, a server Can be applied to a speech recognition system constructed as a system by appropriately combining the above. In this case, each function or each component of the speech recognition apparatus described above may be distributed and arranged in each device that constructs the system, or may be concentrated on any device. .
 図25は、本変形例に係るサーバ51の構成を示すブロック図である。図25のサーバ51は、通信部51aと、音声認識部51bと、認識語彙選択部51cとを備えており、車両52のナビゲーション装置53と無線通信を行うことが可能となっている。 FIG. 25 is a block diagram showing a configuration of the server 51 according to this modification. The server 51 of FIG. 25 includes a communication unit 51a, a voice recognition unit 51b, and a recognition vocabulary selection unit 51c, and can perform wireless communication with the navigation device 53 of the vehicle 52.
 通信部51aは、ナビゲーション装置53と無線通信を行うことにより、ナビゲーション装置53で取得された音声データを受信する。 The communication unit 51a receives the voice data acquired by the navigation device 53 by performing wireless communication with the navigation device 53.
 音声認識部51b及び認識語彙選択部51cは、サーバ51の図示しないプロセッサなどが、サーバ51の図示しない記憶装置に記憶されたプログラムを実行することにより、図1の音声認識部11及び認識語彙選択部12と同様の機能を有している。つまり、音声認識部51bは、通信部51aの音声データを認識する。認識語彙選択部51cは、音声認識部51bの認識結果に基づいて、複数の表示語彙、及び、複数の表示語彙の優先度を取得し、複数の表示語彙の優先度に基づいて認識語彙を選択する。そして、通信部51aは、認識語彙選択部51cで選択された認識語彙をナビゲーション装置53に送信する。 The speech recognition unit 51b and the recognition vocabulary selection unit 51c are configured such that the processor (not shown) of the server 51 executes a program stored in a storage device (not shown) of the server 51, so that the speech recognition unit 11 and the recognition vocabulary selection of FIG. It has the same function as the unit 12. That is, the voice recognition unit 51b recognizes the voice data of the communication unit 51a. The recognition vocabulary selection unit 51c acquires a plurality of display vocabularies and priorities of the plurality of display vocabularies based on the recognition result of the speech recognition unit 51b, and selects the recognition vocabulary based on the priorities of the plurality of display vocabularies. To do. Then, the communication unit 51a transmits the recognized vocabulary selected by the recognized vocabulary selecting unit 51c to the navigation device 53.
 このように構成されたサーバ51によれば、例えば、ナビゲーション装置53が、表示機能と、サーバ51との通信機能としか有さなくても、実施の形態1で説明した音声認識装置1と同様の効果を得ることができる。 According to the server 51 configured in this way, for example, even if the navigation device 53 has only a display function and a communication function with the server 51, it is the same as the voice recognition device 1 described in the first embodiment. The effect of can be obtained.
 図26は、本変形例に係る通信端末56の構成を示すブロック図である。図26の通信端末56は、通信部51a、音声認識部51b及び認識語彙選択部51cと同様の通信部56a、音声認識部56b及び認識語彙選択部56cを備えており、車両57のナビゲーション装置58と無線通信を行うことが可能となっている。なお、通信端末56には、例えば車両57の運転者が携帯する携帯電話機、スマートフォン、及びタブレットなどの携帯端末が適用される。このように構成された通信端末56によれば、例えば、ナビゲーション装置58が、表示機能と、通信端末56との通信機能としか有さなくても、実施の形態1で説明した音声認識装置1と同様の効果を得ることができる。 FIG. 26 is a block diagram showing the configuration of the communication terminal 56 according to this modification. 26 includes a communication unit 56a, a speech recognition unit 56b, and a recognized vocabulary selection unit 56c similar to the communication unit 51a, the speech recognition unit 51b, and the recognized vocabulary selection unit 51c, and a navigation device 58 of the vehicle 57. Wireless communication is possible. As the communication terminal 56, for example, a mobile terminal such as a mobile phone, a smartphone, and a tablet carried by the driver of the vehicle 57 is applied. According to the communication terminal 56 configured in this way, for example, even if the navigation device 58 has only a display function and a communication function with the communication terminal 56, the voice recognition device 1 described in the first embodiment. The same effect can be obtained.
 なお、本発明は、その発明の範囲内において、各実施の形態及び各変形例を自由に組み合わせたり、各実施の形態及び各変形例を適宜、変形、省略したりすることが可能である。 The present invention can be freely combined with each embodiment and each modification within the scope of the invention, or can be appropriately modified and omitted with each embodiment and each modification.
 本発明は詳細に説明されたが、上記した説明は、すべての態様において、例示であって、本発明がそれに限定されるものではない。例示されていない無数の変形例が、本発明の範囲から外れることなく想定され得るものと解される。 Although the present invention has been described in detail, the above description is illustrative in all aspects, and the present invention is not limited thereto. It is understood that countless variations that are not illustrated can be envisaged without departing from the scope of the present invention.
 1 音声認識装置、11 音声認識部、12 認識語彙選択部。 1 speech recognition device, 11 speech recognition unit, 12 recognition vocabulary selection unit.

Claims (9)

  1.  入力された音声の認識を行う音声認識部と、
     前記音声認識部の前記認識によって、予め定められた語彙である本体語彙を含む認識結果が得られた場合に、それぞれが前記本体語彙を含み、前記本体語彙と予め対応付けられた複数の候補語彙を取得し、かつ各前記候補語彙について優先度を取得し、前記取得した優先度に基づいて、前記複数の候補語彙から1以上の候補語彙を1以上の認識語彙として選択する認識語彙選択部と
    を備える、音声認識装置。
    A speech recognition unit that recognizes the input speech;
    When the recognition result including the main vocabulary which is a predetermined vocabulary is obtained by the recognition of the voice recognition unit, each of the candidate vocabulary includes the main vocabulary and is associated with the main vocabulary in advance. A recognition vocabulary selection unit that obtains a priority for each candidate vocabulary and selects one or more candidate vocabulary from the plurality of candidate vocabularies as one or more recognition vocabulary based on the obtained priority A voice recognition device comprising:
  2.  請求項1に記載の音声認識装置であって、
     前記複数の候補語彙は、
     前記本体語彙そのもの、及び、前記本体語彙と組み合わさり当該本体語彙を詳細にする付属語彙と当該本体語彙とを組み合わせた語彙を含む、音声認識装置。
    The speech recognition device according to claim 1,
    The plurality of candidate vocabularies are:
    A speech recognition apparatus including the main vocabulary itself, and a vocabulary obtained by combining the main vocabulary with an attached vocabulary that combines the main vocabulary and details the main vocabulary.
  3.  請求項1に記載の音声認識装置であって、
     前記認識語彙選択部は、
     前記1以上の認識語彙を選択した場合に、当該1以上の認識語彙以外の前記複数の候補語彙を、次の選択以降のいずれかの選択において除外可能である、音声認識装置。
    The speech recognition device according to claim 1,
    The recognition vocabulary selection unit includes:
    A speech recognition device capable of excluding a plurality of candidate vocabularies other than the one or more recognized vocabularies in any of the subsequent selections when the one or more recognized vocabularies are selected.
  4.  請求項1に記載の音声認識装置であって、
     前記認識語彙選択部は、
     前記認識結果と各前記候補語彙とに基づいて、各前記候補語彙が前記認識結果と一致する度合である一致度を、前記各候補語彙の前記優先度として取得する、音声認識装置。
    The speech recognition device according to claim 1,
    The recognition vocabulary selection unit includes:
    A speech recognition device that obtains, as the priority of each candidate vocabulary, a degree of coincidence that is a degree that each candidate vocabulary matches the recognition result based on the recognition result and each candidate vocabulary.
  5.  請求項1に記載の音声認識装置であって、
     前記音声認識装置は車両にて用いられ、
     前記認識語彙選択部は、
     前記車両の情報と、前記複数の候補語彙の前記優先度とに基づいて、前記1以上の認識語彙を選択する、音声認識装置。
    The speech recognition device according to claim 1,
    The voice recognition device is used in a vehicle,
    The recognition vocabulary selection unit includes:
    A speech recognition device that selects the one or more recognition vocabulary based on the vehicle information and the priority of the plurality of candidate vocabularies.
  6.  請求項1に記載の音声認識装置であって、
     前記認識語彙選択部は、
     前記複数の候補語彙に予め規定された階層と、前記複数の候補語彙の前記優先度とに基づいて、前記1以上の認識語彙を選択する、音声認識装置。
    The speech recognition device according to claim 1,
    The recognition vocabulary selection unit includes:
    A speech recognition apparatus that selects the one or more recognition vocabulary based on a hierarchy defined in advance in the plurality of candidate vocabularies and the priority of the plurality of candidate vocabularies.
  7.  請求項1に記載の音声認識装置であって、
     前記認識語彙選択部は、
     前記音声認識装置を用いるシステムにおけるソフトウェアの要件と、前記複数の候補語彙の前記優先度とに基づいて、前記1以上の認識語彙を選択する、音声認識装置。
    The speech recognition device according to claim 1,
    The recognition vocabulary selection unit includes:
    A speech recognition apparatus that selects the one or more recognition vocabularies based on software requirements in a system using the speech recognition apparatus and the priorities of the plurality of candidate vocabularies.
  8.  請求項1に記載の音声認識装置であって、
     前記音声認識装置を用いるシステムにおけるハードウェアの要件と、前記複数の候補語彙の前記優先度とに基づいて、前記1以上の認識語彙を選択する、音声認識装置。
    The speech recognition device according to claim 1,
    A speech recognition apparatus that selects the one or more recognition vocabulary based on hardware requirements in a system using the speech recognition apparatus and the priority of the plurality of candidate vocabularies.
  9.  入力された音声を認識し、
     前記認識によって、予め定められた語彙である本体語彙を含む認識結果が得られた場合に、それぞれが前記本体語彙を含み、前記本体語彙と予め対応付けられた複数の候補語彙を取得し、かつ各前記候補語彙について優先度を取得し、前記取得した優先度に基づいて、前記複数の候補語彙から1以上の候補語彙を1以上の認識語彙として選択する、音声認識方法。
    Recognize the input voice,
    When a recognition result including a body vocabulary that is a predetermined vocabulary is obtained by the recognition, each of the body vocabulary includes a plurality of candidate vocabularies that are associated with the body vocabulary in advance, and A speech recognition method for obtaining a priority for each candidate vocabulary and selecting one or more candidate vocabulary from the plurality of candidate vocabularies as one or more recognition vocabulary based on the obtained priority.
PCT/JP2016/085689 2016-12-01 2016-12-01 Voice recognition device and voice recognition method WO2018100705A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/JP2016/085689 WO2018100705A1 (en) 2016-12-01 2016-12-01 Voice recognition device and voice recognition method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/JP2016/085689 WO2018100705A1 (en) 2016-12-01 2016-12-01 Voice recognition device and voice recognition method

Publications (1)

Publication Number Publication Date
WO2018100705A1 true WO2018100705A1 (en) 2018-06-07

Family

ID=62242804

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2016/085689 WO2018100705A1 (en) 2016-12-01 2016-12-01 Voice recognition device and voice recognition method

Country Status (1)

Country Link
WO (1) WO2018100705A1 (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2006071791A (en) * 2004-08-31 2006-03-16 Fuji Heavy Ind Ltd Vehicle voice recognition device
JP2007101892A (en) * 2005-10-04 2007-04-19 Denso Corp Speech recognition device
JP2008134503A (en) * 2006-11-29 2008-06-12 Nissan Motor Co Ltd Speech recognition apparatus and speech recognition method
JP2008134502A (en) * 2006-11-29 2008-06-12 Nissan Motor Co Ltd Speech recognition apparatus and speech recognition method
JP2014142465A (en) * 2013-01-23 2014-08-07 Canon Inc Acoustic model generation device and method, and voice recognition device and method

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2006071791A (en) * 2004-08-31 2006-03-16 Fuji Heavy Ind Ltd Vehicle voice recognition device
JP2007101892A (en) * 2005-10-04 2007-04-19 Denso Corp Speech recognition device
JP2008134503A (en) * 2006-11-29 2008-06-12 Nissan Motor Co Ltd Speech recognition apparatus and speech recognition method
JP2008134502A (en) * 2006-11-29 2008-06-12 Nissan Motor Co Ltd Speech recognition apparatus and speech recognition method
JP2014142465A (en) * 2013-01-23 2014-08-07 Canon Inc Acoustic model generation device and method, and voice recognition device and method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
YONGGEE JANG ET AL.: "Speech interface on combination of search candidates from the common word parts", IEICE TECHNICAL REPORT, vol. 109, no. 355, 14 December 2009 (2009-12-14), pages 219 - 224 *

Similar Documents

Publication Publication Date Title
US8412455B2 (en) Voice-controlled navigation device and method
US9805722B2 (en) Interactive speech recognition system
US20120290303A1 (en) Speech recognition system and method based on word-level candidate generation
US10514268B2 (en) Search system
US20150120288A1 (en) System and method of performing automatic speech recognition using local private data
JP2012230670A (en) System, method, and computer program for correcting incorrect recognition by return
US20120239399A1 (en) Voice recognition device
CN106233246A (en) User interface system, user interface control device, user interface control method and user interface control program
JP5705312B2 (en) Information equipment
US20190115015A1 (en) Vehicular voice recognition system and method for controlling the same
JP6896335B2 (en) Speech recognition device and speech recognition method
CN103635961B (en) Pronunciation information generating device, vehicle-mounted information device, and word string information processing method
KR20120052591A (en) Apparatus and method for error correction in a continuous speech recognition system
KR102069700B1 (en) Automatic speech recognition system for replacing specific domain search network, mobile device and method thereof
WO2018100705A1 (en) Voice recognition device and voice recognition method
WO2018073907A1 (en) Speech recognition device and speech recognition method
JP5396426B2 (en) Speech recognition apparatus, speech recognition method, and speech recognition program
US9704479B2 (en) Speech recognition device
JP2003162293A (en) Voice recognition device and method
JP4926689B2 (en) Facility search device
US11107474B2 (en) Character input device, character input method, and character input program
US10915565B2 (en) Retrieval result providing device and retrieval result providing method
EP3292376B1 (en) Automatic data switching approach in onboard voice destination entry (vde) navigation solution
JP7038919B2 (en) Multilingual speech recognition device and multilingual speech recognition method
WO2016136208A1 (en) Voice interaction device, voice interaction system, control method of voice interaction device

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16922780

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 16922780

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: JP