WO2018100705A1

WO2018100705A1 - Voice recognition device and voice recognition method

Info

Publication number: WO2018100705A1
Application number: PCT/JP2016/085689
Authority: WO
Inventors: 昭男堀井
Original assignee: 三菱電機株式会社
Priority date: 2016-12-01
Filing date: 2016-12-01
Publication date: 2018-06-07

Abstract

The purpose of the present invention is to provide technology capable of improving the recognition accuracy of a voice recognition device. This voice recognition device includes: a voice recognition unit that recognizes an input voice; and a recognized-word selection unit that, when a recognition result including a main word that is a predetermined word is acquired due to recognition by the voice recognition unit, acquires a plurality of candidate words, each including the main word and being pre-associated with the main word, acquires a priority for each of the candidate words, and selects, as one or more recognized words, one or more candidate words from among the plurality of candidate words on the basis of the acquired priorities.

Description

Speech recognition apparatus and speech recognition method

The present invention relates to a speech recognition apparatus and speech recognition method for recognizing speech.

Various technologies for voice recognition devices have been proposed. For example, in the technique of Patent Document 1, a recognition vocabulary (or a feature amount of a recognition vocabulary) obtained from speech and a function are stored in association with each other, and the same utterance as the recognition vocabulary corresponding to the stored speech is performed. If it is, the function associated with the voice is executed.

On the other hand, in general, when a device that uses the recognition result of the speech recognition device performs a search using the recognition result of the speech recognition device, the recognition result may be ambiguous. For example, (i) in the case of a facility search, it becomes ambiguous whether the facility name “BP” is “BP” in the facility category “fuel station” or “BP” in the facility category “diesel”. May end up. Also, for example, (ii) in the case of a place name search, it is ambiguous whether the city name “Munchen” is a city in the state “Bavaria” or a city in the state “Hutthum”. As one of countermeasures, for the above (i), for example, the search results are “BP (fuel station)” and “BP (diesel)”, and for the above (ii), for example, the search results are “Munchen ( Bavaria) ”and“ Munchen (Hutthrum) ”. That is, it is possible to give information to distinguish the search results presented to the user. For example, in the technique of Patent Document 2, it is possible to use a recognition vocabulary including information for this distinction.

JP 2003-323192 A Japanese Patent No. 4554272

However, the techniques of Patent Document 1 and Patent Document 2 are biased toward adding recognition vocabulary. In general, an increase in the number of recognized vocabulary causes a reduction in recognition accuracy of the speech recognition apparatus. For this reason, there is room for improving the recognition accuracy of the speech recognition apparatus in that the recognition result is not narrowed down to the result intended by the user.

Therefore, the present invention has been made in view of the above-described problems, and an object thereof is to provide a technique capable of improving the recognition accuracy of a speech recognition apparatus.

The speech recognition apparatus according to the present invention recognizes a speech recognition unit that recognizes input speech and a recognition result including a main vocabulary that is a predetermined vocabulary by recognition of the speech recognition unit. Includes a main vocabulary, obtains a plurality of candidate vocabulary previously associated with the main vocabulary, obtains a priority for each candidate vocabulary, and based on the obtained priority, obtains one or more candidates from the plurality of candidate vocabularies A recognition vocabulary selection unit that selects a vocabulary as one or more recognition vocabularies.

According to the present invention, when a recognition result including the main vocabulary is obtained by recognition, a plurality of candidate vocabularies are acquired, and priorities are acquired for each candidate vocabulary, and a plurality of candidates are acquired based on the acquired priorities. One or more candidate vocabularies are selected as one or more recognition vocabularies from the candidate vocabularies. Thereby, the recognition accuracy of the speech recognition apparatus can be increased.

The objects, features, aspects and advantages of the present invention will become more apparent from the following detailed description and the accompanying drawings.

1 is a block diagram showing a configuration of a speech recognition apparatus according to Embodiment 1. FIG. 6 is a block diagram showing a configuration of a recognized vocabulary selection unit according to Embodiment 2. FIG. It is a figure which shows an example of the information of the display vocabulary database which concerns on Embodiment 2. FIG. It is a figure which shows an example of the information of the priority database which concerns on Embodiment 2. It is a figure which shows an example of the information of the determination information database which concerns on Embodiment 2. FIG. 6 is a flowchart showing the operation of the speech recognition apparatus according to the second embodiment. FIG. 10 is a diagram illustrating an operation result of the first example of the speech recognition apparatus according to the second embodiment. It is a figure which shows the operation result of the 2nd example of the speech recognition apparatus which concerns on Embodiment 2. FIG. 10 is a block diagram showing a configuration of a recognized vocabulary selection unit according to Embodiment 3. FIG. It is a figure which shows an example of the information of the display vocabulary database which concerns on Embodiment 3. FIG. It is a figure which shows an example of the information of the vehicle information database which concerns on Embodiment 3. 10 is a flowchart showing the operation of the speech recognition apparatus according to the third embodiment. FIG. 10 is a block diagram illustrating a configuration of a recognized vocabulary selection unit according to Embodiment 4. It is a figure which shows an example of the information of the display vocabulary database which concerns on Embodiment 4. FIG. It is a figure which shows an example of the information of the hierarchy information database which concerns on Embodiment 4. 10 is a flowchart showing the operation of the speech recognition apparatus according to the fourth embodiment. FIG. 10 is a block diagram illustrating a configuration of a recognized vocabulary selection unit according to Embodiment 5. It is a figure which shows an example of the information of SW information database which concerns on Embodiment 5. FIG. 10 is a flowchart showing the operation of the speech recognition apparatus according to the fifth embodiment. FIG. 10 is a block diagram showing a configuration of a recognized vocabulary selection unit according to Embodiment 6. It is a figure which shows an example of the information of the HW information database which concerns on Embodiment 6. FIG. 14 is a flowchart showing the operation of the speech recognition apparatus according to the sixth embodiment. It is a block diagram which shows the hardware constitutions of the navigation apparatus which concerns on another modification. It is a block diagram which shows the hardware constitutions of the navigation apparatus which concerns on another modification. It is a block diagram which shows the structure of the server which concerns on another modification. It is a block diagram which shows the structure of the communication terminal which concerns on another modification.

<Embodiment 1>
FIG. 1 is a block diagram showing a configuration of a speech recognition apparatus 1 according to Embodiment 1 of the present invention. The speech recognition apparatus 1 in FIG. 1 includes a speech recognition unit 11 and a recognition vocabulary selection unit 12.

The voice recognition unit 11 recognizes the input voice. For example, the speech recognition unit 11 sequentially converts the input speech into an analog speech signal and a digital speech signal, and recognizes a character string and a phrase corresponding to the digital speech signal based on the digital speech signal. Get as. Note that, using the technique described in Japanese Patent Laid-Open No. 9-50291, the speech recognition unit 11 recognizes a vocabulary recognized based on a vocabulary recognized by speech recognition, that is, a vocabulary most likely to be acoustically and linguistically generated by the user. May be selected as a recognition result. The voice recognition unit 11 may appropriately use dictionary data stored in the recognition dictionary database 11a when performing this recognition. Dictionary data is data including a character string acquired as a recognition result.

When the recognition vocabulary selection unit 12 obtains a recognition result including the main vocabulary, which is a predetermined vocabulary, by the recognition of the speech recognition unit 11, the recognition vocabulary selection unit 12 associates with the main vocabulary in advance. A plurality of candidate vocabularies are acquired, and a priority is acquired for each candidate vocabulary. Each of the plurality of candidate vocabularies is a vocabulary including the associated main vocabulary.

Then, the recognition vocabulary selection unit 12 selects one or more candidate vocabulary from a plurality of candidate vocabulary as one or more recognition vocabulary based on the acquired priority.

<Summary of Embodiment 1>
According to the speech recognition apparatus 1 according to the first embodiment as described above, when a recognition result including the main vocabulary is obtained, a plurality of candidate vocabularies are acquired, and a priority is acquired for each candidate vocabulary. Based on the acquired priority, one or more candidate vocabularies are selected from a plurality of candidate vocabularies as one or more recognition vocabularies. According to such a configuration, a plurality of candidate vocabularies can be narrowed down to vocabulary intended by the user based on the priority. Therefore, the recognition accuracy of the speech recognition apparatus 1 can be improved, and the confusion of the user that has occurred when many vocabularies are notified to the user can be suppressed.

<Embodiment 2>
FIG. 2 is a block diagram showing a configuration of the recognition vocabulary selection unit 12 included in the speech recognition apparatus 1 according to Embodiment 2 of the present invention. Hereinafter, among the constituent elements described in the second embodiment, the same or similar constituent elements as those in the first embodiment are denoted by the same reference numerals, and different constituent elements will be mainly described.

The recognition vocabulary selection unit 12 included in the speech recognition apparatus 1 of FIG. 2 includes a display vocabulary database 12a, a result comparison unit 12b, a priority database 12c, a priority calculation unit 12d, a determination information database 12e, and a recognition vocabulary update. Part 12f.

FIG. 3 is a diagram showing an example of information stored in the display vocabulary database 12a. As shown in FIG. 3, the display vocabulary database 12a includes a main vocabulary such as “BP” and a plurality of display vocabularies such as “BP”, “BP (fuel station)”, and “BP (diesel)”. The associated information is stored.

The main vocabulary includes, for example, the same place name given to a plurality of different places, the same name given to a plurality of facilities, the same abbreviation given to a plurality of different formal names, and names similar to these. Etc. apply.

The display vocabulary corresponds to the candidate vocabulary described in the first embodiment. In the second embodiment, the plurality of display vocabularies include the main vocabulary itself and a vocabulary obtained by combining the main vocabulary with an attached vocabulary that combines the main vocabulary and details the main vocabulary. In the example of FIG. 3, parentheses are appropriately added and postfix information that is an attached vocabulary following the main vocabulary is used.

The recognition result is input from the speech recognition unit 11 to the result comparison unit 12b in FIG. When the recognition result of the speech recognition unit 11 includes the main vocabulary, the result comparison unit 12b acquires a plurality of display vocabulary associated with the main vocabulary from the display vocabulary database 12a.

Here, two examples of the case where the information of FIG. 3 is stored in the display vocabulary database 12a will be described.

As a first example, an example in which the recognition result is the main body vocabulary “BP” itself will be described. At this time, the result comparison unit 12b acquires display vocabularies “BP”, “BP (fuel station)”, and “BP (diesel)” associated with the main vocabulary “BP”.

As a second example, an example in which the recognition result is “BP station” will be described. At this time, the result comparison unit 12b acquires the display vocabulary "BP", "BP (fuel station)", and "BP (diesel)" associated with the main vocabulary "BP" included in the "BP station". To do. As a result of the result comparison unit 12b, the first example and the second example are the same.

By the way, the result comparison unit 12b according to the second embodiment acquires a degree of coincidence that is the degree to which each display vocabulary matches the recognition result based on the recognition result of the speech recognition unit 11 and a plurality of display vocabularies. Hereinafter, the description will be made assuming that the degree of coincidence is divided into three stages of the first degree, the second degree, and the third degree. Of these, the first degree means that the displayed vocabulary completely matches the recognition result. The second degree means that in the display vocabulary that combines the main body vocabulary and the postfix information, the main body vocabulary matches part of the recognition result, and part of the postfix information matches the rest of the recognition result. To do. The third degree means that in the display vocabulary that combines the body vocabulary and the postscript information, the body vocabulary matches a part of the recognition result, but the postfix information does not partially match the rest of the recognition result. To do.

Here, the case where the information of FIG. 3 is stored in the display vocabulary database 12a will be described.

In the first example in which the recognition result is the main vocabulary “BP” itself, the result comparison unit 12b acquires the first degree for the display vocabulary “BP”, and obtains “BP (fuel station)” and “BP ( The third degree is obtained for the display vocabulary “diesel)”.

In the second example in which the recognition result is “BP station”, the result comparison unit 12b acquires the first degree for the display vocabulary “BP” and the second for the display vocabulary “BP (fuel station)”. The degree is obtained, and the third degree is obtained for the display vocabulary “BP (diesel)”.

FIG. 4 is a diagram illustrating an example of information stored in the priority database 12c. As shown in FIG. 4, the priority database 12c associates the degree of matching with the priority. Specifically, high priority, medium priority, and low priority are associated with the first degree, the second degree, and the third degree, respectively.

2 is input to the priority calculation unit 12d in FIG. 2 from the result comparison unit 12b, the recognition result of the speech recognition unit 11, a plurality of display vocabularies, and the matching degree of the plurality of display vocabularies. The priority calculating unit 12d acquires the priority of each display vocabulary from the priority database 12c based on the input matching degree of each display vocabulary. As a result, the recognition vocabulary selection unit 12 according to the second embodiment uses the recognition result and each display vocabulary to set the degree of coincidence, which is the degree to which each display vocabulary matches the recognition result, as the priority of each display vocabulary. Get as.

FIG. 5 is a diagram illustrating an example of information stored in the determination information database 12e. As shown in FIG. 5, in the determination information database 12e, a priority is associated with a determination rule as to whether to determine as a recognized vocabulary, that is, whether to select as a recognized vocabulary.

2, the recognition result of the speech recognition unit 11, a plurality of display vocabularies, and priorities of the plurality of display vocabularies are input from the priority calculation unit 12d. Based on the input priority, the recognized vocabulary update unit 12f selects one or more display vocabulary from the plurality of display vocabularies as one or more recognized vocabulary according to the determination rule of the determination information database 12e. The selected recognition vocabulary is displayed on, for example, a display device (not shown), or is output as voice by a voice output device (not shown).

In addition, when one or more recognized vocabulary words are selected, the recognized vocabulary update unit 12f can exclude a plurality of display vocabularies other than the one or more recognized vocabulary words in any of the subsequent selections. It has become. As an example, when the recognition vocabulary update unit 12f according to the second embodiment selects one or more recognition vocabularies, the recognition vocabulary update unit 12f continuously stores the one or more recognition vocabularies in the display vocabulary database 12a. A plurality of display vocabularies other than the above recognized vocabulary are deleted from the display vocabulary database 12a. In this case, the recognized vocabulary update unit 12f can exclude a plurality of display vocabularies other than the selected one or more recognized vocabularies in the next selection.

However, the recognized vocabulary update unit 12f is not limited to this. For example, the recognized vocabulary update unit 12f may not immediately delete the display vocabulary that has not been selected once as the recognized vocabulary from the display vocabulary database 12a. Then, the recognized vocabulary update unit 12f may delete from the display vocabulary database 12a the display vocabulary that has not been continuously selected a plurality of times that is predetermined as the recognized vocabulary. In this case, the recognized vocabulary update unit 12f can exclude a plurality of display vocabularies other than the selected one or more recognized vocabularies in the selection after the next selection.

<Operation>
FIG. 6 is a flowchart showing the operation of the speech recognition apparatus 1 according to the second embodiment.

First, in step S1, the speech recognition unit 11 recognizes the input speech and outputs the recognition result to the result comparison unit 12b of the recognition vocabulary selection unit 12.

In step S2, the result comparison unit 12b refers to the display vocabulary database 12a, and acquires a plurality of display vocabularies and the degree of coincidence of the plurality of display vocabularies based on the recognition result from the speech recognition unit 11. To do. Then, the result comparison unit 12b outputs the recognition result of the voice recognition unit 11, the plurality of display vocabularies, and the degree of coincidence of the plurality of display vocabularies to the priority calculation unit 12d.

In step S3, the priority calculation unit 12d acquires the priority of each display vocabulary based on the matching degree of each display vocabulary from the result comparison unit 12b while referring to the priority database 12c. Then, the priority calculation unit 12d outputs the recognition result of the speech recognition unit 11, the plurality of display vocabularies, and the respective priorities of the plurality of display vocabularies to the recognition vocabulary update unit 12f.

In step S4, the recognized vocabulary update unit 12f selects and selects one or more recognized vocabulary from a plurality of display vocabularies based on the priority from the priority calculation unit 12d while referring to the determination information database 12e. The recognized vocabulary is output to a display device (not shown). The recognized vocabulary update unit 12f deletes a plurality of display vocabularies other than the selected one or more recognized vocabularies from the display vocabulary database 12a. Thereafter, the operation of FIG. 6 ends.

7 and 8 are diagrams showing the operation results of the first example and the second example described above.

As shown in FIG. 7, in the first example in which the recognition result of the speech recognition unit 11 is the body vocabulary itself “BP”, among the display vocabularies described above, “BP” is selected as the recognition vocabulary. “BP (fuel station)” and “BP (diesel)” are not selected as recognition vocabularies. Therefore, “BP” is continuously stored in the display vocabulary database 12a, but “BP (fuel station)” and “BP (diesel)” are deleted from the display vocabulary database 12a.

On the other hand, as shown in FIG. 8, in the second example in which the recognition result of the speech recognition unit 11 is “BP station”, among the display vocabulary described above, “BP” and “BP (fuel station)” are recognized vocabularies. However, “BP (diesel)” is not selected as a recognition vocabulary. For this reason, “BP” and “BP (fuel station)” are continuously stored in the display vocabulary database 12a, but “BP (diesel)” is deleted from the display vocabulary database 12a.

<Summary of Embodiment 2>
According to the speech recognition apparatus 1 according to the second embodiment as described above, as in the first embodiment, one or more display vocabularies from a plurality of display vocabularies as one or more recognition vocabularies based on priority. select. For this reason, similarly to Embodiment 1, the recognition accuracy of the speech recognition apparatus 1 can be increased, and the confusion of the user can be suppressed.

In the second embodiment, the plurality of display vocabularies include the main vocabulary itself and a vocabulary obtained by combining the main vocabulary with the postfix information combined with the main vocabulary to make the main vocabulary detailed. According to such a configuration, as shown in FIGS. 7 and 8, the body vocabulary “BP” is selected as the display vocabulary in both the first example and the second example. Thus, as long as the recognition result of the speech recognition unit 11 includes the main body vocabulary, the main body vocabulary can be selected as the display vocabulary regardless of the content of the recognition result.

In addition, when one or more recognition vocabulary is selected, the speech recognition apparatus 1 according to the second embodiment excludes a plurality of display vocabularies other than the one or more recognition vocabulary in any selection after the next selection. It is possible. According to such a configuration, it is possible to reduce processing for selecting a recognized vocabulary from a plurality of display vocabularies in any selection after the next selection. Therefore, the processing load of the speech recognition apparatus 1 can be reduced.

Also, the speech recognition apparatus 1 according to the second embodiment acquires the matching degree of each display vocabulary as the priority of each display vocabulary. According to such a configuration, it is possible to narrow the display vocabulary corresponding to the vocabulary intended by the user by speaking. Therefore, the recognition accuracy of the speech recognition apparatus 1 can be increased, and user confusion can be suppressed.

In the second embodiment described above, the degree of coincidence and the priority are divided into three stages. However, the present invention is not limited to this, and the degree of coincidence and priority may be divided into two stages or may be divided into four or more stages.

<Embodiment 3>
FIG. 9 is a block diagram showing a configuration of the recognition vocabulary selection unit 12 included in the speech recognition apparatus 1 according to Embodiment 3 of the present invention. Hereinafter, among the constituent elements described in the third embodiment, constituent elements that are the same as or similar to those in the second embodiment are denoted by the same reference numerals, and different constituent elements are mainly described.

The voice recognition device 1 in FIG. 9 according to the third embodiment is used in a vehicle. 9 includes a vehicle information database 12g and a display vocabulary update unit 12h, in addition to the block configuration (FIG. 2) of the recognition vocabulary selection unit 12 according to the second embodiment. The recognition vocabulary selection unit 12 configured in this way selects one or more recognition vocabularies from the plurality of display vocabularies based on the vehicle information that is information of the vehicle and the priority of the plurality of display vocabularies. This will be described in detail below.

FIG. 10 is a diagram showing an example of information stored in the display vocabulary database 12a. As shown in FIG. 10, in the display vocabulary database 12a according to the third embodiment, the information shown in FIG. 3 described in the second embodiment and the domain are associated with each other. Here, the domain is a kind of vehicle information, and for the domain, for example, information related to vehicle specifications is used.

When the recognition result of the speech recognition unit 11 includes the main vocabulary, the result comparison unit 12b in FIG. 9 obtains a plurality of display vocabulary previously associated with the main vocabulary and the domains of the plurality of display vocabularies. Obtained from the display vocabulary database 12a. In addition, the result comparison unit 12b also acquires the matching degree of each display vocabulary as in the second embodiment.

FIG. 11 is a diagram showing an example of information stored in the vehicle information database 12g. As shown in FIG. 11, in the vehicle information database 12g, a domain and any one of valid and invalid regarding the display vocabulary are associated with each other. Note that the information shown in FIG. 11 may be set in advance by the user or the like, or may be automatically changed by the voice recognition device 1 or the like based on the travel history of the vehicle. For example, when it is recorded as the travel history that the number of times the vehicle has stopped at the gas oil filling station is larger than the number of times the vehicle has stopped at the gasoline filling station, the voice recognition device 1 displays “ The “valid” in “fuel station” may be changed to “invalid”, and the “invalid” in “diesel” in FIG. 11 may be changed to “valid”.

The display vocabulary update unit 12h in FIG. 9 includes, from the result comparison unit 12b, a recognition result of the speech recognition unit 11, a plurality of display vocabularies, a degree of coincidence between the plurality of display vocabularies, a domain of the plurality of display vocabularies Is entered. The display vocabulary update unit 12h updates the display vocabulary to be output to the priority calculation unit 12d based on the input domain and the information in the vehicle information database 12g.

For example, it is assumed that the display vocabulary and domain of FIG. 10 are input from the result comparison unit 12b to the display vocabulary update unit 12h, and the information of FIG. 11 is stored in the vehicle information database 12g. In this case, the display vocabulary update unit 12h displays the display vocabulary “BP” and “BP (fuel station)” having “fuel station” associated with “valid” in the information of FIG. And the recognition result of the voice recognition unit 11 are output to the priority calculation unit 12d. On the other hand, the display vocabulary update unit 12h outputs the display vocabulary “BP (diesel)” whose domain is “diesel” associated with “invalid” in the information of FIG. 11 and its matching degree to the priority calculation unit 12d. do not do.

The configurations of the priority database 12c, the priority calculation unit 12d, the determination information database 12e, and the recognized vocabulary update unit 12f are the same as those in the second embodiment.

<Operation>
FIG. 12 is a flowchart showing the operation of the speech recognition apparatus 1 according to the third embodiment. In step S 11, as in step S 1 of FIG. 6, the speech recognition unit 11 recognizes the input speech and outputs the recognition result to the result comparison unit 12 b of the recognition vocabulary selection unit 12.

In step S12, the result comparison unit 12b refers to the display vocabulary database 12a, and based on the recognition result from the speech recognition unit 11, a plurality of display vocabularies, the degree of coincidence of the plurality of display vocabularies, Get each domain of the displayed vocabulary. Then, the result comparison unit 12b outputs the recognition result of the speech recognition unit 11, the plurality of display vocabularies, and the degree of matching and the domain of each of the plurality of display vocabularies to the display vocabulary update unit 12h.

In step S13, the display vocabulary update unit 12h outputs the recognition result of the voice recognition unit 11 to the priority calculation unit 12d. Also, the display vocabulary update unit 12h determines, based on the domain from the result comparison unit 12b, the display vocabulary in which the domain is associated with “valid” in the vehicle information database 12g and the matching degree of the display vocabulary. It outputs to the calculation part 12d. The display vocabulary associated with the domain “valid” may be one or plural.

In step S14, as in step S3 of FIG. 6, the priority calculation unit 12d refers to the priority database 12c, and based on the matching degree of each display vocabulary from the display vocabulary update unit 12h, Get the priority. Then, the priority calculation unit 12d outputs the recognition result of the speech recognition unit 11, the display vocabulary, and the priority of the display vocabulary to the recognition vocabulary update unit 12f.

In step S15, as in step S4 of FIG. 6, the recognized vocabulary update unit 12f selects a recognized vocabulary from the display vocabulary based on the priority from the priority calculation unit 12d while referring to the determination information database 12e. The selected recognition vocabulary is output to a display device (not shown). The recognized vocabulary update unit 12f deletes display vocabulary other than the selected recognized vocabulary from the display vocabulary database 12a. Thereafter, the operation of FIG. 12 ends.

<Summary of Embodiment 3>
According to the speech recognition apparatus 1 according to the third embodiment as described above, one or more recognition vocabularies are selected from the plurality of display vocabularies based on the vehicle information and the priorities of the plurality of display vocabularies. According to such a configuration, the recognition accuracy of the speech recognition apparatus 1 can be further increased, and user confusion can be further suppressed.

In the third embodiment described above, the recognized vocabulary selection unit 12 does not change the priority based on the vehicle information. However, the present invention is not limited to this, and the recognized vocabulary selection unit 12 may change the priority based on the vehicle information. For example, the recognition vocabulary selection unit 12 may change the priority of the display vocabulary whose domain is “diesel” to “low” in step S13, and maintain the priority of the display vocabulary as it is in step S14. . In this case, the same effect as described above can be obtained.

<Embodiment 4>
FIG. 13 is a block diagram showing a configuration of the recognition vocabulary selection unit 12 included in the speech recognition apparatus 1 according to Embodiment 4 of the present invention. Hereinafter, among the constituent elements described in the fourth embodiment, constituent elements that are the same as or similar to those in the second embodiment are denoted by the same reference numerals, and different constituent elements are mainly described.

13 includes a hierarchical information database 12i and a hierarchical reference updating unit 12j in addition to the block configuration (FIG. 2) of the recognized vocabulary selecting unit 12 according to the second embodiment. The recognition vocabulary selection unit 12 configured in this way selects one or more recognition vocabularies from the plurality of display vocabularies based on the hierarchy defined in advance for the plurality of display vocabularies and the priority of the plurality of display vocabularies. It is configured as follows. This will be described in detail below.

FIG. 14 is a diagram showing an example of information stored in the display vocabulary database 12a. As shown in FIG. 14, in the display vocabulary database 12a according to the fourth embodiment, the information of FIG. 3 described in the second embodiment and the hierarchy of the display vocabulary are associated with each other. In this example, the higher the number assigned to the hierarchy, the lower the hierarchy, and the vocabulary including the concept of the lower display vocabulary is used for the display vocabulary of the upper hierarchy.

When the recognition result of the speech recognition unit 11 includes the main vocabulary, the result comparison unit 12b in FIG. 13 displays a plurality of display vocabulary previously associated with the main vocabulary and the respective levels of the plurality of display vocabularies. Obtained from the display vocabulary database 12a. In addition, the result comparison unit 12b also acquires the matching degree of each display vocabulary as in the second embodiment.

FIG. 15 is a diagram showing an example of information stored in the hierarchical information database 12i. As shown in FIG. 15, in the hierarchy information database 12i, the hierarchy and any one of valid and invalid regarding the display vocabulary are associated with each other. Note that the information shown in FIG. 15 may be set in advance by a user or the like, or may be automatically changed by the voice recognition device 1 or the like.

The hierarchy reference update unit 12j in FIG. 13 includes a result comparison unit 12b, a recognition result of the speech recognition unit 11, a plurality of display vocabularies, a degree of coincidence between the plurality of display vocabularies, and a hierarchy of the plurality of display vocabularies. Is entered. The hierarchy reference update unit 12j updates the display vocabulary to be output to the priority calculation unit 12d based on the input hierarchy and the information in the hierarchy information database 12i.

For example, it is assumed that the display vocabulary and hierarchy of FIG. 14 are input from the result comparison unit 12b to the hierarchy reference update unit 12j and the information of FIG. 15 is stored in the hierarchy information database 12i. In this case, the hierarchy reference update unit 12j displays the display vocabulary “BP” having “1” associated with “valid” in the information of FIG. Is output to the priority calculation unit 12d. On the other hand, the hierarchy reference update unit 12j displays the display vocabulary “BP (fuel station)” and “BP (diesel)” and their matching with “2” associated with “invalid” in the information of FIG. The degree is not output to the priority calculation unit 12d.

<Operation>
FIG. 16 is a flowchart showing the operation of the speech recognition apparatus 1 according to the fourth embodiment. In step S21, as in step S1 of FIG. 6, the speech recognition unit 11 recognizes the input speech and outputs the recognition result to the result comparison unit 12b of the recognition vocabulary selection unit 12.

In step S22, the result comparison unit 12b refers to the display vocabulary database 12a, and based on the recognition result from the speech recognition unit 11, the plurality of display vocabularies, the degree of coincidence of the plurality of display vocabularies, Get each hierarchy of display vocabulary. Then, the result comparison unit 12b outputs the recognition result of the speech recognition unit 11, the plurality of display vocabularies, and the degree of coincidence and the hierarchy of the plurality of display vocabularies to the layer reference update unit 12j.

In step S23, the hierarchy reference update unit 12j outputs the recognition result of the voice recognition unit 11 to the priority calculation unit 12d. Further, the hierarchy reference updating unit 12j determines, based on the hierarchy from the result comparison unit 12b, the display vocabulary in which the hierarchy is associated with “valid” in the hierarchy information database 12i and the matching degree of the display vocabulary. It outputs to the calculation part 12d. Note that there may be one or more display vocabulary associated with a hierarchy of “valid”.

In step S24, as in step S3 of FIG. 6, the priority calculation unit 12d refers to the priority database 12c, and based on the matching degree of each display vocabulary from the hierarchy reference update unit 12j, Get the priority. Then, the priority calculation unit 12d outputs the recognition result of the speech recognition unit 11, the display vocabulary, and the priority of the display vocabulary to the recognition vocabulary update unit 12f.

In step S25, as in step S4 of FIG. 6, the recognized vocabulary update unit 12f selects a recognized vocabulary from the display vocabulary based on the priority from the priority calculation unit 12d while referring to the determination information database 12e. The selected recognition vocabulary is output to a display device (not shown). The recognized vocabulary update unit 12f deletes display vocabulary other than the selected recognized vocabulary from the display vocabulary database 12a. Thereafter, the operation of FIG. 16 ends.

<Summary of Embodiment 4>
According to the speech recognition apparatus 1 according to the fourth embodiment as described above, at least one of the plurality of display vocabularies is selected based on the hierarchy defined in advance for the plurality of display vocabularies and the priority of the plurality of display vocabularies. Select a recognized vocabulary. According to such a configuration, the recognition accuracy of the speech recognition apparatus 1 can be further increased, and user confusion can be further suppressed.

In Embodiment 4 described above, the recognized vocabulary selection unit 12 does not change the priority based on the hierarchy. However, the present invention is not limited to this, and the recognized vocabulary selection unit 12 may change the priority based on the hierarchy. For example, the recognition vocabulary selection unit 12 may change the priority of the display vocabulary having the hierarchy “2” to “low” in step S23, and maintain the priority of the display vocabulary as it is in step S24. . In this case, the same effect as described above can be obtained.

<Embodiment 5>
FIG. 17 is a block diagram showing the configuration of the speech recognition apparatus 1 according to Embodiment 5 of the present invention. Hereinafter, among the constituent elements described in the fifth embodiment, constituent elements that are the same as or similar to those in the second embodiment are denoted by the same reference numerals, and different constituent elements are mainly described.

The recognition vocabulary selection unit 12 in FIG. 17 includes a SW (software) information database 12k and a SW restriction reference update unit 12m in addition to the block configuration (FIG. 2) of the recognition vocabulary selection unit 12 according to the second embodiment. . The recognition vocabulary selection unit 12 configured in this way selects one or more recognition vocabulary from a plurality of display vocabularies based on software requirements in the system using the speech recognition apparatus 1 and the priority of the plurality of display vocabularies. Is configured to do. This will be described in detail below.

FIG. 18 is a diagram showing an example of information stored in the SW information database 12k. As shown in FIG. 18, the SW information database 12k stores the number of recognized vocabulary that can be displayed by the system as a software requirement in the system using the speech recognition apparatus 1. Note that the information shown in FIG. 18 may be set in advance by a user or the like, or may be automatically changed by the voice recognition device 1 or the like based on the requirements of the software.

17 is input with the recognition vocabulary and the priority of the recognition vocabulary from the recognition vocabulary update unit 12f. Here, the priority of the recognized vocabulary is the priority obtained for the display vocabulary that has become the recognized vocabulary. When the number of recognized vocabulary input from the recognized vocabulary updating unit 12f is equal to or less than the displayable number stored in the SW information database 12k, the SW restriction reference updating unit 12m outputs it as it is.

On the other hand, when the number of recognized vocabulary input from the recognized vocabulary updating unit 12f exceeds the displayable number stored in the SW information database 12k, the SW restriction reference updating unit 12m sets the priority of each recognized vocabulary. Lower by one. As a result, the SW restriction reference updating unit 12m can set the priority of some recognized vocabulary to “low”. After the priority is changed, the SW restriction reference update unit 12m performs the same operation as the recognition vocabulary update unit 12f using the information in the determination information database 12e, so that the priority is changed from the recognized vocabulary after the priority change to “ Select a recognition vocabulary that is “medium”. The SW restriction reference updating unit 12m selects recognition vocabulary that is less than or equal to the displayable number by appropriately changing the priority as described above.

<Operation>
FIG. 19 is a flowchart showing the operation of the speech recognition apparatus 1 according to the fifth embodiment. From Steps S31 to S33, operations similar to Steps S1 to S3 in FIG. 6 are performed.

In step S34, the recognized vocabulary update unit 12f selects a recognized vocabulary from a plurality of display vocabularies based on the priority from the priority calculation unit 12d while referring to the determination information database 12e. Then, the recognized vocabulary update unit 12f outputs the selected recognized vocabulary and the priority of the recognized vocabulary to the SW restriction reference update unit 12m. The recognized vocabulary update unit 12f deletes display vocabulary other than the selected recognized vocabulary from the display vocabulary database 12a.

In step S35, the SW restriction reference updating unit 12m selects and selects a recognition vocabulary having a displayable number or less based on the recognition vocabulary and priority from the recognition vocabulary update unit 12f while referring to the SW information database 12k. The recognized vocabulary is output to a display device (not shown). At this time, the SW restriction reference update unit 12m may delete the display vocabulary that has not been output from the display vocabulary database 12a by performing deletion similar to the deletion performed by the recognized vocabulary update unit 12f. Thereafter, the operation of FIG. 19 ends.

<Summary of Embodiment 5>
According to the speech recognition device 1 according to the fifth embodiment as described above, one of the plurality of display vocabularies is selected based on the software requirements in the system using the speech recognition device 1 and the priority of the plurality of display vocabularies. Select the above recognition vocabulary. According to such a configuration, it is possible to realize the speech recognition apparatus 1 that can automatically satisfy the requirements of the software.

<Embodiment 6>
FIG. 20 is a block diagram showing a configuration of speech recognition apparatus 1 according to Embodiment 6 of the present invention. Hereinafter, among the constituent elements described in the sixth embodiment, the same or similar constituent elements as those in the second embodiment are denoted by the same reference numerals, and different constituent elements will be mainly described.

The recognition vocabulary selection unit 12 of FIG. 20 includes an HW (hardware) information database 12n and an HW restriction reference update unit 12o in addition to the block configuration (FIG. 2) of the recognition vocabulary selection unit 12 according to the second embodiment. Prepare. The recognition vocabulary selection unit 12 configured as described above selects one or more recognition vocabularies from a plurality of display vocabularies based on hardware requirements in the system using the speech recognition apparatus 1 and the priorities of the plurality of display vocabularies. Configured to select. This will be described in detail below.

FIG. 21 is a diagram showing an example of information stored in the HW information database 12n. As shown in FIG. 21, in the HW information database 12n, the number of display vocabularies that can be stored in the future by a memory (not shown) of the system is stored as a hardware requirement in the system using the speech recognition apparatus 1. Note that the information shown in FIG. 21 may be set in advance by a user or the like, or may be automatically changed by the voice recognition apparatus 1 or the like based on the hardware requirements.

20, the recognition vocabulary and the priority of the recognition vocabulary are input to the HW restriction reference update unit 12o from the recognition vocabulary update unit 12f. When the number of recognized vocabulary input from the recognized vocabulary update unit 12f is equal to or less than the storable number stored in the HW information database 12n, the HW restriction reference update unit 12o outputs the same as it is.

On the other hand, when the number of recognized vocabulary input from the recognized vocabulary updating unit 12f exceeds the storable number stored in the HW information database 12n, the HW restriction reference updating unit 12o sets the priority of each recognized vocabulary. Lower by one. As a result, the HW restriction reference updating unit 12o can set the priority of some recognized vocabularies to “low”. After changing the priority, the HW restriction reference update unit 12o performs the same operation as the recognized vocabulary update unit 12f using the information in the determination information database 12e, so that the priority is changed from the recognized vocabulary after the priority change to “ Select a recognition vocabulary that is “medium”. The HW restriction reference update unit 12o selects recognition vocabulary that is less than or equal to the storable number by appropriately changing the priority as described above.

<Operation>
FIG. 22 is a flowchart showing the operation of the speech recognition apparatus 1 according to the sixth embodiment. From Steps S41 to S43, the same operation as Steps S1 to S3 in FIG. 6 is performed.

In step S44, the recognized vocabulary update unit 12f selects a recognized vocabulary from a plurality of display vocabularies based on the priority from the priority calculation unit 12d while referring to the determination information database 12e. Then, the recognized vocabulary update unit 12f outputs the selected recognized vocabulary and the priority of the recognized vocabulary to the HW restriction reference update unit 12o. The recognized vocabulary update unit 12f deletes display vocabulary other than the selected recognized vocabulary from the display vocabulary database 12a.

In step S45, the HW restriction reference updating unit 12o refers to the HW information database 12n, selects and selects recognition vocabulary less than the storable number based on the recognition vocabulary and priority from the recognition vocabulary update unit 12f. The recognized vocabulary is output to a display device (not shown). At this time, the HW restriction reference update unit 12o may delete the display vocabulary that has not been output from the display vocabulary database 12a by performing deletion similar to the deletion performed by the recognized vocabulary update unit 12f. Thereafter, the operation of FIG. 22 ends.

<Summary of Embodiment 6>
According to the speech recognition apparatus 1 according to the sixth embodiment as described above, based on the hardware requirements in the system using the speech recognition apparatus 1 and the priorities of the plurality of display vocabularies, the plurality of display vocabularies are used. Select one or more recognition vocabularies. According to such a configuration, the speech recognition apparatus 1 that can automatically satisfy the hardware requirements can be realized.

<Other variations>
The speech recognition unit 11 and the recognition vocabulary selection unit 12 in the speech recognition apparatus 1 described above are hereinafter referred to as “speech recognition unit 11 etc.”. The voice recognition unit 11 and the like are realized by a processing circuit 81 shown in FIG. That is, the processing circuit 81 recognizes the input speech, and when the recognition result including the body vocabulary which is a predetermined vocabulary is obtained by the recognition of the speech recognition unit 11, respectively. Includes a main vocabulary, obtains a plurality of candidate vocabulary previously associated with the main vocabulary, obtains a priority for each candidate vocabulary, and based on the obtained priority, obtains one or more candidates from the plurality of candidate vocabularies A recognition vocabulary selection unit 12 that selects a vocabulary as one or more recognition vocabularies. Dedicated hardware may be applied to the processing circuit 81, or a processor that executes a program stored in the memory may be applied. The processor corresponds to, for example, a central processing unit, a processing unit, an arithmetic unit, a microprocessor, a microcomputer, a DSP (Digital Signal Processor) and the like.

When the processing circuit 81 is dedicated hardware, the processing circuit 81 includes, for example, a single circuit, a composite circuit, a programmed processor, a parallel programmed processor, an ASIC (Application Specific Integrated Circuit), an FPGA (Field Programmable Gate). Array) or a combination thereof. Each function of each unit such as the speech recognition unit 11 may be realized by a circuit in which processing circuits are distributed, or the function of each unit may be realized by a single processing circuit.

When the processing circuit 81 is a processor, the functions of the voice recognition unit 11 and the like are realized by a combination with software or the like. Note that the software or the like corresponds to, for example, software, firmware, or software and firmware. Software or the like is described as a program and stored in a memory. As shown in FIG. 24, the processor 82 applied to the processing circuit 81 reads out and executes the program stored in the memory 83, thereby realizing the functions of the respective units. That is, when the speech recognition apparatus 1 is executed by the processing circuit 81, the step of recognizing the input speech and the recognition result including the main vocabulary which is a predetermined vocabulary are obtained by the recognition. , Each including a body vocabulary, obtaining a plurality of candidate vocabulary previously associated with the body vocabulary, obtaining a priority for each candidate vocabulary, and based on the obtained priority, one or more from the plurality of candidate vocabularies And selecting a candidate vocabulary as one or more recognition vocabularies, and a memory 83 for storing a program to be executed as a result. In other words, it can be said that this program causes the computer to execute procedures and methods such as the speech recognition unit 11. Here, the memory 83 is, for example, non-volatile such as RAM (Random Access Memory), ROM (Read Only Memory), flash memory, EPROM (Erasable Programmable Read Only Memory), EEPROM (Electrically Erasable Programmable Read Only Memory), etc. In addition, all storage media such as volatile semiconductor memory, HDD (Hard Disk Drive), magnetic disk, flexible disk, optical disk, compact disk, mini disk, DVD (Digital Versatile Disk) and its drive device are applicable.

As described above, the configuration in which each function of the voice recognition unit 11 and the like is realized by either hardware or software has been described. However, the present invention is not limited to this, and a configuration in which a part of the voice recognition unit 11 or the like is realized by dedicated hardware and another part is realized by software or the like. For example, the function of the speech recognition unit 11 is realized by a processing circuit as dedicated hardware, and the processing circuit 81 as the processor 82 reads and executes the program stored in the memory 83 for the other functions. A function can be realized.

As described above, the processing circuit 81 can realize the above functions by hardware, software, or the like, or a combination thereof.

The voice recognition device described above includes a navigation device such as PND (Portable Navigation Device), a communication terminal including a mobile terminal such as a mobile phone, a smartphone, and a tablet, a function of an application installed in these, a server Can be applied to a speech recognition system constructed as a system by appropriately combining the above. In this case, each function or each component of the speech recognition apparatus described above may be distributed and arranged in each device that constructs the system, or may be concentrated on any device. .

FIG. 25 is a block diagram showing a configuration of the server 51 according to this modification. The server 51 of FIG. 25 includes a communication unit 51a, a voice recognition unit 51b, and a recognition vocabulary selection unit 51c, and can perform wireless communication with the navigation device 53 of the vehicle 52.

The communication unit 51a receives the voice data acquired by the navigation device 53 by performing wireless communication with the navigation device 53.

The speech recognition unit 51b and the recognition vocabulary selection unit 51c are configured such that the processor (not shown) of the server 51 executes a program stored in a storage device (not shown) of the server 51, so that the speech recognition unit 11 and the recognition vocabulary selection of FIG. It has the same function as the unit 12. That is, the voice recognition unit 51b recognizes the voice data of the communication unit 51a. The recognition vocabulary selection unit 51c acquires a plurality of display vocabularies and priorities of the plurality of display vocabularies based on the recognition result of the speech recognition unit 51b, and selects the recognition vocabulary based on the priorities of the plurality of display vocabularies. To do. Then, the communication unit 51a transmits the recognized vocabulary selected by the recognized vocabulary selecting unit 51c to the navigation device 53.

According to the server 51 configured in this way, for example, even if the navigation device 53 has only a display function and a communication function with the server 51, it is the same as the voice recognition device 1 described in the first embodiment. The effect of can be obtained.

FIG. 26 is a block diagram showing the configuration of the communication terminal 56 according to this modification. 26 includes a communication unit 56a, a speech recognition unit 56b, and a recognized vocabulary selection unit 56c similar to the communication unit 51a, the speech recognition unit 51b, and the recognized vocabulary selection unit 51c, and a navigation device 58 of the vehicle 57. Wireless communication is possible. As the communication terminal 56, for example, a mobile terminal such as a mobile phone, a smartphone, and a tablet carried by the driver of the vehicle 57 is applied. According to the communication terminal 56 configured in this way, for example, even if the navigation device 58 has only a display function and a communication function with the communication terminal 56, the voice recognition device 1 described in the first embodiment. The same effect can be obtained.

The present invention can be freely combined with each embodiment and each modification within the scope of the invention, or can be appropriately modified and omitted with each embodiment and each modification.

Although the present invention has been described in detail, the above description is illustrative in all aspects, and the present invention is not limited thereto. It is understood that countless variations that are not illustrated can be envisaged without departing from the scope of the present invention.

1 speech recognition device, 11 speech recognition unit, 12 recognition vocabulary selection unit.

Claims

A speech recognition unit that recognizes the input speech;
When the recognition result including the main vocabulary which is a predetermined vocabulary is obtained by the recognition of the voice recognition unit, each of the candidate vocabulary includes the main vocabulary and is associated with the main vocabulary in advance. A recognition vocabulary selection unit that obtains a priority for each candidate vocabulary and selects one or more candidate vocabulary from the plurality of candidate vocabularies as one or more recognition vocabulary based on the obtained priority A voice recognition device comprising:
The speech recognition device according to claim 1,
The plurality of candidate vocabularies are:
A speech recognition apparatus including the main vocabulary itself, and a vocabulary obtained by combining the main vocabulary with an attached vocabulary that combines the main vocabulary and details the main vocabulary.
The speech recognition device according to claim 1,
The recognition vocabulary selection unit includes:
A speech recognition device capable of excluding a plurality of candidate vocabularies other than the one or more recognized vocabularies in any of the subsequent selections when the one or more recognized vocabularies are selected.
The speech recognition device according to claim 1,
The recognition vocabulary selection unit includes:
A speech recognition device that obtains, as the priority of each candidate vocabulary, a degree of coincidence that is a degree that each candidate vocabulary matches the recognition result based on the recognition result and each candidate vocabulary.
The speech recognition device according to claim 1,
The voice recognition device is used in a vehicle,
The recognition vocabulary selection unit includes:
A speech recognition device that selects the one or more recognition vocabulary based on the vehicle information and the priority of the plurality of candidate vocabularies.
The speech recognition device according to claim 1,
The recognition vocabulary selection unit includes:
A speech recognition apparatus that selects the one or more recognition vocabulary based on a hierarchy defined in advance in the plurality of candidate vocabularies and the priority of the plurality of candidate vocabularies.
The speech recognition device according to claim 1,
The recognition vocabulary selection unit includes:
A speech recognition apparatus that selects the one or more recognition vocabularies based on software requirements in a system using the speech recognition apparatus and the priorities of the plurality of candidate vocabularies.
The speech recognition device according to claim 1,
A speech recognition apparatus that selects the one or more recognition vocabulary based on hardware requirements in a system using the speech recognition apparatus and the priority of the plurality of candidate vocabularies.
Recognize the input voice,
When a recognition result including a body vocabulary that is a predetermined vocabulary is obtained by the recognition, each of the body vocabulary includes a plurality of candidate vocabularies that are associated with the body vocabulary in advance, and A speech recognition method for obtaining a priority for each candidate vocabulary and selecting one or more candidate vocabulary from the plurality of candidate vocabularies as one or more recognition vocabulary based on the obtained priority.