CN103700368B

CN103700368B - Speech recognition method, speech recognition device and electronic equipment

Info

Publication number: CN103700368B
Application number: CN201410013478.8A
Authority: CN
Inventors: 王伟宁; 戴海生; 宫玉强
Original assignee: Lenovo Beijing Ltd
Current assignee: Lenovo Beijing Ltd
Priority date: 2014-01-13
Filing date: 2014-01-13
Publication date: 2017-01-18
Anticipated expiration: 2034-01-13
Also published as: CN103700368A

Abstract

The invention provides a speech recognition method, a speech recognition device and electronic equipment. The method comprises the steps: receiving a speech input to obtain an audio signal corresponding to the speech input; recognizing the audio signal to obtain a recognition result by utilizing a first speech recognition device, wherein the recognition result comprises a recognition content and a confidence, and the confidence degree is used for determining the reliability degree of the recognition content; presetting at least two confidence thresholds which are different from each other; selecting one confidence threshold from the at least two confidence thresholds; and on the basis of the confidence in the recognition result and the selected confidence threshold, judging whether the recognition content is accurate. According to the technical scheme disclosed by the embodiment of the invention, the recognition rate and robustness of the speech recognition can be considered under different situations by adopting different confidence thresholds, and thus the user experience is improved.

Description

Method for voice recognition, speech recognition equipment and electronic equipment

Technical field

The present invention relates to areas of information technology, more particularly, to a kind of method for voice recognition, speech recognition dress Put and electronic equipment.

Background technology

Speech recognition technology is, by identifying and understand, voice is changed into the technology of corresponding text or order.In language In sound technology of identification, processed by voice is carried out with feature extraction, pattern match, model training etc., and obtain electronic equipment energy The instruction of enough responses, the text recording in the electronic device etc., thus user can be handed over electronic equipment using language Mutually.

It is usually present noise in real voice environment, and it is dry really to mix pauses, cough etc. in spoken language Disturb sound, this all affects the recognition accuracy of existing speech recognition system.In addition, if the vocabulary that user says is not in speech recognition In system territory set in advance, it is relatively easy to cause to identify mistake.Therefore, for business-like speech recognition system, Expect to refuse the voice of mistake.Correspondingly, confidence evaluation is employed to ensure that the accuracy of identified content, and refuses wrong The voice of misrecognition.

Confidence evaluation can carry out hypothesis testing to the recognition result of speech recognition equipment, by the confidence being previously set Degree threshold value the reliability of recognition result is evaluated, the mistake in positioning result, thus improve identifying system discrimination and Robustness.Therefore, reasonably setting confidence threshold value is non-the normally off key, and this has become as current technical barrier.

Content of the invention

Embodiments provide a kind of method for voice recognition, speech recognition equipment and electronic equipment, it makes Obtain and can adopt different confidence threshold value in different situations to take into account discrimination and the robustness of speech recognition, thus carrying The high experience of user.

A kind of first aspect, there is provided method for voice recognition, is applied to one and includes the first speech recognition equipment Electronic equipment, methods described mays include: reception one phonetic entry, and obtains audio signal corresponding with this phonetic entry；Using Described first speech recognition equipment is identified to described audio signal processing and obtains a recognition result, and this recognition result includes Identification content and confidence level, this confidence level is used for determining the degree of reliability of this identification content；Pre-set at least two confidence levels Threshold value, each confidence threshold value is different from each other；Select a confidence threshold value from described at least two confidence threshold value；Based on institute State the confidence level in recognition result and whether the confidence threshold value of described selection is accurate to judge described identification content.

In described method for voice recognition, described at least two confidence threshold value that pre-set may include: basis At least one of the identification content that described first speech recognition equipment is capable of identify that and its network condition are pre-setting at least Two confidence threshold value.

In described method for voice recognition, content that described first speech recognition equipment is capable of identify that may include many Individual order word, at least in the described identification content being capable of identify that according to described first speech recognition equipment and its network condition Individual come to pre-set at least two confidence threshold value may include: in the plurality of order word first order word setting first Confidence threshold value；Second confidence threshold value is set for the second order word in the plurality of order word, this second order word is not It is same as described first order word.

In described method for voice recognition, the identification content that is capable of identify that according to described first speech recognition equipment To pre-set at least two confidence threshold value with least one of its network condition and to may include: and to know for described first voice Other device has a case that network connection arranges the 3rd confidence threshold value；For described first speech recognition equipment, there is no net Network connect situation and the 4th confidence threshold value is set.

In described method for voice recognition, described from described at least two confidence threshold value, select a confidence level Threshold value mays include: whether the identification content determining in described recognition result corresponds to described second order word；When in described identification When holding corresponding to described second order word, select the second confidence threshold value；When described identification content does not correspond to described second life When making word, determine whether described first speech recognition equipment has network connection；When described first speech recognition equipment has net When network connects, select the 3rd confidence threshold value；When described first speech recognition equipment does not have network connection, the 4th is selected to put Confidence threshold.

In described method for voice recognition, described based on the confidence level in described recognition result and described selection Confidence threshold value come to judge described identification content whether accurately may include: by the confidence level in described recognition result with selected Second confidence threshold value or selected 3rd confidence threshold value compare, and obtain a comparative result；Compare knot according to described Fruit judges whether described identification content is accurate.

In described method for voice recognition, may also include that when judging that described identification content is inaccurate, will be described Audio signal is sent to the second speech recognition equipment with described electronic equipment network connection, and this second speech recognition equipment can Described audio signal is identified process and obtains the second identification content；Receive described the from described second speech recognition equipment Two identification contents, and using this second identification content as final identification content.

In described method for voice recognition, may also include that to be sent to described audio signal and set with described electronics Second speech recognition equipment of standby network connection, this second speech recognition equipment can be identified to described audio signal processing And obtain the second identification content；When judging that in described judgement operation described identification content is inaccurate, in a preset time period In from described second speech recognition equipment receive described second identification content.

In described method for voice recognition, may also include that described when not receiving in described preset time period During the second identification content, obtain the low confidence threshold less than selected confidence threshold value；With based on this low confidence threshold Value judges whether described identification content is accurate.

A kind of second aspect, there is provided speech recognition equipment, is applied to an electronic equipment, and this speech recognition equipment can wrap Include: audio input unit, for receiving a phonetic entry, and obtain audio signal corresponding with this phonetic entry；Recognition unit, Obtain a recognition result for being identified to described audio signal processing, this recognition result includes identifying content and confidence Degree, this confidence level is used for determining the degree of reliability of this identification content；Threshold setting unit, is used for pre-setting at least two confidences Degree threshold value, each confidence threshold value is different from each other；Threshold value acquiring unit, for selecting from described at least two confidence threshold value One confidence threshold value；Judging unit, for the confidence threshold value based on the confidence level in described recognition result and described selection Lai Judge whether described identification content is accurate.

In described speech recognition equipment, the identification that described threshold setting unit can be capable of identify that according to described recognition unit At least one of content and its network condition are pre-setting at least two confidence threshold value.

In described speech recognition equipment, the content that described speech recognition equipment is capable of identify that may include multiple order words, Described threshold setting unit can pre-set at least two confidence threshold value as follows: for first in the plurality of order word Order word arranges the first confidence threshold value；For the second order word in the plurality of order word, the second confidence threshold value is set, This second order word is different from described first order word.

In described speech recognition equipment, described threshold setting unit can pre-set at least two confidence level thresholds as follows Value: have a case that network connection arranges the 3rd confidence threshold value for described speech recognition equipment；Know for described voice Other device does not have a case that network connection arranges the 4th confidence threshold value.

In described speech recognition equipment, described threshold value acquiring unit mays include: determination part, for determining described identification Whether the identification content in result corresponds to described second order word, and does not correspond to described second life in described identification content When making word, determine whether described first speech recognition equipment has network connection；Alternative pack, for true in described determination part When fixed described identification content corresponds to described second order word, select the second confidence threshold value, determine institute in described determination part When stating speech recognition equipment there is network connection, select the 3rd confidence threshold value, determine that described voice is known in described determination part When other device does not have network connection, select the 4th confidence threshold value.

In described speech recognition equipment, described judging unit can judge whether described identification content is accurate as follows: will Confidence level in described recognition result compared with selected second confidence threshold value or selected 3rd confidence threshold value, And obtain a comparative result；Judge whether described identification content is accurate according to described comparative result.

In described speech recognition equipment, may also include that transmitting element, for judging described identification when described judging unit When content is inaccurate, described audio signal is sent to another speech recognition dress with described speech recognition equipment network connection Put, this another speech recognition equipment can be identified to described audio signal processing and obtain the second identification content；Receive single Unit, for receiving described second identification content from described another speech recognition equipment, and using this second identification content as final Identification content.

In described speech recognition equipment, may also include that transmitting element, for by described audio signal be sent to described Another speech recognition equipment of electronic equipment network connection, this another speech recognition equipment can be known to described audio signal Other places are managed and are obtained the second identification content；Receiving unit, for judging to judge in operation that described identification content is inaccurate described When, receive described second identification content from described another speech recognition equipment in a preset time period, by this second identification Content is as final identification content.

In described speech recognition equipment, if described receiving unit does not receive described in described preset time period Two identification contents, described threshold value acquiring unit can obtain the low confidence threshold less than selected confidence threshold value, described Whether judging unit is based on this low confidence threshold accurate to judge described identification content.

The third aspect, there is provided a kind of electronic equipment, including speech recognition equipment as above.

Skill in above-mentioned method for voice recognition, speech recognition equipment and electronic equipment according to embodiments of the present invention In art scheme, by pre-setting multiple confidence threshold value and therefrom selecting confidence threshold value to judge to identify the accurate of content Property, allow to changeably adopt confidence threshold value to judge to identify content to take into account discrimination and the robustness of speech recognition, from And improve the experience of user.

Brief description

In order to be illustrated more clearly that the technical scheme of the embodiment of the present invention, below will be in embodiment or description of the prior art The accompanying drawing of required use be briefly described it should be apparent that, drawings in the following description be only the present invention some are real Apply example, for those of ordinary skill in the art, on the premise of not paying creative work, can also be according to these accompanying drawings Obtain other accompanying drawings.

Fig. 1 be a diagram that the Organization Chart of the according to embodiments of the present invention device carrying out speech recognition；

Fig. 2 is the flow chart schematically illustrating method for voice recognition according to embodiments of the present invention；

Fig. 3 is the confidence level threshold schematically illustrating in described method for voice recognition according to embodiments of the present invention The flow chart of value setting；

Fig. 4 is the selection confidence schematically illustrating in described method for voice recognition according to embodiments of the present invention The flow chart of degree threshold value；

Fig. 5 is the flow chart schematically illustrating method for voice recognition according to another embodiment of the present invention；

Fig. 6 is the block diagram schematically illustrating speech recognition equipment according to embodiments of the present invention；

Fig. 7 is the block diagram schematically illustrating speech recognition equipment according to another embodiment of the present invention.

Specific embodiment

Below in conjunction with the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is carried out clear, complete Site preparation description is it is clear that described embodiment a part of embodiment that is the present invention, rather than whole embodiments, is not conflicting In the case of, the embodiment in the application and the feature in embodiment can mutually be combined.

Fig. 1 be a diagram that the Organization Chart of the communication of each device carrying out speech recognition.

As shown in figure 1, the first speech recognition equipment 10 receives voice from user, then the voice being received is known Not if it is possible to successfully be identified to the voice being received, then identified content accordingly；Failing to it is successfully right The voice being received is identified, then cannot be identified content.This first speech recognition equipment 10 can be single voice Identifying device is it is also possible to be integrated in the electronic equipments such as mobile phone, notebook, tablet PC.

Using current network interconnection technology, described first speech recognition equipment 10 is also possible to for example via network and second Speech recognition equipment 20 connects, this second speech recognition equipment 20 generally can be utilized powerful Internet resources and realize more accurate Speech recognition, it is possible that sharing voice identification result with described first speech recognition equipment 10.This second speech recognition fills Putting 20 can be single speech recognition equipment it is also possible to be integrated in other electronic equipments, for example, be integrated in network service In the electronic equipments such as device, notebook.The voice transfer of described reception can be given second by the first speech recognition equipment 10 Speech recognition equipment 20, and receive identified content from the second speech recognition equipment 20.

Each speech recognition equipment shown in Fig. 1 is only schematically.First speech recognition equipment 10 and the second voice Identifying device 20 is in the status of equity.For example, the second speech recognition equipment 20 can receive voice, by the voice of described reception Send the first speech recognition equipment 10 to, and receive identified content from the first speech recognition equipment 10.

In each embodiment according to the present invention, (the such as first voice knowledge will be described in individual voice identifying device In other device 10) carry out the scheme of speech recognition, and different speech recognition equipments shares voice identification result, to take into account language The discrimination of sound identification and robustness, thus improve the experience of user.

Fig. 2 is the flow chart schematically illustrating method for voice recognition 200 according to embodiments of the present invention.This use Method 200 in speech recognition can be applicable to speech recognition equipment as shown in Figure 1 or includes described speech recognition equipment In electronic equipment.

As shown in Fig. 2 this method for voice recognition 200 mays include: reception one phonetic entry, and obtain and this voice Input corresponding audio signal (s210)；Using described first speech recognition equipment described audio signal is identified process and Obtain a recognition result, this recognition result includes identifying content and confidence level, what this confidence level was used for determining this identification content can By degree (s220)；Pre-set at least two confidence threshold value, each confidence threshold value (s230) different from each other；From described to A confidence threshold value (s240) is selected in few two confidence threshold value；Based on the confidence level in described recognition result and described selection Confidence threshold value whether accurately (s250) judging described identification content.

In s210, the recording device of such as microphone, phonographic recorder etc. is can be utilized to receive phonetic entry, described recording device Received speech is converted into electronic signal, i.e. audio signal corresponding with described phonetic entry, thus being identified.Received Voice can be expressed with the sound that various language (such as Chinese, English, German etc.) send or hybrid language Sound, for example, be mixed with English word in Chinese.The concrete mode sending mode and receiving voice of the voice being received It is not construed as limiting the invention.

In s220, any speech recognition technology that described first speech recognition equipment can be occurred using existing future Described audio signal to be identified process and obtain a recognition result, described recognition result includes identifying content and confidence Degree, this confidence level is used for determining the degree of reliability of this identification content.In mode as a example the speech recognition of matching way, in training rank Section, each word in vocabulary is given an account of by user successively, and its characteristic vector is stored in ATL as template；Then, In cognitive phase, from raw tone (i.e. above-mentioned audio signal) extract characteristic vector, and by input voice characteristic vector according to Each template secondary and in ATL carries out similarity-rough set, will be defeated as recognition result for similarity (i.e. confidence level) soprano Go out.

In practice, it may be difficult to carry out voice exactly, this is because following reason, for example, speech pattern is not only to different Speaker is different, or even is also different to same speaker, and such as speaker is when arbitrarily speaking and conscientiously speaking Voice messaging is different；Voice has ambiguity in itself and is affected to change stress, tone, volume by context With the rate of articulation etc.；Ambient noise and interference have a strong impact on to speech recognition.Therefore, for same phonetic entry, in difference The confidence level in recognition result under environment or background also changes very big.

To judge to identify content whether accurately situation arranging single confidence threshold value, if the setting of this confidence threshold value Height then may lead to obtain identification content (recognition failures) probability too big, if the setting of this confidence threshold value is low, The more identification content in recognition result may be led to inaccurate.For example, if phonetic entry is the sound of hybrid language expression, In Chinese, for example it is mixed with " the opening filefox " of English word, then the confidence level in recognition result is generally relatively low, now such as Fruit then may lead to recognition failures using common confidence threshold value.

In s230, pre-set at least two confidence threshold value, each confidence threshold value is different from each other.With respect to only setting Put a confidence threshold value to judge whether accurately to identify content, embodiments of the invention pre-set at least two confidence level thresholds Value, and judged thereafter according to the different confidence threshold value of different situation selections.As an example, can be according to described At least one of the identification content that one speech recognition equipment is capable of identify that and its network condition are put pre-setting at least two Confidence threshold.

Fig. 3 is the confidence level threshold schematically illustrating in described method for voice recognition according to embodiments of the present invention The flow chart of value setting 230.As shown in figure 3, the content being capable of identify that in described first speech recognition equipment includes multiple orders In the case of word, the first confidence threshold value (s231) can be set for the first order word in the plurality of order word；For The second order word in the plurality of order word arranges the second confidence threshold value, and this second order word is different from described first order Word (s232)；Have a case that network connection arranges the 3rd confidence threshold value for described first speech recognition equipment (s233)；Do not have a case that network connection arranges the 4th confidence threshold value for described first speech recognition equipment (s234).

In s231 and s232, different confidence threshold value are set for different order words.For example, if the first language Sound identifying device identifies that the accuracy rate of Chinese speech is high, then can be directed to the higher confidence threshold value of the order word setting of Chinese； If the first speech recognition equipment identifies that the accuracy rate of English Phonetics is low, the order word that can be directed to English arranges relatively low putting Confidence threshold.Additionally, in s230, can also be for the other confidence threshold value of the 3rd order word setting, based on the setting of order word The number of confidence threshold value do not constitute the restriction to the embodiment of the present invention.Described first order word can be a specific life The class order word making word or including multiple order words, e.g. multiple Chinese order words.Described second order word can To be a specific order word or to include a class order word of multiple order words, for example, can be directed to indigestion Order word " filefox " and a special confidence threshold value is set.

In s233 and s234, whether there is network connection to arrange different putting for described first speech recognition equipment Confidence threshold, described 3rd confidence threshold value can be higher than described 4th confidence threshold value.When the first speech recognition equipment has net When network connects, if the first speech recognition equipment utilizes the 3rd confidence threshold value and recognition failures, network connection can be asked The second speech recognition equipment phonetic entry is carried out with speech recognition, and by the identification content acquired in the second speech recognition equipment As final identification content, such that it is able to there is higher discrimination in the case of ensureing higher recognition accuracy.However, such as Really the first speech recognition equipment does not have network connection, then suitably reduce confidence threshold value, thus ensureing for a user more Important discrimination.

Suitable confidence threshold value setting steps can be taken as needed, for example, it is possible to only with above-mentioned S231 and s232, or only with above-mentioned s233 and s234.Other confidence level thresholds can also be taken under other scenes Value setting steps.Although additionally, in FIG s230 is illustrated as after described s220, can be (i.e. pre- before s210 First) execute this s230 and each confidence threshold value is set.

In s240, can be according to the current scene of the first speech recognition equipment come from described at least two confidence threshold value Middle selection one confidence threshold value, for example can be according to the net of identification content corresponding with phonetic entry and the first speech recognition equipment Network connection state is selecting confidence threshold value.The foundation of selection can be adjusted as required by practice.

Fig. 4 is the selection confidence schematically illustrating in described method for voice recognition according to embodiments of the present invention The flow chart of degree threshold value.Carry out exemplary description with reference to Fig. 4.

As shown in figure 4, after being identified result in s220, determining whether the identification content in described recognition result is right Second order word (s241) described in Ying Yu；When described identification content corresponds to described second order word (being in s241), choosing Select the second confidence threshold value (s242)；When described identification content does not correspond to described second order word (no in s241), really Whether fixed described first speech recognition equipment has network connection (s243)；When described first speech recognition equipment has network even When connecing (being in s243), select the 3rd confidence threshold value；When described first speech recognition equipment does not have network connection (no in s243), selects the 4th confidence threshold value.

In the example of fig. 4, to select confidence level threshold in conjunction with two different factors (i.e. identification content and network connection) Value.In practice, confidence threshold value can be selected according only to identification content, then when described identification content does not correspond to described the The confidence threshold value of an acquiescence during two order words, can be selected, or it may also be determined that whether described identification content corresponds to institute State the first order word, when described identification content corresponds to described 3rd order word, select other confidence threshold value.In a word, Consider that current speech recognition scene and both setting bases of each confidence threshold value to select confidence threshold value.

In s250, confidence threshold value based on the confidence level in described recognition result and described selection is judging described knowledge Whether other content is accurate.As an example, can be by the confidence level in described recognition result and selected second confidence threshold value Or selected 3rd confidence threshold value compares, and obtain a comparative result；Described identification is judged according to described comparative result Whether content is accurate.For example, when the confidence level in described recognition result is more than or equal to selected confidence threshold value, judge to know Identification content in other result is accurate, thus using the identification content in recognition result as final identification content；When described knowledge When confidence level in other result is less than selected confidence threshold value, judge that the identification content in recognition result is inaccurate, thus Recognition failures.

In the technical scheme of above-mentioned method for voice recognition according to embodiments of the present invention, many by pre-setting Individual confidence threshold value simultaneously therefrom selects confidence threshold value to judge to identify the accuracy of content, allows to changeably adopt confidence level Threshold value come to judge identify content, to take into account discrimination and the robustness of speech recognition, thus improve the experience of user.

In above-mentioned method for voice recognition, carry out speech recognition using the first speech recognition equipment.As knot Close described by Fig. 1, the first speech recognition equipment can also share speech recognition with the second speech recognition equipment of network connection As a result, it is described below in conjunction with Fig. 5.

Fig. 5 is the flow chart schematically illustrating method for voice recognition 500 according to another embodiment of the present invention. This method for voice recognition 500 also includes step s210- in method for voice recognition 200 described above S250, from unlike method for voice recognition 200, after recognition failures in s250, also includes the steps S251-s254.

When the identification content in judging described recognition result in s250 is inaccurate, by described audio signal be sent to Second speech recognition equipment (for example, the second speech recognition equipment 20 in Fig. 1) of described electronic equipment network connection, this second Speech recognition equipment can be identified to described audio signal processing and obtain the second identification content (s251)；And wait from institute State the second speech recognition equipment and receive described second identification content (s252), if receiving from the second speech recognition equipment described Second identification content (being in s252), then terminate this second identification content as final identification content；Failing to Receive described second identification content (no s252) from the second speech recognition equipment, then obtain and be less than selected confidence level One low confidence threshold (s253) of threshold value；Whether accurately (s254) to judge described identification content with based on this low confidence threshold To terminate to identify.

In the example of hgure 5, when the identification content in judging described recognition result in s250 is inaccurate, by described sound Frequency signal is sent to the second speech recognition equipment (s251) with described electronic equipment network connection.But it is not limited to this, also may be used After obtaining audio signal in s210, immediately described audio signal is sent to the with described electronic equipment network connection Two speech recognition equipments (s252), thus when judging that in described s250 described identification content is inaccurate, can be as early as possible from institute State the second speech recognition equipment and receive described second identification content.

S252 wait when described second speech recognition equipment receives described second identification content, if network is gathered around Block up or interrupt, then may lead to receive described second identification content, if now the stand-by period is long, can pole The earth reduces the experience of user.Therefore, it can arrange a stand-by period (such as preset time period) in s252, thus such as Fruit does not receive described second identification content in this preset time period, is just no longer waiting for receiving.

From the second speech recognition equipment be not received by described second identification content (no in s252) when, in order to Family provides identification content, can again investigate the recognition result in the first speech recognition equipment, to strive for improving discrimination.As Fruit user is very high to the accuracy requirement of identification, then need not execute this ss253 and s254 and directly terminate to identify.In s253, Can by the confidence threshold value selecting in s240 is deducted a predetermined value and obtain described low confidence threshold, can also to Carry out reselecting to obtain described low confidence threshold among the confidence threshold value of setting in s230.

Judgement operation in s254 is similar with s250, judges whether described identification content is accurate based on this low confidence threshold (s254) to terminate to identify.For example, it is possible to the confidence level in described recognition result is compared with described low confidence threshold, when When confidence level in described recognition result is more than or equal to described low confidence threshold, judge that the identification content in recognition result is accurate Really, thus using the identification content in recognition result as final identification content；When the confidence level in described recognition result is less than During described low confidence threshold, judge that the identification content in recognition result is inaccurate, i.e. recognition failures.

Therefore, because network timeout, server be busy etc., reason cannot obtain in time using the second speech recognition equipment During the Network Recognition result carrying out, by reducing confidence threshold value, reuse the local result generation of the first speech recognition equipment For feedbacks such as the busy, network timeouts of server, so that user can obtain under conditions of network server inclement condition Recognition result, lifts Consumer's Experience.If directly selecting described low confidence threshold in s240, can lead to good in network condition Under conditions of good, using substantial amounts of in the less reliable recognition result carrying out locally with the first speech recognition equipment.Pass through In s240 and s253, confidence threshold value is set twice and avoids this problem, its ability when not obtaining web results in time Reduce confidence threshold value.

Therefore, in the technical scheme of the method for voice recognition 500 describing with reference to Fig. 5, can be flexible further Ground to judge using confidence threshold value to identify content, makes full use of the advantage of each speech recognition equipment to take into account speech recognition Discrimination and robustness, thus improve the experience of user.

Fig. 6 is the block diagram schematically illustrating speech recognition equipment 600 according to embodiments of the present invention.This speech recognition fills Put 600 to can be applicable in speech recognition equipment as shown in Figure 1 or the electronic equipment of the described speech recognition equipment of inclusion.

This speech recognition equipment 600 mays include: audio input unit 610, for receiving a phonetic entry, and obtains and is somebody's turn to do The corresponding audio signal of phonetic entry；Recognition unit 620, obtains an identification for being identified to described audio signal processing As a result, this recognition result includes identifying content and confidence level, and this confidence level is used for determining the degree of reliability of this identification content；Threshold value Arranging unit 630, is used for pre-setting at least two confidence threshold value, each confidence threshold value is different from each other；Threshold value obtains single Unit 640, for selecting a confidence threshold value from described at least two confidence threshold value；Judging unit 650, for based on described Whether the confidence threshold value of the confidence level in recognition result and described selection is accurate to judge described identification content.

Described audio input unit 610 is, for example, the recording device of microphone, phonographic recorder etc., and it receives phonetic entry, will Received speech is converted into electronic signal, i.e. audio signal corresponding with described phonetic entry, thus being identified.Received Voice can be the sound being sent with various language or the sound of hybrid language expression.The sending of the voice being received The concrete mode of mode and reception voice is not construed as limiting the invention.

Described recognition unit 620 can described audio frequency is believed using any speech recognition technology that existing future occurs Number it is identified processing and obtain a recognition result.In mode as a example the speech recognition of matching way, in the training stage, user will Each word in vocabulary is given an account of successively, and its characteristic vector is stored in ATL as template；Then, in identification rank Section, from the audio signal of phonetic entry extract characteristic vector, and by this feature vector successively with ATL in each template Carry out similarity-rough set, similarity (i.e. confidence level) soprano is exported as recognition result.

Regularly judging using single confidence threshold value to identify content whether accurately situation, if this confidence level threshold The height of value setting then may lead to the probability that can not obtain identification content (recognition failures) too big, if the setting of this confidence threshold value Low, the more identification content in recognition result may be led to inaccurate.

Described threshold setting unit 630 pre-sets at least two confidence threshold value, so that thereafter according to different situations Choose different confidence threshold value to be judged.As an example, described threshold setting unit 630 can be according to described recognition unit At least one of the identification content being capable of identify that and its network condition are pre-setting at least two confidence threshold value.Described threshold Value arranging unit 630 can arrange suitable confidence threshold value as needed it is also possible to take other under other scenes Confidence threshold value setting steps.

The content being capable of identify that in described speech recognition equipment includes multiple order words, and described threshold setting unit 630 can Different confidence threshold value are set for different order words, for example, for the first order word in the plurality of order word First confidence threshold value is set；Second confidence threshold value is set for the second order word in the plurality of order word, this second Order word is different from described first order word.Additionally, described threshold setting unit 630 can also be for the 3rd order word arranges it Its confidence threshold value.For example, if described speech recognition equipment identifies that the accuracy rate of Chinese speech is high, Chinese can be directed to The higher confidence threshold value of order word setting；If described speech recognition equipment identifies that the accuracy rate of English Phonetics is low, permissible For the relatively low confidence threshold value of the order word setting of English.Each of described first order word and the second order word can be one Individual specific order word or the class order word including multiple order words.

Described threshold setting unit 630 can also be for whether described speech recognition equipment has network connection to arrange not Same confidence threshold value, for example, described threshold setting unit 630 can have network connection for described speech recognition equipment Situation and the 3rd confidence threshold value is set；Do not have a case that network connection arranges the 4th and puts for described speech recognition equipment Confidence threshold, described 3rd confidence threshold value can be higher than described 4th confidence threshold value.When speech recognition equipment has network even When connecing, if speech recognition equipment utilizes the 3rd confidence threshold value and recognition failures, another language of network connection can be asked Sound identifying device carries out speech recognition to phonetic entry, and using the identification content acquired in another speech recognition equipment as final Identification content, such that it is able to ensure higher recognition accuracy in the case of there is higher discrimination.If however, voice is known Other device does not have network connection, then suitably reduce confidence threshold value, thus ensureing prior discrimination for a user.

Described threshold value acquiring unit 640 can be according to the current scene of speech recognition equipment come from described at least two confidences A confidence threshold value is selected in degree threshold value, for example can be according to identification content corresponding with phonetic entry and speech recognition equipment Network connection status are selecting confidence threshold value.The foundation of selection can be adjusted as required by practice.

For example, described threshold value acquiring unit 640 mays include: determination part, for determining the identification in described recognition result Whether content corresponds to described second order word, and when described identification content does not correspond to described second order word, determines Whether described first speech recognition equipment has network connection；Alternative pack, for determining described identification in described determination part When content corresponds to described second order word, select the second confidence threshold value, determine described speech recognition in described determination part When device has network connection, select the 3rd confidence threshold value, determine that described speech recognition equipment does not have in described determination part When having network connection, select the 4th confidence threshold value.

Additionally, described threshold value acquiring unit 640 can select confidence threshold value according only to identification content, when determination part When determining that described identification content does not correspond to described second order word, alternative pack can select the confidence threshold value of an acquiescence, Or determine part it may also be determined that described identification content is whether during corresponding to described first order word, the 3rd order word etc., with Select other confidence threshold value.In a word, described threshold value acquiring unit 640 will consider that current speech recognition scene is put with each Confidence threshold value is selected both the setting of confidence threshold is basic.

The confidence threshold value based on the confidence level in described recognition result and described selection for the described judging unit 650 is judging Whether described identification content is accurate.As an example, described judging unit 650 can be by the confidence level in described recognition result and institute The confidence threshold value selecting compares, and obtains a comparative result；Whether described identification content is judged according to described comparative result Accurately.When the confidence level in described recognition result is more than or equal to selected confidence threshold value, described judging unit 650 judges Identification content in recognition result is accurate, thus using the identification content in recognition result as final identification content；When described When confidence level in recognition result is less than selected confidence threshold value, described judging unit 650 judges the knowledge in recognition result Other content is inaccurate, thus recognition failures.

Alternatively, described speech recognition equipment may also include transmitting element 660 and receiving unit 670, the dotted line in such as Fig. 6 Shown in frame.For example, when described judging unit 650 judges that described identification content is inaccurate, described transmitting element 660 can be by institute State audio signal and be sent to another speech recognition equipment with described speech recognition equipment network connection, this another speech recognition dress Put described audio signal can be identified process and obtain the second identification content；Described receiving unit 670 can be from described Another speech recognition equipment receives described second identification content, and using this second identification content as final identification content.? In the example of Fig. 5, when the identification content in judging described recognition result in s250 is inaccurate, described audio signal is transmitted To another speech recognition equipment with described electronic equipment network connection.

Additionally, described transmitting element 660 can also be after described audio input unit 610 obtains audio signal, immediately Described audio signal is sent to another speech recognition equipment with described electronic equipment network connection, thus described receiving unit 660 can receive institute from described another speech recognition equipment as early as possible when judging unit 650 judges that described identification content is inaccurate State the second identification content.

If network congestion or interruption, described receiving unit 670 may be led to can not to receive described second identification Content, if now the stand-by period is long, can greatly reduce the experience of user.When therefore, it can arrange a wait Between (such as preset time period), if thus receiving unit 670 do not receive in this preset time period described second identification in Hold, described speech recognition equipment just no longer receives.Now, described threshold value acquiring unit 640 can obtain and put less than selected One low confidence threshold of confidence threshold, described judging unit 650 is based on this low confidence threshold and judges described identification content Whether accurate.

Receiving unit 670 from another speech recognition equipment be not received by described second identification content when, in order to Family provides identification content, can again investigate the recognition result in speech recognition equipment, to strive for improving discrimination.Therefore, Described threshold value acquiring unit 640 obtains low confidence threshold, and this threshold value acquiring unit 640 can be by the confidence level that will currently select Threshold value deducts a predetermined value to obtain described low confidence threshold, can also carry out among each set confidence threshold value Reselect to obtain described low confidence threshold.Subsequently, described judging unit 650 is based on this low confidence threshold to judge Whether accurate state identification content.

Therefore, because network timeout, server be busy etc., reason cannot obtain in time using another speech recognition equipment During the Network Recognition result carrying out, by reducing confidence threshold value, the local result reusing speech recognition equipment replaces clothes The feedbacks such as business device hurries, network timeout, so that user can obtain identification under conditions of network server inclement condition As a result, lift Consumer's Experience.If described threshold value acquiring unit 640 directly selects described low confidence threshold, can lead in net Under conditions of network condition is good, using substantial amounts of in the less reliable recognition result carrying out locally with speech recognition equipment. Avoid this problem by arranging confidence threshold value twice, it just reduces confidence level when not obtaining web results in time Threshold value.

In the technical scheme of above-mentioned speech recognition equipment according to embodiments of the present invention, allow to changeably adopt confidence Degree threshold value judging to identify content, and make full use of the advantage of each speech recognition equipment come to take into account speech recognition discrimination and Robustness, thus improve the experience of user.

Fig. 7 is the block diagram schematically illustrating speech recognition equipment 700 according to another embodiment of the present invention.This voice is known Other device 700 can be with other speech recognition equipment coupled in communication, and this speech recognition equipment 700 includes: memory 710, are used for Store program codes；Processor 720, for executing described program code to realize the method with reference to Fig. 2-5 description.

Memory 710 can include at least one of read-only storage and random access memory, and to processor 720 Provide instruction and data.The a part of of memory 710 can also include non-volatile row random access memory (nvram).

Processor 720 can be general processor, digital signal processor (dsp), special IC (asic), ready-made Programmable gate array (fpga) or other PLDs, discrete gate or transistor logic, discrete hardware group Part.General processor can be microprocessor or any conventional processor etc..

Step in conjunction with the method disclosed in the embodiment of the present invention can be embodied directly in and completed by computing device, or Completed with the hardware in processor and software module combination execution.Software module may be located at random access memory, flash memory, read-only deposits In the ripe storage medium in this area such as reservoir, programmable read only memory or electrically erasable programmable memory, register. This storage medium is located in memory 710, and processor 720 reads the information in memory 710, completes above-mentioned side in conjunction with its hardware The step of method.

In the case of disclosing speech recognition equipment according to embodiments of the present invention above in conjunction with Fig. 6-7, all inclusions The electronic equipment of described speech recognition equipment is also in the open scope of the embodiment of the present invention.

Those of ordinary skill in the art are it is to be appreciated that combine the list of each example of the embodiments described herein description Unit and algorithm steps, being capable of being implemented in combination in electronic hardware or computer software and electronic hardware.These functions are actually To be executed with hardware or software mode, the application-specific depending on technical scheme and design constraint.Professional and technical personnel Each specific application can be used different methods to realize described function, but this realization is it is not considered that exceed The scope of the present invention.

Those skilled in the art can be understood that, for convenience and simplicity of description, the device of foregoing description With the specific work process of unit, may be referred to the corresponding process in preceding method embodiment, will not be described here.

It should be understood that disclosed equipment and method in several embodiments provided herein, can be passed through it Its mode is realized.For example, device embodiment described above is only schematically, for example, the division of described unit, and only It is only a kind of division of logic function, actual can have other dividing mode when realizing, and for example multiple units or assembly can be tied Close or be desirably integrated into another equipment, or some features can be ignored, or do not execute.

The described unit illustrating as separating component can be or may not be physically separate, show as unit The part showing can be or may not be physical location.Can select therein some or all of according to the actual needs Unit is realizing the purpose of this embodiment scheme.

If described function realized using in the form of SFU software functional unit and as independent production marketing or use when, permissible It is stored in a computer read/write memory medium.Based on such understanding, technical scheme is substantially in other words Partly being embodied in the form of software product of part that prior art is contributed or this technical scheme, this meter Calculation machine software product is stored in a storage medium, including some instructions with so that a computer equipment (can be individual People's computer, server, or network equipment etc.) execution each embodiment methods described of the present invention all or part of step. And aforesaid storage medium includes: u disk, portable hard drive, read-only storage, random access memory, magnetic disc or CD etc. are each Planting can be with the medium of store program codes.

The above, the only specific embodiment of the present invention, but protection scope of the present invention is not limited thereto, and any Those familiar with the art the invention discloses technical scope in, change or replacement can be readily occurred in, all should contain Cover within protection scope of the present invention.Therefore, protection scope of the present invention should described be defined by scope of the claims.

Claims

1. a kind of method for voice recognition, is applied to an electronic equipment including the first speech recognition equipment, methods described Including:

Receive a phonetic entry, and obtain audio signal corresponding with this phonetic entry；

Described audio signal is identified process using described first speech recognition equipment and obtains a recognition result, this identification Result includes identifying content and confidence level, and this confidence level is used for determining the degree of reliability of this identification content；

Pre-set at least two confidence threshold value, each confidence threshold value is different from each other；

Select a confidence threshold value from described at least two confidence threshold value；

Whether the confidence threshold value based on the confidence level in described recognition result and described selection is accurate to judge described identification content Really.

2. method according to claim 1, wherein, described at least two confidence threshold value that pre-set include: according to described At least one of the identification content that one speech recognition equipment is capable of identify that and its network condition are put pre-setting at least two Confidence threshold.

3. method according to claim 2, wherein, the content that described first speech recognition equipment is capable of identify that includes multiple orders Word, at least one of the described identification content being capable of identify that according to described first speech recognition equipment and its network condition are come pre- First arrange at least two confidence threshold value to include:

For the first order word in the plurality of order word, the first confidence threshold value is set；

Second confidence threshold value is set for the second order word in the plurality of order word, this second order word is different from described First order word.

4. method according to claim 3, wherein, the described identification content being capable of identify that according to described first speech recognition equipment To pre-set at least two confidence threshold value with least one of its network condition to include:

Have a case that network connection arranges the 3rd confidence threshold value for described first speech recognition equipment；

Do not have a case that network connection arranges the 4th confidence threshold value for described first speech recognition equipment.

5. method according to claim 4, wherein, described selects a confidence threshold value from described at least two confidence threshold value Including:

Determine whether the identification content in described recognition result corresponds to described second order word；

When described identification content corresponds to described second order word, select the second confidence threshold value；

When described identification content does not correspond to described second order word, determine whether described first speech recognition equipment has net Network connects；

When described first speech recognition equipment has network connection, select the 3rd confidence threshold value；

When described first speech recognition equipment does not have network connection, select the 4th confidence threshold value.

6. method according to claim 5, wherein, the described confidence based on the confidence level in described recognition result and described selection Spend threshold value to judge whether described identification content accurately includes:

By the confidence level in described recognition result and selected second confidence threshold value or selected 3rd confidence threshold value Compare, and obtain a comparative result；

Judge whether described identification content is accurate according to described comparative result.

7. method according to claim 1, also includes:

When judging that described identification content is inaccurate, described audio signal is sent to the with described electronic equipment network connection Two speech recognition equipments, this second speech recognition equipment can be identified to described audio signal processing and obtain the second identification Content；

Receive described second identification content from described second speech recognition equipment, and using this second identification content as final knowledge Other content.

8. method according to claim 1, also includes:

Described audio signal is sent to the second speech recognition equipment with described electronic equipment network connection, this second voice is known Other device can be identified to described audio signal processing and obtain the second identification content；

When judging that in described judgement operation described identification content is inaccurate, from described second voice in a preset time period Identifying device receives described second identification content.

9. method according to claim 8, also includes:

When not receiving described second identification content in described preset time period, obtain and be less than selected confidence threshold value A low confidence threshold；With

Judge whether described identification content is accurate based on this low confidence threshold.

10. method according to claim 2, wherein, in the described identification being capable of identify that according to described first speech recognition equipment Hold and at least one of its network condition include pre-setting at least two confidence threshold value:

A kind of 11. speech recognition equipments, are applied to an electronic equipment, and this speech recognition equipment includes:

Audio input unit, for receiving a phonetic entry, and obtains audio signal corresponding with this phonetic entry；

Recognition unit, obtains a recognition result for being identified to described audio signal processing, this recognition result includes knowing Other content and confidence level, this confidence level is used for determining the degree of reliability of this identification content；

Threshold setting unit, is used for pre-setting at least two confidence threshold value, each confidence threshold value is different from each other；

Threshold value acquiring unit, for selecting a confidence threshold value from described at least two confidence threshold value；

Judging unit, judges described knowledge for the confidence threshold value based on the confidence level in described recognition result and described selection Whether other content is accurate.

12. speech recognition equipments according to claim 11, wherein, described threshold setting unit can according to described recognition unit The identification content of identification and at least one of its network condition are pre-setting at least two confidence threshold value.

13. speech recognition equipments according to claim 12, wherein, content that described speech recognition equipment is capable of identify that includes many Individual order word, described threshold setting unit pre-sets at least two confidence threshold value as follows:

14. speech recognition equipments according to claim 13, wherein, described threshold setting unit pre-sets at least two as follows Individual confidence threshold value:

Have a case that network connection arranges the 3rd confidence threshold value for described speech recognition equipment；

Do not have a case that network connection arranges the 4th confidence threshold value for described speech recognition equipment.

15. speech recognition equipments according to claim 14, wherein, described threshold value acquiring unit includes:

Determine part, for determining whether the identification content in described recognition result corresponds to described second order word, and When described identification content does not correspond to described second order word, determine whether described speech recognition equipment has network connection；

Alternative pack, for when described determination part determines that described identification content corresponds to described second order word, selecting the Two confidence threshold value, when described determination part determines that described speech recognition equipment has network connection, select the 3rd confidence level Threshold value, when described determination part determines that described speech recognition equipment does not have network connection, selects the 4th confidence threshold value.

16. speech recognition equipments according to claim 15, wherein, described judging unit judges that described identification content is as follows No accurate: by the confidence level in described recognition result and selected second confidence threshold value or selected 3rd confidence level threshold Value compares, and obtains a comparative result；Judge whether described identification content is accurate according to described comparative result.

17. speech recognition equipments according to claim 11, also include:

Transmitting element, for when described judging unit judges that described identification content is inaccurate, described audio signal being sent to With another speech recognition equipment of described speech recognition equipment network connection, this another speech recognition equipment can be to described audio frequency Signal is identified processing and obtains the second identification content；

Receiving unit, for receiving described second identification content from described another speech recognition equipment, and by this second identification Hold as final identification content.

18. speech recognition equipments according to claim 11, also include:

Transmitting element, for being sent to another speech recognition dress with described electronic equipment network connection by described audio signal Put, this another speech recognition equipment can be identified to described audio signal processing and obtain the second identification content；

Receiving unit, for described judge operation in judge described identification content inaccurate when, in a preset time period from Described another speech recognition equipment receives described second identification content, using this second identification content as in final identification Hold.

19. speech recognition equipments according to claim 18, wherein, described receiving unit does not receive in described preset time period To the described second identification content,

Described threshold value acquiring unit obtains the low confidence threshold less than selected confidence threshold value,

Whether described judging unit is based on this low confidence threshold accurate to judge described identification content.

20. speech recognition equipments according to claim 12, wherein, described threshold setting unit pre-sets at least two as follows Individual confidence threshold value:

21. a kind of electronic equipments, including the speech recognition equipment as any one of claim 11-20.