CN103700368B - Speech recognition method, speech recognition device and electronic equipment - Google Patents
Speech recognition method, speech recognition device and electronic equipment Download PDFInfo
- Publication number
- CN103700368B CN103700368B CN201410013478.8A CN201410013478A CN103700368B CN 103700368 B CN103700368 B CN 103700368B CN 201410013478 A CN201410013478 A CN 201410013478A CN 103700368 B CN103700368 B CN 103700368B
- Authority
- CN
- China
- Prior art keywords
- threshold value
- speech recognition
- confidence threshold
- confidence
- identification content
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 58
- 230000005236 sound signal Effects 0.000 claims abstract description 43
- 238000012545 processing Methods 0.000 claims description 12
- 230000000052 comparative effect Effects 0.000 claims description 11
- 230000008569 process Effects 0.000 claims description 7
- 238000005516 engineering process Methods 0.000 description 8
- 238000010586 diagram Methods 0.000 description 6
- 238000003860 storage Methods 0.000 description 6
- 235000013399 edible fruits Nutrition 0.000 description 4
- 230000006870 function Effects 0.000 description 4
- 238000012549 training Methods 0.000 description 3
- 230000008901 benefit Effects 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 238000004891 communication Methods 0.000 description 2
- 238000011156 evaluation Methods 0.000 description 2
- 230000008520 organization Effects 0.000 description 2
- 206010011224 Cough Diseases 0.000 description 1
- 230000004888 barrier function Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 230000001149 cognitive effect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 201000006549 dyspepsia Diseases 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000002360 preparation method Methods 0.000 description 1
- 238000004549 pulsed laser deposition Methods 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
Landscapes
- Telephonic Communication Services (AREA)
Abstract
The invention provides a speech recognition method, a speech recognition device and electronic equipment. The method comprises the steps: receiving a speech input to obtain an audio signal corresponding to the speech input; recognizing the audio signal to obtain a recognition result by utilizing a first speech recognition device, wherein the recognition result comprises a recognition content and a confidence, and the confidence degree is used for determining the reliability degree of the recognition content; presetting at least two confidence thresholds which are different from each other; selecting one confidence threshold from the at least two confidence thresholds; and on the basis of the confidence in the recognition result and the selected confidence threshold, judging whether the recognition content is accurate. According to the technical scheme disclosed by the embodiment of the invention, the recognition rate and robustness of the speech recognition can be considered under different situations by adopting different confidence thresholds, and thus the user experience is improved.
Description
Technical field
The present invention relates to areas of information technology, more particularly, to a kind of method for voice recognition, speech recognition dress
Put and electronic equipment.
Background technology
Speech recognition technology is, by identifying and understand, voice is changed into the technology of corresponding text or order.In language
In sound technology of identification, processed by voice is carried out with feature extraction, pattern match, model training etc., and obtain electronic equipment energy
The instruction of enough responses, the text recording in the electronic device etc., thus user can be handed over electronic equipment using language
Mutually.
It is usually present noise in real voice environment, and it is dry really to mix pauses, cough etc. in spoken language
Disturb sound, this all affects the recognition accuracy of existing speech recognition system.In addition, if the vocabulary that user says is not in speech recognition
In system territory set in advance, it is relatively easy to cause to identify mistake.Therefore, for business-like speech recognition system,
Expect to refuse the voice of mistake.Correspondingly, confidence evaluation is employed to ensure that the accuracy of identified content, and refuses wrong
The voice of misrecognition.
Confidence evaluation can carry out hypothesis testing to the recognition result of speech recognition equipment, by the confidence being previously set
Degree threshold value the reliability of recognition result is evaluated, the mistake in positioning result, thus improve identifying system discrimination and
Robustness.Therefore, reasonably setting confidence threshold value is non-the normally off key, and this has become as current technical barrier.
Content of the invention
Embodiments provide a kind of method for voice recognition, speech recognition equipment and electronic equipment, it makes
Obtain and can adopt different confidence threshold value in different situations to take into account discrimination and the robustness of speech recognition, thus carrying
The high experience of user.
A kind of first aspect, there is provided method for voice recognition, is applied to one and includes the first speech recognition equipment
Electronic equipment, methods described mays include: reception one phonetic entry, and obtains audio signal corresponding with this phonetic entry;Using
Described first speech recognition equipment is identified to described audio signal processing and obtains a recognition result, and this recognition result includes
Identification content and confidence level, this confidence level is used for determining the degree of reliability of this identification content;Pre-set at least two confidence levels
Threshold value, each confidence threshold value is different from each other;Select a confidence threshold value from described at least two confidence threshold value;Based on institute
State the confidence level in recognition result and whether the confidence threshold value of described selection is accurate to judge described identification content.
In described method for voice recognition, described at least two confidence threshold value that pre-set may include: basis
At least one of the identification content that described first speech recognition equipment is capable of identify that and its network condition are pre-setting at least
Two confidence threshold value.
In described method for voice recognition, content that described first speech recognition equipment is capable of identify that may include many
Individual order word, at least in the described identification content being capable of identify that according to described first speech recognition equipment and its network condition
Individual come to pre-set at least two confidence threshold value may include: in the plurality of order word first order word setting first
Confidence threshold value;Second confidence threshold value is set for the second order word in the plurality of order word, this second order word is not
It is same as described first order word.
In described method for voice recognition, the identification content that is capable of identify that according to described first speech recognition equipment
To pre-set at least two confidence threshold value with least one of its network condition and to may include: and to know for described first voice
Other device has a case that network connection arranges the 3rd confidence threshold value;For described first speech recognition equipment, there is no net
Network connect situation and the 4th confidence threshold value is set.
In described method for voice recognition, described from described at least two confidence threshold value, select a confidence level
Threshold value mays include: whether the identification content determining in described recognition result corresponds to described second order word;When in described identification
When holding corresponding to described second order word, select the second confidence threshold value;When described identification content does not correspond to described second life
When making word, determine whether described first speech recognition equipment has network connection;When described first speech recognition equipment has net
When network connects, select the 3rd confidence threshold value;When described first speech recognition equipment does not have network connection, the 4th is selected to put
Confidence threshold.
In described method for voice recognition, described based on the confidence level in described recognition result and described selection
Confidence threshold value come to judge described identification content whether accurately may include: by the confidence level in described recognition result with selected
Second confidence threshold value or selected 3rd confidence threshold value compare, and obtain a comparative result;Compare knot according to described
Fruit judges whether described identification content is accurate.
In described method for voice recognition, may also include that when judging that described identification content is inaccurate, will be described
Audio signal is sent to the second speech recognition equipment with described electronic equipment network connection, and this second speech recognition equipment can
Described audio signal is identified process and obtains the second identification content;Receive described the from described second speech recognition equipment
Two identification contents, and using this second identification content as final identification content.
In described method for voice recognition, may also include that to be sent to described audio signal and set with described electronics
Second speech recognition equipment of standby network connection, this second speech recognition equipment can be identified to described audio signal processing
And obtain the second identification content;When judging that in described judgement operation described identification content is inaccurate, in a preset time period
In from described second speech recognition equipment receive described second identification content.
In described method for voice recognition, may also include that described when not receiving in described preset time period
During the second identification content, obtain the low confidence threshold less than selected confidence threshold value;With based on this low confidence threshold
Value judges whether described identification content is accurate.
A kind of second aspect, there is provided speech recognition equipment, is applied to an electronic equipment, and this speech recognition equipment can wrap
Include: audio input unit, for receiving a phonetic entry, and obtain audio signal corresponding with this phonetic entry;Recognition unit,
Obtain a recognition result for being identified to described audio signal processing, this recognition result includes identifying content and confidence
Degree, this confidence level is used for determining the degree of reliability of this identification content;Threshold setting unit, is used for pre-setting at least two confidences
Degree threshold value, each confidence threshold value is different from each other;Threshold value acquiring unit, for selecting from described at least two confidence threshold value
One confidence threshold value;Judging unit, for the confidence threshold value based on the confidence level in described recognition result and described selection Lai
Judge whether described identification content is accurate.
In described speech recognition equipment, the identification that described threshold setting unit can be capable of identify that according to described recognition unit
At least one of content and its network condition are pre-setting at least two confidence threshold value.
In described speech recognition equipment, the content that described speech recognition equipment is capable of identify that may include multiple order words,
Described threshold setting unit can pre-set at least two confidence threshold value as follows: for first in the plurality of order word
Order word arranges the first confidence threshold value;For the second order word in the plurality of order word, the second confidence threshold value is set,
This second order word is different from described first order word.
In described speech recognition equipment, described threshold setting unit can pre-set at least two confidence level thresholds as follows
Value: have a case that network connection arranges the 3rd confidence threshold value for described speech recognition equipment;Know for described voice
Other device does not have a case that network connection arranges the 4th confidence threshold value.
In described speech recognition equipment, described threshold value acquiring unit mays include: determination part, for determining described identification
Whether the identification content in result corresponds to described second order word, and does not correspond to described second life in described identification content
When making word, determine whether described first speech recognition equipment has network connection;Alternative pack, for true in described determination part
When fixed described identification content corresponds to described second order word, select the second confidence threshold value, determine institute in described determination part
When stating speech recognition equipment there is network connection, select the 3rd confidence threshold value, determine that described voice is known in described determination part
When other device does not have network connection, select the 4th confidence threshold value.
In described speech recognition equipment, described judging unit can judge whether described identification content is accurate as follows: will
Confidence level in described recognition result compared with selected second confidence threshold value or selected 3rd confidence threshold value,
And obtain a comparative result;Judge whether described identification content is accurate according to described comparative result.
In described speech recognition equipment, may also include that transmitting element, for judging described identification when described judging unit
When content is inaccurate, described audio signal is sent to another speech recognition dress with described speech recognition equipment network connection
Put, this another speech recognition equipment can be identified to described audio signal processing and obtain the second identification content;Receive single
Unit, for receiving described second identification content from described another speech recognition equipment, and using this second identification content as final
Identification content.
In described speech recognition equipment, may also include that transmitting element, for by described audio signal be sent to described
Another speech recognition equipment of electronic equipment network connection, this another speech recognition equipment can be known to described audio signal
Other places are managed and are obtained the second identification content;Receiving unit, for judging to judge in operation that described identification content is inaccurate described
When, receive described second identification content from described another speech recognition equipment in a preset time period, by this second identification
Content is as final identification content.
In described speech recognition equipment, if described receiving unit does not receive described in described preset time period
Two identification contents, described threshold value acquiring unit can obtain the low confidence threshold less than selected confidence threshold value, described
Whether judging unit is based on this low confidence threshold accurate to judge described identification content.
The third aspect, there is provided a kind of electronic equipment, including speech recognition equipment as above.
Skill in above-mentioned method for voice recognition, speech recognition equipment and electronic equipment according to embodiments of the present invention
In art scheme, by pre-setting multiple confidence threshold value and therefrom selecting confidence threshold value to judge to identify the accurate of content
Property, allow to changeably adopt confidence threshold value to judge to identify content to take into account discrimination and the robustness of speech recognition, from
And improve the experience of user.
Brief description
In order to be illustrated more clearly that the technical scheme of the embodiment of the present invention, below will be in embodiment or description of the prior art
The accompanying drawing of required use be briefly described it should be apparent that, drawings in the following description be only the present invention some are real
Apply example, for those of ordinary skill in the art, on the premise of not paying creative work, can also be according to these accompanying drawings
Obtain other accompanying drawings.
Fig. 1 be a diagram that the Organization Chart of the according to embodiments of the present invention device carrying out speech recognition;
Fig. 2 is the flow chart schematically illustrating method for voice recognition according to embodiments of the present invention;
Fig. 3 is the confidence level threshold schematically illustrating in described method for voice recognition according to embodiments of the present invention
The flow chart of value setting;
Fig. 4 is the selection confidence schematically illustrating in described method for voice recognition according to embodiments of the present invention
The flow chart of degree threshold value;
Fig. 5 is the flow chart schematically illustrating method for voice recognition according to another embodiment of the present invention;
Fig. 6 is the block diagram schematically illustrating speech recognition equipment according to embodiments of the present invention;
Fig. 7 is the block diagram schematically illustrating speech recognition equipment according to another embodiment of the present invention.
Specific embodiment
Below in conjunction with the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is carried out clear, complete
Site preparation description is it is clear that described embodiment a part of embodiment that is the present invention, rather than whole embodiments, is not conflicting
In the case of, the embodiment in the application and the feature in embodiment can mutually be combined.
Fig. 1 be a diagram that the Organization Chart of the communication of each device carrying out speech recognition.
As shown in figure 1, the first speech recognition equipment 10 receives voice from user, then the voice being received is known
Not if it is possible to successfully be identified to the voice being received, then identified content accordingly;Failing to it is successfully right
The voice being received is identified, then cannot be identified content.This first speech recognition equipment 10 can be single voice
Identifying device is it is also possible to be integrated in the electronic equipments such as mobile phone, notebook, tablet PC.
Using current network interconnection technology, described first speech recognition equipment 10 is also possible to for example via network and second
Speech recognition equipment 20 connects, this second speech recognition equipment 20 generally can be utilized powerful Internet resources and realize more accurate
Speech recognition, it is possible that sharing voice identification result with described first speech recognition equipment 10.This second speech recognition fills
Putting 20 can be single speech recognition equipment it is also possible to be integrated in other electronic equipments, for example, be integrated in network service
In the electronic equipments such as device, notebook.The voice transfer of described reception can be given second by the first speech recognition equipment 10
Speech recognition equipment 20, and receive identified content from the second speech recognition equipment 20.
Each speech recognition equipment shown in Fig. 1 is only schematically.First speech recognition equipment 10 and the second voice
Identifying device 20 is in the status of equity.For example, the second speech recognition equipment 20 can receive voice, by the voice of described reception
Send the first speech recognition equipment 10 to, and receive identified content from the first speech recognition equipment 10.
In each embodiment according to the present invention, (the such as first voice knowledge will be described in individual voice identifying device
In other device 10) carry out the scheme of speech recognition, and different speech recognition equipments shares voice identification result, to take into account language
The discrimination of sound identification and robustness, thus improve the experience of user.
Fig. 2 is the flow chart schematically illustrating method for voice recognition 200 according to embodiments of the present invention.This use
Method 200 in speech recognition can be applicable to speech recognition equipment as shown in Figure 1 or includes described speech recognition equipment
In electronic equipment.
As shown in Fig. 2 this method for voice recognition 200 mays include: reception one phonetic entry, and obtain and this voice
Input corresponding audio signal (s210);Using described first speech recognition equipment described audio signal is identified process and
Obtain a recognition result, this recognition result includes identifying content and confidence level, what this confidence level was used for determining this identification content can
By degree (s220);Pre-set at least two confidence threshold value, each confidence threshold value (s230) different from each other;From described to
A confidence threshold value (s240) is selected in few two confidence threshold value;Based on the confidence level in described recognition result and described selection
Confidence threshold value whether accurately (s250) judging described identification content.
In s210, the recording device of such as microphone, phonographic recorder etc. is can be utilized to receive phonetic entry, described recording device
Received speech is converted into electronic signal, i.e. audio signal corresponding with described phonetic entry, thus being identified.Received
Voice can be expressed with the sound that various language (such as Chinese, English, German etc.) send or hybrid language
Sound, for example, be mixed with English word in Chinese.The concrete mode sending mode and receiving voice of the voice being received
It is not construed as limiting the invention.
In s220, any speech recognition technology that described first speech recognition equipment can be occurred using existing future
Described audio signal to be identified process and obtain a recognition result, described recognition result includes identifying content and confidence
Degree, this confidence level is used for determining the degree of reliability of this identification content.In mode as a example the speech recognition of matching way, in training rank
Section, each word in vocabulary is given an account of by user successively, and its characteristic vector is stored in ATL as template;Then,
In cognitive phase, from raw tone (i.e. above-mentioned audio signal) extract characteristic vector, and by input voice characteristic vector according to
Each template secondary and in ATL carries out similarity-rough set, will be defeated as recognition result for similarity (i.e. confidence level) soprano
Go out.
In practice, it may be difficult to carry out voice exactly, this is because following reason, for example, speech pattern is not only to different
Speaker is different, or even is also different to same speaker, and such as speaker is when arbitrarily speaking and conscientiously speaking
Voice messaging is different;Voice has ambiguity in itself and is affected to change stress, tone, volume by context
With the rate of articulation etc.;Ambient noise and interference have a strong impact on to speech recognition.Therefore, for same phonetic entry, in difference
The confidence level in recognition result under environment or background also changes very big.
To judge to identify content whether accurately situation arranging single confidence threshold value, if the setting of this confidence threshold value
Height then may lead to obtain identification content (recognition failures) probability too big, if the setting of this confidence threshold value is low,
The more identification content in recognition result may be led to inaccurate.For example, if phonetic entry is the sound of hybrid language expression,
In Chinese, for example it is mixed with " the opening filefox " of English word, then the confidence level in recognition result is generally relatively low, now such as
Fruit then may lead to recognition failures using common confidence threshold value.
In s230, pre-set at least two confidence threshold value, each confidence threshold value is different from each other.With respect to only setting
Put a confidence threshold value to judge whether accurately to identify content, embodiments of the invention pre-set at least two confidence level thresholds
Value, and judged thereafter according to the different confidence threshold value of different situation selections.As an example, can be according to described
At least one of the identification content that one speech recognition equipment is capable of identify that and its network condition are put pre-setting at least two
Confidence threshold.
Fig. 3 is the confidence level threshold schematically illustrating in described method for voice recognition according to embodiments of the present invention
The flow chart of value setting 230.As shown in figure 3, the content being capable of identify that in described first speech recognition equipment includes multiple orders
In the case of word, the first confidence threshold value (s231) can be set for the first order word in the plurality of order word;For
The second order word in the plurality of order word arranges the second confidence threshold value, and this second order word is different from described first order
Word (s232);Have a case that network connection arranges the 3rd confidence threshold value for described first speech recognition equipment
(s233);Do not have a case that network connection arranges the 4th confidence threshold value for described first speech recognition equipment
(s234).
In s231 and s232, different confidence threshold value are set for different order words.For example, if the first language
Sound identifying device identifies that the accuracy rate of Chinese speech is high, then can be directed to the higher confidence threshold value of the order word setting of Chinese;
If the first speech recognition equipment identifies that the accuracy rate of English Phonetics is low, the order word that can be directed to English arranges relatively low putting
Confidence threshold.Additionally, in s230, can also be for the other confidence threshold value of the 3rd order word setting, based on the setting of order word
The number of confidence threshold value do not constitute the restriction to the embodiment of the present invention.Described first order word can be a specific life
The class order word making word or including multiple order words, e.g. multiple Chinese order words.Described second order word can
To be a specific order word or to include a class order word of multiple order words, for example, can be directed to indigestion
Order word " filefox " and a special confidence threshold value is set.
In s233 and s234, whether there is network connection to arrange different putting for described first speech recognition equipment
Confidence threshold, described 3rd confidence threshold value can be higher than described 4th confidence threshold value.When the first speech recognition equipment has net
When network connects, if the first speech recognition equipment utilizes the 3rd confidence threshold value and recognition failures, network connection can be asked
The second speech recognition equipment phonetic entry is carried out with speech recognition, and by the identification content acquired in the second speech recognition equipment
As final identification content, such that it is able to there is higher discrimination in the case of ensureing higher recognition accuracy.However, such as
Really the first speech recognition equipment does not have network connection, then suitably reduce confidence threshold value, thus ensureing for a user more
Important discrimination.
Suitable confidence threshold value setting steps can be taken as needed, for example, it is possible to only with above-mentioned
S231 and s232, or only with above-mentioned s233 and s234.Other confidence level thresholds can also be taken under other scenes
Value setting steps.Although additionally, in FIG s230 is illustrated as after described s220, can be (i.e. pre- before s210
First) execute this s230 and each confidence threshold value is set.
In s240, can be according to the current scene of the first speech recognition equipment come from described at least two confidence threshold value
Middle selection one confidence threshold value, for example can be according to the net of identification content corresponding with phonetic entry and the first speech recognition equipment
Network connection state is selecting confidence threshold value.The foundation of selection can be adjusted as required by practice.
Fig. 4 is the selection confidence schematically illustrating in described method for voice recognition according to embodiments of the present invention
The flow chart of degree threshold value.Carry out exemplary description with reference to Fig. 4.
As shown in figure 4, after being identified result in s220, determining whether the identification content in described recognition result is right
Second order word (s241) described in Ying Yu;When described identification content corresponds to described second order word (being in s241), choosing
Select the second confidence threshold value (s242);When described identification content does not correspond to described second order word (no in s241), really
Whether fixed described first speech recognition equipment has network connection (s243);When described first speech recognition equipment has network even
When connecing (being in s243), select the 3rd confidence threshold value;When described first speech recognition equipment does not have network connection
(no in s243), selects the 4th confidence threshold value.
In the example of fig. 4, to select confidence level threshold in conjunction with two different factors (i.e. identification content and network connection)
Value.In practice, confidence threshold value can be selected according only to identification content, then when described identification content does not correspond to described the
The confidence threshold value of an acquiescence during two order words, can be selected, or it may also be determined that whether described identification content corresponds to institute
State the first order word, when described identification content corresponds to described 3rd order word, select other confidence threshold value.In a word,
Consider that current speech recognition scene and both setting bases of each confidence threshold value to select confidence threshold value.
In s250, confidence threshold value based on the confidence level in described recognition result and described selection is judging described knowledge
Whether other content is accurate.As an example, can be by the confidence level in described recognition result and selected second confidence threshold value
Or selected 3rd confidence threshold value compares, and obtain a comparative result;Described identification is judged according to described comparative result
Whether content is accurate.For example, when the confidence level in described recognition result is more than or equal to selected confidence threshold value, judge to know
Identification content in other result is accurate, thus using the identification content in recognition result as final identification content;When described knowledge
When confidence level in other result is less than selected confidence threshold value, judge that the identification content in recognition result is inaccurate, thus
Recognition failures.
In the technical scheme of above-mentioned method for voice recognition according to embodiments of the present invention, many by pre-setting
Individual confidence threshold value simultaneously therefrom selects confidence threshold value to judge to identify the accuracy of content, allows to changeably adopt confidence level
Threshold value come to judge identify content, to take into account discrimination and the robustness of speech recognition, thus improve the experience of user.
In above-mentioned method for voice recognition, carry out speech recognition using the first speech recognition equipment.As knot
Close described by Fig. 1, the first speech recognition equipment can also share speech recognition with the second speech recognition equipment of network connection
As a result, it is described below in conjunction with Fig. 5.
Fig. 5 is the flow chart schematically illustrating method for voice recognition 500 according to another embodiment of the present invention.
This method for voice recognition 500 also includes step s210- in method for voice recognition 200 described above
S250, from unlike method for voice recognition 200, after recognition failures in s250, also includes the steps
S251-s254.
When the identification content in judging described recognition result in s250 is inaccurate, by described audio signal be sent to
Second speech recognition equipment (for example, the second speech recognition equipment 20 in Fig. 1) of described electronic equipment network connection, this second
Speech recognition equipment can be identified to described audio signal processing and obtain the second identification content (s251);And wait from institute
State the second speech recognition equipment and receive described second identification content (s252), if receiving from the second speech recognition equipment described
Second identification content (being in s252), then terminate this second identification content as final identification content;Failing to
Receive described second identification content (no s252) from the second speech recognition equipment, then obtain and be less than selected confidence level
One low confidence threshold (s253) of threshold value;Whether accurately (s254) to judge described identification content with based on this low confidence threshold
To terminate to identify.
In the example of hgure 5, when the identification content in judging described recognition result in s250 is inaccurate, by described sound
Frequency signal is sent to the second speech recognition equipment (s251) with described electronic equipment network connection.But it is not limited to this, also may be used
After obtaining audio signal in s210, immediately described audio signal is sent to the with described electronic equipment network connection
Two speech recognition equipments (s252), thus when judging that in described s250 described identification content is inaccurate, can be as early as possible from institute
State the second speech recognition equipment and receive described second identification content.
S252 wait when described second speech recognition equipment receives described second identification content, if network is gathered around
Block up or interrupt, then may lead to receive described second identification content, if now the stand-by period is long, can pole
The earth reduces the experience of user.Therefore, it can arrange a stand-by period (such as preset time period) in s252, thus such as
Fruit does not receive described second identification content in this preset time period, is just no longer waiting for receiving.
From the second speech recognition equipment be not received by described second identification content (no in s252) when, in order to
Family provides identification content, can again investigate the recognition result in the first speech recognition equipment, to strive for improving discrimination.As
Fruit user is very high to the accuracy requirement of identification, then need not execute this ss253 and s254 and directly terminate to identify.In s253,
Can by the confidence threshold value selecting in s240 is deducted a predetermined value and obtain described low confidence threshold, can also to
Carry out reselecting to obtain described low confidence threshold among the confidence threshold value of setting in s230.
Judgement operation in s254 is similar with s250, judges whether described identification content is accurate based on this low confidence threshold
(s254) to terminate to identify.For example, it is possible to the confidence level in described recognition result is compared with described low confidence threshold, when
When confidence level in described recognition result is more than or equal to described low confidence threshold, judge that the identification content in recognition result is accurate
Really, thus using the identification content in recognition result as final identification content;When the confidence level in described recognition result is less than
During described low confidence threshold, judge that the identification content in recognition result is inaccurate, i.e. recognition failures.
Therefore, because network timeout, server be busy etc., reason cannot obtain in time using the second speech recognition equipment
During the Network Recognition result carrying out, by reducing confidence threshold value, reuse the local result generation of the first speech recognition equipment
For feedbacks such as the busy, network timeouts of server, so that user can obtain under conditions of network server inclement condition
Recognition result, lifts Consumer's Experience.If directly selecting described low confidence threshold in s240, can lead to good in network condition
Under conditions of good, using substantial amounts of in the less reliable recognition result carrying out locally with the first speech recognition equipment.Pass through
In s240 and s253, confidence threshold value is set twice and avoids this problem, its ability when not obtaining web results in time
Reduce confidence threshold value.
Therefore, in the technical scheme of the method for voice recognition 500 describing with reference to Fig. 5, can be flexible further
Ground to judge using confidence threshold value to identify content, makes full use of the advantage of each speech recognition equipment to take into account speech recognition
Discrimination and robustness, thus improve the experience of user.
Fig. 6 is the block diagram schematically illustrating speech recognition equipment 600 according to embodiments of the present invention.This speech recognition fills
Put 600 to can be applicable in speech recognition equipment as shown in Figure 1 or the electronic equipment of the described speech recognition equipment of inclusion.
This speech recognition equipment 600 mays include: audio input unit 610, for receiving a phonetic entry, and obtains and is somebody's turn to do
The corresponding audio signal of phonetic entry;Recognition unit 620, obtains an identification for being identified to described audio signal processing
As a result, this recognition result includes identifying content and confidence level, and this confidence level is used for determining the degree of reliability of this identification content;Threshold value
Arranging unit 630, is used for pre-setting at least two confidence threshold value, each confidence threshold value is different from each other;Threshold value obtains single
Unit 640, for selecting a confidence threshold value from described at least two confidence threshold value;Judging unit 650, for based on described
Whether the confidence threshold value of the confidence level in recognition result and described selection is accurate to judge described identification content.
Described audio input unit 610 is, for example, the recording device of microphone, phonographic recorder etc., and it receives phonetic entry, will
Received speech is converted into electronic signal, i.e. audio signal corresponding with described phonetic entry, thus being identified.Received
Voice can be the sound being sent with various language or the sound of hybrid language expression.The sending of the voice being received
The concrete mode of mode and reception voice is not construed as limiting the invention.
Described recognition unit 620 can described audio frequency is believed using any speech recognition technology that existing future occurs
Number it is identified processing and obtain a recognition result.In mode as a example the speech recognition of matching way, in the training stage, user will
Each word in vocabulary is given an account of successively, and its characteristic vector is stored in ATL as template;Then, in identification rank
Section, from the audio signal of phonetic entry extract characteristic vector, and by this feature vector successively with ATL in each template
Carry out similarity-rough set, similarity (i.e. confidence level) soprano is exported as recognition result.
Regularly judging using single confidence threshold value to identify content whether accurately situation, if this confidence level threshold
The height of value setting then may lead to the probability that can not obtain identification content (recognition failures) too big, if the setting of this confidence threshold value
Low, the more identification content in recognition result may be led to inaccurate.
Described threshold setting unit 630 pre-sets at least two confidence threshold value, so that thereafter according to different situations
Choose different confidence threshold value to be judged.As an example, described threshold setting unit 630 can be according to described recognition unit
At least one of the identification content being capable of identify that and its network condition are pre-setting at least two confidence threshold value.Described threshold
Value arranging unit 630 can arrange suitable confidence threshold value as needed it is also possible to take other under other scenes
Confidence threshold value setting steps.
The content being capable of identify that in described speech recognition equipment includes multiple order words, and described threshold setting unit 630 can
Different confidence threshold value are set for different order words, for example, for the first order word in the plurality of order word
First confidence threshold value is set;Second confidence threshold value is set for the second order word in the plurality of order word, this second
Order word is different from described first order word.Additionally, described threshold setting unit 630 can also be for the 3rd order word arranges it
Its confidence threshold value.For example, if described speech recognition equipment identifies that the accuracy rate of Chinese speech is high, Chinese can be directed to
The higher confidence threshold value of order word setting;If described speech recognition equipment identifies that the accuracy rate of English Phonetics is low, permissible
For the relatively low confidence threshold value of the order word setting of English.Each of described first order word and the second order word can be one
Individual specific order word or the class order word including multiple order words.
Described threshold setting unit 630 can also be for whether described speech recognition equipment has network connection to arrange not
Same confidence threshold value, for example, described threshold setting unit 630 can have network connection for described speech recognition equipment
Situation and the 3rd confidence threshold value is set;Do not have a case that network connection arranges the 4th and puts for described speech recognition equipment
Confidence threshold, described 3rd confidence threshold value can be higher than described 4th confidence threshold value.When speech recognition equipment has network even
When connecing, if speech recognition equipment utilizes the 3rd confidence threshold value and recognition failures, another language of network connection can be asked
Sound identifying device carries out speech recognition to phonetic entry, and using the identification content acquired in another speech recognition equipment as final
Identification content, such that it is able to ensure higher recognition accuracy in the case of there is higher discrimination.If however, voice is known
Other device does not have network connection, then suitably reduce confidence threshold value, thus ensureing prior discrimination for a user.
Described threshold value acquiring unit 640 can be according to the current scene of speech recognition equipment come from described at least two confidences
A confidence threshold value is selected in degree threshold value, for example can be according to identification content corresponding with phonetic entry and speech recognition equipment
Network connection status are selecting confidence threshold value.The foundation of selection can be adjusted as required by practice.
For example, described threshold value acquiring unit 640 mays include: determination part, for determining the identification in described recognition result
Whether content corresponds to described second order word, and when described identification content does not correspond to described second order word, determines
Whether described first speech recognition equipment has network connection;Alternative pack, for determining described identification in described determination part
When content corresponds to described second order word, select the second confidence threshold value, determine described speech recognition in described determination part
When device has network connection, select the 3rd confidence threshold value, determine that described speech recognition equipment does not have in described determination part
When having network connection, select the 4th confidence threshold value.
Additionally, described threshold value acquiring unit 640 can select confidence threshold value according only to identification content, when determination part
When determining that described identification content does not correspond to described second order word, alternative pack can select the confidence threshold value of an acquiescence,
Or determine part it may also be determined that described identification content is whether during corresponding to described first order word, the 3rd order word etc., with
Select other confidence threshold value.In a word, described threshold value acquiring unit 640 will consider that current speech recognition scene is put with each
Confidence threshold value is selected both the setting of confidence threshold is basic.
The confidence threshold value based on the confidence level in described recognition result and described selection for the described judging unit 650 is judging
Whether described identification content is accurate.As an example, described judging unit 650 can be by the confidence level in described recognition result and institute
The confidence threshold value selecting compares, and obtains a comparative result;Whether described identification content is judged according to described comparative result
Accurately.When the confidence level in described recognition result is more than or equal to selected confidence threshold value, described judging unit 650 judges
Identification content in recognition result is accurate, thus using the identification content in recognition result as final identification content;When described
When confidence level in recognition result is less than selected confidence threshold value, described judging unit 650 judges the knowledge in recognition result
Other content is inaccurate, thus recognition failures.
Alternatively, described speech recognition equipment may also include transmitting element 660 and receiving unit 670, the dotted line in such as Fig. 6
Shown in frame.For example, when described judging unit 650 judges that described identification content is inaccurate, described transmitting element 660 can be by institute
State audio signal and be sent to another speech recognition equipment with described speech recognition equipment network connection, this another speech recognition dress
Put described audio signal can be identified process and obtain the second identification content;Described receiving unit 670 can be from described
Another speech recognition equipment receives described second identification content, and using this second identification content as final identification content.?
In the example of Fig. 5, when the identification content in judging described recognition result in s250 is inaccurate, described audio signal is transmitted
To another speech recognition equipment with described electronic equipment network connection.
Additionally, described transmitting element 660 can also be after described audio input unit 610 obtains audio signal, immediately
Described audio signal is sent to another speech recognition equipment with described electronic equipment network connection, thus described receiving unit
660 can receive institute from described another speech recognition equipment as early as possible when judging unit 650 judges that described identification content is inaccurate
State the second identification content.
If network congestion or interruption, described receiving unit 670 may be led to can not to receive described second identification
Content, if now the stand-by period is long, can greatly reduce the experience of user.When therefore, it can arrange a wait
Between (such as preset time period), if thus receiving unit 670 do not receive in this preset time period described second identification in
Hold, described speech recognition equipment just no longer receives.Now, described threshold value acquiring unit 640 can obtain and put less than selected
One low confidence threshold of confidence threshold, described judging unit 650 is based on this low confidence threshold and judges described identification content
Whether accurate.
Receiving unit 670 from another speech recognition equipment be not received by described second identification content when, in order to
Family provides identification content, can again investigate the recognition result in speech recognition equipment, to strive for improving discrimination.Therefore,
Described threshold value acquiring unit 640 obtains low confidence threshold, and this threshold value acquiring unit 640 can be by the confidence level that will currently select
Threshold value deducts a predetermined value to obtain described low confidence threshold, can also carry out among each set confidence threshold value
Reselect to obtain described low confidence threshold.Subsequently, described judging unit 650 is based on this low confidence threshold to judge
Whether accurate state identification content.
Therefore, because network timeout, server be busy etc., reason cannot obtain in time using another speech recognition equipment
During the Network Recognition result carrying out, by reducing confidence threshold value, the local result reusing speech recognition equipment replaces clothes
The feedbacks such as business device hurries, network timeout, so that user can obtain identification under conditions of network server inclement condition
As a result, lift Consumer's Experience.If described threshold value acquiring unit 640 directly selects described low confidence threshold, can lead in net
Under conditions of network condition is good, using substantial amounts of in the less reliable recognition result carrying out locally with speech recognition equipment.
Avoid this problem by arranging confidence threshold value twice, it just reduces confidence level when not obtaining web results in time
Threshold value.
In the technical scheme of above-mentioned speech recognition equipment according to embodiments of the present invention, allow to changeably adopt confidence
Degree threshold value judging to identify content, and make full use of the advantage of each speech recognition equipment come to take into account speech recognition discrimination and
Robustness, thus improve the experience of user.
Fig. 7 is the block diagram schematically illustrating speech recognition equipment 700 according to another embodiment of the present invention.This voice is known
Other device 700 can be with other speech recognition equipment coupled in communication, and this speech recognition equipment 700 includes: memory 710, are used for
Store program codes;Processor 720, for executing described program code to realize the method with reference to Fig. 2-5 description.
Memory 710 can include at least one of read-only storage and random access memory, and to processor 720
Provide instruction and data.The a part of of memory 710 can also include non-volatile row random access memory (nvram).
Processor 720 can be general processor, digital signal processor (dsp), special IC (asic), ready-made
Programmable gate array (fpga) or other PLDs, discrete gate or transistor logic, discrete hardware group
Part.General processor can be microprocessor or any conventional processor etc..
Step in conjunction with the method disclosed in the embodiment of the present invention can be embodied directly in and completed by computing device, or
Completed with the hardware in processor and software module combination execution.Software module may be located at random access memory, flash memory, read-only deposits
In the ripe storage medium in this area such as reservoir, programmable read only memory or electrically erasable programmable memory, register.
This storage medium is located in memory 710, and processor 720 reads the information in memory 710, completes above-mentioned side in conjunction with its hardware
The step of method.
In the case of disclosing speech recognition equipment according to embodiments of the present invention above in conjunction with Fig. 6-7, all inclusions
The electronic equipment of described speech recognition equipment is also in the open scope of the embodiment of the present invention.
Those of ordinary skill in the art are it is to be appreciated that combine the list of each example of the embodiments described herein description
Unit and algorithm steps, being capable of being implemented in combination in electronic hardware or computer software and electronic hardware.These functions are actually
To be executed with hardware or software mode, the application-specific depending on technical scheme and design constraint.Professional and technical personnel
Each specific application can be used different methods to realize described function, but this realization is it is not considered that exceed
The scope of the present invention.
Those skilled in the art can be understood that, for convenience and simplicity of description, the device of foregoing description
With the specific work process of unit, may be referred to the corresponding process in preceding method embodiment, will not be described here.
It should be understood that disclosed equipment and method in several embodiments provided herein, can be passed through it
Its mode is realized.For example, device embodiment described above is only schematically, for example, the division of described unit, and only
It is only a kind of division of logic function, actual can have other dividing mode when realizing, and for example multiple units or assembly can be tied
Close or be desirably integrated into another equipment, or some features can be ignored, or do not execute.
The described unit illustrating as separating component can be or may not be physically separate, show as unit
The part showing can be or may not be physical location.Can select therein some or all of according to the actual needs
Unit is realizing the purpose of this embodiment scheme.
If described function realized using in the form of SFU software functional unit and as independent production marketing or use when, permissible
It is stored in a computer read/write memory medium.Based on such understanding, technical scheme is substantially in other words
Partly being embodied in the form of software product of part that prior art is contributed or this technical scheme, this meter
Calculation machine software product is stored in a storage medium, including some instructions with so that a computer equipment (can be individual
People's computer, server, or network equipment etc.) execution each embodiment methods described of the present invention all or part of step.
And aforesaid storage medium includes: u disk, portable hard drive, read-only storage, random access memory, magnetic disc or CD etc. are each
Planting can be with the medium of store program codes.
The above, the only specific embodiment of the present invention, but protection scope of the present invention is not limited thereto, and any
Those familiar with the art the invention discloses technical scope in, change or replacement can be readily occurred in, all should contain
Cover within protection scope of the present invention.Therefore, protection scope of the present invention should described be defined by scope of the claims.
Claims (21)
1. a kind of method for voice recognition, is applied to an electronic equipment including the first speech recognition equipment, methods described
Including:
Receive a phonetic entry, and obtain audio signal corresponding with this phonetic entry;
Described audio signal is identified process using described first speech recognition equipment and obtains a recognition result, this identification
Result includes identifying content and confidence level, and this confidence level is used for determining the degree of reliability of this identification content;
Pre-set at least two confidence threshold value, each confidence threshold value is different from each other;
Select a confidence threshold value from described at least two confidence threshold value;
Whether the confidence threshold value based on the confidence level in described recognition result and described selection is accurate to judge described identification content
Really.
2. method according to claim 1, wherein, described at least two confidence threshold value that pre-set include: according to described
At least one of the identification content that one speech recognition equipment is capable of identify that and its network condition are put pre-setting at least two
Confidence threshold.
3. method according to claim 2, wherein, the content that described first speech recognition equipment is capable of identify that includes multiple orders
Word, at least one of the described identification content being capable of identify that according to described first speech recognition equipment and its network condition are come pre-
First arrange at least two confidence threshold value to include:
For the first order word in the plurality of order word, the first confidence threshold value is set;
Second confidence threshold value is set for the second order word in the plurality of order word, this second order word is different from described
First order word.
4. method according to claim 3, wherein, the described identification content being capable of identify that according to described first speech recognition equipment
To pre-set at least two confidence threshold value with least one of its network condition to include:
Have a case that network connection arranges the 3rd confidence threshold value for described first speech recognition equipment;
Do not have a case that network connection arranges the 4th confidence threshold value for described first speech recognition equipment.
5. method according to claim 4, wherein, described selects a confidence threshold value from described at least two confidence threshold value
Including:
Determine whether the identification content in described recognition result corresponds to described second order word;
When described identification content corresponds to described second order word, select the second confidence threshold value;
When described identification content does not correspond to described second order word, determine whether described first speech recognition equipment has net
Network connects;
When described first speech recognition equipment has network connection, select the 3rd confidence threshold value;
When described first speech recognition equipment does not have network connection, select the 4th confidence threshold value.
6. method according to claim 5, wherein, the described confidence based on the confidence level in described recognition result and described selection
Spend threshold value to judge whether described identification content accurately includes:
By the confidence level in described recognition result and selected second confidence threshold value or selected 3rd confidence threshold value
Compare, and obtain a comparative result;
Judge whether described identification content is accurate according to described comparative result.
7. method according to claim 1, also includes:
When judging that described identification content is inaccurate, described audio signal is sent to the with described electronic equipment network connection
Two speech recognition equipments, this second speech recognition equipment can be identified to described audio signal processing and obtain the second identification
Content;
Receive described second identification content from described second speech recognition equipment, and using this second identification content as final knowledge
Other content.
8. method according to claim 1, also includes:
Described audio signal is sent to the second speech recognition equipment with described electronic equipment network connection, this second voice is known
Other device can be identified to described audio signal processing and obtain the second identification content;
When judging that in described judgement operation described identification content is inaccurate, from described second voice in a preset time period
Identifying device receives described second identification content.
9. method according to claim 8, also includes:
When not receiving described second identification content in described preset time period, obtain and be less than selected confidence threshold value
A low confidence threshold;With
Judge whether described identification content is accurate based on this low confidence threshold.
10. method according to claim 2, wherein, in the described identification being capable of identify that according to described first speech recognition equipment
Hold and at least one of its network condition include pre-setting at least two confidence threshold value:
Have a case that network connection arranges the 3rd confidence threshold value for described first speech recognition equipment;
Do not have a case that network connection arranges the 4th confidence threshold value for described first speech recognition equipment.
A kind of 11. speech recognition equipments, are applied to an electronic equipment, and this speech recognition equipment includes:
Audio input unit, for receiving a phonetic entry, and obtains audio signal corresponding with this phonetic entry;
Recognition unit, obtains a recognition result for being identified to described audio signal processing, this recognition result includes knowing
Other content and confidence level, this confidence level is used for determining the degree of reliability of this identification content;
Threshold setting unit, is used for pre-setting at least two confidence threshold value, each confidence threshold value is different from each other;
Threshold value acquiring unit, for selecting a confidence threshold value from described at least two confidence threshold value;
Judging unit, judges described knowledge for the confidence threshold value based on the confidence level in described recognition result and described selection
Whether other content is accurate.
12. speech recognition equipments according to claim 11, wherein, described threshold setting unit can according to described recognition unit
The identification content of identification and at least one of its network condition are pre-setting at least two confidence threshold value.
13. speech recognition equipments according to claim 12, wherein, content that described speech recognition equipment is capable of identify that includes many
Individual order word, described threshold setting unit pre-sets at least two confidence threshold value as follows:
For the first order word in the plurality of order word, the first confidence threshold value is set;
Second confidence threshold value is set for the second order word in the plurality of order word, this second order word is different from described
First order word.
14. speech recognition equipments according to claim 13, wherein, described threshold setting unit pre-sets at least two as follows
Individual confidence threshold value:
Have a case that network connection arranges the 3rd confidence threshold value for described speech recognition equipment;
Do not have a case that network connection arranges the 4th confidence threshold value for described speech recognition equipment.
15. speech recognition equipments according to claim 14, wherein, described threshold value acquiring unit includes:
Determine part, for determining whether the identification content in described recognition result corresponds to described second order word, and
When described identification content does not correspond to described second order word, determine whether described speech recognition equipment has network connection;
Alternative pack, for when described determination part determines that described identification content corresponds to described second order word, selecting the
Two confidence threshold value, when described determination part determines that described speech recognition equipment has network connection, select the 3rd confidence level
Threshold value, when described determination part determines that described speech recognition equipment does not have network connection, selects the 4th confidence threshold value.
16. speech recognition equipments according to claim 15, wherein, described judging unit judges that described identification content is as follows
No accurate: by the confidence level in described recognition result and selected second confidence threshold value or selected 3rd confidence level threshold
Value compares, and obtains a comparative result;Judge whether described identification content is accurate according to described comparative result.
17. speech recognition equipments according to claim 11, also include:
Transmitting element, for when described judging unit judges that described identification content is inaccurate, described audio signal being sent to
With another speech recognition equipment of described speech recognition equipment network connection, this another speech recognition equipment can be to described audio frequency
Signal is identified processing and obtains the second identification content;
Receiving unit, for receiving described second identification content from described another speech recognition equipment, and by this second identification
Hold as final identification content.
18. speech recognition equipments according to claim 11, also include:
Transmitting element, for being sent to another speech recognition dress with described electronic equipment network connection by described audio signal
Put, this another speech recognition equipment can be identified to described audio signal processing and obtain the second identification content;
Receiving unit, for described judge operation in judge described identification content inaccurate when, in a preset time period from
Described another speech recognition equipment receives described second identification content, using this second identification content as in final identification
Hold.
19. speech recognition equipments according to claim 18, wherein, described receiving unit does not receive in described preset time period
To the described second identification content,
Described threshold value acquiring unit obtains the low confidence threshold less than selected confidence threshold value,
Whether described judging unit is based on this low confidence threshold accurate to judge described identification content.
20. speech recognition equipments according to claim 12, wherein, described threshold setting unit pre-sets at least two as follows
Individual confidence threshold value:
Have a case that network connection arranges the 3rd confidence threshold value for described speech recognition equipment;
Do not have a case that network connection arranges the 4th confidence threshold value for described speech recognition equipment.
21. a kind of electronic equipments, including the speech recognition equipment as any one of claim 11-20.
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201410013478.8A CN103700368B (en) | 2014-01-13 | 2014-01-13 | Speech recognition method, speech recognition device and electronic equipment |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201410013478.8A CN103700368B (en) | 2014-01-13 | 2014-01-13 | Speech recognition method, speech recognition device and electronic equipment |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| CN103700368A CN103700368A (en) | 2014-04-02 |
| CN103700368B true CN103700368B (en) | 2017-01-18 |
Family
ID=50361874
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN201410013478.8A Active CN103700368B (en) | 2014-01-13 | 2014-01-13 | Speech recognition method, speech recognition device and electronic equipment |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN103700368B (en) |
Families Citing this family (12)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| KR102346302B1 (en) | 2015-02-16 | 2022-01-03 | 삼성전자 주식회사 | Electronic apparatus and Method of operating voice recognition in the electronic apparatus |
| KR101736109B1 (en) * | 2015-08-20 | 2017-05-16 | 현대자동차주식회사 | Speech recognition apparatus, vehicle having the same, and method for controlling thereof |
| CN106653008B (en) * | 2015-10-28 | 2021-02-02 | 中兴通讯股份有限公司 | Voice control method, device and system |
| CN106338924A (en) * | 2016-09-23 | 2017-01-18 | 广州视源电子科技股份有限公司 | Method and device for automatically adjusting equipment operation parameter threshold |
| CN107895573B (en) * | 2017-11-15 | 2021-08-24 | 百度在线网络技术(北京)有限公司 | Method and device for identifying information |
| CN108711429B (en) * | 2018-06-08 | 2021-04-02 | Oppo广东移动通信有限公司 | Electronic device and device control method |
| CN109346071A (en) * | 2018-09-26 | 2019-02-15 | 出门问问信息科技有限公司 | Wake up processing method, device and electronic equipment |
| CN109256134B (en) * | 2018-11-22 | 2021-11-02 | 深圳市同行者科技有限公司 | Voice awakening method, storage medium and terminal |
| CN110265018B (en) * | 2019-07-01 | 2022-03-04 | 成都启英泰伦科技有限公司 | Method for recognizing continuously-sent repeated command words |
| CN110880318B (en) * | 2019-11-27 | 2023-04-18 | 云知声智能科技股份有限公司 | Voice recognition method and device |
| CN112802483B (en) * | 2021-04-14 | 2021-06-29 | 南京山猫齐动信息技术有限公司 | Method, device and storage medium for optimizing intention recognition confidence threshold |
| CN115410578A (en) * | 2022-10-27 | 2022-11-29 | 广州小鹏汽车科技有限公司 | Processing method of voice recognition, processing system thereof, vehicle and readable storage medium |
Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US6006183A (en) * | 1997-12-16 | 1999-12-21 | International Business Machines Corp. | Speech recognition confidence level display |
| CN1633679A (en) * | 2001-12-29 | 2005-06-29 | 摩托罗拉公司 | Method and device for multi-level distributed speech recognition |
| CN101609672A (en) * | 2009-07-21 | 2009-12-23 | 北京邮电大学 | A method and device for extracting semantic confidence features for speech recognition |
| CN101763855A (en) * | 2009-11-20 | 2010-06-30 | 安徽科大讯飞信息科技股份有限公司 | Method and device for judging confidence of speech recognition |
| CN103177721A (en) * | 2011-12-26 | 2013-06-26 | 中国电信股份有限公司 | Voice recognition method and system |
-
2014
- 2014-01-13 CN CN201410013478.8A patent/CN103700368B/en active Active
Patent Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US6006183A (en) * | 1997-12-16 | 1999-12-21 | International Business Machines Corp. | Speech recognition confidence level display |
| CN1633679A (en) * | 2001-12-29 | 2005-06-29 | 摩托罗拉公司 | Method and device for multi-level distributed speech recognition |
| CN101609672A (en) * | 2009-07-21 | 2009-12-23 | 北京邮电大学 | A method and device for extracting semantic confidence features for speech recognition |
| CN101763855A (en) * | 2009-11-20 | 2010-06-30 | 安徽科大讯飞信息科技股份有限公司 | Method and device for judging confidence of speech recognition |
| CN103177721A (en) * | 2011-12-26 | 2013-06-26 | 中国电信股份有限公司 | Voice recognition method and system |
Also Published As
| Publication number | Publication date |
|---|---|
| CN103700368A (en) | 2014-04-02 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN103700368B (en) | Speech recognition method, speech recognition device and electronic equipment | |
| US11075862B2 (en) | Evaluating retraining recommendations for an automated conversational service | |
| US20210027788A1 (en) | Conversation interaction method, apparatus and computer readable storage medium | |
| US20190311036A1 (en) | System and method for chatbot conversation construction and management | |
| EP3020040B1 (en) | Method and apparatus for assigning keyword model to voice operated function | |
| US9589564B2 (en) | Multiple speech locale-specific hotword classifiers for selection of a speech locale | |
| CN110147726A (en) | Business quality detecting method and device, storage medium and electronic device | |
| CN104299623B (en) | It is used to automatically confirm that the method and system with disambiguation module in voice application | |
| CN108447471A (en) | Audio recognition method and speech recognition equipment | |
| CN109815156A (en) | Test methods, apparatus, equipment and storage media for the presentation of visual elements in pages | |
| CN108694940A (en) | A kind of audio recognition method, device and electronic equipment | |
| CN111128134A (en) | Acoustic model training method, voice awakening method, device and electronic equipment | |
| KR20150031984A (en) | Speech recognition system and method using incremental device-based model adaptation | |
| CN109660533B (en) | Method and device for identifying abnormal flow in real time, computer equipment and storage medium | |
| CN113436614A (en) | Speech recognition method, apparatus, device, system and storage medium | |
| CN103838991A (en) | Information processing method and electronic device | |
| CN113282509B (en) | Tone recognition, live broadcast room classification method, device, computer equipment and medium | |
| US11256609B1 (en) | Systems and methods to optimize testing using machine learning | |
| US12394405B2 (en) | Systems and methods for reconstructing video data using contextually-aware multi-modal generation during signal loss | |
| WO2024093578A1 (en) | Voice recognition method and apparatus, and electronic device, storage medium and computer program product | |
| CN109273004A (en) | Predictive speech recognition method and device based on big data | |
| CN112735381A (en) | Model updating method and device | |
| CN108960836A (en) | Voice payment method, apparatus and system | |
| CN112148864B (en) | Voice interaction method and device, computer equipment and storage medium | |
| CN112667790A (en) | Intelligent question and answer method, device, equipment and storage medium |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| C06 | Publication | ||
| PB01 | Publication | ||
| C10 | Entry into substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| C14 | Grant of patent or utility model | ||
| GR01 | Patent grant |