[go: up one dir, main page]

CN106875949B - Correction method and device for voice recognition - Google Patents

Correction method and device for voice recognition Download PDF

Info

Publication number
CN106875949B
CN106875949B CN201710291330.4A CN201710291330A CN106875949B CN 106875949 B CN106875949 B CN 106875949B CN 201710291330 A CN201710291330 A CN 201710291330A CN 106875949 B CN106875949 B CN 106875949B
Authority
CN
China
Prior art keywords
application scene
voice recognition
corpus
result
current application
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710291330.4A
Other languages
Chinese (zh)
Other versions
CN106875949A (en
Inventor
石日俭
贺磊
刘旭
吕晓霞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Szbroad Technology Co ltd
Original Assignee
Szbroad Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Szbroad Technology Co ltd filed Critical Szbroad Technology Co ltd
Priority to CN201710291330.4A priority Critical patent/CN106875949B/en
Publication of CN106875949A publication Critical patent/CN106875949A/en
Application granted granted Critical
Publication of CN106875949B publication Critical patent/CN106875949B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/04Training, enrolment or model building
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/27Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination

Landscapes

  • Engineering & Computer Science (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Theoretical Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Machine Translation (AREA)
  • Image Analysis (AREA)

Abstract

The embodiment of the invention discloses a method and a device for correcting voice recognition, wherein the method comprises the following steps: determining the current application scene of the user according to the detection data of the set detection equipment; performing voice recognition on the detected sound in the current application scene; performing deep learning on the linguistic data obtained by voice recognition based on the deep learning model corresponding to the current application scene to obtain a learning result; and correcting the voice recognition result according to the learning result. The embodiment of the invention can meet the requirement of speech recognition of specific application scenes, has pertinence to perform speech recognition on each application scene, greatly improves the accuracy of the speech recognition, further promotes man-machine interaction, and has wide application range.

Description

Correction method and device for voice recognition
Technical Field
The present invention relates to speech processing technologies, and in particular, to a method and an apparatus for correcting speech recognition.
Background
With the development of science and technology, human beings have entered the era of artificial intelligence, which is used to extend the intelligence and ability of human beings, simulate the thinking process and intelligent behavior of human beings, and make machines capable of performing complex work that usually needs human intelligence to complete. One of the important branches of artificial intelligence comprises voice recognition, word translation and voice synthesis, wherein the voice recognition technology is that a machine converts an input voice signal into a corresponding text through a recognition and understanding process to realize the communication between a human and the machine; the word translation technology is to translate the words recognized by the voice into sentences according to correct grammar; text To Speech (TTS) is a technique of converting Text information generated by a machine or input from the outside into speech similar to human expression and outputting the speech.
At present, speech recognition technologies developed by companies such as science news, microsoft and google are calculated based on a large data platform with huge cloud data processing capacity, the data volume has the characteristic of being large and wide, man-machine language interaction can be basically realized, and recognition and translation of specific application sentences under specific application scenes are often not accurate enough.
In the prior art, a correction set is obtained by filtering step by using a statistical or machine learning method. However, this method is not accurate because the process of correcting the input of each user is substantially the same due to lack of pertinence. For example, when the voice "lihua" of different users is received, the corresponding text obtained by the initial recognition is "li hua", and the corresponding text may be corrected to "pear flower", "physicochemical", or "fireworks display", that is, the correction result is not obtained more specifically according to different application scenarios.
Disclosure of Invention
The embodiment of the invention provides a method and a device for correcting voice recognition, which aim to solve the problem of inaccurate correction of a voice recognition result in the prior art.
In a first aspect, an embodiment of the present invention provides a method for correcting speech recognition, including:
determining the current application scene of the user according to the detection data of the set detection equipment;
performing voice recognition on the detected sound in the current application scene;
performing deep learning on the linguistic data obtained by voice recognition based on the deep learning model corresponding to the current application scene to obtain a learning result;
and correcting the voice recognition result according to the learning result.
Further, the determining the current application scenario where the user is located according to the detection data of the setting detection device includes at least one of the following:
performing voice recognition on the detected sound, and judging the application scene corresponding to the corpus to which the corpus belongs by the voice recognition;
detecting the position of the mobile terminal through a positioning module, and acquiring the current application scene of a user;
detecting the characteristics of the application scene through the Bluetooth digital signal processing equipment, and determining the current application scene according to the characteristics.
Further, before determining the current application scenario where the user is located according to the detection data of the setting detection device, the method further includes:
clustering a corpus under each application scene by using a clustering algorithm, and extracting corpus features according to the clustering result;
and training the corpus features, and creating deep learning models corresponding to each application scene.
Further, the correcting the result of the speech recognition according to the learning result includes:
and if the learning result is that the voice recognition result is not matched with the current application scene, correcting the voice recognition result into a corresponding result in the current application scene.
Further, the corpus comprises: stored user-entered corpus, screened corpus, and/or corpus obtained by correcting speech recognition results.
In a second aspect, an embodiment of the present invention further provides a device for correcting speech recognition, including:
the scene determining module is used for determining the current application scene of the user according to the detection data of the set detection equipment;
the voice recognition module is used for performing voice recognition on the detected sound in the current application scene;
the deep learning module is used for carrying out deep learning on the linguistic data obtained by voice recognition based on a deep learning model corresponding to the current application scene to obtain a learning result;
and the correction module is used for correcting the voice recognition result according to the learning result.
Further, the scene determination module includes:
the first determining unit is used for carrying out voice recognition on the detected sound and judging the application scene corresponding to the corpus to which the corpus belongs by the voice recognition;
the second determining unit is used for detecting the position of the mobile terminal through the positioning module and acquiring the current application scene of the user;
and the third determining unit is used for detecting the characteristics of the application scene through the Bluetooth digital signal processing equipment and determining the current application scene according to the characteristics.
Further, the apparatus further comprises:
the characteristic extraction unit is used for grouping the corpus under each application scene by using a clustering algorithm and extracting corpus characteristics according to the grouping result;
and the model creating unit is used for training the corpus features and creating deep learning models corresponding to each application scene.
Further, the correction module includes:
and the correcting unit is used for correcting the voice recognition result into a corresponding result in the current application scene if the learning result is that the voice recognition result is not matched with the current application scene.
Further, the corpus comprises:
stored user-entered corpus, screened corpus, and/or corpus obtained by correcting speech recognition results.
The embodiment of the invention provides a method and a device for correcting voice recognition, which are used for determining a current application scene by acquiring detection data, deeply learning linguistic data obtained by voice recognition in a deep learning model corresponding to the current application scene, correcting a voice recognition result unmatched with the current application scene, and replacing the voice recognition result unmatched with a correct character translation result, so that the requirements of voice recognition of a specific application scene can be met, voice recognition is performed on each application scene in a targeted manner, the accuracy of voice recognition is greatly improved, human-computer interaction is further promoted, people and machines can effectively communicate, the user experience is improved, and the application range is wide.
Drawings
FIG. 1 is a flowchart illustrating a method for correcting speech recognition according to a first embodiment of the present invention;
FIG. 2 is a flowchart of a method for correcting speech recognition according to a second embodiment of the present invention;
FIG. 3a is a flowchart of a method for correcting speech recognition according to a third embodiment of the present invention;
FIG. 3b is a diagram illustrating a method for correcting speech recognition according to a third embodiment of the present invention;
FIG. 4 is a flowchart of a method for correcting speech recognition according to a fourth embodiment of the present invention;
fig. 5 is a schematic structural diagram of a speech recognition correction apparatus according to a fifth embodiment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention. It should be further noted that, for the convenience of description, only some of the structures related to the present invention are shown in the drawings, not all of the structures.
Example one
Fig. 1 is a flowchart of a method for correcting speech recognition according to an embodiment of the present invention, where the embodiment is applicable to a case where a result of speech recognition is corrected according to a current application scenario, and the method may be executed by a speech recognition correction apparatus, which may be implemented in a software and/or hardware manner and is generally integrated in a device with a speech recognition function.
The method of the first embodiment of the invention specifically comprises the following steps:
s101, determining the current application scene of the user according to the detection data of the setting detection device.
Chinese language is profound, and voice recognition of Chinese language is difficult to perform even if only one voice tone is different, even if the voice tones are completely the same, the meanings to be expressed are completely different, so that the current application scene where a user is located needs to be detected, and the linguistic data under the specific application scene used by the user is recognized and judged according to different application scenes, so that the final result of voice recognition is more accurate. The current application environment can be detected by using the setting detection device, so that the current application scene where the user is located is determined.
And S102, performing voice recognition on the detected sound in the current application scene.
Specifically, after the current application scene where the user is located is determined, voice recognition is performed on the detected sound, and a voice recognition result, that is, a corpus obtained through the voice recognition, is obtained.
S103, deep learning is carried out on the linguistic data obtained by voice recognition based on the deep learning model corresponding to the current application scene, and a learning result is obtained.
Specifically, a deep learning model corresponding to each application scene is created, a neural network simulating human brain for analysis and learning is established, deep learning and analysis including semantics, voice, intonation, context, grammar and the like are performed on the linguistic data obtained by voice recognition, whether the initial result of the voice recognition is matched with the current application scene or not is judged, and whether the linguistic data obtained by the voice recognition is accurate or not is judged.
And S104, correcting the voice recognition result according to the learning result.
Specifically, after deep learning, if the corpus obtained by speech recognition is inaccurate, the result of speech recognition is corrected, the speech recognition result is translated into correct characters, and the previous speech recognition result is replaced.
In this embodiment, a current application scenario in which a user is located is first determined, and a corpus obtained by speech recognition is deeply learned in combination with the current application scenario, and if the corpus obtained by speech recognition is inaccurate, a result of the speech recognition is corrected according to the current application scenario according to a result of the deep learning. For example: the language input by the user is 'programmer writes codes in front of a computer', the recognition result of the big data speech engine is 'programmer writes capitals in front of the computer' possibly due to reasons of nonstandard accents, too fast speed and the like sent by the user, the current application scene can be determined as the working scene of the programmer according to words such as 'programmer' and 'computer', and the 'writemen' is corrected into 'written codes' by deep learning of the recognition result of the big data speech engine in a deep learning model, so that a correct speech recognition result is obtained.
The voice recognition correction method provided by the embodiment of the invention can meet the voice recognition requirements of specific application scenes, can perform voice recognition on each application scene in a targeted manner, greatly improves the accuracy of voice recognition, further promotes man-machine interaction, enables people to effectively communicate with machines, improves the user experience, and has a wide application range.
Example two
Fig. 2 is a flowchart of a speech recognition correction method according to a second embodiment of the present invention, where the second embodiment of the present invention is optimized based on the first embodiment, specifically, the operation of determining the current application scenario where the user is located according to the detection data of the detection device is further optimized, as shown in fig. 2, the second embodiment of the present invention specifically includes:
s201, carrying out voice recognition on the detected sound, and judging an application scene corresponding to a corpus to which the corpus belongs by the voice recognition.
Specifically, a corpus set having a mapping relation with each application scene is collected and stored, the corpus set is a set of all collected corpora, voice recognition is performed on detected sounds according to the corpora input by a user, the detected sounds are compared with the content of the corpus set, and a current application scene corresponding to the corpus set to which the corpus belongs is found and judged through the voice recognition. The mapping relation between the keywords and the application scenes can be established by collecting the keywords of the specific application scenes. For example, all linguistic data such as common phrases and menu names of restaurant scenes are collected, and a mapping relation between the linguistic data and the restaurant application scenes is established.
S202, detecting the position of the mobile terminal through a positioning module, and acquiring the current application scene of the user.
Specifically, the position of the user can be detected through a module with a positioning function in the mobile terminal used by the user, and the current application scene where the user is located is determined according to the detection result. The module with the Positioning function can adopt a Global Positioning System (GPS), a bluetooth Positioning technology, a Positioning method such as connecting mobile data traffic or wireless local area network to position the current application scene through map software, and the like.
S203, detecting the characteristics of the application scene through the Bluetooth digital signal processing equipment, and determining the current application scene according to the characteristics.
Specifically, a sensor in the bluetooth digital signal processing device is used to collect a current application scene signal, and characteristics of an application scene are detected according to the collected signal, for example, the temperature sensor may detect the temperature of the environment to determine whether the environment is an indoor environment or an outdoor environment, so as to determine the current application scene where the user is located.
In this embodiment, a global positioning system may be used to locate a position where a user is located, for example: if the user is located in a restaurant, the current application scene can be judged to be the restaurant, and the voice recognition result is related to the restaurant scene.
It should be noted that the three methods are used for determining the current application scenario, and any one or any two or all of the methods may be selected to determine the current application scenario according to the actual application situation.
And S204, performing voice recognition on the detected sound in the current application scene.
S205, deep learning is carried out on the linguistic data obtained by voice recognition based on the deep learning model corresponding to the current application scene, and a learning result is obtained.
And S206, correcting the voice recognition result according to the learning result.
The correction method for voice recognition provided by the embodiment of the invention can accurately acquire the current application scene where the user is located, and performs voice recognition according to the current application scene in a targeted manner, so that the accuracy of voice recognition is improved, and the actual interactive experience between the user and a product is improved.
EXAMPLE III
Fig. 3a is a flowchart of a correction method for speech recognition according to a third embodiment of the present invention, which is optimized and improved based on the above embodiments, and further illustrates an operation before determining a current application scenario where a user is located according to detection data of a set detection device, as shown in fig. 3a, the method according to the third embodiment of the present invention specifically includes:
s301, clustering the corpus under each application scene by using a clustering algorithm, and extracting corpus features according to the clustering result.
Preferably, the corpus comprises: stored user-entered corpus, screened corpus, and/or corpus obtained by correcting speech recognition results.
Specifically, the corpus is used as basic data in the deep learning model, and may be stored corpus input by a user, and/or corpus screened by a professional speech technologist according to various topics, and/or corpus obtained by performing speech synthesis on a speech recognition result, and analyzing and correcting the speech synthesis result. And (4) grouping the corpus by using clustering algorithms such as a partition method or a hierarchy method and the like, and extracting the characteristics of each group of corpora.
S302, training the corpus features, and creating deep learning models corresponding to the application scenes.
Specifically, a corpus is input into the model, the characteristics of the corpus are trained through a neural network, the thinking mode of the human brain is simulated, and a deep learning model for each application scene is created. And judging the accuracy of the voice recognition result of each corpus by combining the application scene.
And S303, determining the current application scene of the user according to the detection data of the setting detection device.
S304, performing voice recognition on the detected sound in the current application scene.
S305, deep learning is carried out on the linguistic data obtained by voice recognition based on the deep learning model corresponding to the current application scene, and a learning result is obtained.
S306, correcting the voice recognition result according to the learning result.
In this embodiment, fig. 3b is a schematic diagram of a correction method for speech recognition according to a third embodiment of the present invention, and referring to fig. 3b, a current geographic location of a user may be obtained through a positioning function of a mobile terminal used by the user, a bluetooth digital signal processing device, and a matching application scenario in which an input corpus is searched, so as to determine a current application scenario in which the user is located. And inputting the stored user linguistic data, the classified linguistic data provided by the voice technologist and the linguistic data corrected by the voice synthesis result into a model for training, and creating a deep learning model corresponding to each application scene. Inputting the result of the speech recognition of the big data speech engine into the deep learning model, correcting the result of the speech recognition according to the current application scene, predicting error-prone points, correcting the result of the wrong speech recognition, and replacing the original wrong translation result with the correct translation result.
According to the correction method for voice recognition provided by the third embodiment of the invention, the current application scene recognition is more accurate by creating the deep learning model, so that the accuracy of the voice recognition result is judged, the inaccurate voice recognition result is corrected, and the accuracy of the voice recognition is improved.
Example four
Fig. 4 is a flowchart of a speech recognition correction method according to a fourth embodiment of the present invention, which is optimized and improved based on the foregoing embodiments, and further describes an operation of correcting a speech recognition result according to the learning result, as shown in fig. 4, the method according to the fourth embodiment of the present invention specifically includes:
s401, determining the current application scene of the user according to the detection data of the setting detection device.
S402, performing voice recognition on the detected sound in the current application scene.
And S403, performing deep learning on the linguistic data obtained by voice recognition based on the deep learning model corresponding to the current application scene to obtain a learning result.
S404, if the learning result is that the voice recognition result is not matched with the current application scene, correcting the voice recognition result into a corresponding result in the current application scene.
Specifically, whether the result of the voice recognition output by the big data voice engine is matched with the current application scene is verified, if not, the result of the voice recognition is corrected to be the result matched with the current application scene, and is translated into correct characters to replace the original error result.
The voice recognition correction method provided by the fourth embodiment of the invention corrects the voice recognition result which is not matched with the application scene, improves the accuracy of voice recognition and translation in the specific application scene, and optimizes the system logic.
EXAMPLE five
Fig. 5 is a schematic structural diagram of a speech recognition correction apparatus according to a fifth embodiment of the present invention, which is applied to correct a speech recognition result that does not match with an application scenario. As shown in fig. 5, the apparatus includes: a scene determination module 501, a speech recognition module 502, a deep learning module 503, and a correction module 504.
A scene determining module 501, configured to determine a current application scene where a user is located according to detection data of a set detection device;
a speech recognition module 502, configured to perform speech recognition on the detected sound in the current application scenario;
a deep learning module 503, configured to perform deep learning on the corpus obtained by speech recognition based on the deep learning model corresponding to the current application scenario, and obtain a learning result;
and a correcting module 504, configured to correct the result of speech recognition according to the learning result.
The embodiment of the invention determines the current application scene by acquiring the detection data, deeply learns the corpus obtained by voice recognition in the deep learning model corresponding to the current application scene, corrects the voice recognition result which is not matched with the current application scene, and replaces the voice recognition result with the correct character translation result, so that the requirement of voice recognition of the specific application scene can be met, voice recognition of each application scene is pertinently carried out, the accuracy of voice recognition is greatly improved, human-computer interaction is further promoted, people and machines can effectively communicate, the user experience is improved, and the application range is wide.
On the basis of the foregoing embodiments, the scene determining module 501 may include:
the first determining unit is used for carrying out voice recognition on the detected sound and judging the application scene corresponding to the corpus to which the corpus belongs by the voice recognition;
the second determining unit is used for detecting the position of the mobile terminal through the positioning module and acquiring the current application scene of the user;
and the third determining unit is used for detecting the characteristics of the application scene through the Bluetooth digital signal processing equipment and determining the current application scene according to the characteristics.
On the basis of the above embodiments, the apparatus may further include:
the characteristic extraction unit is used for grouping the corpus under each application scene by using a clustering algorithm and extracting corpus characteristics according to the grouping result;
and the model creating unit is used for training the corpus features and creating deep learning models corresponding to each application scene.
On the basis of the foregoing embodiments, the correction module 504 may include:
and the correcting unit is used for correcting the voice recognition result into a corresponding result in the current application scene if the learning result is that the voice recognition result is not matched with the current application scene.
On the basis of the foregoing embodiments, the corpus may include:
stored user-entered corpus, screened corpus, and/or corpus obtained by correcting speech recognition results.
In this embodiment, the current application scene in which the user is located is determined in the scene determination module by the method of searching for the application scene matched with the input corpus through the first determination unit, locating the geographic position of the user through the second determination unit, and detecting the application scene characteristics through the third determination unit, and the sound detected in the current application scene is identified in the voice recognition module to obtain the recognition result. The method comprises the steps that stored linguistic data input by a user and/or professional voice technicians are screened out according to various topics, and/or voice recognition results are subjected to voice synthesis, the linguistic data obtained by analyzing and correcting the voice synthesis results are input into a model as basic data of a corpus to be trained, deep learning models corresponding to application scenes are created, deep learning is conducted on the linguistic data obtained by voice recognition based on the deep learning models corresponding to the current application scenes in a deep learning module, if the learning results are that the voice recognition results are not matched with the current application scenes, the voice recognition results are corrected in a correction unit of the correction module, translated into correct characters, and original translation results are replaced.
The voice recognition correcting device provided by the fifth embodiment of the invention improves the accuracy of voice recognition, promotes effective communication of man-machine interaction, improves the logic of a voice recognition system and has a wide application range.
The voice recognition correction device provided by the embodiment of the invention can execute the method for correcting the voice recognition provided by any embodiment of the invention, and has the corresponding functional modules and beneficial effects of the execution method.
It is to be noted that the foregoing is only illustrative of the preferred embodiments of the present invention and the technical principles employed. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, although the present invention has been described in greater detail by the above embodiments, the present invention is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the present invention, and the scope of the present invention is determined by the scope of the appended claims.

Claims (6)

1. A method for correcting speech recognition, comprising:
determining a current application scene where a user is located according to detection data of a setting detection device, wherein the determining the current application scene where the user is located according to the detection data of the setting detection device comprises: performing voice recognition on the detected sound, judging the application scene corresponding to the corpus to which the corpus belongs by the voice recognition, and detecting the position of the mobile terminal through a positioning module to acquire the current application scene of the user; detecting the characteristics of an application scene through Bluetooth digital signal processing equipment, and determining the current application scene according to the characteristics;
performing voice recognition on the detected sound in the current application scene;
performing deep learning on the linguistic data obtained by voice recognition based on the deep learning model corresponding to the current application scene to obtain a learning result;
correcting the result of the voice recognition according to the learning result;
the correcting the result of the voice recognition according to the learning result comprises:
and if the learning result is that the voice recognition result is not matched with the current application scene, correcting the voice recognition result into a corresponding result in the current application scene.
2. The method according to claim 1, wherein before determining the current application scenario where the user is located according to the detection data of the setting detection device, the method further comprises:
clustering a corpus under each application scene by using a clustering algorithm, and extracting corpus features according to the clustering result;
and training the corpus features, and creating deep learning models corresponding to each application scene.
3. The method of claim 2, wherein the corpus comprises: stored user-entered corpus, screened corpus, and/or corpus obtained by correcting speech recognition results.
4. A correction device for speech recognition, comprising:
a scene determining module, configured to determine a current application scene where a user is located according to detection data of a set detection device, where the scene determining module includes: a first determining unit, configured to perform speech recognition on the detected sound, and determine an application scenario corresponding to a corpus to which the corpus belongs by the speech recognition, where the scenario determining module further includes: the second determining unit is used for detecting the position of the mobile terminal through the positioning module and acquiring the current application scene of the user; the third determining unit is used for detecting the characteristics of the application scene through the Bluetooth digital signal processing equipment and determining the current application scene according to the characteristics;
the voice recognition module is used for performing voice recognition on the detected sound in the current application scene;
the deep learning module is used for carrying out deep learning on the linguistic data obtained by voice recognition based on a deep learning model corresponding to the current application scene to obtain a learning result;
the correction module is used for correcting the result of the voice recognition according to the learning result;
the correction module includes:
and the correcting unit is used for correcting the voice recognition result into a corresponding result in the current application scene if the learning result is that the voice recognition result is not matched with the current application scene.
5. The apparatus of claim 4, further comprising:
the characteristic extraction unit is used for grouping the corpus under each application scene by using a clustering algorithm and extracting corpus characteristics according to the grouping result;
and the model creating unit is used for training the corpus features and creating deep learning models corresponding to each application scene.
6. The apparatus of claim 5, wherein the corpus comprises:
stored user-entered corpus, screened corpus, and/or corpus obtained by correcting speech recognition results.
CN201710291330.4A 2017-04-28 2017-04-28 Correction method and device for voice recognition Active CN106875949B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710291330.4A CN106875949B (en) 2017-04-28 2017-04-28 Correction method and device for voice recognition

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710291330.4A CN106875949B (en) 2017-04-28 2017-04-28 Correction method and device for voice recognition

Publications (2)

Publication Number Publication Date
CN106875949A CN106875949A (en) 2017-06-20
CN106875949B true CN106875949B (en) 2020-09-22

Family

ID=59161656

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710291330.4A Active CN106875949B (en) 2017-04-28 2017-04-28 Correction method and device for voice recognition

Country Status (1)

Country Link
CN (1) CN106875949B (en)

Families Citing this family (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107293296B (en) * 2017-06-28 2020-11-20 百度在线网络技术(北京)有限公司 Voice recognition result correction method, device, equipment and storage medium
CN107680600B (en) * 2017-09-11 2019-03-19 平安科技(深圳)有限公司 Sound-groove model training method, audio recognition method, device, equipment and medium
CN108831505B (en) * 2018-05-30 2020-01-21 百度在线网络技术(北京)有限公司 Method and device for identifying use scenes of application
CN109104534A (en) * 2018-10-22 2018-12-28 北京智合大方科技有限公司 A kind of system for improving outgoing call robot and being intended to Detection accuracy, recall rate
CN109410913B (en) * 2018-12-13 2022-08-05 百度在线网络技术(北京)有限公司 Voice synthesis method, device, equipment and storage medium
CN111368145A (en) * 2018-12-26 2020-07-03 沈阳新松机器人自动化股份有限公司 Knowledge graph creating method and system and terminal equipment
CN111951626A (en) * 2019-05-16 2020-11-17 上海流利说信息技术有限公司 Language learning apparatus, method, medium, and computing device
CN110544234A (en) * 2019-07-30 2019-12-06 北京达佳互联信息技术有限公司 Image noise detection method, image noise detection device, electronic equipment and storage medium
CN110556127B (en) * 2019-09-24 2021-01-01 北京声智科技有限公司 Method, device, equipment and medium for detecting voice recognition result
CN111104546B (en) * 2019-12-03 2021-08-27 珠海格力电器股份有限公司 Method and device for constructing corpus, computing equipment and storage medium
CN113660501A (en) * 2021-08-11 2021-11-16 云知声(上海)智能科技有限公司 Method and device for matching subtitles
CN114155841B (en) * 2021-11-15 2025-06-10 安徽听见科技有限公司 Speech recognition method, device, equipment and storage medium
CN114842857A (en) * 2022-03-25 2022-08-02 阿里巴巴(中国)有限公司 Speech processing method, apparatus, system, device and storage medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105447019A (en) * 2014-08-20 2016-03-30 北京羽扇智信息科技有限公司 User usage scene based input identification result calibration method and system
CN105448292A (en) * 2014-08-19 2016-03-30 北京羽扇智信息科技有限公司 Scene-based real-time voice recognition system and method

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5524169A (en) * 1993-12-30 1996-06-04 International Business Machines Incorporated Method and system for location-specific speech recognition
CN1207664C (en) * 1999-07-27 2005-06-22 国际商业机器公司 Error correcting method for voice identification result and voice identification system
US7200555B1 (en) * 2000-07-05 2007-04-03 International Business Machines Corporation Speech recognition correction for devices having limited or no display
ATE311650T1 (en) * 2001-09-17 2005-12-15 Koninkl Philips Electronics Nv CORRECTION OF A TEXT RECOGNIZED BY A VOICE RECOGNITION BY COMPARING THE PHONE SEQUENCES OF THE RECOGNIZED TEXT WITH A PHONETIC TRANSCRIPTION OF A MANUALLY ENTRED CORRECTION WORD
CN102324233B (en) * 2011-08-03 2014-05-07 中国科学院计算技术研究所 Method for automatically correcting identification error of repeated words in Chinese pronunciation identification
CN103903619B (en) * 2012-12-28 2016-12-28 科大讯飞股份有限公司 A kind of method and system improving speech recognition accuracy
CN103645876B (en) * 2013-12-06 2017-01-18 百度在线网络技术(北京)有限公司 Voice inputting method and device
CN105786880A (en) * 2014-12-24 2016-07-20 中兴通讯股份有限公司 Voice recognition method, client and terminal device

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105448292A (en) * 2014-08-19 2016-03-30 北京羽扇智信息科技有限公司 Scene-based real-time voice recognition system and method
CN105447019A (en) * 2014-08-20 2016-03-30 北京羽扇智信息科技有限公司 User usage scene based input identification result calibration method and system

Also Published As

Publication number Publication date
CN106875949A (en) 2017-06-20

Similar Documents

Publication Publication Date Title
CN106875949B (en) Correction method and device for voice recognition
US11322153B2 (en) Conversation interaction method, apparatus and computer readable storage medium
CN107240398B (en) Intelligent voice interaction method and device
CN103956169B (en) A kind of pronunciation inputting method, device and system
CN103065630B (en) User personalized information voice recognition method and user personalized information voice recognition system
CN111341305B (en) Audio data labeling method, device and system
EP3153978B1 (en) Address search method and device
CN105895103B (en) Voice recognition method and device
CN111292751B (en) Semantic analysis method and device, voice interaction method and device, and electronic equipment
CN110797010A (en) Question-answer scoring method, device, equipment and storage medium based on artificial intelligence
CN110415679A (en) Speech error correction method, device, equipment and storage medium
CN111488468B (en) Geographic information knowledge point extraction method and device, storage medium and computer equipment
JP2005084681A (en) Method and system for semantic language modeling and reliability measurement
WO2021103712A1 (en) Neural network-based voice keyword detection method and device, and system
CN109256125B (en) Off-line voice recognition method and device and storage medium
CN104407834A (en) Message input method and device
CN106570180A (en) Artificial intelligence based voice searching method and device
CN109213856A (en) Semantic recognition method and system
CN111128181B (en) Recitation question evaluating method, recitation question evaluating device and recitation question evaluating equipment
CN103632668B (en) A kind of method and apparatus for training English speech model based on Chinese voice information
CN106649253B (en) Auxiliary control method and system based on rear verifying
CN112069833B (en) Log analysis method, log analysis device and electronic equipment
CN109710949A (en) A kind of interpretation method and translator
KR102017229B1 (en) A text sentence automatic generating system based deep learning for improving infinity of speech pattern
CN111916062A (en) Voice recognition method, device and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant