[go: up one dir, main page]

WO2018121275A1 - Method and apparatus for error connection of voice recognition in smart hardware device - Google Patents

Method and apparatus for error connection of voice recognition in smart hardware device Download PDF

Info

Publication number
WO2018121275A1
WO2018121275A1 PCT/CN2017/116165 CN2017116165W WO2018121275A1 WO 2018121275 A1 WO2018121275 A1 WO 2018121275A1 CN 2017116165 W CN2017116165 W CN 2017116165W WO 2018121275 A1 WO2018121275 A1 WO 2018121275A1
Authority
WO
WIPO (PCT)
Prior art keywords
keyword
candidate
keywords
score
text information
Prior art date
Application number
PCT/CN2017/116165
Other languages
French (fr)
Chinese (zh)
Inventor
杨英
张倩倩
Original Assignee
北京奇虎科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京奇虎科技有限公司 filed Critical 北京奇虎科技有限公司
Publication of WO2018121275A1 publication Critical patent/WO2018121275A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/10Speech classification or search using distance or distortion measures between unknown speech and reference templates
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • G10L15/1815Semantic context, e.g. disambiguation of the recognition hypotheses based on word meaning
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • G10L15/1822Parsing for meaning understanding
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L2015/086Recognition of spelled words
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L2015/088Word spotting

Definitions

  • the present invention relates to the field of voice recognition technology, and in particular, to a voice recognition error correction method and apparatus in an intelligent hardware device.
  • the process of semantic analysis relies on the accuracy of speech recognition, and the accuracy of speech recognition is difficult to reach 100%. For example, when the user is a child, the unclearness of the speech makes various errors in speech recognition.
  • the present invention has been made in order to provide a speech recognition error correction method and apparatus in an intelligent hardware device that overcomes the above problems or at least partially solves the above problems.
  • a voice recognition error correction method in an intelligent hardware device including:
  • the error correction processing is performed on the keywords in the text information according to the selected one or more candidate words.
  • a speech recognition error correction apparatus in an intelligent hardware device including:
  • a voice recognition unit configured to convert a voice signal received by the smart hardware device into text information by using a voice recognition technology
  • a keyword extracting unit adapted to extract a keyword from the text information
  • a matching unit configured to match the extracted keyword with a vocabulary related to the intelligent hardware service, and select one or more candidate words matching the keyword from the vocabulary;
  • the error correction unit is adapted to perform error correction processing on the keywords in the text information according to the selected one or more candidate words.
  • a computer program comprising computer readable code, when said computer readable code is run on a computing device, causing said computing device to perform an intelligent hardware device as described above Speech recognition error correction method.
  • a computer readable medium storing a computer program as described above is provided.
  • the technical solution of the present invention first uses voice recognition technology to perform voice recognition on a voice signal received by an intelligent hardware device, converts it into text information, and further analyzes the text information to obtain a plurality of keywords, and these The keywords are matched by a vocabulary related to the intelligent hardware service, one or more candidate words are determined, and finally the keywords are corrected using the obtained candidate words.
  • the technical solution fully considers the functional characteristics of the intelligent hardware, and uses the preset business-related vocabulary to intelligently correct the keywords parsed in the speech recognition result, which significantly improves the accuracy of the speech recognition and consumes less resources. In line with the low energy consumption requirements of intelligent hardware devices.
  • FIG. 1 is a flow chart showing a voice recognition error correction method in an intelligent hardware device according to an embodiment of the present invention
  • FIG. 2 is a schematic structural diagram of a speech recognition error correction apparatus in an intelligent hardware device according to an embodiment of the present invention
  • Figure 3 schematically shows a block diagram of a computing device for performing the method according to the invention
  • Fig. 4 schematically shows a storage unit for holding or carrying program code implementing the method according to the invention.
  • FIG. 1 is a schematic flowchart of a voice recognition error correction method in an intelligent hardware device according to an embodiment of the present invention. As shown in FIG. 1, the method includes:
  • Step S110 Convert the voice signal received by the smart hardware device into text information by using a voice recognition technology.
  • the smart hardware device can be a smart phone, a smart watch, an intelligent robot, and the like.
  • Speech recognition technology is a technology that is gradually being improved and put into use.
  • the Siri function in Apple's mobile phone can realize the voice recognition of the user's voice to perform functions such as opening the camera and map navigation in the Apple mobile phone.
  • step S120 keywords are extracted from the text information.
  • the user wants to use the storytelling function in the children's watch to send out the voice signal "I want to hear the story - Little Red Riding Hood", then the "story” is the keyword corresponding to the function, and “Little Red Riding Hood” is with this function.
  • the corresponding keywords of the story classification. In other words, the keywords are related to the intelligent hardware business. So continue with the following steps:
  • Step S130 matching the extracted keywords with the vocabulary related to the intelligent hardware service, and selecting one or more candidate words matching the keywords from the vocabulary.
  • the user's pronunciation may not be standard, which may cause the sound to be unsatisfactory.
  • the speech recognition technology converts the voice signal.
  • the text message may be "Shaw Red Hat.”
  • the word does not match the "Little Red Riding Hood” completely, but the similarity between the two is very high. It can be concluded from the manual judgment that the child actually wants to express "Little Red Riding Hood”. But in fact, the text message into which the speech signal is converted can blur the matching of the keywords. For example, there may be a story of "Little Red Cat" in the story library. Then for a keyword, the candidate words selected from the vocabulary may be one or more.
  • Step S140 Perform error correction processing on the keywords in the text information according to the selected one or more candidate words.
  • the method shown in FIG. 1 firstly uses the speech recognition technology to perform speech recognition on the speech signal received by the intelligent hardware device, converts it into text information, and further analyzes the text information to obtain a plurality of keywords, and these keys are The words are matched by a vocabulary related to the intelligent hardware service, one or more candidate words are determined, and finally the keywords are corrected using the obtained candidate words.
  • the technical solution fully considers the functional characteristics of the intelligent hardware, and uses the preset business-related vocabulary to intelligently correct the keywords parsed in the speech recognition result, which significantly improves the accuracy of the speech recognition and consumes less resources. In line with the low energy consumption requirements of intelligent hardware devices.
  • the method shown in FIG. 1 further includes: presetting one or more fixed sentence patterns associated with the business voice interaction of the smart hardware device; marking in each fixed sentence pattern Position of the keyword; extracting the keyword from the text information includes: matching the text information with one or more fixed sentence patterns; extracting from the corresponding position of the text information according to the position of the keyword marked in the matched fixed sentence pattern Key words.
  • "I want to listen to the story - Xiao Hong Cap” is a fixed sentence style, which can be summarized as “I want to hear the story XXX", wherein "XXX” corresponds to a keyword.
  • the semantics of the sentence can also be expressed as: “I want to hear the story of Xiao Hong Cap”, then the fixed sentence is "I want to hear the story of XXX.”
  • the user can use the phrase "I want to listen to XXX's XXX", for example, "I want to listen to Andy Lau's forgotten water", and so on.
  • the method further includes: marking type information for each keyword in the fixed sentence; determining type information of the vocabulary related to the intelligent hardware service; and extracting the extracted keyword and the intelligent hardware
  • the matching of the vocabulary related to the service includes: determining the type information of the extracted keyword according to the type information of the keyword in the matched fixed sentence pattern, and extracting the extracted keyword and type according to the type information of the extracted keyword. Match the vocabulary to match.
  • the vocabulary related to the business can be the name of the story; for the business function of “song playing”, the vocabulary related to the business can be the name of the song, the style of the song, the singer Name and so on.
  • the position of each keyword in the fixed sentence pattern can be determined. Then, by using the type information corresponding to the keyword of each position, it can be determined which service-related vocabulary should be used to match the candidate words. .
  • the true semantics of the keywords can be determined according to the candidate words. For example, the story that a child wants to hear is the story of "Little Red Cat” or the story of "Little Red Riding Hood". In an embodiment of the present invention, in the method shown in FIG.
  • the error correction processing on the keywords in the text information according to the selected one or more candidate words includes: matching the selected keywords with the keywords Each candidate word is scored according to the matching degree of the extracted keyword and the candidate word; if the score of the highest score candidate word of the keyword is higher than or equal to the first confidence value, the highest score candidate is used The word corrects the keyword; if the score of the highest score candidate of the keyword is higher than the second confidence value but lower than the first confidence value, a further voice dialogue is performed with the user to confirm whether the highest score candidate is needed The keyword is corrected; if the score of the highest score candidate of the keyword is lower than or equal to the second confidence value, no correction is made.
  • the candidates for "Red Riding Hood” have the words “Little Red Cat” and “Little Red Riding Hood”.
  • “Red Riding Hood” and “Little Red Riding Hood” have the same two words, only one word is different; “Shaw Red Hat” has the same word as “Little Red Cat”, and the other two words have different voices.
  • the score result may be: “Little Red Riding Hood” has a score of 0.6, and “Little Red Cat” has a score of 0.5.
  • taking the first confidence value of 0.45 as an example, since the scores of the two candidate words are all higher than 0.45, then the candidate word with the highest score, that is, "Little Red Riding Hood” is selected to correct the keyword "Red Hood”.
  • the scoring of the candidate words according to the matching degree between the extracted keywords and the candidate words includes: From high to low, it is divided into three ranges of high, medium and low; if the keyword is the same as the pinyin of the candidate, but the pitch is different, it is scored in the high range; if the keyword is in the pinyin of the candidate If the initial or final part is the same, the score is scored in the middle range; if the key is not the same as the initial and final in the pinyin of the candidate, the score is scored in the low range.
  • the high-end score range is [0.45, 1]
  • the mid-range score is (0.4, 0.45)
  • the low-range score is [0, 0.4). Since “Little Red Cat” and “Little Red Riding Hood” and “Red Hood” belong to “the same pinyin, but the tone is different”, then the high-grade scoring standard is adopted. Let's look at another example: the pronunciation of the initials of some users "n” and “l” is not divided, resulting in the "beef” being said to be “flowing meat”, then the keyword “beef” and the word “flowing meat” The partial initials are different, the finals are the same, then the mid-range scoring standard is adopted.
  • the confidence of the candidate word of the highest score can be determined according to the method in the previous embodiment. For example, if the score for "beef" is 0.44, which is lower than the first confidence value of 0.45 but higher than the second confidence value of 0.4, then you can ask the user: "Do you want to say 'beef'?" to confirm if you need to use "Beef" corrects "flowing meat.” If the score of the highest score candidate of the keyword is lower than or equal to the second confidence value, no correction is made, because even if correction is made at this time, the user's original intention may be deviated.
  • the method further includes: if a plurality of keywords are extracted from the text information, multiplying the scores of the highest score candidates of each keyword to obtain a score of the plurality of keywords; If the scores of the plurality of keywords are higher than or equal to the third confidence value, correcting each keyword with the highest score candidate of each keyword; if the score of the plurality of keywords is higher than the fourth confidence value but lower than the first The three-confidence value is further voiced with the user to confirm whether the highest score candidate of each keyword is needed to correct each keyword; if the score of the plurality of keywords is lower than or equal to the fourth confidence value, no correct.
  • an example of how to calculate a candidate word score when a keyword is plural is given.
  • the score of "Zhang Dehua” is 0.3
  • the pronunciation of the user who "forgets the water” is very standard
  • the method further includes: outputting, according to the result of the correction processing, a corresponding service service of the smart hardware device.
  • the child without performing the technical solution of the present invention, the child said to the intelligent accompanying robot: "Do you tell the story of Xiao Hong Cap?", because "Shaw Red Hat” does not match “Little Red Riding Hood", the child gets The reply is "I will not do this.”
  • the mother re-directed the smart accompanying robot to "tell the story of Little Red Riding Hood", and the intelligent accompanying robot correctly obtained the resources of the story “Little Red Riding Hood” for voicetelling. .
  • FIG. 2 is a schematic structural diagram of a voice recognition error correction apparatus in an intelligent hardware device according to an embodiment of the present invention.
  • the voice recognition error correction apparatus 200 in the smart hardware device includes:
  • the voice recognition unit 210 is adapted to convert the voice signal received by the smart hardware device into text information by using a voice recognition technology.
  • the keyword extracting unit 220 is adapted to extract keywords from the text information.
  • the matching unit 230 is adapted to match the extracted keywords with the vocabulary related to the intelligent hardware service, and select one or more candidate words that match the keywords from the vocabulary.
  • the error correction unit 240 is adapted to perform error correction processing on the keywords in the text information according to the selected one or more candidate words.
  • the device shown in FIG. 2 firstly uses the voice recognition technology to perform voice recognition on the voice signal received by the intelligent hardware device by using the mutual recognition of each unit, converts it into text information, and further analyzes the text information to obtain some of them.
  • the keywords are matched by the vocabulary related to the intelligent hardware service to determine one or more candidate words, and finally the keywords are corrected by using the obtained candidate words.
  • the technical solution fully considers the functional characteristics of the intelligent hardware, and uses the preset business-related vocabulary to intelligently correct the keywords parsed in the speech recognition result, which significantly improves the accuracy of the speech recognition and consumes less resources. In line with the low energy consumption requirements of intelligent hardware devices.
  • the apparatus further includes: a configuration unit, configured to preset one or more fixed sentence patterns associated with the business voice interaction of the smart hardware device; in each fixed sentence pattern Marking the location of the keyword; the keyword extracting unit 220 is adapted to match the text information with one or more fixed sentence patterns; and extracting from the corresponding position of the text information according to the position of the keyword marked in the matched fixed sentence pattern Key words.
  • a configuration unit configured to preset one or more fixed sentence patterns associated with the business voice interaction of the smart hardware device; in each fixed sentence pattern Marking the location of the keyword; the keyword extracting unit 220 is adapted to match the text information with one or more fixed sentence patterns; and extracting from the corresponding position of the text information according to the position of the keyword marked in the matched fixed sentence pattern Key words.
  • the configuration unit is further adapted to mark type information for the keywords in each fixed sentence; determine type information of the vocabulary related to the intelligent hardware service; and the matching unit 230
  • the type information of the extracted keyword is determined according to the type information of the keyword in the matched fixed sentence pattern, and the extracted keyword is matched with the vocabulary matching the type according to the type information of the extracted keyword.
  • the error correcting unit 240 is adapted to select, for each selected candidate word that matches the keyword, according to the matching degree of the extracted keyword and the candidate word, the candidate Word scoring; if the score of the highest score candidate of the keyword is higher than or equal to the first confidence value, correct the keyword with the highest score candidate; if the score of the highest score candidate of the keyword is higher than the second confidence If the value is lower than the first confidence value, a further voice dialogue is performed with the user to confirm whether the keyword needs to be corrected with the highest score candidate; if the score of the highest score candidate of the keyword is lower than or equal to the second confidence Degree value, no correction.
  • the error correction unit 240 is adapted to divide the score from high to low into three ranges of high, medium, and low;
  • the score is scored in the high-end range
  • the keyword is the same as the initial or final part of the pinyin of the candidate word, it is scored in the mid-range range; if the keyword is not the same as the initial or final in the pinyin of the candidate word, it is in the low range hit Minute.
  • the error correction unit 240 is further adapted to multiply the scores of the highest score candidate words of each keyword when the plurality of keywords are extracted from the text information, to obtain the a score of a plurality of keywords; if the scores of the plurality of keywords are higher than or equal to the third confidence value, correcting each keyword with the highest score candidate of each keyword; if the score of the plurality of keywords is higher than the fourth If the confidence value is lower than the third confidence value, a further voice dialogue is performed with the user to confirm whether the highest score candidate of each keyword is needed to correct each keyword; if the scores of the multiple keywords are lower than or equal to the first Four confidence values are not corrected.
  • the apparatus further includes: a service service unit adapted to output a corresponding service service of the smart hardware device according to the result of the correction process.
  • the technical solution of the present invention firstly uses voice recognition technology to perform voice recognition on a voice signal received by an intelligent hardware device, converts it into text information, and further analyzes the text information to obtain a plurality of keywords thereof. These keywords are matched by a vocabulary related to the intelligent hardware service to determine one or more candidate words, and finally the keywords are corrected using the obtained candidate words.
  • the technical solution fully considers the functional characteristics of the intelligent hardware, and uses the preset business-related vocabulary to intelligently correct the keywords parsed in the speech recognition result, which significantly improves the accuracy of the speech recognition and consumes less resources. In line with the low energy consumption requirements of intelligent hardware devices.
  • modules in the devices of the embodiments can be adaptively changed and placed in one or more devices different from the embodiment.
  • the modules or units or components of the embodiments may be combined into one module or unit or component, and further they may be divided into a plurality of sub-modules or sub-units or sub-components.
  • any combination of the features disclosed in the specification, including the accompanying claims, the abstract and the drawings, and any methods so disclosed, or All processes or units of the device are combined.
  • Each feature disclosed in this specification (including the accompanying claims, the abstract and the drawings) may be replaced by alternative features that provide the same, equivalent or similar purpose.
  • the various component embodiments of the present invention may be implemented in hardware, or in a software module running on one or more processors, or in a combination thereof.
  • a microprocessor or digital signal processor may be used in practice to implement some of some or all of the components of the speech recognition error correction device in an intelligent hardware device in accordance with an embodiment of the present invention. Or all features.
  • the invention can also be implemented as a device or device program (e.g., a computer program and a computer program product) for performing some or all of the methods described herein.
  • a program implementing the invention may be stored on a computer readable medium or may be in the form of one or more signals. Such signals may be downloaded from an Internet website, provided on a carrier signal, or provided in any other form.
  • Figure 3 shows a block diagram of a computing device for performing the method in accordance with the present invention.
  • the computing device conventionally includes a processor 310 and a computer program product or computer readable medium in the form of a memory 320.
  • the memory 320 may be an electronic memory such as a flash memory, an EEPROM (Electrically Erasable Programmable Read Only Memory), an EPROM, a hard disk, or a ROM.
  • the memory 320 has a storage space 330 that stores program code 331 for performing any of the method steps described above.
  • the storage space 330 for storing program code may separately store respective program codes 331 for implementing various steps in the above method.
  • the program code can be read from or written to one or more computer program products to the one or more computer programs In the product.
  • These computer program products include program code carriers such as hard disks, compact disks (CDs), memory cards or floppy disks.
  • Such computer program products are typically portable or fixed storage units such as those shown in FIG.
  • the storage unit may have storage segments, storage spaces, and the like that are similarly arranged to memory 320 in the computing device of FIG.
  • the program code can be compressed in an appropriate form.
  • the storage unit stores computer readable program code 331' for performing the steps of the method according to the invention, ie program code readable by a processor such as 310, when the program code is run by the computing device, resulting in The computing device performs the various steps in the methods described above.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Acoustics & Sound (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Artificial Intelligence (AREA)
  • Theoretical Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Machine Translation (AREA)

Abstract

A method and an apparatus for error connection of voice recognition in a smart hardware device. The method comprises: converting a voice signal received by a smart hardware device into text information by means of voice recognition technology (S110); extracting key words from the text information (S120); matching the extracted key words with a word list related to a smart hardware service, and selecting from the word list one or a plurality of candidate terms matching the key words (S130); and, on the basis of the selected one or plurality of candidate terms, performing error correction processing of the key words in the text information (S140). The present method takes the functional performance of the smart hardware into full consideration, and uses a preset service-related word list to perform smart error correction of key words parsed from voice recognition results, significantly increasing the accuracy of voice recognition and occupying few resources, and thereby meeting the requirements for low smart hardware device energy consumption.

Description

一种智能硬件设备中的语音识别纠错方法和装置Speech recognition error correction method and device in intelligent hardware device 技术领域Technical field
本发明涉及语音识别技术领域,具体涉及一种智能硬件设备中的语音识别纠错方法和装置。The present invention relates to the field of voice recognition technology, and in particular, to a voice recognition error correction method and apparatus in an intelligent hardware device.
背景技术Background technique
语音识别技术的发展使得用户与智能硬件设备(如智能手表、手机、行车记录仪)的交互变得更加便捷。下面给出了现有技术中,用户利用语音识别技术与智能硬件设备进行交互的几种示例:The development of speech recognition technology has made it easier for users to interact with smart hardware devices such as smart watches, mobile phones, and driving recorders. Here are some examples of the interaction between the user and the intelligent hardware device using the voice recognition technology in the prior art:
1)通过语音识别技术将用户的指令转化为文字;1) Convert the user's instructions into words by voice recognition technology;
2)通过语义分析技术理解用户意图;2) Understand user intent through semantic analysis techniques;
3)通过语音合成技术将找到的文字资源转化成语音,反馈给用户。3) Convert the found text resources into voice through speech synthesis technology and feed back to the user.
其中,语义分析的过程很依赖语音识别的准确率,而语音识别准确率难以达到100%,例如用户是儿童时,其吐字不清晰的特点会使语音识别出现各种各样的错误。Among them, the process of semantic analysis relies on the accuracy of speech recognition, and the accuracy of speech recognition is difficult to reach 100%. For example, when the user is a child, the unclearness of the speech makes various errors in speech recognition.
发明内容Summary of the invention
鉴于上述问题,提出了本发明以便提供一种克服上述问题或者至少部分地解决上述问题的智能硬件设备中的语音识别纠错方法和装置。In view of the above problems, the present invention has been made in order to provide a speech recognition error correction method and apparatus in an intelligent hardware device that overcomes the above problems or at least partially solves the above problems.
依据本发明的一个方面,提供了一种智能硬件设备中的语音识别纠错方法,包括:According to an aspect of the present invention, a voice recognition error correction method in an intelligent hardware device is provided, including:
将智能硬件设备收到的语音信号通过语音识别技术转换成文字信息;Transmitting the voice signal received by the intelligent hardware device into text information through voice recognition technology;
从所述文字信息中提取关键词;Extracting keywords from the text information;
将所提取的关键词与智能硬件业务相关的词表进行匹配,从词表中选出与所述关键词匹配的一个或多个候选词语;Matching the extracted keywords with a vocabulary related to the intelligent hardware service, and selecting one or more candidate words matching the keyword from the vocabulary;
根据所选出的一个或多个候选词语对所述文字信息中的关键词进行纠错处理。The error correction processing is performed on the keywords in the text information according to the selected one or more candidate words.
依据本发明的另一方面,提供了一种智能硬件设备中的语音识别纠错装置,包括:According to another aspect of the present invention, a speech recognition error correction apparatus in an intelligent hardware device is provided, including:
语音识别单元,适于将智能硬件设备收到的语音信号通过语音识别技术转换成文字信息; a voice recognition unit, configured to convert a voice signal received by the smart hardware device into text information by using a voice recognition technology;
关键词提取单元,适于从所述文字信息中提取关键词;a keyword extracting unit adapted to extract a keyword from the text information;
匹配单元,适于将所提取的关键词与智能硬件业务相关的词表进行匹配,从词表中选出与所述关键词匹配的一个或多个候选词语;a matching unit, configured to match the extracted keyword with a vocabulary related to the intelligent hardware service, and select one or more candidate words matching the keyword from the vocabulary;
纠错单元,适于根据所选出的一个或多个候选词语对所述文字信息中的关键词进行纠错处理。The error correction unit is adapted to perform error correction processing on the keywords in the text information according to the selected one or more candidate words.
根据本发明的一个方面,提供了一种计算机程序,包括计算机可读代码,当所述计算机可读代码在计算设备上运行时,导致所述计算设备执行如上所述的一种智能硬件设备中的语音识别纠错方法。According to an aspect of the invention there is provided a computer program comprising computer readable code, when said computer readable code is run on a computing device, causing said computing device to perform an intelligent hardware device as described above Speech recognition error correction method.
根据本发明的又一个方面,提供了一种计算机可读介质,其中存储了如上所述的计算机程序。According to still another aspect of the present invention, a computer readable medium storing a computer program as described above is provided.
由上述可知,本发明的技术方案,首先利用语音识别技术对智能硬件设备收到的语音信号进行语音识别,将其转换为文字信息,再进一步解析文字信息得到其中的若干个关键词,将这些关键词通过与智能硬件业务相关的词表进行匹配,确定一个或多个候选词语,最后利用得到的候选词语对关键词进行纠错。该技术方案充分考虑了智能硬件的功能特性,利用预设的业务相关词表对语音识别结果中解析出的关键词进行智能纠错,显著提升了语音识别的准确度,并且占用的资源较少,符合智能硬件设备低能耗的需求。It can be seen from the above that the technical solution of the present invention first uses voice recognition technology to perform voice recognition on a voice signal received by an intelligent hardware device, converts it into text information, and further analyzes the text information to obtain a plurality of keywords, and these The keywords are matched by a vocabulary related to the intelligent hardware service, one or more candidate words are determined, and finally the keywords are corrected using the obtained candidate words. The technical solution fully considers the functional characteristics of the intelligent hardware, and uses the preset business-related vocabulary to intelligently correct the keywords parsed in the speech recognition result, which significantly improves the accuracy of the speech recognition and consumes less resources. In line with the low energy consumption requirements of intelligent hardware devices.
上述说明仅是本发明技术方案的概述,为了能够更清楚了解本发明的技术手段,而可依照说明书的内容予以实施,并且为了让本发明的上述和其它目的、特征和优点能够更明显易懂,以下特举本发明的具体实施方式。The above description is only an overview of the technical solutions of the present invention, and the above-described and other objects, features and advantages of the present invention can be more clearly understood. Specific embodiments of the invention are set forth below.
附图说明DRAWINGS
通过阅读下文优选实施方式的详细描述,各种其他的优点和益处对于本领域普通技术人员将变得清楚明了。附图仅用于示出优选实施方式的目的,而并不认为是对本发明的限制。而且在整个附图中,用相同的参考符号表示相同的部件。在附图中:Various other advantages and benefits will become apparent to those skilled in the art from a The drawings are only for the purpose of illustrating the preferred embodiments and are not to be construed as limiting. Throughout the drawings, the same reference numerals are used to refer to the same parts. In the drawing:
图1示出了根据本发明一个实施例的一种智能硬件设备中的语音识别纠错方法的流程示意图;1 is a flow chart showing a voice recognition error correction method in an intelligent hardware device according to an embodiment of the present invention;
图2示出了根据本发明一个实施例的一种智能硬件设备中的语音识别纠错装置的结构示意图; 2 is a schematic structural diagram of a speech recognition error correction apparatus in an intelligent hardware device according to an embodiment of the present invention;
图3示意性地示出了用于执行根据本发明的方法的计算设备的框图;以及Figure 3 schematically shows a block diagram of a computing device for performing the method according to the invention;
图4示意性地示出了用于保持或者携带实现根据本发明的方法的程序代码的存储单元。Fig. 4 schematically shows a storage unit for holding or carrying program code implementing the method according to the invention.
具体实施例Specific embodiment
下面将参照附图更详细地描述本公开的示例性实施例。虽然附图中显示了本公开的示例性实施例,然而应当理解,可以以各种形式实现本公开而不应被这里阐述的实施例所限制。相反,提供这些实施例是为了能够更透彻地理解本公开,并且能够将本公开的范围完整的传达给本领域的技术人员。Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While the embodiments of the present invention have been shown in the drawings, the embodiments Rather, these embodiments are provided so that this disclosure will be more fully understood and the scope of the disclosure will be fully disclosed.
图1示出了根据本发明一个实施例的一种智能硬件设备中的语音识别纠错方法的流程示意图,如图1所示,该方法包括:1 is a schematic flowchart of a voice recognition error correction method in an intelligent hardware device according to an embodiment of the present invention. As shown in FIG. 1, the method includes:
步骤S110,将智能硬件设备收到的语音信号通过语音识别技术转换成文字信息。Step S110: Convert the voice signal received by the smart hardware device into text information by using a voice recognition technology.
其中,智能硬件设备可以是智能手机、智能手表、智能机器人等。语音识别技术是一种正在逐步完善并已投入使用的技术,例如苹果手机中的Siri功能,就可以实现对用户的声音进行语音识别,来执行苹果手机中的打开相机、地图导航等功能。Among them, the smart hardware device can be a smart phone, a smart watch, an intelligent robot, and the like. Speech recognition technology is a technology that is gradually being improved and put into use. For example, the Siri function in Apple's mobile phone can realize the voice recognition of the user's voice to perform functions such as opening the camera and map navigation in the Apple mobile phone.
步骤S120,从文字信息中提取关键词。In step S120, keywords are extracted from the text information.
例如,用户希望使用儿童手表中的讲故事功能,发出了“我想听故事——小红帽”的语音信号,那么“故事”是与该功能对应的关键词,“小红帽”则是与该功能下的故事分类对应的关键词。也就是说,关键词是与智能硬件业务相关的。因此继续执行下述步骤:For example, the user wants to use the storytelling function in the children's watch to send out the voice signal "I want to hear the story - Little Red Riding Hood", then the "story" is the keyword corresponding to the function, and "Little Red Riding Hood" is with this function. The corresponding keywords of the story classification. In other words, the keywords are related to the intelligent hardware business. So continue with the following steps:
步骤S130,将所提取的关键词与智能硬件业务相关的词表进行匹配,从词表中选出与关键词匹配的一个或多个候选词语。Step S130, matching the extracted keywords with the vocabulary related to the intelligent hardware service, and selecting one or more candidate words matching the keywords from the vocabulary.
在实际使用时,用户的发音可能并不标准,这样就导致音不达意,例如,小孩子把“小红帽”说成了“肖红帽”(音),那么语音识别技术将该语音信号转换的文字消息就可能是“肖红帽”。那么显然该词与“小红帽”是不能完全匹配的,但是二者的相似度非常高,通过人工判断可以得出小孩子实际想表达的就是“小红帽”。但是实际上,语音信号所转换成的文字消息能模糊匹配到的关键词可能是非常多的,例如,故事库中可能还有“小红猫”的故事。那么对于一个关键词而言,从词表中选出的候选词语可能是一个,也可能是多个。 In actual use, the user's pronunciation may not be standard, which may cause the sound to be unsatisfactory. For example, if the child says "Little Red Riding Hood" as "Shaw Red Hat", then the speech recognition technology converts the voice signal. The text message may be "Shaw Red Hat." Obviously, the word does not match the "Little Red Riding Hood" completely, but the similarity between the two is very high. It can be concluded from the manual judgment that the child actually wants to express "Little Red Riding Hood". But in fact, the text message into which the speech signal is converted can blur the matching of the keywords. For example, there may be a story of "Little Red Cat" in the story library. Then for a keyword, the candidate words selected from the vocabulary may be one or more.
步骤S140,根据所选出的一个或多个候选词语对文字信息中的关键词进行纠错处理。Step S140: Perform error correction processing on the keywords in the text information according to the selected one or more candidate words.
可见,图1所示的方法,首先利用语音识别技术对智能硬件设备收到的语音信号进行语音识别,将其转换为文字信息,再进一步解析文字信息得到其中的若干个关键词,将这些关键词通过与智能硬件业务相关的词表进行匹配,确定一个或多个候选词语,最后利用得到的候选词语对关键词进行纠错。该技术方案充分考虑了智能硬件的功能特性,利用预设的业务相关词表对语音识别结果中解析出的关键词进行智能纠错,显著提升了语音识别的准确度,并且占用的资源较少,符合智能硬件设备低能耗的需求。It can be seen that the method shown in FIG. 1 firstly uses the speech recognition technology to perform speech recognition on the speech signal received by the intelligent hardware device, converts it into text information, and further analyzes the text information to obtain a plurality of keywords, and these keys are The words are matched by a vocabulary related to the intelligent hardware service, one or more candidate words are determined, and finally the keywords are corrected using the obtained candidate words. The technical solution fully considers the functional characteristics of the intelligent hardware, and uses the preset business-related vocabulary to intelligently correct the keywords parsed in the speech recognition result, which significantly improves the accuracy of the speech recognition and consumes less resources. In line with the low energy consumption requirements of intelligent hardware devices.
在本发明的一个实施例中,图1所示的方法进一步包括:预先设置一个或多个与智能硬件设备的业务语音交互关联的一个或多个固定句式;在每个固定句式中标记关键词的位置;从文字信息中提取关键词包括:将文字信息与一个或多个固定句式进行匹配;根据相匹配的固定句式中标记的关键词的位置,从文字信息的相应位置提取关键词。In an embodiment of the present invention, the method shown in FIG. 1 further includes: presetting one or more fixed sentence patterns associated with the business voice interaction of the smart hardware device; marking in each fixed sentence pattern Position of the keyword; extracting the keyword from the text information includes: matching the text information with one or more fixed sentence patterns; extracting from the corresponding position of the text information according to the position of the keyword marked in the matched fixed sentence pattern Key words.
前述实施例中,“我要听故事——肖红帽”就是一种固定句式,可以将其归纳为“我想听故事XXX”,其中的“XXX”对应于一个关键词。又例如,该句子的语义还可以表达为:“我想听肖红帽的故事”,那么对于的固定句式为“我想听XXX的故事”。在智能硬件支持歌曲播放时,用户可以使用“我想听XXX的XXX”这样的句式,例如,“我想听刘德华的忘情水”,等等。In the foregoing embodiment, "I want to listen to the story - Xiao Hong Cap" is a fixed sentence style, which can be summarized as "I want to hear the story XXX", wherein "XXX" corresponds to a keyword. For another example, the semantics of the sentence can also be expressed as: "I want to hear the story of Xiao Hong Cap", then the fixed sentence is "I want to hear the story of XXX." When the smart hardware supports song playback, the user can use the phrase "I want to listen to XXX's XXX", for example, "I want to listen to Andy Lau's forgotten water", and so on.
那么显然,将“我想听肖红帽的故事”的文字信息与“我想听XXX的故事”这一固定句式进行匹配,就可以确定“肖红帽”是关键词。进一步地,还可以确定“肖红帽”对应的是故事名称。因此在本发明的一个实施例中,上述方法进一步包括:为每个固定句式中的关键词标记类型信息;确定智能硬件业务相关的词表的类型信息;将所提取的关键词与智能硬件业务相关的词表进行匹配包括:根据相匹配的固定句式中的关键词的类型信息确定所提取的关键词的类型信息,根据所提取的关键词的类型信息将所提取的关键词与类型匹配的词表进行匹配。Obviously, by matching the text message of "I want to hear the story of Xiao Hong Cap" and the fixed sentence of "I want to hear the story of XXX", we can confirm that "Shaw Red Hat" is a keyword. Further, it can also be determined that "Shaw Red Hat" corresponds to the story name. Therefore, in an embodiment of the present invention, the method further includes: marking type information for each keyword in the fixed sentence; determining type information of the vocabulary related to the intelligent hardware service; and extracting the extracted keyword and the intelligent hardware The matching of the vocabulary related to the service includes: determining the type information of the extracted keyword according to the type information of the keyword in the matched fixed sentence pattern, and extracting the extracted keyword and type according to the type information of the extracted keyword. Match the vocabulary to match.
对于“讲故事”这一业务功能而言,与业务相关的词表可以为故事名;对于“歌曲播放”这一业务功能而言,与业务相关的词表可以为歌曲名、歌曲风格、歌手名等。在上一实施例中,可以确定各关键词在固定句式中的位置,那么通过每个位置的关键词对应的类型信息,就可以判断应该使用哪个业务相关的词表来进行候选词语的匹配。对于上一实施例中的“肖红帽”,就可以使用故事名词表来进行匹 配;对于“刘德华”,可以使用歌手名词表来进行匹配;对于“忘情水”,可以使用歌曲名词表来进行匹配。For the business function of “storytelling”, the vocabulary related to the business can be the name of the story; for the business function of “song playing”, the vocabulary related to the business can be the name of the song, the style of the song, the singer Name and so on. In the previous embodiment, the position of each keyword in the fixed sentence pattern can be determined. Then, by using the type information corresponding to the keyword of each position, it can be determined which service-related vocabulary should be used to match the candidate words. . For the "Red Riding Hood" in the previous embodiment, you can use the story noun table to For "Andy Lau", you can use the singer noun table to match; for "forget the water", you can use the song noun table to match.
在得到关键词和对应的候选词语后,可以根据候选词语来确定关键词的真正语义。例如,小孩子想听的到底是“小红猫”的故事还是“小红帽”的故事。在本发明的一个实施例中,图1所示的方法中,根据所选出的一个或多个候选词语对文字信息中的关键词进行纠错处理包括:对所选出的与关键词匹配的每个候选词句,根据所提取关键词与该候选词语的匹配度为该候选词语打分;如果关键词的最高分候选词的分数高于或等于第一置信度值,则用该最高分候选词纠正关键词;如果关键词的最高分候选词的分数高于第二置信度值但低于第一置信度值,则与用户进行进一步的语音对话,以确认是否需要用该最高分候选词纠正关键词;如果关键词的最高分候选词的分数低于或等于第二置信度值,不进行纠正。After the keywords and corresponding candidate words are obtained, the true semantics of the keywords can be determined according to the candidate words. For example, the story that a child wants to hear is the story of "Little Red Cat" or the story of "Little Red Riding Hood". In an embodiment of the present invention, in the method shown in FIG. 1, the error correction processing on the keywords in the text information according to the selected one or more candidate words includes: matching the selected keywords with the keywords Each candidate word is scored according to the matching degree of the extracted keyword and the candidate word; if the score of the highest score candidate word of the keyword is higher than or equal to the first confidence value, the highest score candidate is used The word corrects the keyword; if the score of the highest score candidate of the keyword is higher than the second confidence value but lower than the first confidence value, a further voice dialogue is performed with the user to confirm whether the highest score candidate is needed The keyword is corrected; if the score of the highest score candidate of the keyword is lower than or equal to the second confidence value, no correction is made.
例如,“肖红帽”对应的候选词有“小红猫”和“小红帽”两个词。“肖红帽”与“小红帽”有两个字相同,只有一个字语音不同;“肖红帽”与“小红猫”有一个字相同,另外两个字语音不同。那么分数结果可能为:“小红帽”的分数为0.6,“小红猫”的分数为0.5。那么以第一置信度值为0.45为例,由于两个候选词语的分数都高于0.45,那么选择其中最高分的候选词语,也就是“小红帽”来纠正“肖红帽”这一关键词。For example, the candidates for "Red Riding Hood" have the words "Little Red Cat" and "Little Red Riding Hood". "Red Riding Hood" and "Little Red Riding Hood" have the same two words, only one word is different; "Shaw Red Hat" has the same word as "Little Red Cat", and the other two words have different voices. Then the score result may be: "Little Red Riding Hood" has a score of 0.6, and "Little Red Cat" has a score of 0.5. Then, taking the first confidence value of 0.45 as an example, since the scores of the two candidate words are all higher than 0.45, then the candidate word with the highest score, that is, "Little Red Riding Hood" is selected to correct the keyword "Red Hood".
下面给出了几种对候选词语语进行打分的策略示例:在本发明的一个实施例中,上述方法中,根据所提取关键词与该候选词语的匹配度为该候选词语打分包括:将分数从高到低划分为高、中、低三个档位范围;如果关键词与该候选词的拼音相同,只是音调不同,则在高档位范围内打分;如果关键词与该候选词的拼音中的声母或韵母部分相同,则在中档位范围内打分;如果关键词与该候选词的拼音中的声母、韵母都不相同,则在低档位范围内打分。An example of a strategy for scoring candidate term words is given below. In an embodiment of the present invention, the scoring of the candidate words according to the matching degree between the extracted keywords and the candidate words includes: From high to low, it is divided into three ranges of high, medium and low; if the keyword is the same as the pinyin of the candidate, but the pitch is different, it is scored in the high range; if the keyword is in the pinyin of the candidate If the initial or final part is the same, the score is scored in the middle range; if the key is not the same as the initial and final in the pinyin of the candidate, the score is scored in the low range.
举例而言,高档位的分数范围为[0.45,1],中档位的分数范围为(0.4,0.45),低档位的分数范围为[0,0.4)。由于“小红猫”和“小红帽”与“肖红帽”属于“拼音相同,只是音调不同”,那么采用高档位的打分标准。试看另一例:部分用户“n”、“l”的声母发音不分,导致将“牛肉”说成了“流肉”,那么由于“牛肉”这一候选词语与“流肉”这一关键词的部分声母不同,韵母部分相同,那么采用中档位的打分标准。再看另一例:用户由于记忆偏差,将“刘德华”记成了“张德华”,那么“刘”和“张”的声母、韵母都不相同,但其余两个字相同,在这种情况下可以采用低档位的打分标准。 For example, the high-end score range is [0.45, 1], the mid-range score is (0.4, 0.45), and the low-range score is [0, 0.4). Since "Little Red Cat" and "Little Red Riding Hood" and "Red Hood" belong to "the same pinyin, but the tone is different", then the high-grade scoring standard is adopted. Let's look at another example: the pronunciation of the initials of some users "n" and "l" is not divided, resulting in the "beef" being said to be "flowing meat", then the keyword "beef" and the word "flowing meat" The partial initials are different, the finals are the same, then the mid-range scoring standard is adopted. Let's look at another example: the user recorded "Andy Lau" as "Zhang Dehua" due to memory deviation, then the initials and finals of "Liu" and "Zhang" are different, but the other two words are the same. In this case, Use low-grade scoring standards.
那么在获取到候选词语的分数后,就可以依据上一实施例中的方法,判断最高分的候选词语的置信度。例如,“牛肉”的分数为0.44,低于第一置信度值0.45但高于第二置信度值0.4,那么可以询问用户:“你想说的是‘牛肉’吗?”以确认是否需要用“牛肉”纠正“流肉”。如果关键词的最高分候选词的分数低于或等于第二置信度值,不进行纠正,因为这时即使进行纠正,也可能偏离了用户的原意。Then, after the score of the candidate word is obtained, the confidence of the candidate word of the highest score can be determined according to the method in the previous embodiment. For example, if the score for "beef" is 0.44, which is lower than the first confidence value of 0.45 but higher than the second confidence value of 0.4, then you can ask the user: "Do you want to say 'beef'?" to confirm if you need to use "Beef" corrects "flowing meat." If the score of the highest score candidate of the keyword is lower than or equal to the second confidence value, no correction is made, because even if correction is made at this time, the user's original intention may be deviated.
在本发明的一个实施例中,上述方法进一步包括:如果从文字信息中提取了多个关键词,则将各关键词的最高分候选词的分数相乘,得到该多个关键词的分数;如果多个关键词的分数高于或等于第三置信度值,则用各关键词的最高分候选词纠正各关键词;如果多个关键词的分数高于第四置信度值但低于第三置信度值,则与用户进行进一步的语音对话,以确认是否需要各关键词的最高分候选词纠正各关键词;如果多个关键词的分数低于或等于第四置信度值,不进行纠正。In an embodiment of the present invention, the method further includes: if a plurality of keywords are extracted from the text information, multiplying the scores of the highest score candidates of each keyword to obtain a score of the plurality of keywords; If the scores of the plurality of keywords are higher than or equal to the third confidence value, correcting each keyword with the highest score candidate of each keyword; if the score of the plurality of keywords is higher than the fourth confidence value but lower than the first The three-confidence value is further voiced with the user to confirm whether the highest score candidate of each keyword is needed to correct each keyword; if the score of the plurality of keywords is lower than or equal to the fourth confidence value, no correct.
在本实施例中给出了当关键词为多个时,如何计算候选词分数的示例。例如,“张德华”的分数为0.3,“忘情水”用户发音很标准,相应的“忘情水”候选词的分数为1,那么这两个关键词的分数为0.3×1=0.3。In the present embodiment, an example of how to calculate a candidate word score when a keyword is plural is given. For example, the score of "Zhang Dehua" is 0.3, the pronunciation of the user who "forgets the water" is very standard, and the score of the corresponding candidate for "forgetting water" is 1, then the scores of these two keywords are 0.3 × 1 = 0.3.
在本发明的一个实施例中,上述方法进一步包括:根据纠正处理结果输出智能硬件设备的相应业务服务。In an embodiment of the present invention, the method further includes: outputting, according to the result of the correction processing, a corresponding service service of the smart hardware device.
例如,在不执行本发明技术方案的情况下,小孩子对智能陪护机器人说:“你会讲肖红帽的故事吗?”,由于“肖红帽”与“小红帽”不匹配,小孩子得到的回复为“这个我还不会呢。”这时妈妈重新对智能陪护机器人做出了正确的指令“讲小红帽的故事”,智能陪护机器人正确地获取“小红帽”这个故事的资源,进行语音讲述。For example, without performing the technical solution of the present invention, the child said to the intelligent accompanying robot: "Do you tell the story of Xiao Hong Cap?", because "Shaw Red Hat" does not match "Little Red Riding Hood", the child gets The reply is "I will not do this." At this time, the mother re-directed the smart accompanying robot to "tell the story of Little Red Riding Hood", and the intelligent accompanying robot correctly obtained the resources of the story "Little Red Riding Hood" for voicetelling. .
而在本实施例中,由于在小孩子说出“你会讲肖红帽的故事吗?”时,通过语音识别纠正,可以正确地将“肖红帽”纠正为“小红帽”,就可以正确地获取“小红帽”这个故事的资源,进行语音讲述。In the present embodiment, since the child said "Will you tell the story of Xiaohongxi?", it is correct to correct "Shaw Red Hat" to "Little Red Riding Hood" by correcting the voice recognition. Get the resources of the story "Little Red Riding Hood" and tell the story.
图2示出了根据本发明一个实施例的一种智能硬件设备中的语音识别纠错装置的结构示意图,如图2所示,智能硬件设备中的语音识别纠错装置200包括:FIG. 2 is a schematic structural diagram of a voice recognition error correction apparatus in an intelligent hardware device according to an embodiment of the present invention. As shown in FIG. 2, the voice recognition error correction apparatus 200 in the smart hardware device includes:
语音识别单元210,适于将智能硬件设备收到的语音信号通过语音识别技术转换成文字信息。The voice recognition unit 210 is adapted to convert the voice signal received by the smart hardware device into text information by using a voice recognition technology.
关键词提取单元220,适于从文字信息中提取关键词。The keyword extracting unit 220 is adapted to extract keywords from the text information.
匹配单元230,适于将所提取的关键词与智能硬件业务相关的词表进行匹配,从词表中选出与关键词匹配的一个或多个候选词语。 The matching unit 230 is adapted to match the extracted keywords with the vocabulary related to the intelligent hardware service, and select one or more candidate words that match the keywords from the vocabulary.
纠错单元240,适于根据所选出的一个或多个候选词语对文字信息中的关键词进行纠错处理。The error correction unit 240 is adapted to perform error correction processing on the keywords in the text information according to the selected one or more candidate words.
可见,图2所示的装置,通过各单元的相互配合,首先利用语音识别技术对智能硬件设备收到的语音信号进行语音识别,将其转换为文字信息,再进一步解析文字信息得到其中的若干个关键词,将这些关键词通过与智能硬件业务相关的词表进行匹配,确定一个或多个候选词语,最后利用得到的候选词语对关键词进行纠错。该技术方案充分考虑了智能硬件的功能特性,利用预设的业务相关词表对语音识别结果中解析出的关键词进行智能纠错,显著提升了语音识别的准确度,并且占用的资源较少,符合智能硬件设备低能耗的需求。It can be seen that the device shown in FIG. 2 firstly uses the voice recognition technology to perform voice recognition on the voice signal received by the intelligent hardware device by using the mutual recognition of each unit, converts it into text information, and further analyzes the text information to obtain some of them. The keywords are matched by the vocabulary related to the intelligent hardware service to determine one or more candidate words, and finally the keywords are corrected by using the obtained candidate words. The technical solution fully considers the functional characteristics of the intelligent hardware, and uses the preset business-related vocabulary to intelligently correct the keywords parsed in the speech recognition result, which significantly improves the accuracy of the speech recognition and consumes less resources. In line with the low energy consumption requirements of intelligent hardware devices.
在本发明的一个实施例中,上述装置还包括:配置单元,适于预先设置一个或多个与智能硬件设备的业务语音交互关联的一个或多个固定句式;在每个固定句式中标记关键词的位置;关键词提取单元220,适于将文字信息与一个或多个固定句式进行匹配;根据相匹配的固定句式中标记的关键词的位置,从文字信息的相应位置提取关键词。In an embodiment of the present invention, the apparatus further includes: a configuration unit, configured to preset one or more fixed sentence patterns associated with the business voice interaction of the smart hardware device; in each fixed sentence pattern Marking the location of the keyword; the keyword extracting unit 220 is adapted to match the text information with one or more fixed sentence patterns; and extracting from the corresponding position of the text information according to the position of the keyword marked in the matched fixed sentence pattern Key words.
在本发明的一个实施例中,上述装置中,配置单元,进一步适于为每个固定句式中的关键词标记类型信息;确定智能硬件业务相关的词表的类型信息;匹配单元230,适于根据相匹配的固定句式中的关键词的类型信息确定所提取的关键词的类型信息,根据所提取的关键词的类型信息将所提取的关键词与类型匹配的词表进行匹配。In an embodiment of the present invention, in the foregoing apparatus, the configuration unit is further adapted to mark type information for the keywords in each fixed sentence; determine type information of the vocabulary related to the intelligent hardware service; and the matching unit 230 The type information of the extracted keyword is determined according to the type information of the keyword in the matched fixed sentence pattern, and the extracted keyword is matched with the vocabulary matching the type according to the type information of the extracted keyword.
在本发明的一个实施例中,上述装置中,纠错单元240,适于对所选出的与关键词匹配的每个候选词句,根据所提取关键词与该候选词语的匹配度为该候选词语打分;如果关键词的最高分候选词的分数高于或等于第一置信度值,则用该最高分候选词纠正关键词;如果关键词的最高分候选词的分数高于第二置信度值但低于第一置信度值,则与用户进行进一步的语音对话,以确认是否需要用该最高分候选词纠正关键词;如果关键词的最高分候选词的分数低于或等于第二置信度值,不进行纠正。In an embodiment of the present invention, in the foregoing apparatus, the error correcting unit 240 is adapted to select, for each selected candidate word that matches the keyword, according to the matching degree of the extracted keyword and the candidate word, the candidate Word scoring; if the score of the highest score candidate of the keyword is higher than or equal to the first confidence value, correct the keyword with the highest score candidate; if the score of the highest score candidate of the keyword is higher than the second confidence If the value is lower than the first confidence value, a further voice dialogue is performed with the user to confirm whether the keyword needs to be corrected with the highest score candidate; if the score of the highest score candidate of the keyword is lower than or equal to the second confidence Degree value, no correction.
在本发明的一个实施例中,上述装置中,纠错单元240,适于将分数从高到低划分为高、中、低三个档位范围;In an embodiment of the present invention, in the foregoing apparatus, the error correction unit 240 is adapted to divide the score from high to low into three ranges of high, medium, and low;
如果关键词与该候选词的拼音相同,只是音调不同,则在高档位范围内打分;If the keyword is the same as the pinyin of the candidate word, but the pitch is different, the score is scored in the high-end range;
如果关键词与该候选词的拼音中的声母或韵母部分相同,则在中档位范围内打分;如果关键词与该候选词的拼音中的声母、韵母都不相同,则在低档位范围内打 分。If the keyword is the same as the initial or final part of the pinyin of the candidate word, it is scored in the mid-range range; if the keyword is not the same as the initial or final in the pinyin of the candidate word, it is in the low range hit Minute.
在本发明的一个实施例中,上述装置中,纠错单元240,进一步适于当从文字信息中提取了多个关键词时,将各关键词的最高分候选词的分数相乘,得到该多个关键词的分数;如果多个关键词的分数高于或等于第三置信度值,则用各关键词的最高分候选词纠正各关键词;如果多个关键词的分数高于第四置信度值但低于第三置信度值,则与用户进行进一步的语音对话,以确认是否需要各关键词的最高分候选词纠正各关键词;如果多个关键词的分数低于或等于第四置信度值,不进行纠正。In an embodiment of the present invention, in the foregoing apparatus, the error correction unit 240 is further adapted to multiply the scores of the highest score candidate words of each keyword when the plurality of keywords are extracted from the text information, to obtain the a score of a plurality of keywords; if the scores of the plurality of keywords are higher than or equal to the third confidence value, correcting each keyword with the highest score candidate of each keyword; if the score of the plurality of keywords is higher than the fourth If the confidence value is lower than the third confidence value, a further voice dialogue is performed with the user to confirm whether the highest score candidate of each keyword is needed to correct each keyword; if the scores of the multiple keywords are lower than or equal to the first Four confidence values are not corrected.
在本发明的一个实施例中,上述装置进一步包括:业务服务单元,适于根据纠正处理结果输出智能硬件设备的相应业务服务。In an embodiment of the present invention, the apparatus further includes: a service service unit adapted to output a corresponding service service of the smart hardware device according to the result of the correction process.
需要说明的是,上述各装置实施例的具体实施方式与前述对应方法实施例的具体实施方式相同,在此不再赘述。It should be noted that the specific implementation manners of the foregoing device embodiments are the same as the specific implementation manners of the foregoing corresponding method embodiments, and details are not described herein again.
综上所述,本发明的技术方案,首先利用语音识别技术对智能硬件设备收到的语音信号进行语音识别,将其转换为文字信息,再进一步解析文字信息得到其中的若干个关键词,将这些关键词通过与智能硬件业务相关的词表进行匹配,确定一个或多个候选词语,最后利用得到的候选词语对关键词进行纠错。该技术方案充分考虑了智能硬件的功能特性,利用预设的业务相关词表对语音识别结果中解析出的关键词进行智能纠错,显著提升了语音识别的准确度,并且占用的资源较少,符合智能硬件设备低能耗的需求。In summary, the technical solution of the present invention firstly uses voice recognition technology to perform voice recognition on a voice signal received by an intelligent hardware device, converts it into text information, and further analyzes the text information to obtain a plurality of keywords thereof. These keywords are matched by a vocabulary related to the intelligent hardware service to determine one or more candidate words, and finally the keywords are corrected using the obtained candidate words. The technical solution fully considers the functional characteristics of the intelligent hardware, and uses the preset business-related vocabulary to intelligently correct the keywords parsed in the speech recognition result, which significantly improves the accuracy of the speech recognition and consumes less resources. In line with the low energy consumption requirements of intelligent hardware devices.
需要说明的是:It should be noted:
在此提供的算法和显示不与任何特定计算机、虚拟装置或者其它设备固有相关。各种通用装置也可以与基于在此的示教一起使用。根据上面的描述,构造这类装置所要求的结构是显而易见的。此外,本发明也不针对任何特定编程语言。应当明白,可以利用各种编程语言实现在此描述的本发明的内容,并且上面对特定语言所做的描述是为了披露本发明的最佳实施方式。The algorithms and displays provided herein are not inherently related to any particular computer, virtual device, or other device. Various general purpose devices can also be used with the teaching based on the teachings herein. The structure required to construct such a device is apparent from the above description. Moreover, the invention is not directed to any particular programming language. It is to be understood that the invention may be embodied in a variety of programming language, and the description of the specific language has been described above in order to disclose the preferred embodiments of the invention.
在此处所提供的说明书中,说明了大量具体细节。然而,能够理解,本发明的实施例可以在没有这些具体细节的情况下实践。在一些实例中,并未详细示出公知的方法、结构和技术,以便不模糊对本说明书的理解。In the description provided herein, numerous specific details are set forth. However, it is understood that the embodiments of the invention may be practiced without these specific details. In some instances, well-known methods, structures, and techniques are not shown in detail so as not to obscure the understanding of the description.
类似地,应当理解,为了精简本公开并帮助理解各个发明方面中的一个或多个,在上面对本发明的示例性实施例的描述中,本发明的各个特征有时被一起分组到单个实施例、图、或者对其的描述中。然而,并不应将该公开的方法解释成反映 如下意图:即所要求保护的本发明要求比在每个权利要求中所明确记载的特征更多的特征。更确切地说,如下面的权利要求书所反映的那样,发明方面在于少于前面公开的单个实施例的所有特征。因此,遵循具体实施方式的权利要求书由此明确地并入该具体实施方式,其中每个权利要求本身都作为本发明的单独实施例。Similarly, the various features of the invention are sometimes grouped together into a single embodiment, in the above description of the exemplary embodiments of the invention, Figure, or a description of it. However, the method disclosed should not be interpreted as reflecting It is intended that the claimed invention be characterized by more features than those described in the appended claims. Rather, as the following claims reflect, inventive aspects reside in less than all features of the single embodiments disclosed herein. Therefore, the claims following the specific embodiments are hereby explicitly incorporated into the embodiments, and each of the claims as a separate embodiment of the invention.
本领域那些技术人员可以理解,可以对实施例中的设备中的模块进行自适应性地改变并且把它们设置在与该实施例不同的一个或多个设备中。可以把实施例中的模块或单元或组件组合成一个模块或单元或组件,以及此外可以把它们分成多个子模块或子单元或子组件。除了这样的特征和/或过程或者单元中的至少一些是相互排斥之外,可以采用任何组合对本说明书(包括伴随的权利要求、摘要和附图)中公开的所有特征以及如此公开的任何方法或者设备的所有过程或单元进行组合。除非另外明确陈述,本说明书(包括伴随的权利要求、摘要和附图)中公开的每个特征可以由提供相同、等同或相似目的的替代特征来代替。Those skilled in the art will appreciate that the modules in the devices of the embodiments can be adaptively changed and placed in one or more devices different from the embodiment. The modules or units or components of the embodiments may be combined into one module or unit or component, and further they may be divided into a plurality of sub-modules or sub-units or sub-components. In addition to such features and/or at least some of the processes or units being mutually exclusive, any combination of the features disclosed in the specification, including the accompanying claims, the abstract and the drawings, and any methods so disclosed, or All processes or units of the device are combined. Each feature disclosed in this specification (including the accompanying claims, the abstract and the drawings) may be replaced by alternative features that provide the same, equivalent or similar purpose.
此外,本领域的技术人员能够理解,尽管在此所述的一些实施例包括其它实施例中所包括的某些特征而不是其它特征,但是不同实施例的特征的组合意味着处于本发明的范围之内并且形成不同的实施例。例如,在下面的权利要求书中,所要求保护的实施例的任意之一都可以以任意的组合方式来使用。In addition, those skilled in the art will appreciate that, although some embodiments described herein include certain features that are included in other embodiments and not in other features, combinations of features of different embodiments are intended to be within the scope of the present invention. Different embodiments are formed and formed. For example, in the following claims, any one of the claimed embodiments can be used in any combination.
本发明的各个部件实施例可以以硬件实现,或者以在一个或者多个处理器上运行的软件模块实现,或者以它们的组合实现。本领域的技术人员应当理解,可以在实践中使用微处理器或者数字信号处理器(DSP)来实现根据本发明实施例的智能硬件设备中的语音识别纠错装置中的一些或者全部部件的一些或者全部功能。本发明还可以实现为用于执行这里所描述的方法的一部分或者全部的设备或者装置程序(例如,计算机程序和计算机程序产品)。这样的实现本发明的程序可以存储在计算机可读介质上,或者可以具有一个或者多个信号的形式。这样的信号可以从因特网网站上下载得到,或者在载体信号上提供,或者以任何其他形式提供。The various component embodiments of the present invention may be implemented in hardware, or in a software module running on one or more processors, or in a combination thereof. Those skilled in the art will appreciate that a microprocessor or digital signal processor (DSP) may be used in practice to implement some of some or all of the components of the speech recognition error correction device in an intelligent hardware device in accordance with an embodiment of the present invention. Or all features. The invention can also be implemented as a device or device program (e.g., a computer program and a computer program product) for performing some or all of the methods described herein. Such a program implementing the invention may be stored on a computer readable medium or may be in the form of one or more signals. Such signals may be downloaded from an Internet website, provided on a carrier signal, or provided in any other form.
例如,图3示出了用于执行根据本发明的方法的计算设备的框图。该计算设备传统上包括处理器310和以存储器320形式的计算机程序产品或者计算机可读介质。存储器320可以是诸如闪存、EEPROM(电可擦除可编程只读存储器)、EPROM、硬盘或者ROM之类的电子存储器。存储器320具有存储用于执行上述方法中的任何方法步骤的程序代码331的存储空间330。例如,用于存储程序代码的存储空间330可以分别存储用于实现上面的方法中的各种步骤的各个程序代码331。这些程序代码可以从一个或者多个计算机程序产品中读出或者写入到这一个或者多个计算机程序 产品中。这些计算机程序产品包括诸如硬盘,紧致盘(CD)、存储卡或者软盘之类的程序代码载体。这样的计算机程序产品通常为例如图4所示的便携式或者固定存储单元。该存储单元可以具有与图3的计算设备中的存储器320类似布置的存储段、存储空间等。程序代码可以以适当形式进行压缩。通常,存储单元存储有用于执行根据本发明的方法步骤的计算机可读程序代码331’,即可以由诸如310之类的处理器读取的程序代码,当这些程序代码由计算设备运行时,导致该计算设备执行上面所描述的方法中的各个步骤。For example, Figure 3 shows a block diagram of a computing device for performing the method in accordance with the present invention. The computing device conventionally includes a processor 310 and a computer program product or computer readable medium in the form of a memory 320. The memory 320 may be an electronic memory such as a flash memory, an EEPROM (Electrically Erasable Programmable Read Only Memory), an EPROM, a hard disk, or a ROM. The memory 320 has a storage space 330 that stores program code 331 for performing any of the method steps described above. For example, the storage space 330 for storing program code may separately store respective program codes 331 for implementing various steps in the above method. The program code can be read from or written to one or more computer program products to the one or more computer programs In the product. These computer program products include program code carriers such as hard disks, compact disks (CDs), memory cards or floppy disks. Such computer program products are typically portable or fixed storage units such as those shown in FIG. The storage unit may have storage segments, storage spaces, and the like that are similarly arranged to memory 320 in the computing device of FIG. The program code can be compressed in an appropriate form. In general, the storage unit stores computer readable program code 331' for performing the steps of the method according to the invention, ie program code readable by a processor such as 310, when the program code is run by the computing device, resulting in The computing device performs the various steps in the methods described above.
应该注意的是上述实施例对本发明进行说明而不是对本发明进行限制,并且本领域技术人员在不脱离所附权利要求的范围的情况下可设计出替换实施例。在权利要求中,不应将位于括号之间的任何参考符号构造成对权利要求的限制。单词“包含”不排除存在未列在权利要求中的元件或步骤。位于元件之前的单词“一”或“一个”不排除存在多个这样的元件。本发明可以借助于包括有若干不同元件的硬件以及借助于适当编程的计算机来实现。在列举了若干装置的单元权利要求中,这些装置中的若干个可以是通过同一个硬件项来具体体现。单词第一、第二、以及第三等的使用不表示任何顺序。可将这些单词解释为名称。It is to be noted that the above-described embodiments are illustrative of the invention and are not intended to be limiting, and that the invention may be devised without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as a limitation. The word "comprising" does not exclude the presence of the elements or steps that are not recited in the claims. The word "a" or "an" The invention can be implemented by means of hardware comprising several distinct elements and by means of a suitably programmed computer. In the unit claims enumerating several means, several of these means can be embodied by the same hardware item. The use of the words first, second, and third does not indicate any order. These words can be interpreted as names.
此外,还应当注意,本说明书中使用的语言主要是为了可读性和教导的目的而选择的,而不是为了解释或者限定本发明的主题而选择的。因此,在不偏离所附权利要求书的范围和精神的情况下,对于本技术领域的普通技术人员来说许多修改和变更都是显而易见的。对于本发明的范围,对本发明所做的公开是说明性的,而非限制性的,本发明的范围由所附权利要求书限定。 In addition, it should be noted that the language used in the specification has been selected for the purpose of readability and teaching, and is not intended to be construed or limited. Therefore, many modifications and changes will be apparent to those skilled in the art without departing from the scope of the invention. The disclosure of the present invention is intended to be illustrative, and not restrictive, and the scope of the invention is defined by the appended claims.

Claims (16)

  1. 一种智能硬件设备中的语音识别纠错方法,其中,该方法包括:A speech recognition error correction method in an intelligent hardware device, wherein the method comprises:
    将智能硬件设备收到的语音信号通过语音识别技术转换成文字信息;Transmitting the voice signal received by the intelligent hardware device into text information through voice recognition technology;
    从所述文字信息中提取关键词;Extracting keywords from the text information;
    将所提取的关键词与智能硬件业务相关的词表进行匹配,从词表中选出与所述关键词匹配的一个或多个候选词语;Matching the extracted keywords with a vocabulary related to the intelligent hardware service, and selecting one or more candidate words matching the keyword from the vocabulary;
    根据所选出的一个或多个候选词语对所述文字信息中的关键词进行纠错处理。The error correction processing is performed on the keywords in the text information according to the selected one or more candidate words.
  2. 如权利要求1所述的方法,其中,The method of claim 1 wherein
    该方法进一步包括:预先设置一个或多个与所述智能硬件设备的业务语音交互关联的一个或多个固定句式;在每个固定句式中标记关键词的位置;The method further includes: presetting one or more fixed sentence patterns associated with the business voice interaction of the smart hardware device; marking a location of the keyword in each fixed sentence pattern;
    所述从所述文字信息中提取关键词包括:将所述文字信息与所述一个或多个固定句式进行匹配;根据相匹配的固定句式中标记的关键词的位置,从所述文字信息的相应位置提取关键词。The extracting the keyword from the text information comprises: matching the text information with the one or more fixed sentence patterns; and from the position of the keywords marked in the matched fixed sentence pattern, from the text The corresponding position of the information is extracted from the keyword.
  3. 如权利要求1或2所述的方法,其中,The method of claim 1 or 2, wherein
    该方法进一步包括:为每个固定句式中的关键词标记类型信息;确定智能硬件业务相关的词表的类型信息;The method further includes: marking type information for each keyword in the fixed sentence; determining type information of the vocabulary related to the intelligent hardware service;
    所述将所提取的关键词与智能硬件业务相关的词表进行匹配包括:根据相匹配的固定句式中的关键词的类型信息确定所提取的关键词的类型信息,根据所提取的关键词的类型信息将所提取的关键词与类型匹配的词表进行匹配。The matching the extracted keywords with the vocabulary related to the intelligent hardware service comprises: determining the type information of the extracted keywords according to the type information of the keywords in the matched fixed sentence patterns, according to the extracted keywords The type information matches the extracted keywords with a vocabulary that matches the type.
  4. 如权利要求1-3中任一项所述的方法,其中,所述根据所选出的一个或多个候选词语对所述文字信息中的关键词进行纠错处理包括:The method according to any one of claims 1 to 3, wherein the error correction processing of the keywords in the text information according to the selected one or more candidate words comprises:
    对所选出的与所述关键词匹配的每个候选词句,根据所提取关键词与该候选词语的匹配度为该候选词语打分;For each candidate phrase selected to match the keyword, the candidate word is scored according to the matching degree of the extracted keyword and the candidate word;
    如果所述关键词的最高分候选词的分数高于或等于第一置信度值,则用该最高分候选词纠正所述关键词;If the score of the highest score candidate of the keyword is higher than or equal to the first confidence value, correct the keyword with the highest score candidate;
    如果所述关键词的最高分候选词的分数高于第二置信度值但低于第一置信度值,则与用户进行进一步的语音对话,以确认是否需要用该最高分候选词纠正所述关键词;If the score of the highest score candidate of the keyword is higher than the second confidence value but lower than the first confidence value, a further voice dialogue is performed with the user to confirm whether the highest score candidate is needed to correct the Key words;
    如果所述关键词的最高分候选词的分数低于或等于第二置信度值,不进行纠 正。If the score of the highest score candidate of the keyword is lower than or equal to the second confidence value, no correction is performed positive.
  5. 如权利要求1-4中任一项所述的方法,其中,所述根据所提取关键词与该候选词语的匹配度为该候选词语打分包括:The method according to any one of claims 1 to 4, wherein the scoring the candidate word according to the matching degree of the extracted keyword and the candidate word comprises:
    将分数从高到低划分为高、中、低三个档位范围;Divide the score from high to low into three ranges of high, medium and low;
    如果关键词与该候选词的拼音相同,只是音调不同,则在高档位范围内打分;If the keyword is the same as the pinyin of the candidate word, but the pitch is different, the score is scored in the high-end range;
    如果关键词与该候选词的拼音中的声母或韵母部分相同,则在中档位范围内打分;If the keyword is the same as the initial or final part of the pinyin of the candidate word, the score is scored in the mid-range range;
    如果关键词与该候选词的拼音中的声母、韵母都不相同,则在低档位范围内打分。If the keyword is not the same as the initial or final in the pinyin of the candidate, the score is scored in the low range.
  6. 如权利要求1-5中任一项所述的方法,其中,该方法进一步包括:The method of any of claims 1-5, wherein the method further comprises:
    如果从所述文字信息中提取了多个关键词,则将各关键词的最高分候选词的分数相乘,得到该多个关键词的分数;If a plurality of keywords are extracted from the text information, the scores of the highest score candidates of each keyword are multiplied to obtain scores of the plurality of keywords;
    如果多个关键词的分数高于或等于第三置信度值,则用各关键词的最高分候选词纠正所述各关键词;If the scores of the plurality of keywords are higher than or equal to the third confidence value, correcting the keywords by using the highest score candidate words of each keyword;
    如果多个关键词的分数高于第四置信度值但低于第三置信度值,则与用户进行进一步的语音对话,以确认是否需要各关键词的最高分候选词纠正所述各关键词;If the score of the plurality of keywords is higher than the fourth confidence value but lower than the third confidence value, a further voice dialogue is performed with the user to confirm whether the highest score candidate of each keyword is needed to correct the keywords ;
    如果多个关键词的分数低于或等于第四置信度值,不进行纠正。If the score of multiple keywords is lower than or equal to the fourth confidence value, no correction is made.
  7. 如权利要求1-6中任一项所述的方法,其中,该方法进一步包括:The method of any of claims 1-6, wherein the method further comprises:
    根据纠正处理结果输出所述智能硬件设备的相应业务服务。Outputting corresponding service services of the smart hardware device according to the result of the correction processing.
  8. 一种智能硬件设备中的语音识别纠错装置,其中,该装置包括:A speech recognition error correction device in an intelligent hardware device, wherein the device comprises:
    语音识别单元,适于将智能硬件设备收到的语音信号通过语音识别技术转换成文字信息;a voice recognition unit, configured to convert a voice signal received by the smart hardware device into text information by using a voice recognition technology;
    关键词提取单元,适于从所述文字信息中提取关键词;a keyword extracting unit adapted to extract a keyword from the text information;
    匹配单元,适于将所提取的关键词与智能硬件业务相关的词表进行匹配,从词表中选出与所述关键词匹配的一个或多个候选词语;a matching unit, configured to match the extracted keyword with a vocabulary related to the intelligent hardware service, and select one or more candidate words matching the keyword from the vocabulary;
    纠错单元,适于根据所选出的一个或多个候选词语对所述文字信息中的关键词进行纠错处理。The error correction unit is adapted to perform error correction processing on the keywords in the text information according to the selected one or more candidate words.
  9. 如权利要求8所述的装置,其中,该装置进一步包括:配置单元,适于预先设置一个或多个与所述智能硬件设备的业务语音交互关联的一个或多个固 定句式;在每个固定句式中标记关键词的位置;The apparatus of claim 8, wherein the apparatus further comprises: a configuration unit adapted to pre-set one or more one or more solids associated with the business voice of the smart hardware device Fixed sentence; the location of the keyword in each fixed sentence;
    所述关键词提取单元,适于将所述文字信息与所述一个或多个固定句式进行匹配;根据相匹配的固定句式中标记的关键词的位置,从所述文字信息的相应位置提取关键词。The keyword extracting unit is adapted to match the text information with the one or more fixed sentence patterns; according to the position of the keyword marked in the matched fixed sentence pattern, from the corresponding position of the text information Extract keywords.
  10. 如权利要求8或9所述的装置,其中,The apparatus according to claim 8 or 9, wherein
    所述配置单元,进一步适于为每个固定句式中的关键词标记类型信息;确定智能硬件业务相关的词表的类型信息;The configuration unit is further adapted to mark type information for each keyword in the fixed sentence; and determine type information of the vocabulary related to the intelligent hardware service;
    所述匹配单元,适于根据相匹配的固定句式中的关键词的类型信息确定所提取的关键词的类型信息,根据所提取的关键词的类型信息将所提取的关键词与类型匹配的词表进行匹配。The matching unit is adapted to determine type information of the extracted keyword according to type information of the keyword in the matched fixed sentence pattern, and match the extracted keyword with the type according to the type information of the extracted keyword. The vocabulary is matched.
  11. 如权利要求8-10中任一项所述的装置,其中,A device according to any one of claims 8 to 10, wherein
    所述纠错单元,适于对所选出的与所述关键词匹配的每个候选词句,根据所提取关键词与该候选词语的匹配度为该候选词语打分;如果所述关键词的最高分候选词的分数高于或等于第一置信度值,则用该最高分候选词纠正所述关键词;如果所述关键词的最高分候选词的分数高于第二置信度值但低于第一置信度值,则与用户进行进一步的语音对话,以确认是否需要用该最高分候选词纠正所述关键词;如果所述关键词的最高分候选词的分数低于或等于第二置信度值,不进行纠正。The error correcting unit is adapted to score the candidate word according to the matching degree of the extracted keyword and the candidate word for each selected candidate word matching the keyword; if the keyword is the highest If the score of the candidate candidate is higher than or equal to the first confidence value, the keyword is corrected by the highest score candidate; if the score of the highest score candidate of the keyword is higher than the second confidence value but lower than a first confidence value, and then a further voice conversation with the user to confirm whether the keyword needs to be corrected with the highest score candidate; if the score of the highest score candidate of the keyword is lower than or equal to the second confidence Degree value, no correction.
  12. 如权利要求8-11中任一项所述的装置,其中,A device according to any of claims 8-11, wherein
    所述纠错单元,适于将分数从高到低划分为高、中、低三个档位范围;The error correction unit is adapted to divide the score from high to low into three ranges of high, medium and low;
    如果关键词与该候选词的拼音相同,只是音调不同,则在高档位范围内打分;如果关键词与该候选词的拼音中的声母或韵母部分相同,则在中档位范围内打分;如果关键词与该候选词的拼音中的声母、韵母都不相同,则在低档位范围内打分。If the keyword is the same as the pinyin of the candidate word, but the pitch is different, the score is scored in the high-end range; if the keyword is the same as the initial or final part in the pinyin of the candidate, the score is scored in the mid-range range; If the initials and the finals in the pinyin of the candidate are not the same, the scores are scored in the low range.
  13. 如权利要求8-12中任一项所述的装置,其中,A device according to any one of claims 8 to 12, wherein
    所述纠错单元,进一步适于当从所述文字信息中提取了多个关键词时,将各关键词的最高分候选词的分数相乘,得到该多个关键词的分数;如果多个关键词的分数高于或等于第三置信度值,则用各关键词的最高分候选词纠正所述各关键词;如果多个关键词的分数高于第四置信度值但低于第三置信度值,则与用户进行进一步的语音对话,以确认是否需要各关键词的最高分候选词纠正所述各关键 词;如果多个关键词的分数低于或等于第四置信度值,不进行纠正。The error correction unit is further adapted to: when the plurality of keywords are extracted from the text information, multiply the scores of the highest score candidates of each keyword to obtain scores of the plurality of keywords; if multiple If the score of the keyword is higher than or equal to the third confidence value, the keywords are corrected by the highest score candidate of each keyword; if the score of the multiple keywords is higher than the fourth confidence value but lower than the third Confidence value, then a further voice dialogue with the user to confirm whether the highest score candidate for each keyword is needed to correct the key Word; if the score of multiple keywords is lower than or equal to the fourth confidence value, no correction is made.
  14. 如权利要求8-13中任一项所述的装置,其中,该装置进一步包括:The apparatus of any of claims 8-13, wherein the apparatus further comprises:
    业务服务单元,适于根据纠正处理结果输出所述智能硬件设备的相应业务服务。The service service unit is adapted to output a corresponding service service of the intelligent hardware device according to the result of the correction process.
  15. 一种计算机程序,包括计算机可读代码,当所述计算机可读代码在计算设备上运行时,导致所述计算设备执行根据权利要求1-7中的任一个所述的一种智能硬件设备中的语音识别纠错方法。A computer program comprising computer readable code, when the computer readable code is run on a computing device, causing the computing device to perform in an intelligent hardware device according to any one of claims 1-7 Speech recognition error correction method.
  16. 一种计算机可读介质,其中存储了如权利要求15所述的计算机程序。 A computer readable medium storing the computer program of claim 15.
PCT/CN2017/116165 2016-12-29 2017-12-14 Method and apparatus for error connection of voice recognition in smart hardware device WO2018121275A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201611246867.0A CN106710592B (en) 2016-12-29 2016-12-29 A kind of speech recognition error correction method and device in intelligent hardware device
CN201611246867.0 2016-12-29

Publications (1)

Publication Number Publication Date
WO2018121275A1 true WO2018121275A1 (en) 2018-07-05

Family

ID=58906036

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/116165 WO2018121275A1 (en) 2016-12-29 2017-12-14 Method and apparatus for error connection of voice recognition in smart hardware device

Country Status (2)

Country Link
CN (1) CN106710592B (en)
WO (1) WO2018121275A1 (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109977412A (en) * 2019-03-29 2019-07-05 北京林业大学 A kind of field value error correction method, device, readable medium and storage control
CN110232129A (en) * 2019-06-11 2019-09-13 北京百度网讯科技有限公司 Scene error correction method, device, equipment and storage medium
CN110516237A (en) * 2019-08-15 2019-11-29 重庆长安汽车股份有限公司 Short text phrase extracting method, system and storage medium
CN113808577A (en) * 2021-09-18 2021-12-17 平安银行股份有限公司 Intelligent extraction method, device, electronic device and storage medium for speech abstract

Families Citing this family (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106710592B (en) * 2016-12-29 2021-05-18 北京奇虎科技有限公司 A kind of speech recognition error correction method and device in intelligent hardware device
CN107239547B (en) * 2017-06-05 2019-05-28 北京儒博科技有限公司 Voice error correction method, terminal and storage medium for voice song request
TWI660340B (en) * 2017-11-03 2019-05-21 財團法人資訊工業策進會 Voice controlling method and system
CN108021554A (en) * 2017-11-14 2018-05-11 无锡小天鹅股份有限公司 Audio recognition method, device and washing machine
CN107977356B (en) * 2017-11-21 2019-10-25 新疆科大讯飞信息科技有限责任公司 Method and device for correcting recognized text
CN109947264B (en) * 2017-12-21 2023-03-14 北京搜狗科技发展有限公司 Information display method and device and electronic equipment
CN110120986B (en) * 2018-02-05 2021-10-19 腾讯科技(深圳)有限公司 Method, device and equipment for acquiring electronic equipment information
CN108647346B (en) * 2018-05-15 2021-10-29 苏州东巍网络科技有限公司 Old people voice interaction method and system for wearable electronic equipment
CN108877792B (en) * 2018-05-30 2023-10-24 北京百度网讯科技有限公司 Method, apparatus, electronic device and computer readable storage medium for processing voice conversations
CN109522550B (en) * 2018-11-08 2023-04-07 和美(深圳)信息技术股份有限公司 Text information error correction method and device, computer equipment and storage medium
CN109951354B (en) * 2019-03-12 2021-08-10 北京奇虎科技有限公司 Terminal equipment identification method, system and storage medium
CN110211592A (en) * 2019-05-17 2019-09-06 北京华控创为南京信息技术有限公司 Intelligent sound data processing equipment and method
CN110232921A (en) * 2019-06-21 2019-09-13 深圳市酷开网络科技有限公司 Voice operating method, apparatus, smart television and system based on service for life
CN112668312A (en) * 2019-09-30 2021-04-16 北大方正集团有限公司 Wrongly written character correction method and device, electronic equipment and storage medium
CN110909127A (en) * 2019-11-07 2020-03-24 中铁大桥科学研究院有限公司 A method and system for inputting and querying bridge inspection information
CN110689891A (en) * 2019-11-20 2020-01-14 广东奥园奥买家电子商务有限公司 Voice interaction method and device based on public display device
CN111160013B (en) * 2019-12-30 2023-11-24 北京百度网讯科技有限公司 Text error correction method and device
CN113971952B (en) * 2020-07-24 2025-07-11 阿里巴巴集团控股有限公司 A voice recognition verification method, computing device and storage medium
CN113763944B (en) * 2020-09-29 2024-06-04 浙江思考者科技有限公司 AI video cloud interaction system based on pseudo person logic knowledge base
CN114678027B (en) * 2020-12-24 2024-12-03 深圳Tcl新技术有限公司 Speech recognition result error correction method, device, terminal equipment and storage medium
CN112580324B (en) * 2020-12-24 2023-07-25 北京百度网讯科技有限公司 Text error correction method, device, electronic equipment and storage medium
CN113420547A (en) * 2021-08-25 2021-09-21 深圳市豪华科技有限公司 Wrongly written word error correction method of instant messaging software and related equipment
CN115438995B (en) * 2022-09-21 2023-10-10 青岛酷特智能股份有限公司 Business processing method and equipment for clothing customization enterprise based on knowledge graph

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7058575B2 (en) * 2001-06-27 2006-06-06 Intel Corporation Integrating keyword spotting with graph decoder to improve the robustness of speech recognition
CN102682763A (en) * 2011-03-10 2012-09-19 北京三星通信技术研究有限公司 Method, device and terminal for correcting named entity vocabularies in voice input text
CN103186523A (en) * 2011-12-30 2013-07-03 富泰华工业(深圳)有限公司 Electronic device and natural language analyzing method thereof
CN104269168A (en) * 2014-09-24 2015-01-07 上海伯释信息科技有限公司 Voice recognition method with higher work efficiency
CN106710592A (en) * 2016-12-29 2017-05-24 北京奇虎科技有限公司 Speech recognition error correction method and speech recognition error correction device used for intelligent hardware equipment

Family Cites Families (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050049868A1 (en) * 2003-08-25 2005-03-03 Bellsouth Intellectual Property Corporation Speech recognition error identification method and system
CN1979638A (en) * 2005-12-02 2007-06-13 中国科学院自动化研究所 Method for correcting error of voice identification result
CN101154228A (en) * 2006-09-27 2008-04-02 西门子公司 A segmented pattern matching method and device thereof
JP4845118B2 (en) * 2006-11-20 2011-12-28 富士通株式会社 Speech recognition apparatus, speech recognition method, and speech recognition program
CN101655837B (en) * 2009-09-08 2010-10-13 北京邮电大学 Method for detecting and correcting error on text after voice recognition
CN103514882B (en) * 2012-06-30 2017-11-10 北京百度网讯科技有限公司 A kind of audio recognition method and system
JP2014137430A (en) * 2013-01-16 2014-07-28 Sharp Corp Electronic apparatus and cleaner
CN105374356B (en) * 2014-08-29 2019-07-30 株式会社理光 Audio recognition method, speech assessment method, speech recognition system and speech assessment system
CN105632499B (en) * 2014-10-31 2019-12-10 株式会社东芝 Method and apparatus for optimizing speech recognition results
CN105488027B (en) * 2015-11-30 2019-07-12 百度在线网络技术(北京)有限公司 The method for pushing and device of keyword
CN106098060B (en) * 2016-05-19 2020-01-31 北京搜狗科技发展有限公司 Method and device for error correction processing of voice

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7058575B2 (en) * 2001-06-27 2006-06-06 Intel Corporation Integrating keyword spotting with graph decoder to improve the robustness of speech recognition
CN102682763A (en) * 2011-03-10 2012-09-19 北京三星通信技术研究有限公司 Method, device and terminal for correcting named entity vocabularies in voice input text
CN103186523A (en) * 2011-12-30 2013-07-03 富泰华工业(深圳)有限公司 Electronic device and natural language analyzing method thereof
CN104269168A (en) * 2014-09-24 2015-01-07 上海伯释信息科技有限公司 Voice recognition method with higher work efficiency
CN106710592A (en) * 2016-12-29 2017-05-24 北京奇虎科技有限公司 Speech recognition error correction method and speech recognition error correction device used for intelligent hardware equipment

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109977412A (en) * 2019-03-29 2019-07-05 北京林业大学 A kind of field value error correction method, device, readable medium and storage control
CN109977412B (en) * 2019-03-29 2022-12-27 北京林业大学 Method and device for correcting field value of voice recognition text and storage controller
CN110232129A (en) * 2019-06-11 2019-09-13 北京百度网讯科技有限公司 Scene error correction method, device, equipment and storage medium
CN110232129B (en) * 2019-06-11 2020-09-29 北京百度网讯科技有限公司 Scene error correction method, apparatus, device and storage medium
CN110516237A (en) * 2019-08-15 2019-11-29 重庆长安汽车股份有限公司 Short text phrase extracting method, system and storage medium
CN110516237B (en) * 2019-08-15 2022-12-09 重庆长安汽车股份有限公司 Short text phrase extraction method, system and storage medium
CN113808577A (en) * 2021-09-18 2021-12-17 平安银行股份有限公司 Intelligent extraction method, device, electronic device and storage medium for speech abstract

Also Published As

Publication number Publication date
CN106710592B (en) 2021-05-18
CN106710592A (en) 2017-05-24

Similar Documents

Publication Publication Date Title
WO2018121275A1 (en) Method and apparatus for error connection of voice recognition in smart hardware device
US10803869B2 (en) Voice enablement and disablement of speech processing functionality
US11564090B1 (en) Audio verification
US12136417B2 (en) Domain and intent name feature identification and processing
US10339166B1 (en) Systems and methods for providing natural responses to commands
US11237793B1 (en) Latency reduction for content playback
US8972260B2 (en) Speech recognition using multiple language models
CN108228132B (en) Voice enabling device and method executed therein
US10628483B1 (en) Entity resolution with ranking
CN111710333B (en) Method and system for generating speech transcription
KR101670150B1 (en) Systems and methods for name pronunciation
TWI711967B (en) Method, device and equipment for determining broadcast voice
CN110557589A (en) System and method for integrating recorded content
JP2011033874A (en) Device for multilingual voice recognition, multilingual voice recognition dictionary creation method
CN106710585B (en) Method and system for broadcasting polyphonic characters during voice interaction
CN104462071A (en) SPEECH TRANSLATION APPARATUS and SPEECH TRANSLATION METHOD
CN110750996B (en) Method and device for generating multimedia information and readable storage medium
CN114783424A (en) Text corpus screening method, device, equipment and storage medium
CN113076397A (en) Intention recognition method and device, electronic equipment and storage medium
Chen et al. A proof-of-concept study for automatic speech recognition to transcribe AAC speakers’ speech from high-technology AAC systems
CN113744718A (en) Voice text output method and device, storage medium and electronic device
CN114758665B (en) Audio data enhancement method and device, electronic equipment and storage medium
US11632345B1 (en) Message management for communal account
US11563708B1 (en) Message grouping
CN110895938A (en) Voice correction system and voice correction method

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17886786

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 17886786

Country of ref document: EP

Kind code of ref document: A1