[go: up one dir, main page]

CN110310631A - Speech recognition method, device, server and storage medium - Google Patents

Speech recognition method, device, server and storage medium Download PDF

Info

Publication number
CN110310631A
CN110310631A CN201910578399.4A CN201910578399A CN110310631A CN 110310631 A CN110310631 A CN 110310631A CN 201910578399 A CN201910578399 A CN 201910578399A CN 110310631 A CN110310631 A CN 110310631A
Authority
CN
China
Prior art keywords
map
information
search
current user
voice
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910578399.4A
Other languages
Chinese (zh)
Inventor
李扬
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN201910578399.4A priority Critical patent/CN110310631A/en
Publication of CN110310631A publication Critical patent/CN110310631A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Navigation (AREA)

Abstract

本发明实施例公开了一种语音识别方法、装置、服务器和存储介质。该方法包括:对当前用户语音进行地图信息搜索,确定匹配的至少一个候选信息;依据当前用户的地图领域特征,对所述至少一个候选信息进行消歧处理,以确定所述当前用户语音的地图信息识别结果。本发明实施例通过对于地图领域信息的专项搜索得到的候选信息进行消歧处理,不仅去除了可能存在的通用领域知识对于地图搜索的干扰,而且避免了歧义以及口音等导致的误判,使得搜索得到的地图信息识别结果更加符合用户习惯和需求,大幅提高了地图语音搜索的语音识别准确率。

The embodiment of the invention discloses a voice recognition method, device, server and storage medium. The method includes: performing map information search on the current user's voice, and determining at least one matching candidate information; performing disambiguation processing on the at least one candidate information according to the current user's map domain characteristics, so as to determine the map of the current user's voice information identification results. The embodiment of the present invention disambiguates the candidate information obtained from the special search of the map domain information, which not only removes the possible interference of general domain knowledge on the map search, but also avoids misjudgments caused by ambiguity and accents, making the search The obtained map information recognition results are more in line with the user's habits and needs, which greatly improves the voice recognition accuracy of map voice search.

Description

语音识别方法、装置、服务器和存储介质Speech recognition method, device, server and storage medium

技术领域technical field

本发明实施例涉及语音识别技术领域,尤其涉及一种语音识别方法、装置、服务器和存储介质。The embodiments of the present invention relate to the technical field of voice recognition, and in particular, to a voice recognition method, device, server and storage medium.

背景技术Background technique

地图语音搜索是目前地图重要的功能,通过使用语音进行输入与交互,替代手动输入来进行地图类信息的搜索查询,可以极大方便用户输入,更适用于地图驾车场景。Map voice search is an important function of maps at present. By using voice input and interaction instead of manual input to search and query map information, it can greatly facilitate user input and is more suitable for map driving scenarios.

目前,可以调用第三方成熟的语音输入法接口,为地图语音搜索提供语音识别功能的支持。其中,所采用的语音识别模型通常为基于大规模互联网数据训练得到的,具有语音识别的通用性。或者,利用具有地图领域经验的地图语料,重新训练一版专用于地图语音识别的模型。At present, a third-party mature voice input method interface can be called to provide voice recognition support for map voice search. Among them, the speech recognition model used is usually obtained based on large-scale Internet data training, and has the versatility of speech recognition. Alternatively, retrain a version of the model specifically for map speech recognition using a map corpus with experience in the map domain.

然而,通用的语音识别模型缺少地图领域经验,不适用于地图语音搜索场景。当将通用的语音识别模型使用到地图场景中后,地图里各地图类信息的名称有大部分在通用语音识别模型中均是低频、少见、甚至是不存在的生疏词汇,直接使用通用的语音识别模型准确率会非常差。此外基于地图语料重新训练专用的语音识别模型的成本较高,且难以克服噪音、区域差异导致的地图信息识别错误的问题,地图语音搜索的准确率较低。However, general-purpose speech recognition models lack map domain experience and are not suitable for map speech search scenarios. When the general-purpose speech recognition model is used in the map scene, most of the names of map information in the map are low-frequency, rare, or even non-existent unfamiliar words in the general-purpose speech recognition model, and the general-purpose speech is used directly The recognition model accuracy will be very poor. In addition, the cost of retraining a dedicated speech recognition model based on map corpus is high, and it is difficult to overcome the problem of map information recognition errors caused by noise and regional differences, and the accuracy of map voice search is low.

发明内容Contents of the invention

本发明实施例提供了一种语音识别方法、装置、服务器和存储介质,能够提高地图语音搜索的语音识别准确率。Embodiments of the present invention provide a voice recognition method, device, server and storage medium, which can improve the voice recognition accuracy rate of map voice search.

第一方面,本发明实施例提供了一种语音识别方法,包括:In a first aspect, an embodiment of the present invention provides a speech recognition method, including:

对当前用户语音进行地图信息搜索,确定匹配的至少一个候选信息;Carrying out a map information search on the current user's voice, and determining at least one matching candidate information;

依据当前用户的地图领域特征,对所述至少一个候选信息进行消歧处理,以确定所述当前用户语音的地图信息识别结果。Disambiguation processing is performed on the at least one piece of candidate information according to the characteristics of the current user's map domain, so as to determine the map information recognition result of the current user's voice.

第二方面,本发明实施例提供了一种语音识别装置,包括:In a second aspect, an embodiment of the present invention provides a speech recognition device, including:

候选信息确定模块,用于对当前用户语音进行地图信息搜索,确定匹配的至少一个候选信息;Candidate information determination module, used to search the current user voice for map information, and determine at least one matching candidate information;

语音识别消歧模块,用于依据当前用户的地图领域特征,对所述至少一个候选信息进行消歧处理,以确定所述当前用户语音的地图信息识别结果。The voice recognition disambiguation module is configured to perform disambiguation processing on the at least one candidate information according to the current user's map domain characteristics, so as to determine the map information recognition result of the current user's voice.

第三方面,本发明实施例提供了一种服务器,包括:In a third aspect, an embodiment of the present invention provides a server, including:

一个或多个处理器;one or more processors;

存储器,用于存储一个或多个程序;memory for storing one or more programs;

当所述一个或多个程序被所述一个或多个处理器执行,使得所述一个或多个处理器实现本发明任意实施例所述的语音识别方法。When the one or more programs are executed by the one or more processors, the one or more processors implement the speech recognition method in any embodiment of the present invention.

第四方面,本发明实施例提供了一种计算机可读存储介质,其上存储有计算机程序,该程序被处理器执行时实现本发明任意实施例所述的语音识别方法。In a fourth aspect, an embodiment of the present invention provides a computer-readable storage medium, on which a computer program is stored, and when the program is executed by a processor, the speech recognition method described in any embodiment of the present invention is implemented.

本发明实施例通过对用户语音进行专门的地图信息搜索,确定与用户语音匹配的多个候选信息,并依据当前用户的地图领域特征,对多个候选信息进行消歧处理,从多个候选信息中筛选出与用户最匹配的候选信息作为地图信息识别结果。本发明实施例通过对于地图领域信息的专项搜索得到的候选信息进行消歧处理,不仅去除了可能存在的通用领域知识对于地图搜索的干扰,而且避免了歧义以及口音等导致的误判,使得搜索得到的地图信息识别结果更加符合用户习惯和需求,大幅提高了地图语音搜索的语音识别准确率。In the embodiment of the present invention, a plurality of candidate information matching the user's voice is determined by performing a special map information search on the user's voice, and disambiguation processing is performed on the multiple candidate information according to the current user's map domain characteristics, and the multiple candidate information The candidate information that best matches the user is screened out as the map information recognition result. The embodiment of the present invention disambiguates the candidate information obtained by the special search of the map domain information, which not only removes the possible interference of the general domain knowledge on the map search, but also avoids misjudgment caused by ambiguity and accent, making the search The obtained map information recognition results are more in line with the user's habits and needs, which greatly improves the voice recognition accuracy of map voice search.

附图说明Description of drawings

图1为本发明实施例一提供的一种语音识别方法的流程图;FIG. 1 is a flow chart of a speech recognition method provided by Embodiment 1 of the present invention;

图2为本发明实施例二提供的一种语音识别方法的流程图;FIG. 2 is a flow chart of a speech recognition method provided by Embodiment 2 of the present invention;

图3为本发明实施例二提供的语音识别的整体架构图;FIG. 3 is an overall architecture diagram of speech recognition provided by Embodiment 2 of the present invention;

图4为本发明实施例三提供的一种语音识别装置的结构示意图;FIG. 4 is a schematic structural diagram of a voice recognition device provided in Embodiment 3 of the present invention;

图5为本发明实施例四提供的一种服务器的结构示意图。FIG. 5 is a schematic structural diagram of a server provided by Embodiment 4 of the present invention.

具体实施方式Detailed ways

下面结合附图和实施例对本发明实施例作进一步的详细说明。可以理解的是,此处所描述的具体实施例仅仅用于解释本发明实施例,而非对本发明的限定。另外还需要说明的是,为了便于描述,附图中仅示出了与本发明实施例相关的部分而非全部结构。The embodiments of the present invention will be further described in detail below in conjunction with the drawings and embodiments. It should be understood that the specific embodiments described here are only used to explain the embodiments of the present invention, rather than to limit the present invention. In addition, it should be noted that, for the convenience of description, the drawings only show some but not all structures related to the embodiments of the present invention.

另外还需要说明的是,为了便于描述,附图中仅示出了与本申请相关的部分而非全部内容。在更加详细地讨论示例性实施例之前应当提到的是,一些示例性实施例被描述成作为流程图描绘的处理或方法。虽然流程图将各项操作(或步骤)描述成顺序的处理,但是其中的许多操作可以被并行地、并发地或者同时实施。此外,各项操作的顺序可以被重新安排。当其操作完成时所述处理可以被终止,但是还可以具有未包括在附图中的附加步骤。所述处理可以对应于方法、函数、规程、子例程、子程序等等。In addition, it should be noted that, for the convenience of description, only parts relevant to the present application are shown in the drawings but not all content. Before discussing the exemplary embodiments in more detail, it should be mentioned that some exemplary embodiments are described as processes or methods depicted as flowcharts. Although the flowcharts describe various operations (or steps) as sequential processing, many of the operations may be performed in parallel, concurrently, or simultaneously. In addition, the order of operations can be rearranged. The process may be terminated when its operations are complete, but may also have additional steps not included in the figure. The processing may correspond to a method, function, procedure, subroutine, subroutine, or the like.

实施例一Embodiment one

图1为本发明实施例一提供的一种语音识别方法的流程图,本实施例可适用于依据用户语音进行地图信息搜索的情况,该方法可由一种语音识别装置来执行,该装置可以采用软件和/或硬件的方式实现,优选是配置于服务器。该方法具体包括如下:Fig. 1 is a flow chart of a voice recognition method provided by Embodiment 1 of the present invention. This embodiment is applicable to the situation of searching for map information based on user voice, and the method can be executed by a voice recognition device, which can adopt It is realized by means of software and/or hardware, and is preferably configured on a server. The method specifically includes the following:

S110、对当前用户语音进行地图信息搜索,确定匹配的至少一个候选信息。S110. Search the current user's voice for map information, and determine at least one matching candidate information.

在本发明具体实施例中,地图产品都提供了地图信息搜索功能,尤其是为了便于驾驶导航的语音搜索功能。当前用户语音是指用户使用地图产品时,向地图产品所输入的用于地图信息查询的语音搜索请求,用户语音中可以包括至少一个待搜索地点的相关信息、对于待搜索地点的限制条件等。地图产品对于用户提交的语音搜索请求,搜索相关的POI(Point of Interest,兴趣点)数据并返回给用户,地图中的每一条POI数据可以包括名称、类别、经纬度以及重要度等多种信息。通常地图搜索可以包括精确搜索和模糊搜索,其中,精确搜索是指地图搜索中,用户提交的语音搜索请求是针对某个具体POI数据点的查找,模糊搜索可以是基于sug引擎,根据语音搜索请求中的部分信息、相似度或修正后的语音进行的宽泛搜索。In specific embodiments of the present invention, map products provide a search function for map information, especially a voice search function for driving and navigation. The current user voice refers to the voice search request input by the user to the map product for map information query when the user uses the map product. The user voice may include information about at least one location to be searched, restrictions on the location to be searched, etc. For the voice search request submitted by the user, the map product searches for relevant POI (Point of Interest) data and returns it to the user. Each piece of POI data in the map can include various information such as name, category, latitude and longitude, and importance. Generally, map search can include precise search and fuzzy search, wherein, precise search refers to that in map search, the voice search request submitted by the user is a search for a specific POI data point, and fuzzy search can be based on the sug engine, according to the voice search request A broad search for partial information, similarities, or corrected speech in .

本实施例中,采用语音识别模型对用户语音进行识别,语音识别模型通常可以包括声学子模型和语言子模型。其中,为了提高地图信息搜索的准确度,采用通用语料和地图语料同时训练语音识别模型。本实施了中为了降低语音识别模型训练的成本,在传统的基于通用语料训练的通用语音识别模型的基础上,鉴于地图中的POI数据有较大部分属于低频、少见的生疏词汇,因此可以直接采用地图语料对通用语音识别模型中的语言子模型进行二次训练,以加强语音识别模型对于地图语料的识别能力。In this embodiment, the speech recognition model is used to recognize the user's speech, and the speech recognition model may generally include an acoustic sub-model and a language sub-model. Among them, in order to improve the accuracy of the map information search, the general corpus and the map corpus are used to train the speech recognition model at the same time. In order to reduce the cost of speech recognition model training in this implementation, on the basis of the traditional general-purpose speech recognition model based on general-purpose corpus training, since a large part of the POI data in the map belongs to low-frequency and rare unfamiliar words, it can be directly The language sub-model in the general speech recognition model is trained twice by using the map corpus to strengthen the recognition ability of the speech recognition model for the map corpus.

本实施例中,区别于传统的地图搜索中将用户语音识别为文本,并依据文本进行搜索的方式。在新方案中,语音识别不直接输出文本,而是改为输出由多个音素构成的音素表示作为模糊音,从而基于模糊音进行地图信息搜索。其中,模糊音可以是指非严格要求的拼音,即模糊音在形式上类似于拼音,但是模糊音中可能存在不符合拼音拼写规则的差异,且模糊音从整体上仍然保留了声音相关的特征。从而基于模糊音进行搜索,得到与模糊音发音相匹配的至少一个候选信息。其中,候选信息可以是与用户发音相同或相近的候选识别文本,也可以是与用户口音一致或关联扩展的候选识别文本。In this embodiment, it is different from the traditional map search method in which the user's voice is recognized as text and the search is performed based on the text. In the new scheme, speech recognition does not directly output text, but instead outputs phoneme representations composed of multiple phonemes as fuzzy sounds, so that map information can be searched based on fuzzy sounds. Among them, fuzzy sounds can refer to pinyin that is not strictly required, that is, fuzzy sounds are similar to pinyin in form, but there may be differences in fuzzy sounds that do not conform to the spelling rules of pinyin, and fuzzy sounds still retain sound-related features as a whole . Therefore, a search is performed based on the fuzzy sound, and at least one candidate information matching the pronunciation of the fuzzy sound is obtained. The candidate information may be a candidate recognition text that is the same as or similar to the user's pronunciation, or a candidate recognition text that is consistent with or associated with the user's accent.

具体的,在语音识别模型中,通常只有识别出对应的文本信息,才能视为一次正常的语音识别过程,进而可以从中提取出任一环节的中间信息。因此,本实施例通过对通用语音识别模型中语言子模型的二次训练,提高地图类信息的识别命中率,以减少地图类信息无法识别的异常情况。相应的,采用地图语料训练的语音识别模对用户语音进行识别而得到识别文本时,即一次正常的语音识别后,可以获取用户语音基于声学子模型识别的音素表示,将用户语音的音素表示作为模糊音,且无需或者尽量少的考虑文本识别结果。进而基于模糊音进行搜索,在搜索的过程中,可以实现对音素表示的纠错和扩展,得到多个不同的音素表示作为变异音素表示,并搜索得到与原音素表示以及变异音素表示分别相匹配的候选信息。Specifically, in the speech recognition model, usually only when the corresponding text information is recognized can it be regarded as a normal speech recognition process, and then the intermediate information of any link can be extracted from it. Therefore, in this embodiment, the recognition hit rate of the map-type information is improved through secondary training of the language sub-model in the general speech recognition model, so as to reduce the abnormal situation that the map-type information cannot be recognized. Correspondingly, when the speech recognition module trained with map corpus is used to recognize the user's speech to obtain the recognized text, that is, after a normal speech recognition, the phoneme representation of the user's speech based on the acoustic sub-model recognition can be obtained, and the phoneme representation of the user's speech is used as Fuzzy sounds, and do not need to consider the text recognition results or as little as possible. Then search based on fuzzy sounds. In the process of searching, the error correction and expansion of phoneme representation can be realized, and multiple different phoneme representations can be obtained as variant phoneme representations, and the search results match the original phoneme representation and the variant phoneme representation respectively. candidate information.

示例性的,假设基于用户语音识别出模糊音输出为“cangshangcun”,基于模糊音进行地图信息搜索,可以匹配得到的候选信息包括“通州区苍上村”、“通州区仓上村”以及“通州区仓场村”等。For example, assuming that the fuzzy sound is output as "cangshangcun" based on the user's speech recognition, and the map information search is performed based on the fuzzy sound, the candidate information that can be matched includes "Cangshang Village, Tongzhou District", "Cangshang Village, Tongzhou District" and "Tongzhou District Cangchang Village" and so on.

S120、依据当前用户的地图领域特征,对至少一个候选信息进行消歧处理,以确定当前用户语音的地图信息识别结果。S120. Perform disambiguation processing on at least one piece of candidate information according to the map domain characteristics of the current user, so as to determine a map information recognition result of the current user's voice.

在本发明具体实施例中,鉴于语音识别只能以用户发音为依据进行搜索,语音搜索中可能存在与用户语音发音相同但内容无关联的搜索结果,也可能存在用户口音偏差导致的错误搜索结果。因此本实施例在得到与用户语音相匹配的至少一个候选信息之后,还需要对候选信息进行消除歧义处理,以滤除候选信息中的干扰信息,提高地图信息识别结果确定的准确度。In the specific embodiment of the present invention, in view of the fact that voice recognition can only be searched based on the user's pronunciation, there may be search results in the voice search that are the same as the user's voice pronunciation but not related to the content, and there may also be erroneous search results caused by the user's accent deviation . Therefore, in this embodiment, after obtaining at least one candidate information that matches the user's voice, it is necessary to disambiguate the candidate information to filter out interference information in the candidate information and improve the accuracy of map information recognition results.

本实施例中,可以采用当前用户的地图领域特征对候选信息进行消歧处理。具体的,当前用户的地图领域特征包括当前地图搜索场景特征,当前用户行为特征,以及候选文本的地图信息搜索质量特征中的至少一项。In this embodiment, the current user's map domain characteristics can be used to disambiguate the candidate information. Specifically, the current user's map domain feature includes at least one of the current map search scene feature, the current user behavior feature, and the map information search quality feature of the candidate text.

其中,当前地图搜索场景特征可以包括用户当前进行搜索时用户本身所处的场景特征,例如用户当前的位置,进而可以依据用户当前位置与候选信息所表示位置之间的空间位置关系,例如空间隶属、空间相邻以及空间远离等,对候选信息进行筛选。示例性的,根据用户当前的GPS信息确定用户当前所处于的城市、行政区域等,可以滤除该城市、行政区域以外的候选信息。此外,当前地图搜索场景特征还可以包括用户语音中的空间位置描述,例如用户语音中对POI的区域限制等空间位置描述,进而可以依据空间位置描述滤除不满足限制条件的候选信息。示例性的,用户语音中包括限制信息“A城市的POI点B”,候选信息中包括“S城市的POI点B”,进而滤除该候选信息。Among them, the current map search scene feature may include the scene feature where the user is in when the user is currently searching, such as the user's current location, and then it may be based on the spatial position relationship between the user's current location and the location represented by the candidate information, such as the spatial affiliation , spatial proximity and spatial distance, etc., to screen candidate information. Exemplarily, the city, administrative area, etc. where the user is currently located is determined according to the current GPS information of the user, and candidate information other than the city and administrative area may be filtered out. In addition, the current map search scene feature may also include the spatial location description in the user's voice, such as the spatial location description of POI area restrictions in the user's voice, and then filter out candidate information that does not meet the restriction according to the spatial location description. Exemplarily, the user voice includes restriction information "POI point B in city A", and candidate information includes "POI point B in city S", and then the candidate information is filtered out.

其次,当前用户行为特征可以包括用户进行地图搜索的历史搜索行为,鉴于用户对于同一POI的重复搜索概率较大,因此可以根据用户的历史搜索行为,确定各个候选信息的重复搜索概率,进而滤除重复搜索概率较低的候选信息。此外,当前用户行为特征还可以包括用户的口音特征。通常根据地域差异,用户可能存在前后鼻音不分的情况,例如an与ang、en与eng、in与ing、ian与iang以及uan与uang等,平舌与翘舌不分的情况,例如z与zh、c与ch以及s与sh等,或者对于个别区域的用户,用户可能存在f与h、r与l不分的情况等。因此本实施例中的地图语音搜索,可以基于用户的历史搜索行为,确定用户的发音的偏向特征等发音习惯,从而对用户存在发音偏向的音素进行保留和扩展,以避免仅能够根据用户已偏向的发音导致识别错误的情况。示例性的,假设采用打分的方式对候选信息进行筛选,若确定当前用户对于f与h的发音偏向为f,则在检测到存在f音素的候选信息时,将f音素扩展为h音素,并同时将h音素的候选信息的打分设置为与f音素的候选信息相同,进而以最大可能囊括可能正确的候选信息。Secondly, the current user behavior characteristics can include the user's historical search behavior of map search. Since the user has a high probability of repeated search for the same POI, the repeated search probability of each candidate information can be determined according to the user's historical search behavior, and then filtered out Candidate information with lower probability is repeatedly searched. In addition, the current user behavior characteristics may also include the user's accent characteristics. Usually, according to regional differences, users may not distinguish front and back nasal sounds, such as an and ang, en and eng, in and ing, ian and iang, uan and uang, etc., and flat tongues and warped tongues, such as z and zh, c and ch, s and sh, etc., or for users in individual areas, users may have cases where f and h, r and l are not distinguished, etc. Therefore, the map voice search in this embodiment can determine the pronunciation habits of the user, such as the biased characteristics of the user's pronunciation, based on the user's historical search behavior, so as to retain and expand the phonemes that the user has a biased pronunciation, so as to avoid only being able to use the user's biased phoneme. Pronunciation leads to misidentification. Exemplarily, it is assumed that candidate information is screened by scoring. If it is determined that the current user's pronunciation preference for f and h is f, then when f phoneme candidate information is detected, f phoneme is expanded to h phoneme, and At the same time, the scoring of the candidate information of the h phoneme is set to be the same as that of the candidate information of the f phoneme, so as to include the possible correct candidate information as much as possible.

此外,可以预先将地图语料或者地图类信息确定为地图类信息库,地图类信息库中存储有真实存在的POI的相关信息。相应的,候选文本的地图信息搜索质量特征可以包括候选信息与预设地图类信息库中的地图类信息之间的相似度,即通过将候选信息与真实存在的POI的文本相似度的计算,可以进一步确定候选信息是否属于一个真实存在的POI,避免通用语料中相同或相近发音的词汇对于地图搜索的干扰。相应的,若候选信息与预设地图类信息库中的地图类信息匹配成功或者相似度大于一定的阈值,则说明该候选信息为一个较优的地图搜索结果。进而滤除地图搜索质量较差的候选信息。此外,候选文本的地图信息搜索质量特征还可以包括地图搜索中的广大用户对于候选信息所表示位置的历史搜索需求分布。例如用户对于人烟稀少地区的地图搜索需求较低,对于繁华地区的地图搜索需求较高。因此可以依据广大用户的历史搜索行为,实时或定时的分析各个POI的搜索需求,从而滤除搜索需求较低的候选信息。In addition, map corpus or map information may be pre-determined as a map information base, and information related to real POIs is stored in the map information base. Correspondingly, the map information search quality feature of the candidate text may include the similarity between the candidate information and the map information in the preset map information database, that is, by calculating the text similarity between the candidate information and the real POI, It can be further determined whether the candidate information belongs to a real POI, so as to avoid the interference of words with the same or similar pronunciation in common corpus on the map search. Correspondingly, if the candidate information is successfully matched with the map information in the preset map information database or the similarity is greater than a certain threshold, it indicates that the candidate information is a better map search result. Then filter out the candidate information with poor map search quality. In addition, the map information search quality feature of the candidate text may also include the historical search demand distribution of the majority of users in the map search for the location represented by the candidate information. For example, users have lower demand for map search in sparsely populated areas, but higher demand for map search in prosperous areas. Therefore, according to the historical search behavior of the majority of users, the search requirements of each POI can be analyzed in real time or periodically, so as to filter out candidate information with low search requirements.

值得注意的是,本实施例分别列举了三种地图领域特征,且每种地图领域特征包括至少两种具体情况。其中,本实施例中的地图领域特征不局限于上述示例,任何可以对候选信息进行合理筛选以消除歧义的特征都可以应用与本实施例中,且每种地图领域特征中情况的划分也不局限于上述示例。It should be noted that this embodiment enumerates three map domain features respectively, and each map domain feature includes at least two specific cases. Wherein, the map domain features in this embodiment are not limited to the above examples, and any feature that can reasonably filter candidate information to eliminate ambiguity can be applied to this embodiment, and the division of situations in each map domain feature is not limited. Limited to the examples above.

本实施例中,可以依据地图领域特征中的一种或多种,确定候选信息与当前用户的关联程度,依据候选信息与当前用户的关联程度,确定候选信息中与当前用户语音之间存在歧义的歧义信息,从而滤除候选信息中的歧义信息,以得到当前用户语音的地图信息识别结果。In this embodiment, the degree of association between the candidate information and the current user can be determined according to one or more of the map domain features, and the ambiguity between the candidate information and the current user's voice can be determined according to the degree of association between the candidate information and the current user ambiguity information, so as to filter out the ambiguity information in the candidate information, so as to obtain the map information recognition result of the current user's voice.

示例性的,假设依据候选文本的地图信息搜索质量对候选信息进行消歧,且候选信息中包括“王家卫”以及“王家味”,则通过地图信息搜索质量的检测,可以确定候选信息“王家卫”为通用语料中的导演姓名,而候选信息“王家味”为某餐厅名称,属于实际存在的地图POI,进而候选信息“王家卫”为歧义信息,对其进行滤除,避免对识别结果的干扰。Exemplarily, assuming that the candidate information is disambiguated according to the map information search quality of the candidate text, and the candidate information includes "Wang Jiawei" and "Wang Jiawei", then the candidate information "Wang Jiawei" can be determined by detecting the map information search quality is the name of the director in the general corpus, and the candidate information "Wang Jiawei" is the name of a restaurant, which belongs to the actual map POI, and the candidate information "Wang Jiawei" is ambiguous information, which should be filtered out to avoid interference with the recognition results.

本实施例的技术方案,通过对用户语音进行专门的地图信息搜索,确定与用户语音匹配的多个候选信息,并依据当前用户的地图领域特征,对多个候选信息进行消歧处理,从多个候选信息中筛选出与用户最匹配的候选信息作为地图信息识别结果。本发明实施例通过对于地图领域信息的专项搜索得到的候选信息进行消歧处理,不仅去除了可能存在的通用领域知识对于地图搜索的干扰,而且避免了歧义以及口音等导致的误判,使得搜索得到的地图信息识别结果更加符合用户习惯和需求,大幅提高了地图语音搜索的语音识别准确率。In the technical solution of this embodiment, a plurality of candidate information matching the user's voice is determined by performing a special map information search on the user's voice, and disambiguation processing is performed on the multiple candidate information according to the current user's map domain characteristics, from multiple The candidate information that best matches the user is selected from the candidate information as the map information recognition result. The embodiment of the present invention disambiguates the candidate information obtained by the special search of the map domain information, which not only removes the possible interference of the general domain knowledge on the map search, but also avoids misjudgment caused by ambiguity and accent, making the search The obtained map information recognition results are more in line with the user's habits and needs, which greatly improves the voice recognition accuracy of map voice search.

实施例二Embodiment two

本实施例在上述实施例一的基础上,提供了语音识别方法的一个优选实施方式,能够基于用户语音的音素表示进行地图信息模糊搜索。图2为本发明实施例二提供的一种语音识别方法的流程图,如图2所示,该方法具体包括如下:On the basis of the first embodiment above, this embodiment provides a preferred implementation of the voice recognition method, which can perform fuzzy search for map information based on the phoneme representation of the user's voice. Fig. 2 is a flow chart of a speech recognition method provided by Embodiment 2 of the present invention. As shown in Fig. 2, the method specifically includes the following:

S210、对当前用户语音进行声学特征识别,确定当前用户语音的音素表示。S210. Perform acoustic feature recognition on the current user's voice to determine a phoneme representation of the current user's voice.

在本发明具体实施例中,采用语音识别模型对用户语音进行识别,语音识别模型通常可以包括声学子模型和语言子模型。其中,为了提高地图信息搜索的准确度,采用通用语料和地图语料同时训练语音识别模型。即在传统的基于通用语料训练的通用语音识别模型的基础上,鉴于地图中的POI数据有较大部分属于低频、少见的生疏词汇,因此可以直接采用地图语料对通用语音识别模型中的语言子模型进行二次训练,以加强语音识别模型对于地图语料的识别能力。In a specific embodiment of the present invention, a speech recognition model is used to recognize user speech, and the speech recognition model may generally include an acoustic sub-model and a language sub-model. Among them, in order to improve the accuracy of the map information search, the general corpus and the map corpus are used to train the speech recognition model at the same time. That is, on the basis of the traditional general-purpose speech recognition model based on general-purpose corpus training, since a large part of the POI data in the map belongs to low-frequency and rare unfamiliar words, it is possible to directly use the map corpus to analyze the language subclasses in the general-purpose speech recognition model. The model undergoes secondary training to strengthen the recognition ability of the speech recognition model for map corpus.

本实施例中,采用地图语料训练的语音识别模型对用户语音进行识别得到识别文本,确定为正常的语音识别后,可以获取用户语音基于声学子模型进行声学特征识别得到的音素表示,将用户语音的音素表示作为模糊音。其中,音素是根据语音的自然属性划分出来的最小语音单位。从声学性质来看,音素是从音质角度划分出来的最小语音单位;从生理性质来看,一个发音动作形成一个音素,例如ma包含m、a两个发音动作,是两个音素。相同发音动作发出的音就是同一音素,不同发音动作发出的音就是不同音素,例如ma-mi中,两个m发音动作相同,是相同音素,而a、i发音动作不同,是不同音素。在语音学上将由一个或数个音素组成的语音结构基本单位称为音节。在汉语里,通常一个汉字的字音就是一个音节,普通话的基本音节是由一个到多个音素按一定结合规律构成。In this embodiment, the speech recognition model trained on the map corpus is used to recognize the user's speech to obtain the recognized text. After it is determined to be normal speech recognition, the phoneme representation of the user's speech based on the acoustic sub-model for acoustic feature recognition can be obtained, and the user's speech The phonemes of are represented as fuzzy sounds. Among them, a phoneme is the smallest unit of speech divided according to the natural attributes of speech. From the perspective of acoustic properties, a phoneme is the smallest unit of speech divided from the perspective of sound quality; from the perspective of physiological properties, one pronunciation action forms a phoneme. For example, ma contains two pronunciation actions of m and a, which are two phonemes. The sounds produced by the same pronunciation action are the same phoneme, and the sounds produced by different pronunciation actions are different phonemes. For example, in ma-mi, the two m pronunciation actions are the same and are the same phoneme, while a and i are different phonemes because of different pronunciation actions. In phonetics, the basic unit of speech structure consisting of one or several phonemes is called a syllable. In Chinese, the sound of a Chinese character is usually a syllable, and the basic syllable of Mandarin is composed of one or more phonemes according to certain combination rules.

本实施例中,可以将通过声学子模型进行声学特征识别得到的音素表示作为模糊音,模糊音可以是指非严格要求的拼音,即模糊音在形式上类似于拼音,但是模糊音中可能存在不符合拼音拼写规则的差异,且模糊音从整体上仍然保留了声音相关的特征。示例性的,假设用户语音为“苍上村”,则基于对用户实际语音的识别,可以得到音素表示为“cangshangcun”、“canshangcun”、“changshangcun”或“cagshangcun”等。In this embodiment, the phonemes obtained by acoustic feature recognition through the acoustic sub-model can be represented as fuzzy sounds, and fuzzy sounds can refer to non-strictly required pinyin, that is, fuzzy sounds are similar in form to pinyin, but there may be Differences that do not conform to the spelling rules of Pinyin, and fuzzy sounds still retain sound-related features as a whole. Exemplarily, assuming that the user's voice is "Cangshangcun", based on the recognition of the user's actual voice, the phoneme representation can be obtained as "cangshangcun", "canshangcun", "changshangcun" or "cagshangcun".

S220、依据音素表示进行地图信息搜索,确定与音素表示发音匹配的至少一个候选信息。S220. Search the map information according to the phoneme representation, and determine at least one piece of candidate information that matches the pronunciation of the phoneme representation.

在本发明具体实施例中,音素表示中尽可能囊括了可能正确的发音表示,从而基于音素表示进行地图信息搜索,可以得到与用户发音相同或相近的候选识别文本,也可以是与用户口音一致或关联扩展的候选识别文本。In a specific embodiment of the present invention, the possible correct pronunciation representation is included in the phoneme representation as much as possible, so that the map information search based on the phoneme representation can obtain the candidate recognition text that is the same as or similar to the user's pronunciation, or it can be consistent with the user's accent or associated extended candidate recognition text.

可选的,对音素表示进行纠错和音素扩展,确定至少一个变异音素表示;获得与音素表示,以及至少一个变异音素表示发音匹配的至少一个候选信息。Optionally, perform error correction and phoneme expansion on the phoneme representation, determine at least one variant phoneme representation; obtain at least one candidate information that matches the pronunciation of the phoneme representation and the at least one variant phoneme representation.

本实施例中,可以以声学子模型识别得到的音素表示为基本音素表示,在基本音素表示的基础上,对基础音素表示进行纠错和扩展,得到与基本音素表示不同的至少一个变异音素表示。其中,对基础音素表示的纠错是指对基础音素表示中存在不符合拼音规则的音素进行修正。例如,在上述实施例中,对于音素表示“cagshangcun”可以修正为变异音素表示“cangshangcun”。对基础音素表示的扩展是指对基础音素表示中可能存在发音偏差的音素进行关联扩展,即将可能存在发音偏差的音素的多种可能偏差结果,都关联得到不同的变异音素表示,避免用户口音问题导致语音识别根源错误。通过变异音素表示的获得,可以尽可能地将可能正确的发音表示都囊括进来,以便提高地图语音搜索的准确率。从而分别基于音素表示和变异音素表示进行地图搜索,匹配得到与音素表示,以及至少一个变异音素表示发音匹配的至少一个候选信息。In this embodiment, the phoneme representation recognized by the acoustic sub-model can be used as a basic phoneme representation, and on the basis of the basic phoneme representation, the basic phoneme representation is corrected and extended to obtain at least one variant phoneme representation that is different from the basic phoneme representation . Wherein, the error correction of the basic phoneme representation refers to correcting the phonemes in the basic phoneme representation that do not conform to the pinyin rules. For example, in the above embodiment, the phoneme expression "cagshangcun" can be modified to a variant phoneme expression "cangshangcun". The extension of the basic phoneme representation refers to the associated expansion of the phonemes that may have pronunciation deviations in the basic phoneme representation, that is, the various possible deviation results of the phonemes that may have pronunciation deviations are associated to obtain different variant phoneme representations to avoid user accent problems Causes speech recognition root error. By obtaining the variant phoneme representation, possible correct pronunciation representations can be included as much as possible, so as to improve the accuracy of map voice search. Therefore, the map search is performed based on the phoneme representation and the variant phoneme representation respectively, and at least one candidate information matching the pronunciation of the phoneme representation and at least one variant phoneme representation is obtained through matching.

示例性的,可以预先设置音素模糊匹配表,在该音素模糊匹配表中定义匹配规则,例如:z=zh、c=ch、s=sh、an=ang、en=eng、in=ing、ian=iang、uan=uang、iong=ing、f=h、r=l以及l=n等。例如,当用户语音为“胡建”时,得到其基本音素表示为“hujian”,进而基于音素的扩展,可以得到至少一个变异音素表示为“fujian”。通过对音素表示的扩展,解决了由于用户口齿不清、发音不准确造成的语音识别识别或者识别错误等问题,进一步提高本实施例中地图信息搜索的准确率。Exemplarily, a phoneme fuzzy matching table can be preset, and matching rules are defined in the phoneme fuzzy matching table, for example: z=zh, c=ch, s=sh, an=ang, en=eng, in=ing, ian =iang, uan=uang, iong=ing, f=h, r=l and l=n etc. For example, when the user's voice is "Hu Jian", its basic phoneme can be expressed as "hujian", and based on phoneme expansion, at least one variant phoneme can be obtained and expressed as "fujian". The expansion of the phoneme representation solves problems such as speech recognition or recognition errors caused by the user's slurred speech and inaccurate pronunciation, and further improves the accuracy of map information search in this embodiment.

通常地图搜索可以包括精确搜索和模糊搜索,其中,精确搜索是指地图搜索中用户提交的语音搜索请求是针对某个具体POI数据点的查找,模糊搜索可以是基于sug引擎,根据语音搜索请求中的部分信息、相似度等进行的宽泛搜索。值得注意的是,音素表示的纠错和扩展过程可以是独立于地图信息搜索之外的独立的处理过程,也可以是集成与地图信息搜索功能之内的预处理过程。Usually map search can include precise search and fuzzy search, wherein precise search means that the voice search request submitted by the user in the map search is a search for a specific POI data point, fuzzy search can be based on the sug engine, according to the voice search request Broad search for partial information, similarity, etc. It is worth noting that the error correction and expansion process of phoneme representation can be an independent processing process independent of the map information search, or it can be a preprocessing process integrated with the map information search function.

S230、依据地图领域特征,确定至少一个候选信息与当前用户的关联程度。S230. Determine the degree of association between at least one piece of candidate information and the current user according to the characteristics of the map domain.

在本发明具体实施例中,当前用户的地图领域特征包括当前地图搜索场景特征,当前用户行为特征,以及候选文本的地图信息搜索质量特征中的至少一项。In a specific embodiment of the present invention, the map domain characteristics of the current user include at least one of the current map search scene characteristics, the current user behavior characteristics, and the map information search quality characteristics of the candidate text.

可选的,当前地图搜索场景特征通过如下方式确定:根据当前用户的当前位置与候选信息所表示位置之间的空间位置关系,确定当前地图搜索场景特征;和/或,将当前用户语音中对于候选信息所表示位置的空间位置描述,作为当前地图搜索场景特征。Optionally, the current map search scene feature is determined in the following manner: according to the spatial position relationship between the current user's current position and the position represented by the candidate information, determine the current map search scene feature; and/or, the current user voice for The spatial position description of the position represented by the candidate information is used as the feature of the current map search scene.

本实施例中,在地图信息搜索中,候选信息通常均表示具体的地图POI,进而候选信息可以间接表示POI的具体位置或位置范围。鉴于用户进行地图信息搜索时往往是正在要前往或计划前往某个目的地,进而可以以用户当前的位置为中心,向外辐射查找与音素表示匹配的POI。例如优先从本市召回POI作为候选信息。因此本实施例可以根据当前用户的当前位置与候选信息所表示位置之间的空间位置关系,例如空间隶属、空间相邻以及空间远离等,确定当前地图搜索场景特征,以对候选信息进行筛选。示例性的,用户当前位置隶属于城市A,候选信息1所表示位置隶属于城市A,候选信息2所表示位置隶属于城市B,则可以确定当前用户的当前位置与候选信息1所表示位置之间的空间位置关系为空间相邻,确定当前用户的当前位置与候选信息2所表示位置之间的空间位置关系为空间远离。In this embodiment, in the map information search, the candidate information generally indicates a specific map POI, and further, the candidate information may indirectly indicate a specific position or position range of the POI. In view of the fact that the user is usually going to or planning to go to a certain destination when searching for map information, and then the user's current location can be centered and radiated outward to search for POIs that match the phoneme representation. For example, priority is given to recalling POIs from this city as candidate information. Therefore, this embodiment can determine the characteristics of the current map search scene according to the spatial position relationship between the current user's current location and the location represented by the candidate information, such as spatial affiliation, spatial adjacency, and spatial distance, so as to filter the candidate information. Exemplarily, if the user's current location belongs to city A, the location represented by candidate information 1 belongs to city A, and the location represented by candidate information 2 belongs to city B, then the difference between the current user's current location and the location represented by candidate information 1 can be determined. The spatial position relationship between them is spatially adjacent, and the spatial positional relationship between the current position of the current user and the position represented by candidate information 2 is determined as spatial distance.

此外,在地图信息搜索中,用户语音中可能包含对于待搜索POI的空间位置描述,空间位置描述中包含了待搜索POI与其他至少一个位置或位置范围之间的关系。因此本实施例可以将当前用户语音中对于候选信息所表示位置的空间位置描述,作为当前地图搜索场景特征,以对候选信息进行筛选。示例性的,假设用户语音为“城市A的餐厅S”,则将餐厅S隶属于城市A的空间位置描述这一限制条件作为当前地图搜索场景特征。In addition, in the map information search, the user's voice may include a spatial location description of the POI to be searched, and the spatial location description includes a relationship between the POI to be searched and at least one other location or location range. Therefore, in this embodiment, the spatial position description of the position represented by the candidate information in the current user's voice can be used as the feature of the current map search scene to filter the candidate information. Exemplarily, assuming that the user's voice is "restaurant S in city A", the constraint condition that restaurant S belongs to the spatial location description of city A is taken as the feature of the current map search scene.

可选的,当前用户行为特征通过如下方式确定:确定当前用户对于候选信息所表示位置的历史搜索行为,和当前用户的发音习惯;根据历史搜索行为和/或当前用户的发音习惯,确定当前用户行为特征。Optionally, the behavior characteristics of the current user are determined in the following manner: determine the historical search behavior of the current user for the position represented by the candidate information, and the pronunciation habits of the current user; Behavioral characteristics.

本实施例中,对于当前正在进行地图信息搜索的用户,可以获取该用户的历史搜索行为,并确定当前用户对于候选信息所表示位置的历史搜索行为,将当前用户对于候选信息所表示位置的历史搜索行为作为当前用户行为特征。例如可以得到当前用户对于候选信息所表示位置的历史搜索时间和历史搜索次数。此外,可以基于用户的历史搜索行为,确定用户发音的偏向特征等发音习惯,将当前用户的发音习惯作为当前用户行为特征。例如确定当前用户对于f和h的发音偏向为h。In this embodiment, for a user who is currently searching for map information, the user's historical search behavior can be obtained, and the current user's historical search behavior for the location represented by the candidate information can be determined, and the current user's historical search behavior for the location represented by the candidate information can be obtained. Search behavior is used as the current user behavior characteristics. For example, the historical search time and historical search times of the current user for the location indicated by the candidate information can be obtained. In addition, based on the user's historical search behavior, pronunciation habits such as biased features of the user's pronunciation can be determined, and the current user's pronunciation habit can be used as the current user behavior feature. For example, it is determined that the current user's pronunciation preference for f and h is h.

可选的,候选文本的地图信息搜索质量特征通过如下方式确定:确定候选信息与预设地图类信息库中的地图类信息之间的相似度,和地图搜索类用户对于候选信息所表示位置的历史搜索需求分布;根据相似度和/或历史搜索需求分布,确定候选文本的地图信息搜索质量特征。Optionally, the map information search quality feature of the candidate text is determined in the following manner: determining the similarity between the candidate information and the map information in the preset map information database, and the map search user's perception of the position indicated by the candidate information Historical search demand distribution; according to the similarity and/or historical search demand distribution, determine the map information search quality characteristics of the candidate text.

本实施例中,可以预先将地图语料或者地图类信息确定为地图类信息库,地图类信息库中存储有真实存在的POI的相关信息。其中,若候选信息与预设地图类信息库中的地图类信息相匹配,则可以确定该候选信息为实际存在的POI,而非通用语料中发音相似的干扰词语。因此本实施例可以将候选信息与预设地图类信息库中的地图类信息之间的相似度,作为候选文本的地图信息搜索质量特征。示例性的,可以预先设定相似度阈值,若候选信息与预设地图类信息库中的地图类信息之间的相似度满足预设相似度阈值,则可以确定该候选信息的地图信息搜索质量较高。In this embodiment, map corpus or map information may be pre-determined as a map information base, and information related to real POIs is stored in the map information base. Wherein, if the candidate information matches the map information in the preset map information database, it can be determined that the candidate information is an actual POI, rather than an interfering word with similar pronunciation in the common corpus. Therefore, in this embodiment, the similarity between the candidate information and the map information in the preset map information database can be used as the map information search quality feature of the candidate text. Exemplarily, the similarity threshold can be preset, and if the similarity between the candidate information and the map information in the preset map information library meets the preset similarity threshold, the map information search quality of the candidate information can be determined higher.

此外,地图搜索类用户的历史搜索行为,可以从宏观上反映了广大用户对于地图信息的搜索趋势,因此可以实时或定时的获取地图搜索类用户的历史搜索行为,确定地图搜索类用户对于候选信息所表示位置的历史搜索需求分布,将地图搜索类用户对于候选信息所表示位置的历史搜索需求分布,作为候选文本的地图信息搜索质量特征。示例性的,地图类用户对于城市A的市中心区域中POI的搜索需求较高,而对于城市A周边开发区的搜索需求较低。又例如,随着小视频的发展,近期内地图类用户对于某网红POI的搜索需求大幅度增高。In addition, the historical search behavior of map search users can reflect the search trend of the majority of users for map information from a macro perspective, so it is possible to obtain the historical search behavior of map search users in real time or regularly, and determine the map search users' preference for candidate information. The historical search demand distribution of the represented location takes the historical search demand distribution of the map search users for the location represented by the candidate information as the map information search quality feature of the candidate text. Exemplarily, map users have higher search requirements for POIs in the downtown area of city A, but lower search requirements for development areas around city A. For another example, with the development of small videos, the search demand of map users for a certain Internet celebrity POI has increased significantly in the near future.

本实施例中,依据地图领域特征中的至少一项,确定各个候选信息与当前用户的关联程度。示例性的,对于当前地图搜索场景特征,可以基于空间隶属关系,确定用户当前所隶属的区域或城市,对于与该区域或城市之间具有相同隶属关系的候选信息,确定该候选信息与用户的关联程度为较大值。例如,当前用户的当前位置隶属于城市A,则将隶属于城市A的候选信息与用户的关联程度设置为较大值。此外,还可以对于满足用户语音中空间位置描述的候选信息,确定该候选信息与当前用户之间的关联程度为较大值,而反之,对于不满足空间位置描述的候选信息,确定该候选信息与当前用户之间的关联程度为较小值,甚至为零。In this embodiment, according to at least one of the map domain characteristics, the degree of association between each candidate information and the current user is determined. Exemplarily, for the scene features of the current map search, the region or city to which the user currently belongs may be determined based on the spatial affiliation, and for candidate information having the same affiliation with the region or city, determine the relationship between the candidate information and the user's The degree of association is a larger value. For example, if the current location of the current user belongs to city A, then the degree of association between the candidate information belonging to city A and the user is set to a larger value. In addition, for candidate information that satisfies the spatial position description in the user's voice, determine that the degree of association between the candidate information and the current user is a relatively large value, and conversely, for candidate information that does not satisfy the spatial position description, determine that the candidate information The degree of association with the current user is small, or even zero.

示例性的,对于当前用户行为特征,基于用户重复搜索概率较大为原则,可以依据当前用户的历史搜索行为中,对于候选信息所表示位置的历史搜索时间和历史搜索数量,对于一定历史搜索时间内历史搜索数量越高的候选信息,确定其与当前用户的关联程度越大。此外,还可以结合当前用户的发音习惯,若确定当前用户对于至少两个音素的发音存在混淆,则可以确定该至少两个音素对应的候选信息与当前用户之间的关联程度相同。Exemplarily, for the current user behavior characteristics, based on the principle that the user has a higher probability of repeated searches, according to the historical search behavior of the current user, for the historical search time and historical search quantity of the location represented by the candidate information, for a certain historical search time Candidate information with a higher number of historical searches in the database is determined to be more relevant to the current user. In addition, in combination with the current user's pronunciation habits, if it is determined that the current user is confused about the pronunciation of at least two phonemes, it may be determined that the candidate information corresponding to the at least two phonemes has the same degree of association with the current user.

再例如,对于候选文本的地图信息搜索质量,若依据候选信息与预设地图类信息库中地图类信息之间的相似度,确定该候选信息为实际存在的地图POI,则确定该候选信息的地图搜索质量较高,可以设置该候选信息与当前用户之间的关联程度为较大值。此外,还可以依据历史搜索需求分布,若候选信息所表示位置的历史搜索需求越高,则可以设置该候选信息与当前用户之间的关联程度越大。For another example, for the map information search quality of the candidate text, if it is determined that the candidate information is an actually existing map POI according to the similarity between the candidate information and the map information in the preset map information database, then the The map search quality is high, and the degree of association between the candidate information and the current user can be set to a relatively large value. In addition, according to the historical search demand distribution, if the historical search demand of the location represented by the candidate information is higher, the degree of association between the candidate information and the current user can be set to be greater.

本实施例中,可以综合地图领域特征中基于各项特征确定的候选信息与当前用户之间的关联程度,综合得到候选信息基于各方面特征而确定的关联程度。其中,可以基于大数据和机器学习模型来综合候选信息在各项特征下的关联程度,得到候选信息与当前用户之间的关联程度;关联程度还可以采用打分的方式,通过对地图领域特征中的各项特征预设权重,基于各项特征的打分结果和权重进行加权求和,得到候选信息与当前用户之间的关联程度。In this embodiment, the degree of association between the candidate information determined based on various characteristics in the map domain characteristics and the current user may be integrated to obtain the degree of association determined based on various characteristics of the candidate information. Among them, the degree of association of candidate information under various characteristics can be synthesized based on big data and machine learning models to obtain the degree of association between candidate information and the current user; The preset weights of each feature of each feature are weighted and summed based on the scoring results and weights of each feature to obtain the degree of association between the candidate information and the current user.

S240、依据至少一个候选信息与当前用户的关联程度,确定与当前用户语音之间存在歧义的歧义信息。S240. Determine ambiguous information that is ambiguous with the current user's voice according to the degree of association between the at least one candidate information and the current user.

在本发明具体实施例中,可以依据候选信息与当前用户的关联程度,对候选信息进行排序,确定关联程度较低的候选信息为地图搜索中干扰搜索结果的歧义信息。其中,可以预先设置关联程度阈值,或者百分比阈值,将低于关联程度阈值,或者关联程度较低的预设百分比阈值数量的候选信息,确定为歧义信息。In a specific embodiment of the present invention, the candidate information may be sorted according to the degree of association between the candidate information and the current user, and the candidate information with a lower degree of association may be determined as ambiguity information that interferes with the search results in the map search. Wherein, an association degree threshold or a percentage threshold may be preset, and candidate information lower than the association degree threshold or a preset percentage threshold number with a lower association degree may be determined as ambiguous information.

S250、从至少一个候选信息中滤除歧义信息,以确定当前用户语音的地图信息识别结果。S250. Filter out ambiguous information from at least one piece of candidate information, so as to determine a map information recognition result of the current user voice.

在本发明具体实施例中,将歧义信息从候选信息中滤除,可以将滤除后的候选信息作为地图信息识别结果,按照关联程度由高至低的顺序展示给用户,以使用户优先看到与本人关联程度最高的地图信息识别结果,还可以参考获得关联程度相对较低的候选信息。或者还可以直接将关联程度最高的候选信息作为地图信息识别结果,并展示给用户。In a specific embodiment of the present invention, the ambiguous information is filtered out from the candidate information, and the filtered candidate information can be used as the identification result of the map information, and displayed to the user in order of the degree of association from high to low, so that the user can preferentially view To the identification result of the map information with the highest degree of correlation with the person, you can also refer to the candidate information with a relatively low degree of correlation. Alternatively, the candidate information with the highest degree of association may be directly used as the map information identification result and displayed to the user.

示例性的,图3为本实施例中语音识别的整体架构图。如图3所示,用户向地图搜索客户端输入语音搜索请求,语音识别模型依据接收到的用户语音进行语音识别,确定音素表示作为模糊音。其中,语音识别模型是在基于通用语料训练得到的通用语音识别模型的基础上,采用地图语料进行二次训练后得到的语音识别模型,进而避免了语音识别模型高成本的重新训练,不仅保持了通用领域的知识,而且提高了语音识别模型对于地图信息的识别准确率。其次,采用模糊音进行地图搜索,搜索中包括音素表示的纠错和扩展,从而搜索得到多个候选信息。其中,基于地图搜索的强大匹配能力,替代简单的拼音到文字的匹配,直接滤除了可能存在的通用领域知识的干扰。最终采用当前用户的地图领域特征对候选信息进行消歧处理,以得到与当前用户关联程度最强的前N个识别文本,将识别文本或者择其关联程度最高的识别文本作为精确文本反馈给用户。Exemplarily, FIG. 3 is an overall architecture diagram of speech recognition in this embodiment. As shown in Figure 3, the user inputs a voice search request to the map search client, and the voice recognition model performs voice recognition based on the received user voice, and determines the phoneme representation as fuzzy sound. Among them, the speech recognition model is based on the general speech recognition model obtained from the general corpus training, and the speech recognition model obtained after the second training using the map corpus, thus avoiding the high cost of retraining the speech recognition model, not only maintaining the Knowledge in the general field, and improve the recognition accuracy of the speech recognition model for map information. Secondly, fuzzy sounds are used for map search, which includes error correction and expansion of phoneme representation, so that multiple candidate information can be searched. Among them, the powerful matching capability based on map search replaces simple pinyin-to-text matching, and directly filters out the possible interference of general domain knowledge. Finally, the current user's map domain characteristics are used to disambiguate the candidate information to obtain the top N recognition texts with the strongest correlation with the current user, and the recognition text or the recognition text with the highest correlation is selected as the accurate text feedback to the user .

本实施例的技术方案,通过基于地图语料优化的语音识别模型,对当前用户语音进行声学特征识别,确定当前用户语音的音素表示,作为模糊音进行音素表示的纠错和扩展,并依据模糊音进行地图信息搜索,得到至少一个可能性候选信息,最终依据当前用户的地图领域特征确定候选信息与当前用户的关联程度,对多个候选信息进行消歧处理,从多个候选信息中筛选出与用户最匹配的候选信息作为地图信息识别结果。本发明实施例通过模糊音的识别,不仅保留了用户语音声音相关的特征,而且规避了文字选择错误的问题,通过语音识别模型的优化以及模糊音的地图搜索,替代了简单的拼音到文字的匹配,去除了可能存在的通用领域知识对于地图搜索的干扰,避免了歧义以及口音等导致的误判,大幅提高了地图语音搜索的语音识别准确率。In the technical solution of this embodiment, through the speech recognition model optimized based on the map corpus, the acoustic feature recognition of the current user's voice is carried out, the phoneme representation of the current user's voice is determined, and the error correction and expansion of the phoneme representation are performed as fuzzy sounds, and based on the fuzzy sounds Carry out map information search to obtain at least one possible candidate information, and finally determine the degree of association between the candidate information and the current user according to the current user's map domain characteristics, disambiguate the multiple candidate information, and filter out the candidate information from the multiple candidate information. The candidate information that best matches the user is used as the map information recognition result. Through the recognition of fuzzy sounds, the embodiment of the present invention not only retains the characteristics of the user's voice and sound, but also avoids the problem of wrong text selection. Through the optimization of the speech recognition model and the map search of fuzzy sounds, it replaces the simple pinyin-to-text Matching removes the possible interference of general domain knowledge on map search, avoids misjudgments caused by ambiguity and accents, and greatly improves the speech recognition accuracy of map speech search.

实施例三Embodiment three

图4为本发明实施例三提供的一种语音识别装置的结构示意图,本实施例可适用于依据用户语音进行地图信息搜索的情况,该装置可配置于服务器,可实现本发明任意实施例所述的语音识别方法。该装置具体包括如下:Fig. 4 is a schematic structural diagram of a speech recognition device provided by Embodiment 3 of the present invention. This embodiment is applicable to the situation where map information is searched based on the user's voice. The device can be configured in a server to implement any embodiment of the present invention. The speech recognition method described above. The device specifically includes the following:

候选信息确定模块410,用于对当前用户语音进行地图信息搜索,确定匹配的至少一个候选信息;Candidate information determination module 410, configured to search for map information on the current user voice, and determine at least one matching candidate information;

语音识别消歧模块420,用于依据当前用户的地图领域特征,对所述至少一个候选信息进行消歧处理,以确定所述当前用户语音的地图信息识别结果。The voice recognition disambiguation module 420 is configured to disambiguate the at least one piece of candidate information according to the current user's map domain characteristics, so as to determine the map information recognition result of the current user's voice.

可选的,所述语音识别消歧模块420具体用于:Optionally, the speech recognition disambiguation module 420 is specifically used for:

依据所述地图领域特征,确定所述至少一个候选信息与所述当前用户的关联程度;determining the degree of association between the at least one candidate information and the current user according to the map domain characteristics;

依据所述至少一个候选信息与所述当前用户的关联程度,确定与所述当前用户语音之间存在歧义的歧义信息;determining ambiguous information that is ambiguous with the current user's voice according to the degree of association between the at least one candidate information and the current user;

从所述至少一个候选信息中滤除所述歧义信息,以确定所述当前用户语音的地图信息识别结果。Filter out the ambiguous information from the at least one candidate information to determine a map information recognition result of the current user voice.

可选的,所述当前用户的地图领域特征包括当前地图搜索场景特征,当前用户行为特征,以及候选文本的地图信息搜索质量特征中的至少一项。Optionally, the map domain feature of the current user includes at least one of the current map search scene feature, the current user behavior feature, and the map information search quality feature of the candidate text.

可选的,所述当前地图搜索场景特征通过如下方式确定:Optionally, the current map search scene feature is determined in the following manner:

根据所述当前用户的当前位置与候选信息所表示位置之间的空间位置关系,确定所述当前地图搜索场景特征;和/或,According to the spatial relationship between the current user's current location and the location represented by the candidate information, determine the current map search scene features; and/or,

将所述当前用户语音中对于候选信息所表示位置的空间位置描述,作为当前地图搜索场景特征。The spatial position description of the position represented by the candidate information in the current user's voice is used as the feature of the current map search scene.

可选的,所述当前用户行为特征通过如下方式确定:Optionally, the current user behavior characteristics are determined in the following manner:

确定所述当前用户对于候选信息所表示位置的历史搜索行为,和所述当前用户的发音习惯;Determining the historical search behavior of the current user for the location represented by the candidate information, and the pronunciation habits of the current user;

根据所述历史搜索行为和/或所述当前用户的发音习惯,确定所述当前用户行为特征。According to the historical search behavior and/or the pronunciation habit of the current user, the behavior characteristics of the current user are determined.

可选的,所述候选文本的地图信息搜索质量特征通过如下方式确定:Optionally, the map information search quality feature of the candidate text is determined in the following manner:

确定候选信息与预设地图类信息库中的地图类信息之间的相似度,和地图搜索类用户对于候选信息所表示位置的历史搜索需求分布;Determine the similarity between the candidate information and the map information in the preset map information database, and the historical search demand distribution of map search users for the positions represented by the candidate information;

根据所述相似度和/或所述历史搜索需求分布,确定所述候选文本的地图信息搜索质量特征。According to the similarity and/or the historical search demand distribution, the map information search quality feature of the candidate text is determined.

可选的,所述候选信息确定模块410包括:Optionally, the candidate information determining module 410 includes:

音素识别单元4101,用于对所述当前用户语音进行声学特征识别,确定所述当前用户语音的音素表示;A phoneme recognition unit 4101, configured to perform acoustic feature recognition on the current user's voice, and determine the phoneme representation of the current user's voice;

地图搜索单元4102,用于依据所述音素表示进行地图信息搜索,确定与所述音素表示发音匹配的至少一个候选信息。A map search unit 4102, configured to search for map information according to the phoneme representation, and determine at least one candidate information that matches the pronunciation of the phoneme representation.

可选的,所述地图搜索单元4102具体用于:Optionally, the map search unit 4102 is specifically used for:

对所述音素表示进行纠错和音素扩展,确定至少一个变异音素表示;performing error correction and phoneme extension on the phoneme representation, and determining at least one variant phoneme representation;

获得与所述音素表示,以及所述至少一个变异音素表示发音匹配的至少一个候选信息。Obtain at least one candidate information that matches the pronunciation of the phoneme representation and the at least one variant phoneme representation.

本实施例的技术方案,通过各个功能模块之间的相互配合,实现了音素表示(即模糊音)的识别、模糊音的修正和扩展、基于模糊音的地图信息搜索、地图领域特征的确定、候选信息的消歧以及精确识别文本的反馈等功能。本发明实施例通过对于地图领域信息的专项搜索得到的候选信息进行消歧处理,不仅去除了可能存在的通用领域知识对于地图搜索的干扰,而且避免了歧义以及口音等导致的误判,使得搜索得到的地图信息识别结果更加符合用户习惯和需求,大幅提高了地图语音搜索的语音识别准确率。The technical scheme of this embodiment realizes the identification of phoneme representation (i.e. fuzzy sound), the correction and expansion of fuzzy sound, the search of map information based on fuzzy sound, the determination of map domain characteristics, Disambiguation of candidate information and feedback of accurate text recognition. The embodiment of the present invention disambiguates the candidate information obtained by the special search of the map domain information, which not only removes the possible interference of the general domain knowledge on the map search, but also avoids misjudgment caused by ambiguity and accent, making the search The obtained map information recognition results are more in line with the user's habits and needs, which greatly improves the voice recognition accuracy of map voice search.

实施例四Embodiment four

图5为本发明实施例四提供的一种服务器的结构示意图,图5示出了适于用来实现本发明实施例实施方式的示例性服务器的框图。图5显示的服务器仅仅是一个示例,不应对本发明实施例的功能和使用范围带来任何限制。FIG. 5 is a schematic structural diagram of a server provided by Embodiment 4 of the present invention, and FIG. 5 shows a block diagram of an exemplary server suitable for implementing the embodiment of the present invention. The server shown in FIG. 5 is only an example, and should not limit the functions and scope of use of this embodiment of the present invention.

图5显示的服务器12仅仅是一个示例,不应对本发明实施例的功能和使用范围带来任何限制。The server 12 shown in FIG. 5 is only an example, and should not limit the functions and scope of use of this embodiment of the present invention.

如图5所示,服务器12以通用计算设备的形式表现。服务器12的组件可以包括但不限于:一个或者多个处理器16,系统存储器28,连接不同系统组件(包括系统存储器28和处理器16)的总线18。As shown in FIG. 5, server 12 takes the form of a general-purpose computing device. Components of server 12 may include, but are not limited to: one or more processors 16, system memory 28, bus 18 connecting various system components (including system memory 28 and processor 16).

总线18表示几类总线结构中的一种或多种,包括存储器总线或者存储器控制器,外围总线,图形加速端口,处理器或者使用多种总线结构中的任意总线结构的局域总线。举例来说,这些体系结构包括但不限于工业标准体系结构(ISA)总线,微通道体系结构(MAC)总线,增强型ISA总线、视频电子标准协会(VESA)局域总线以及外围组件互连(PCI)总线。Bus 18 represents one or more of several types of bus structures, including a memory bus or memory controller, a peripheral bus, an accelerated graphics port, a processor, or a local bus using any of a variety of bus structures. These architectures include, by way of example, but are not limited to Industry Standard Architecture (ISA) bus, Micro Channel Architecture (MAC) bus, Enhanced ISA bus, Video Electronics Standards Association (VESA) local bus, and Peripheral Component Interconnect ( PCI) bus.

服务器12典型地包括多种计算机系统可读介质。这些介质可以是任何能够被服务器12访问的可用介质,包括易失性和非易失性介质,可移动的和不可移动的介质。Server 12 typically includes a variety of computer system readable media. These media can be any available media that can be accessed by server 12 and include both volatile and nonvolatile media, removable and non-removable media.

系统存储器28可以包括易失性存储器形式的计算机系统可读介质,例如随机存取存储器(RAM)30和/或高速缓存存储器32。服务器12可以进一步包括其它可移动/不可移动的、易失性/非易失性计算机系统存储介质。仅作为举例,存储系统34可以用于读写不可移动的、非易失性磁介质(图5未显示,通常称为“硬盘驱动器”)。尽管图5中未示出,可以提供用于对可移动非易失性磁盘(例如“软盘”)读写的磁盘驱动器,以及对可移动非易失性光盘(例如CD-ROM,DVD-ROM或者其它光介质)读写的光盘驱动器。在这些情况下,每个驱动器可以通过一个或者多个数据介质接口与总线18相连。系统存储器28可以包括至少一个程序产品,该程序产品具有一组(例如至少一个)程序模块,这些程序模块被配置以执行本发明实施例各实施例的功能。System memory 28 may include computer system readable media in the form of volatile memory, such as random access memory (RAM) 30 and/or cache memory 32 . The server 12 may further include other removable/non-removable, volatile/nonvolatile computer system storage media. By way of example only, storage system 34 may be used to read and write to non-removable, non-volatile magnetic media (not shown in FIG. 5, commonly referred to as a "hard drive"). Although not shown in FIG. 5, a disk drive for reading and writing to removable nonvolatile disks (e.g., "floppy disks") may be provided, as well as for removable nonvolatile optical disks (e.g., CD-ROM, DVD-ROM or other optical media) CD-ROM drive. In these cases, each drive may be connected to bus 18 via one or more data media interfaces. System memory 28 may include at least one program product having a set (eg, at least one) of program modules configured to perform the functions of various embodiments of the present invention.

具有一组(至少一个)程序模块42的程序/实用工具40,可以存储在例如系统存储器28中,这样的程序模块42包括但不限于操作系统、一个或者多个应用程序、其它程序模块以及程序数据,这些示例中的每一个或某种组合中可能包括网络环境的实现。程序模块42通常执行本发明实施例所描述的实施例中的功能和/或方法。Program/utility 40 may be stored, for example, in system memory 28 as a set (at least one) of program modules 42 including, but not limited to, an operating system, one or more application programs, other program modules, and program data, each or some combination of these examples may include the implementation of the network environment. The program module 42 generally executes the functions and/or methods in the embodiments described in the embodiments of the present invention.

服务器12也可以与一个或多个外部设备14(例如键盘、指向设备、显示器24等)通信,还可与一个或者多个使得用户能与该服务器12交互的设备通信,和/或与使得该服务器12能与一个或多个其它计算设备进行通信的任何设备(例如网卡,调制解调器等等)通信。这种通信可以通过输入/输出(I/O)接口22进行。并且,服务器12还可以通过网络适配器20与一个或者多个网络(例如局域网(LAN),广域网(WAN)和/或公共网络,例如因特网)通信。如图所示,网络适配器20通过总线18与服务器12的其它模块通信。应当明白,尽管图中未示出,可以结合服务器12使用其它硬件和/或软件模块,包括但不限于:微代码、设备驱动器、冗余处理器、外部磁盘驱动阵列、RAID系统、磁带驱动器以及数据备份存储系统等。Server 12 may also communicate with one or more external devices 14 (e.g., keyboards, pointing devices, displays 24, etc.), and may also communicate with one or more devices that enable users to interact with Server 12 is capable of communicating with any device (eg, network card, modem, etc.) that communicates with one or more other computing devices. Such communication may occur through input/output (I/O) interface 22 . Moreover, the server 12 can also communicate with one or more networks (eg, a local area network (LAN), a wide area network (WAN) and/or a public network, such as the Internet) through the network adapter 20 . As shown, network adapter 20 communicates with other modules of server 12 via bus 18 . It should be appreciated that although not shown, other hardware and/or software modules may be used in conjunction with server 12, including but not limited to: microcode, device drivers, redundant processors, external disk drive arrays, RAID systems, tape drives, and Data backup storage system, etc.

处理器16通过运行存储在系统存储器28中的程序,从而执行各种功能应用以及数据处理,例如实现本发明实施例所提供的语音识别方法。The processor 16 executes various functional applications and data processing by running the programs stored in the system memory 28 , such as implementing the speech recognition method provided by the embodiment of the present invention.

实施例五Embodiment five

本发明实施例五还提供一种计算机可读存储介质,其上存储有计算机程序(或称为计算机可执行指令),该程序被处理器执行时用于执行一种语音识别方法,该方法包括:Embodiment 5 of the present invention also provides a computer-readable storage medium, on which a computer program (or called computer-executable instructions) is stored. When the program is executed by a processor, it is used to perform a speech recognition method. The method includes :

对当前用户语音进行地图信息搜索,确定匹配的至少一个候选信息;Carrying out a map information search on the current user's voice, and determining at least one matching candidate information;

依据当前用户的地图领域特征,对所述至少一个候选信息进行消歧处理,以确定所述当前用户语音的地图信息识别结果。Disambiguation processing is performed on the at least one piece of candidate information according to the characteristics of the current user's map domain, so as to determine the map information recognition result of the current user's voice.

本发明实施例的计算机存储介质,可以采用一个或多个计算机可读的介质的任意组合。计算机可读介质可以是计算机可读信号介质或者计算机可读存储介质。计算机可读存储介质例如可以是——但不限于——电、磁、光、电磁、红外线、或半导体的系统、装置或器件,或者任意以上的组合。计算机可读存储介质的更具体的例子(非穷举的列表)包括:具有一个或多个导线的电连接、便携式计算机磁盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、光纤、便携式紧凑磁盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。在本文件中,计算机可读存储介质可以是任何包含或存储程序的有形介质,该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。The computer storage medium in the embodiments of the present invention may use any combination of one or more computer-readable media. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. A computer readable storage medium may be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination thereof. More specific examples (non-exhaustive list) of computer readable storage media include: electrical connections with one or more leads, portable computer disks, hard disks, random access memory (RAM), read only memory (ROM), Erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above. In this document, a computer-readable storage medium may be any tangible medium that contains or stores a program that can be used by or in conjunction with an instruction execution system, apparatus, or device.

计算机可读的信号介质可以包括在基带中或者作为载波一部分传播的数据信号,其中承载了计算机可读的程序代码。这种传播的数据信号可以采用多种形式,包括但不限于电磁信号、光信号或上述的任意合适的组合。计算机可读的信号介质还可以是计算机可读存储介质以外的任何计算机可读介质,该计算机可读介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。A computer readable signal medium may include a data signal carrying computer readable program code in baseband or as part of a carrier wave. Such propagated data signals may take many forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing. A computer-readable signal medium may also be any computer-readable medium other than a computer-readable storage medium, which can send, propagate, or transmit a program for use by or in conjunction with an instruction execution system, apparatus, or device. .

计算机可读介质上包含的程序代码可以用任何适当的介质传输,包括——但不限于无线、电线、光缆、RF等等,或者上述的任意合适的组合。Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including - but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

可以以一种或多种程序设计语言或其组合来编写用于执行本发明实施例操作的计算机程序代码,所述程序设计语言包括面向对象的程序设计语言—诸如Java、Smalltalk、C++,还包括常规的过程式程序设计语言—诸如”C”语言或类似的程序设计语言。程序代码可以完全地在用户计算机上执行、部分地在用户计算机上执行、作为一个独立的软件包执行、部分在用户计算机上部分在远程计算机上执行、或者完全在远程计算机或服务器上执行。在涉及远程计算机的情形中,远程计算机可以通过任意种类的网络——包括局域网(LAN)或广域网(WAN)—连接到用户计算机,或者,可以连接到外部计算机(例如利用因特网服务提供商来通过因特网连接)。Computer program code for carrying out operations of embodiments of the present invention may be written in one or more programming languages, or combinations thereof, including object-oriented programming languages—such as Java, Smalltalk, C++, including A conventional procedural programming language—such as "C" or a similar programming language. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In cases involving a remote computer, the remote computer can be connected to the user computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or it can be connected to an external computer (such as through an Internet service provider). Internet connection).

注意,上述仅为本发明的较佳实施例及所运用技术原理。本领域技术人员会理解,本发明不限于这里所述的特定实施例,对本领域技术人员来说能够进行各种明显的变化、重新调整和替代而不会脱离本发明的保护范围。因此,虽然通过以上实施例对本发明实施例进行了较为详细的说明,但是本发明实施例不仅仅限于以上实施例,在不脱离本发明构思的情况下,还可以包括更多其他等效实施例,而本发明的范围由所附的权利要求范围决定。Note that the above are only preferred embodiments of the present invention and applied technical principles. Those skilled in the art will understand that the present invention is not limited to the specific embodiments described herein, and that various obvious changes, readjustments and substitutions can be made by those skilled in the art without departing from the protection scope of the present invention. Therefore, although the embodiments of the present invention have been described in detail through the above embodiments, the embodiments of the present invention are not limited to the above embodiments, and may include more other equivalent embodiments without departing from the concept of the present invention. , and the scope of the present invention is determined by the scope of the appended claims.

Claims (11)

1.一种语音识别方法,其特征在于,包括:1. A speech recognition method, characterized in that, comprising: 对当前用户语音进行地图信息搜索,确定匹配的至少一个候选信息;Carrying out a map information search on the current user's voice, and determining at least one matching candidate information; 依据当前用户的地图领域特征,对所述至少一个候选信息进行消歧处理,以确定所述当前用户语音的地图信息识别结果。Disambiguation processing is performed on the at least one piece of candidate information according to the characteristics of the current user's map domain, so as to determine the map information recognition result of the current user's voice. 2.根据权利要求1所述的方法,其特征在于,所述依据当前用户的地图领域特征,对所述至少一个候选信息进行消歧处理,以确定所述当前用户语音的地图信息识别结果,包括:2. The method according to claim 1, wherein the at least one candidate information is disambiguated according to the map domain characteristics of the current user, so as to determine the map information recognition result of the current user's voice, include: 依据所述地图领域特征,确定所述至少一个候选信息与所述当前用户的关联程度;determining the degree of association between the at least one candidate information and the current user according to the map domain characteristics; 依据所述至少一个候选信息与所述当前用户的关联程度,确定与所述当前用户语音之间存在歧义的歧义信息;determining ambiguous information that is ambiguous with the current user's voice according to the degree of association between the at least one candidate information and the current user; 从所述至少一个候选信息中滤除所述歧义信息,以确定所述当前用户语音的地图信息识别结果。Filter out the ambiguous information from the at least one candidate information to determine a map information recognition result of the current user voice. 3.根据权利要求2所述的方法,其特征在于,所述当前用户的地图领域特征包括当前地图搜索场景特征,当前用户行为特征,以及候选文本的地图信息搜索质量特征中的至少一项。3 . The method according to claim 2 , wherein the map domain characteristics of the current user include at least one of current map search scene characteristics, current user behavior characteristics, and map information search quality characteristics of candidate texts. 4.根据权利要求3所述的方法,其特征在于,所述当前地图搜索场景特征通过如下方式确定:4. The method according to claim 3, wherein the current map search scene feature is determined in the following manner: 根据所述当前用户的当前位置与候选信息所表示位置之间的空间位置关系,确定所述当前地图搜索场景特征;和/或,According to the spatial relationship between the current user's current location and the location represented by the candidate information, determine the current map search scene features; and/or, 将所述当前用户语音中对于候选信息所表示位置的空间位置描述,作为当前地图搜索场景特征。The spatial position description of the position represented by the candidate information in the current user's voice is used as the feature of the current map search scene. 5.根据权利要求3所述的方法,其特征在于,所述当前用户行为特征通过如下方式确定:5. The method according to claim 3, wherein the current user behavior characteristics are determined in the following manner: 确定所述当前用户对于候选信息所表示位置的历史搜索行为,和所述当前用户的发音习惯;Determining the historical search behavior of the current user for the location represented by the candidate information, and the pronunciation habits of the current user; 根据所述历史搜索行为和/或所述当前用户的发音习惯,确定所述当前用户行为特征。According to the historical search behavior and/or the pronunciation habit of the current user, the behavior characteristics of the current user are determined. 6.根据权利要求3所述的方法,其特征在于,所述候选文本的地图信息搜索质量特征通过如下方式确定:6. The method according to claim 3, wherein the map information search quality feature of the candidate text is determined in the following manner: 确定候选信息与预设地图类信息库中的地图类信息之间的相似度,和地图搜索类用户对于候选信息所表示位置的历史搜索需求分布;Determine the similarity between the candidate information and the map information in the preset map information database, and the historical search demand distribution of map search users for the positions represented by the candidate information; 根据所述相似度和/或所述历史搜索需求分布,确定所述候选文本的地图信息搜索质量特征。According to the similarity and/or the historical search demand distribution, the map information search quality feature of the candidate text is determined. 7.根据权利要求1所述的方法,其特征在于,所述对当前用户语音进行地图信息搜索,确定匹配的至少一个候选信息,包括:7. The method according to claim 1, wherein said searching for map information on the current user voice and determining at least one matching candidate information includes: 对所述当前用户语音进行声学特征识别,确定所述当前用户语音的音素表示;performing acoustic feature recognition on the current user's voice, and determining the phoneme representation of the current user's voice; 依据所述音素表示进行地图信息搜索,确定与所述音素表示发音匹配的至少一个候选信息。A map information search is performed according to the phoneme representation, and at least one candidate information matching the pronunciation of the phoneme representation is determined. 8.根据权利要求7所述的方法,其特征在于,所述依据所述音素表示进行地图信息搜索,确定与所述音素表示发音匹配的至少一个候选信息,包括:8. The method according to claim 7, wherein said searching for map information according to said phoneme representation and determining at least one candidate information matching the pronunciation of said phoneme representation comprises: 对所述音素表示进行纠错和音素扩展,确定至少一个变异音素表示;performing error correction and phoneme extension on the phoneme representation, and determining at least one variant phoneme representation; 获得与所述音素表示,以及所述至少一个变异音素表示发音匹配的至少一个候选信息。Obtain at least one candidate information that matches the pronunciation of the phoneme representation and the at least one variant phoneme representation. 9.一种语音识别装置,其特征在于,包括:9. A speech recognition device, characterized in that, comprising: 候选信息确定模块,用于对当前用户语音进行地图信息搜索,确定匹配的至少一个候选信息;Candidate information determination module, used to search the current user voice for map information, and determine at least one matching candidate information; 语音识别消歧模块,用于依据当前用户的地图领域特征,对所述至少一个候选信息进行消歧处理,以确定所述当前用户语音的地图信息识别结果。The voice recognition disambiguation module is configured to perform disambiguation processing on the at least one candidate information according to the current user's map domain characteristics, so as to determine the map information recognition result of the current user's voice. 10.一种服务器,其特征在于,包括:10. A server, characterized in that, comprising: 一个或多个处理器;one or more processors; 存储器,用于存储一个或多个程序;memory for storing one or more programs; 当所述一个或多个程序被所述一个或多个处理器执行,使得所述一个或多个处理器实现如权利要求1-8中任一项所述的语音识别方法。When the one or more programs are executed by the one or more processors, the one or more processors are made to implement the speech recognition method according to any one of claims 1-8. 11.一种计算机可读存储介质,其上存储有计算机程序,其特征在于,该程序被处理器执行时实现如权利要求1-8中任一项所述的语音识别方法。11. A computer-readable storage medium, on which a computer program is stored, wherein, when the program is executed by a processor, the speech recognition method according to any one of claims 1-8 is implemented.
CN201910578399.4A 2019-06-28 2019-06-28 Speech recognition method, device, server and storage medium Pending CN110310631A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910578399.4A CN110310631A (en) 2019-06-28 2019-06-28 Speech recognition method, device, server and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910578399.4A CN110310631A (en) 2019-06-28 2019-06-28 Speech recognition method, device, server and storage medium

Publications (1)

Publication Number Publication Date
CN110310631A true CN110310631A (en) 2019-10-08

Family

ID=68078683

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910578399.4A Pending CN110310631A (en) 2019-06-28 2019-06-28 Speech recognition method, device, server and storage medium

Country Status (1)

Country Link
CN (1) CN110310631A (en)

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111443801A (en) * 2020-03-25 2020-07-24 北京百度网讯科技有限公司 Human-computer interaction method, device, device and storage medium
CN112331207A (en) * 2020-09-30 2021-02-05 音数汇元(上海)智能科技有限公司 Service content monitoring method and device, electronic equipment and storage medium
CN112652298A (en) * 2020-12-11 2021-04-13 北京百度网讯科技有限公司 Voice recognition method and device, electronic equipment and storage medium
CN113160820A (en) * 2021-04-28 2021-07-23 百度在线网络技术(北京)有限公司 Speech recognition method, and training method, device and equipment of speech recognition model
CN113223516A (en) * 2021-04-12 2021-08-06 北京百度网讯科技有限公司 Speech recognition method and device
CN114255739A (en) * 2020-09-21 2022-03-29 中国移动通信集团设计院有限公司 Method and device for recognizing keywords in voice
CN114625833A (en) * 2020-12-08 2022-06-14 上海博泰悦臻网络技术服务有限公司 Voice search method, search device, readable storage medium and electronic equipment
CN114639378A (en) * 2022-02-28 2022-06-17 中国第一汽车股份有限公司 Voice recognition method and device, electronic equipment and storage medium
CN115019787A (en) * 2022-06-02 2022-09-06 中国第一汽车股份有限公司 Interactive homophonic and heteronym word disambiguation method, system, electronic equipment and storage medium
CN115935076A (en) * 2023-02-20 2023-04-07 珠海大横琴泛旅游发展有限公司 Travel service information pushing method and system based on artificial intelligence
CN116124171A (en) * 2023-01-04 2023-05-16 中国第一汽车股份有限公司 Navigation voice POI understanding correction method and system based on user portrait
WO2024222198A1 (en) * 2023-04-28 2024-10-31 腾讯科技(深圳)有限公司 Map search method, device, server, terminal, and storage medium

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1504927A (en) * 2002-11-28 2004-06-16 深圳麦士威科技有限公司 Intelligent retrieval method for electronic map inquiry system
US20100049514A1 (en) * 2005-08-31 2010-02-25 Voicebox Technologies, Inc. Dynamic speech sharpening
CN102137085A (en) * 2010-01-22 2011-07-27 谷歌公司 Multi-dimensional disambiguation of voice commands
CN102867005A (en) * 2011-07-06 2013-01-09 阿尔派株式会社 Retrieving device, retrieving method and vehicle-mounted navigation apparatus
CN104854583A (en) * 2012-08-08 2015-08-19 谷歌公司 Search result ranking and presentation
CN105247511A (en) * 2013-06-07 2016-01-13 苹果公司 intelligent automated assistant
CN105308595A (en) * 2013-04-17 2016-02-03 通腾导航技术股份有限公司 Methods, devices and computer software for facilitating searching and display of locations relevant to a digital map
CN108986790A (en) * 2018-09-29 2018-12-11 百度在线网络技术(北京)有限公司 The method and apparatus of voice recognition of contact

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1504927A (en) * 2002-11-28 2004-06-16 深圳麦士威科技有限公司 Intelligent retrieval method for electronic map inquiry system
US20100049514A1 (en) * 2005-08-31 2010-02-25 Voicebox Technologies, Inc. Dynamic speech sharpening
CN102137085A (en) * 2010-01-22 2011-07-27 谷歌公司 Multi-dimensional disambiguation of voice commands
CN102867005A (en) * 2011-07-06 2013-01-09 阿尔派株式会社 Retrieving device, retrieving method and vehicle-mounted navigation apparatus
CN104854583A (en) * 2012-08-08 2015-08-19 谷歌公司 Search result ranking and presentation
CN105308595A (en) * 2013-04-17 2016-02-03 通腾导航技术股份有限公司 Methods, devices and computer software for facilitating searching and display of locations relevant to a digital map
CN105247511A (en) * 2013-06-07 2016-01-13 苹果公司 intelligent automated assistant
CN108986790A (en) * 2018-09-29 2018-12-11 百度在线网络技术(北京)有限公司 The method and apparatus of voice recognition of contact

Cited By (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111443801B (en) * 2020-03-25 2023-10-13 北京百度网讯科技有限公司 Human-computer interaction methods, devices, equipment and storage media
CN111443801A (en) * 2020-03-25 2020-07-24 北京百度网讯科技有限公司 Human-computer interaction method, device, device and storage medium
CN114255739B (en) * 2020-09-21 2025-05-06 中国移动通信集团设计院有限公司 Method and device for recognizing keywords in speech
CN114255739A (en) * 2020-09-21 2022-03-29 中国移动通信集团设计院有限公司 Method and device for recognizing keywords in voice
CN112331207A (en) * 2020-09-30 2021-02-05 音数汇元(上海)智能科技有限公司 Service content monitoring method and device, electronic equipment and storage medium
CN114625833A (en) * 2020-12-08 2022-06-14 上海博泰悦臻网络技术服务有限公司 Voice search method, search device, readable storage medium and electronic equipment
CN112652298A (en) * 2020-12-11 2021-04-13 北京百度网讯科技有限公司 Voice recognition method and device, electronic equipment and storage medium
CN113223516A (en) * 2021-04-12 2021-08-06 北京百度网讯科技有限公司 Speech recognition method and device
US12067977B2 (en) 2021-04-12 2024-08-20 Beijing Baidu Netcom Science Technology Co., Ltd. Speech recognition method and apparatus
CN113160820B (en) * 2021-04-28 2024-02-27 百度在线网络技术(北京)有限公司 Speech recognition method, training method, device and equipment of speech recognition model
CN113160820A (en) * 2021-04-28 2021-07-23 百度在线网络技术(北京)有限公司 Speech recognition method, and training method, device and equipment of speech recognition model
CN114639378A (en) * 2022-02-28 2022-06-17 中国第一汽车股份有限公司 Voice recognition method and device, electronic equipment and storage medium
CN115019787A (en) * 2022-06-02 2022-09-06 中国第一汽车股份有限公司 Interactive homophonic and heteronym word disambiguation method, system, electronic equipment and storage medium
CN115019787B (en) * 2022-06-02 2024-05-14 中国第一汽车股份有限公司 Interactive homonym disambiguation method, system, electronic equipment and storage medium
CN116124171A (en) * 2023-01-04 2023-05-16 中国第一汽车股份有限公司 Navigation voice POI understanding correction method and system based on user portrait
CN115935076A (en) * 2023-02-20 2023-04-07 珠海大横琴泛旅游发展有限公司 Travel service information pushing method and system based on artificial intelligence
WO2024222198A1 (en) * 2023-04-28 2024-10-31 腾讯科技(深圳)有限公司 Map search method, device, server, terminal, and storage medium

Similar Documents

Publication Publication Date Title
CN110310631A (en) Speech recognition method, device, server and storage medium
US11328708B2 (en) Speech error-correction method, device and storage medium
KR102390940B1 (en) Context biasing for speech recognition
EP3032532B1 (en) Disambiguating heteronyms in speech synthesis
US10157040B2 (en) Multi-modal input on an electronic device
US11016968B1 (en) Mutation architecture for contextual data aggregator
CN111710333B (en) Method and system for generating speech transcription
US8219406B2 (en) Speech-centric multimodal user interface design in mobile technology
US9190054B1 (en) Natural language refinement of voice and text entry
US10643603B2 (en) Acoustic model training using corrected terms
HK1225504A1 (en) Disambiguating heteronyms in speech synthesis
CN109032375A (en) Candidate text sort method, device, equipment and storage medium
US11416214B2 (en) Multi-modal input on an electronic device
CN107305768A (en) Easy wrongly written character calibration method in interactive voice
CN111488468B (en) Geographic information knowledge point extraction method and device, storage medium and computer equipment
CN104699784A (en) Data searching method and device based on interactive input
CN115099222B (en) Punctuation mark misuse detection and correction method, device, equipment and storage medium
CN120032635A (en) Method, device, equipment, medium and vehicle for constructing cross-domain language model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination