CN114582333A - Voice recognition method and device, electronic equipment and storage medium - Google Patents
Voice recognition method and device, electronic equipment and storage medium Download PDFInfo
- Publication number
- CN114582333A CN114582333A CN202210155658.4A CN202210155658A CN114582333A CN 114582333 A CN114582333 A CN 114582333A CN 202210155658 A CN202210155658 A CN 202210155658A CN 114582333 A CN114582333 A CN 114582333A
- Authority
- CN
- China
- Prior art keywords
- voice stream
- voice
- information
- determining
- user
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
- G10L15/1822—Parsing for meaning understanding
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
- G10L2015/223—Execution procedure of a spoken command
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Artificial Intelligence (AREA)
- User Interface Of Digital Computer (AREA)
Abstract
Description
技术领域technical field
本发明实施例涉及语音识别技术领域,尤其涉及一种语音识别方法、装置、电子设备及存储介质。Embodiments of the present invention relate to the technical field of speech recognition, and in particular, to a speech recognition method, apparatus, electronic device, and storage medium.
背景技术Background technique
目前,随着网络技术和移动通信技术的发展,与人们日常生活密切相关的电子产品的应用越来越普遍,例如,智能手机、车载语音系统等。在使用电子产品时,无论手写输入或者键盘输入,都存在各种限制。为了使用方便,经常需要用户输入语音,设备对语音进行识别后,输出语音的文字内容或者执行相应的操作指令。At present, with the development of network technology and mobile communication technology, the application of electronic products closely related to people's daily life is becoming more and more common, for example, smart phones, car voice systems and so on. When using electronic products, there are various limitations whether handwriting input or keyboard input. For the convenience of use, the user is often required to input speech, and after the device recognizes the speech, it outputs the text content of the speech or executes corresponding operation instructions.
然而,现有技术中对语音信息的识别精度不高,经常出现识别错误的问题,给用户使用带来不便,处理效率低下,用户体验度差。However, the recognition accuracy of voice information in the prior art is not high, and the problem of recognition errors often occurs, which brings inconvenience to users, low processing efficiency, and poor user experience.
发明内容SUMMARY OF THE INVENTION
本发明实施例提供了一种语音识别方法、装置、电子设备及存储介质,以实现根据语义分析确定对应的响应策略。Embodiments of the present invention provide a speech recognition method, apparatus, electronic device, and storage medium, so as to determine a corresponding response strategy according to semantic analysis.
第一方面,本发明实施例提供了一种语音识别方法,该方法包括:In a first aspect, an embodiment of the present invention provides a speech recognition method, the method includes:
接收用户输入的用于进行语音交互的第一语音流,并对所述第一语音流进行语义识别;receiving a first voice stream input by a user for performing voice interaction, and performing semantic recognition on the first voice stream;
当检测到所述第一语音流发生停顿时,根据语义识别的结果确定所述第一语音流的信息完整状态,并确定所述第一语音流发生的停顿时长;When it is detected that the first voice stream is paused, the information integrity state of the first voice stream is determined according to the result of semantic recognition, and the pause duration of the first voice stream is determined;
根据所述信息完整状态以及所述停顿时长确定与所述第一语音流对应的交互响应策略。An interaction response strategy corresponding to the first voice stream is determined according to the information integrity state and the pause duration.
第二方面,本发明实施例还提供了一种语音识别装置,该装置包括:In a second aspect, an embodiment of the present invention further provides a voice recognition device, the device comprising:
语义识别模块,用于接收用户输入的用于进行语音交互的第一语音流,并对所述第一语音流进行语义识别;a semantic recognition module, configured to receive a first voice stream input by a user for performing voice interaction, and perform semantic recognition on the first voice stream;
状态确定模块,用于当检测到所述第一语音流发生停顿时,根据语义识别的结果确定所述第一语音流的信息完整状态,并确定所述第一语音流发生的停顿时长;a state determination module, configured to determine the information integrity state of the first voice stream according to the result of semantic recognition when it is detected that the first voice stream is paused, and determine the pause duration of the first voice stream;
响应确定模块,用于根据所述信息完整状态以及所述停顿时长确定与所述第一语音流对应的交互响应策略。A response determination module, configured to determine an interactive response strategy corresponding to the first voice stream according to the information integrity state and the pause duration.
第三方面,本发明实施例还提供了一种电子设备,该电子设备包括:In a third aspect, an embodiment of the present invention further provides an electronic device, the electronic device comprising:
一个或多个处理器;one or more processors;
存储装置,用于存储一个或多个程序,storage means for storing one or more programs,
当所述一个或多个程序被所述一个或多个处理器执行,使得所述一个或多个处理器实现本发明任意实施例所提供的语音识别方法。When the one or more programs are executed by the one or more processors, the one or more processors implement the speech recognition method provided by any embodiment of the present invention.
第四方面,本发明实施例还提供了一种计算机可读存储介质,其上存储有计算机程序,该计算机程序被处理器执行时实现本发明任意实施例所提供的语音识别方法。In a fourth aspect, an embodiment of the present invention further provides a computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, implements the speech recognition method provided by any embodiment of the present invention.
本发明实施例的技术方案,通过接收用户输入的语音流信息,并对其进行语义识别,当检测到用户输入的语音流信息中出现停顿时,根据语义识别结果确定语音流的信息完整状态以及停顿时长,进而,可以根据信息完整状态以及停顿时长确定对应的交互响应策略,解决了现有技术中由于用户输入的语音信息中出现的停顿而影响设备对用户的语义意图的理解,实现了可以根据语音信息的完整状态以及停顿时长分别确定不同的响应策略,有效提高了语义理解的准确率以及对于用户需求的响应速度,提升用户体验。According to the technical solution of the embodiment of the present invention, by receiving the voice stream information input by the user and performing semantic recognition on it, when a pause occurs in the voice stream information input by the user is detected, the information integrity status of the voice stream and The pause duration, and further, the corresponding interactive response strategy can be determined according to the complete state of the information and the pause duration, which solves the problem in the prior art that the pause in the voice information input by the user affects the device's understanding of the user's semantic intention, and realizes the Different response strategies are determined according to the complete state of the voice information and the pause time, which effectively improves the accuracy of semantic understanding and the speed of response to user needs, and improves user experience.
附图说明Description of drawings
为了更加清楚地说明本发明示例性实施例的技术方案,下面对描述实施例中所需要用到的附图做一简单介绍。显然,所介绍的附图只是本发明所要描述的一部分实施例的附图,而不是全部的附图,对于本领域普通技术人员,在不付出创造性劳动的前提下,还可以根据这些附图得到其他的附图。In order to illustrate the technical solutions of the exemplary embodiments of the present invention more clearly, the following briefly introduces the accompanying drawings used in describing the embodiments. Obviously, the introduced drawings are only a part of the drawings of the embodiments to be described in the present invention, rather than all drawings. For those of ordinary skill in the art, without creative work, they can also obtain the drawings according to these drawings. Additional drawings.
图1为本发明实施例一所提供的一种语音识别方法的流程示意图;1 is a schematic flowchart of a speech recognition method according to Embodiment 1 of the present invention;
图2为本发明实施例二所提供的一种语音识别方法的流程示意图;2 is a schematic flowchart of a speech recognition method according to Embodiment 2 of the present invention;
图3为本发明实施例三所提供的一种应用场景下语音识别方法的流程示意图;3 is a schematic flowchart of a speech recognition method in an application scenario provided by Embodiment 3 of the present invention;
图4为本发明实施例四所提供的一种语音识别装置的结构示意图;FIG. 4 is a schematic structural diagram of a speech recognition device according to Embodiment 4 of the present invention;
图5为本发明实施例五所提供的一种电子设备的结构示意图。FIG. 5 is a schematic structural diagram of an electronic device according to Embodiment 5 of the present invention.
具体实施方式Detailed ways
下面结合附图和实施例对本发明作进一步的详细说明。可以理解的是,此处所描述的具体实施例仅仅用于解释本发明,而非对本发明的限定。另外还需要说明的是,为了便于描述,附图中仅示出了与本发明相关的部分而非全部结构。The present invention will be further described in detail below in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are only used to explain the present invention, but not to limit the present invention. In addition, it should be noted that, for the convenience of description, the drawings only show some but not all structures related to the present invention.
另外还需要说明的是,为了便于描述,附图中仅示出了与本发明相关的部分而非全部内容。在更加详细地讨论示例性实施例之前应当提到的是,一些示例性实施例被描述成作为流程图描绘的处理或方法。虽然流程图将各项操作(或步骤)描述成顺序的处理,但是其中的许多操作可以被并行地、并发地或者同时实施。此外,各项操作的顺序可以被重新安排。当其操作完成时所述处理可以被终止,但是还可以具有未包括在附图中的附加步骤。所述处理可以对应于方法、函数、规程、子例程、子程序等等。In addition, it should be noted that, for the convenience of description, the drawings only show some but not all of the contents related to the present invention. Before discussing the exemplary embodiments in greater detail, it should be mentioned that some exemplary embodiments are described as processes or methods depicted as flowcharts. Although a flowchart depicts various operations (or steps) as a sequential process, many of the operations may be performed in parallel, concurrently, or concurrently. Additionally, the order of operations can be rearranged. The process may be terminated when its operation is complete, but may also have additional steps not included in the figures. The processes may correspond to methods, functions, procedures, subroutines, subroutines, and the like.
实施例一Example 1
图1为本发明实施例一所提供的一种语音识别方法的流程示意图,本实施例可适用于用户与设备进行语音交互的情况,该方法可以由语音识别装置来执行,该装置可以通过软件和/或硬件来实现,可配置于终端和/或服务器中来实现本发明实施例中的语音识别方法。FIG. 1 is a schematic flowchart of a speech recognition method provided in Embodiment 1 of the present invention. This embodiment is applicable to a situation where a user interacts with a device by speech. The method can be executed by a speech recognition device, and the device can use software and/or hardware, and may be configured in a terminal and/or a server to implement the speech recognition method in this embodiment of the present invention.
如图1所示,本实施例的方法具体可包括:As shown in FIG. 1 , the method of this embodiment may specifically include:
S110、接收用户输入的用于进行语音交互的第一语音流,并对第一语音流进行语义识别。S110. Receive a first voice stream input by a user for performing voice interaction, and perform semantic recognition on the first voice stream.
其中,第一语音流可以理解为一段语音信息。第一语音流可以用于用户与智能终端设备之间进行语音交互。典型地,智能终端可以是手机、平板电脑、车载电脑或个人计算机等,本实施例对此不作限定。示例性地,第一语音流可以包括但不限于。例如,第一语音流可以为“打开空调”、“打开”或者“调高温度”等。语义识别可以理解为将用户语音中的词汇内容进行识别并转换为计算机可以识别的输入。The first voice stream may be understood as a piece of voice information. The first voice stream can be used for voice interaction between the user and the smart terminal device. Typically, the smart terminal may be a mobile phone, a tablet computer, a vehicle-mounted computer, or a personal computer, which is not limited in this embodiment. Exemplarily, the first voice stream may include, but is not limited to. For example, the first voice stream may be "turn on the air conditioner", "turn on", or "turn up the temperature", and so on. Semantic recognition can be understood as the recognition and conversion of lexical content in the user's speech into input that can be recognized by the computer.
在具体实施中,当用户准备与终端设备进行语音交互时,会向终端设备输入一段语音信息,当终端设备接收用户输入的第一语音流后,会对获取的第一语音流进行识别,从而将用户输入的语音流中的词汇内容转换为终端可以识别的语言。In a specific implementation, when the user is about to perform voice interaction with the terminal device, a piece of voice information will be input to the terminal device, and after the terminal device receives the first voice stream input by the user, it will recognize the acquired first voice stream, thereby Convert the lexical content in the voice stream input by the user into a language that the terminal can recognize.
需要说明的是,终端设备在接收到用户输入的第一语音流并对其进行语义识别时,可以是在用户输入第一语音流信息时,终端设备实时对语音信息进行语义识别,当检测到第一语音流中出现停顿时,识别结束,继续执行后续操作;也可以是在检测到用户输入的第一语音流信息中出现停顿时,则可以开始对停顿之前的第一语音流信息进行语义识别等,本实施例对此不作限定。It should be noted that when the terminal device receives the first voice stream input by the user and performs semantic recognition on it, it may be that when the user inputs the first voice stream information, the terminal device performs semantic recognition on the voice information in real time. When a pause occurs in the first voice stream, the recognition ends, and subsequent operations continue to be performed; it may also be that when a pause occurs in the first voice stream information input by the user is detected, the semantics of the first voice stream information before the pause can be started. Identification, etc., are not limited in this embodiment.
S120、当检测到第一语音流发生停顿时,根据语义识别的结果确定第一语音流的信息完整状态,并确定第一语音流发生的停顿时长。S120. When it is detected that the first voice stream is paused, determine the information integrity state of the first voice stream according to the result of semantic recognition, and determine the pause duration of the first voice stream.
其中,信息完整状态可以理解为第一语音流中的信息是否具有完整的语义意图。可选地,信息完整状态可以包括信息已经完整和信息未完整。停顿时长可以理解为在第一语音流中从发生停顿到结束停顿的时间间隔。The information integrity state can be understood as whether the information in the first speech stream has complete semantic intent. Alternatively, the information complete status may include information complete and incomplete. The pause duration may be understood as the time interval from the occurrence of the pause to the end of the pause in the first voice stream.
在具体实施中,接收用户输入的第一语音流,并对其进行语义识别,当检测到第一语音流中出现停顿时,根据对第一语音流的语义识别结果确定第一语音流中所包含的信息是否完整,并确定第一语音流中的停顿时长。In a specific implementation, the first voice stream input by the user is received, and semantic recognition is performed on it, and when a pause is detected in the first voice stream, the first voice stream is determined according to the semantic recognition result of the first voice stream. Whether the contained information is complete, and determine the pause duration in the first voice stream.
可选地,根据语义识别的结果确定第一语音流的信息完整状态,包括:根据识别出的第一语音流的语义信息与预先设定的功能白名单中的各个响应功能之间的匹配度,确定第一语音流的信息完整状态。Optionally, determining the information integrity state of the first voice stream according to the result of the semantic recognition, including: according to the matching degree between the identified semantic information of the first voice stream and each response function in the preset function whitelist , and determine the information integrity status of the first voice stream.
其中,预先设定的功能白名单可以理解为终端设备中预先存储的、已存在的功能的名单。具体地,设置功能白名单的好处可以是对用户输入的语音信息可以快速筛选,并对位于功能白名单中的各项功能快速响应,可以极大地提高响应速度,提升用户体验。响应功能可以理解为终端设备可以根据语义识别结果所执行的各项功能。例如,响应功能可以为“打开空调”、“播放音乐”或“将温度调高至26度”等。需要说明的是,功能白名单中的各个响应功能并不是固定的,可以根据不同的终端设备设置不同的响应功能,本实施例对此不作限定。The pre-set function whitelist may be understood as a pre-stored list of existing functions in the terminal device. Specifically, the advantages of setting the function whitelist may be that the voice information input by the user can be quickly screened, and the functions in the function whitelist can be quickly responded, which can greatly improve the response speed and user experience. The response function can be understood as various functions that the terminal device can perform according to the semantic recognition result. For example, the response function can be "turn on the air conditioner", "play music", or "turn the temperature up to 26 degrees", etc. It should be noted that each response function in the function whitelist is not fixed, and different response functions may be set according to different terminal devices, which is not limited in this embodiment.
具体地,第一语音流的语义信息与预先设定的功能白名单中的各个响应功能之间的匹配度可以通过比较第一语音流的语义信息与功能白名单中包含的各种用于描述各个响应功能的信息来确定。示例性地,功能白名单中可以包括但不限于垂类信息、动作信息以及槽位信息等信息中的至少一项。其中,垂类信息可以用于反映响应功能的所属类型,例如,可以是空调、音乐或导航系统等。动作信息可以用于反映响应功能所应执行的动作,例如,可以是打开、关闭或调高等。槽位信息可以用于反映响应功能中的关键参数信息,例如,“将空调调高至26度”中的“26度”即为槽位信息等。Specifically, the matching degree between the semantic information of the first voice stream and each response function in the preset function whitelist can be used for description by comparing the semantic information of the first voice stream with various functions included in the function whitelist. The information of each response function is determined. Exemplarily, the function whitelist may include, but is not limited to, at least one item of information such as vertical information, action information, and slot information. Among them, the vertical information can be used to reflect the type of the response function, for example, it can be an air conditioner, a music, or a navigation system. The action information can be used to reflect the action to be performed by the response function, for example, it can be open, close, or adjust. The slot information can be used to reflect the key parameter information in the response function. For example, "26 degrees" in "Turn up the air conditioner to 26 degrees" is the slot information.
在具体实施中,通过比较第一语音流中的语义信息与预先设定的功能白名单中包含的用于描述各个响应功能的信息,判断用户输入的第一语音流与功能白名单中各个响应功能之间的匹配度,进而,可以确定用户输入的第一语音流中的信息是否完整。In a specific implementation, by comparing the semantic information in the first voice stream with the information used to describe each response function contained in the preset function whitelist, it is determined that the first voice stream input by the user and each response in the function whitelist The degree of matching between the functions can further determine whether the information in the first voice stream input by the user is complete.
需要说明的是,第一语音流中的信息与功能白名单中的信息存在至少一个匹配,就可以认为第一语音流为信息完整状态;第一语音流中的信息与功能白名单中的信息均不匹配,则可以认为第一语音流为信息未完整状态。It should be noted that if there is at least one match between the information in the first voice stream and the information in the function whitelist, the first voice stream can be considered to be in a complete information state; the information in the first voice stream and the information in the function whitelist If they do not match, it can be considered that the first voice stream is in a state of incomplete information.
S130、根据信息完整状态以及停顿时长确定与第一语音流对应的交互响应策略。S130. Determine an interaction response strategy corresponding to the first voice stream according to the information integrity state and the pause duration.
其中,交互响应策略可以理解为终端设备根据用户输入的信息所确定的反馈策略。具体地,根据第一语音流的信息完整状态以及停顿时长,终端设备可以有不同的交互响应策略。The interactive response strategy may be understood as a feedback strategy determined by the terminal device according to the information input by the user. Specifically, the terminal device may have different interactive response strategies according to the complete information state of the first voice stream and the pause duration.
需要说明的是,根据信息完整状态以及停顿时长确定对应的交互响应策略可以通过信息完整状态为信息完整和信息未完整,以及停顿时长为同一停顿时长和不同停顿时长的情况来确定不同的交互响应策略。It should be noted that the corresponding interactive response strategy can be determined according to the complete information status and the pause duration. Different interactive responses can be determined by the information complete status being complete and incomplete, and the pause duration being the same pause duration and different pause durations. Strategy.
可选地,信息完整状态为信息完整或信息未完整时,均采用同一停顿时长,根据信息完整状态以及同一停顿时长来确定对应的交互响应策略。Optionally, when the information integrity state is that the information is complete or the information is incomplete, the same pause duration is used, and the corresponding interaction response strategy is determined according to the information integrity state and the same pause duration.
具体地,当用户输入第一语音流信息时,终端设备对其进行语义识别,无论第一语音流信息的语义意图是否完整,终端设备均按照同一停顿时长继续等待,并判断在同一停顿时长内是否接收到用户新输入的语音流信息,根据判断结果确定对应的交互响应策略。Specifically, when the user inputs the first voice stream information, the terminal device performs semantic recognition on it, regardless of whether the semantic intent of the first voice stream information is complete, the terminal device continues to wait according to the same pause duration, and judges that within the same pause duration Whether the voice stream information newly input by the user is received, the corresponding interactive response strategy is determined according to the judgment result.
可选地,根据信息完整状态,采用不同停顿时长来确定对应的交互响应策略。其中,可以在信息完整时,设置第一停顿时长,在信息为完整时,设置第二停顿时长。Optionally, according to the complete state of the information, different pause durations are used to determine the corresponding interactive response strategy. The first pause duration may be set when the information is complete, and the second pause duration may be set when the information is complete.
具体地,当用户输入第一语音流信息时,终端设备对其进行语义识别,确定第一语音流信息的完整状态,当信息完整状态为信息完整时,根据第一停顿时长确定对应的交互响应策略。示例性地,第一停顿时长对应的交互响应策略可以包括但不限于直接执行第一语音流对应的响应功能、根据用户后续输入的语音流信息,将用户输入的所有语音流信息合并起来一起执行总语音流信息对应的响应功能等,本实施例对此不作限定。当信息完整状态为信息未完整时,根据第二停顿时长确定对应的交互响应策略。示例性地,第二停顿时长对应的交互响应策略可以包括但不限于给用户发送提示信息,提醒用户还未输入完整,是否需要继续输入;根据用户的历史行为数据或者是用户已输入的语音流信息推断相关或相似的响应功能,已列表的形式推送给用户,供用户选择等,本实施例对此不作限定。Specifically, when the user inputs the first voice stream information, the terminal device performs semantic recognition on it to determine the complete state of the first voice stream information, and when the complete state of the information is complete information, determines the corresponding interaction response according to the first pause duration Strategy. Exemplarily, the interactive response strategy corresponding to the first pause duration may include, but is not limited to, directly executing the response function corresponding to the first voice stream, and combining all the voice stream information input by the user according to the voice stream information subsequently input by the user and executing them together. The response function corresponding to the total voice flow information, etc., is not limited in this embodiment. When the information integrity state is that the information is incomplete, a corresponding interaction response strategy is determined according to the second pause duration. Exemplarily, the interactive response strategy corresponding to the second pause duration may include, but is not limited to, sending prompt information to the user to remind the user whether the input has not been completed and whether to continue inputting; The information infers related or similar response functions, and is pushed to the user in the form of a list for the user to select, etc., which is not limited in this embodiment.
需要说明的是,本实施例对于第一停顿时长与第二停顿时长的长短及具体数值并不作限定,例如,是第一语音流的信息完整时对应的第一停顿时长,信息未完整时对应的第二停顿时长,可以将第一停顿时长设置的比第二停顿时长短一些,这样设置的好处,在于能够保证在语义完整时快速响应该第一语音信息,同时,考虑的语音输入时出现的临时停顿问题,延长停顿时长以确定语音输入是否结束,提升响应方式的智能化程度,提升用户体验。It should be noted that this embodiment does not limit the lengths and specific values of the first pause duration and the second pause duration. For example, the first pause duration corresponds to when the information of the first voice stream is complete, and corresponds to when the information is incomplete. The second pause duration can be set to be shorter than the second pause duration. The advantage of this setting is that it can ensure a quick response to the first voice information when the semantics are complete, and at the same time, the considered voice input appears when the To solve the temporary pause problem, prolong the pause time to determine whether the voice input is over, improve the intelligence of the response method, and improve the user experience.
还需说明的是,本实施例中提到的“第一”以及“第二”并不表示任何顺序、数量或者重要性,只是其所对应的目标主体不同。It should also be noted that the "first" and "second" mentioned in this embodiment do not indicate any order, quantity or importance, but only that the corresponding target subjects are different.
本发明实施例的技术方案,通过接收用户输入的语音流信息,并对其进行语义识别,当检测到用户输入的语音流信息中出现停顿时,根据语义识别结果确定语音流的信息完整状态以及停顿时长,进而,可以根据信息完整状态以及停顿时长确定对应的交互响应策略,解决了现有技术中由于用户输入的语音信息中出现的停顿而影响设备对用户的语义意图的理解,实现了可以根据语音信息的完整状态以及停顿时长分别确定不同的响应策略,有效提高了语义理解的准确率以及对于用户需求的响应速度,提升用户体验。According to the technical solution of the embodiment of the present invention, by receiving the voice stream information input by the user and performing semantic recognition on it, when a pause occurs in the voice stream information input by the user is detected, the information integrity status of the voice stream and The pause duration, and further, the corresponding interactive response strategy can be determined according to the complete state of the information and the pause duration, which solves the problem in the prior art that the pause in the voice information input by the user affects the device's understanding of the user's semantic intention, and realizes the Different response strategies are determined according to the complete state of the voice information and the pause time, which effectively improves the accuracy of semantic understanding and the speed of response to user needs, and improves user experience.
实施例二Embodiment 2
图2为本发明实施例二所提供的一种语音识别方法的流程示意图,在上述技术方案的基础上,本实施例对技术方案进行了进一步细化。本实施例在本发明实施例中任一可选技术方案的基础上,可选地,所述根据所述信息完整状态以及所述停顿时长确定与所述第一语音流对应的交互响应策略,包括:如果所述语音流的语义已经完整,则根据所述停顿时长与所述第一时长阈值确定与所述语音交互信息对应的交互响应策略;如果所述语音流的语义未完整,则根据所述停顿时长与所述第二时长阈值确定与所述语音交互信息对应的交互响应策略;其中,所述第一时长阈值小于或等于第二时长阈值。FIG. 2 is a schematic flowchart of a speech recognition method according to Embodiment 2 of the present invention. On the basis of the above technical solution, this embodiment further refines the technical solution. This embodiment is based on any optional technical solution in the embodiment of the present invention, optionally, the interactive response strategy corresponding to the first voice stream is determined according to the information integrity state and the pause duration, Including: if the semantics of the voice stream is complete, determining an interaction response strategy corresponding to the voice interaction information according to the pause duration and the first duration threshold; if the semantics of the voice stream is incomplete, according to The pause duration and the second duration threshold determine an interaction response strategy corresponding to the voice interaction information; wherein the first duration threshold is less than or equal to the second duration threshold.
其中,与上述实施例相同或者相应的技术术语在此不在赘述。如图2所示,本实施例的方法具体包括如下步骤:Wherein, the technical terms that are the same as or corresponding to the above-mentioned embodiments will not be repeated here. As shown in Figure 2, the method of this embodiment specifically includes the following steps:
S210、接收用户输入的用于进行语音交互的第一语音流,并对第一语音流进行语义识别。S210. Receive a first voice stream input by a user for performing voice interaction, and perform semantic recognition on the first voice stream.
S220、当检测到第一语音流发生停顿时,根据语义识别的结果确定第一语音流的信息完整状态,并确定第一语音流发生的停顿时长。S220. When it is detected that the first voice stream is paused, determine the information integrity state of the first voice stream according to the result of semantic recognition, and determine the pause duration of the first voice stream.
S230、如果语音流的语义已经完整,则根据停顿时长与第一时长阈值确定与第一语音流对应的交互响应策略。S230. If the semantics of the speech stream is complete, determine an interaction response strategy corresponding to the first speech stream according to the pause duration and the first duration threshold.
其中,语音流的语义已经完整可以理解为语音流具有完整的语义意图。第一时长阈值可以为预先设置的,示例性地,第一时长阈值可以为0.4秒、0.6秒或者1秒等。Among them, the semantics of the speech stream is complete and can be understood as the speech stream has complete semantic intent. The first duration threshold may be preset, for example, the first duration threshold may be 0.4 seconds, 0.6 seconds, or 1 second.
具体地,当检测到第一语音流中出现停顿时,并根据语义识别结果确定第一语音流的语义已经完整,则需要对停顿时长与预先设置的第一时长阈值进行比较,从而,可以确定不同的交互响应策略。Specifically, when a pause is detected in the first speech stream, and it is determined according to the semantic recognition result that the semantics of the first speech stream is complete, it is necessary to compare the pause duration with the preset first duration threshold, so that it can be determined that Different interactive response strategies.
可选地,如果语音流的语义已经完整,则判断在第一时长阈值内是否接收到新输入的第二语音流;若否,则响应于第一语音流,执行与第一语音流对应的交互功能。若是,则继续对第二语音流进行语义识别。Optionally, if the semantics of the voice stream is complete, it is judged whether a newly input second voice stream is received within the first duration threshold; Interactive function. If so, continue to perform semantic recognition on the second speech stream.
在具体实施中,如果根据语义识别结果确定第一语音流的语义已经完整,则可以将停顿时长与第一时长阈值进行比较,判断是否在第一时长阈值内接收到用户继续输入的第二语音流,根据判断结果,执行对应的交互响应策略,具体可以是,如果终端设备在第一时长阈值内没有接收到用户输入的其他语音信息,则可以对第一语音流的语义识别结果进行响应,执行与第一语音流中包含的语义意图对应的交互响应功能;相应地,如果终端设备在第一时长阈值内接收到用户新输入的第二语音流信息,则可以认为接收到的第二语音流信息为有效语音信息,继续对第二语音流信息进行语义识别,确定语义识别的结果。In a specific implementation, if it is determined according to the semantic recognition result that the semantics of the first speech stream is complete, the pause duration can be compared with the first duration threshold to determine whether the second speech input by the user is received within the first duration threshold. flow, according to the judgment result, execute the corresponding interactive response strategy, specifically, if the terminal device does not receive other voice information input by the user within the first duration threshold, it can respond to the semantic recognition result of the first voice flow, Execute the interactive response function corresponding to the semantic intent contained in the first voice stream; accordingly, if the terminal device receives the second voice stream information newly input by the user within the first duration threshold, it can be considered that the received second voice stream The stream information is valid speech information, and the semantic recognition of the second speech stream information is continued to determine the result of the semantic recognition.
为了清楚的介绍本步骤的具体实施方式,以终端设备为车载语音交互系统来对其进行说明,例如,用户在输入“打开空调”后出现停顿,终端设备对停顿前的“打开空调”语音信息进行语义识别,解析得到的结果认为“打开空调”的语义意图已经完整,进一步地,继续按第一时长阈值进行等待,如果在第一时长阈值内系统没有再接收到用户输入的语音信息,则在第一时长阈值后,设备停止接收语音信息以及语义识别,直接执行“打开空调”所对应的响应功能,如,系统回复“空调已经打开了”。In order to clearly introduce the specific implementation of this step, the terminal device is used as a vehicle-mounted voice interaction system to describe it. For example, if the user pauses after inputting "turn on the air conditioner", the terminal device responds to the voice information of "turn on the air conditioner" before the pause. Perform semantic recognition, and the result obtained from the analysis is that the semantic intention of "turn on the air conditioner" is complete. Further, continue to wait according to the first duration threshold. If the system does not receive any more voice information input by the user within the first duration threshold, then After the first duration threshold, the device stops receiving voice information and semantic recognition, and directly executes the response function corresponding to "turn on the air conditioner", for example, the system replies "the air conditioner has been turned on".
S240、如果语音流的语义未完整,则根据停顿时长与第二时长阈值确定与第一语音流对应的交互响应策略。S240. If the semantics of the speech stream is incomplete, determine an interaction response strategy corresponding to the first speech stream according to the pause duration and the second duration threshold.
其中,语音流的语义未完整可以理解为接收到的语音流信息不具有完整的语义意图。类似地,第二时长阈值也可以是预先设置的,例如,第二时长阈值可以是3秒、4秒或5秒等。Among them, the incomplete semantics of the speech stream can be understood as the received speech stream information does not have complete semantic intent. Similarly, the second duration threshold may also be preset, for example, the second duration threshold may be 3 seconds, 4 seconds, or 5 seconds.
具体地,当检测到第一语音流出现停顿时,并根据语义识别结果确定第一语音流的语义未完整,则需要将停顿时长与第二时长阈值进行比较,根据比较结果确定不同的交互响应策略。Specifically, when a pause in the first speech stream is detected, and it is determined that the semantics of the first speech stream is incomplete according to the semantic recognition result, the pause duration needs to be compared with the second duration threshold, and different interactive responses are determined according to the comparison result. Strategy.
可选地,如果语音流的语义未完整,则判断在第二时长阈值内是否接收到新输入的第二语音流;若是,则继续对第二语音流进行语义识别;若否,则执行预先设置的超时响应策略。Optionally, if the semantics of the speech stream is not complete, then determine whether a newly input second speech stream is received within the second duration threshold; if so, continue to perform semantic recognition on the second speech stream; Set timeout response policy.
其中,超时响应策略可以理解为用户在一定时间内没有继续输入,且之前输入的信息没有完整的语义意图时的应对方法及策略。示例性地,超时响应策略可以为给用户发送提示信息,提示用户已输入的语音流未完整,是否需要继续输入,还可以为给用户提供有可能的相关功能列表,让用户可以从中选择,也可以是使设备保持当前状态,无响应等,本实施例对此不作限定。其中,相关功能列表的判断依据可以包括但不限于根据已输入的语音信息的语义推断或者是从历史行为数据中寻找用户使用频率最多的功能等。Among them, the timeout response strategy can be understood as the response method and strategy when the user does not continue to input within a certain period of time, and the previously input information does not have complete semantic intentions. Exemplarily, the timeout response policy may be to send a prompt message to the user, prompting the user that the input voice stream is incomplete and whether to continue inputting, and may also provide the user with a list of possible related functions so that the user can choose from them, or It may be to keep the device in the current state, without a response, etc., which is not limited in this embodiment. Wherein, the judgment basis of the related function list may include, but is not limited to, semantic inference based on the input voice information or searching for the functions most frequently used by the user from historical behavior data.
在具体实施中,如果根据语义识别结果确定用户输入的第一语音流不具有完整的语义意图,则需要将停顿时长与第二时长阈值进行比较,判断在第二时长阈值内是否接收到用户继续输入的第二语音流。如果接收到用户在第二时长阈值内新输入的第二语音流信息,则继续对第二语音流进行语义识别,如果没有接收到用户新输入的语音流信息,则需要执行超时响应策略。In a specific implementation, if it is determined according to the semantic recognition result that the first speech stream input by the user does not have complete semantic intent, it is necessary to compare the pause duration with the second duration threshold to determine whether the user continues to be received within the second duration threshold. Input second voice stream. If the second voice stream information newly input by the user within the second duration threshold is received, continue to perform semantic recognition on the second voice stream, and if no voice stream information newly input by the user is received, a timeout response policy needs to be implemented.
为了清楚的介绍本步骤的具体实施方式,以终端设备为车载语音交互系统来对其进行说明,例如,如果用户输入“打开空调”中的“打开”和“空调”之间出现停顿,系统对停顿前的“打开”进行语义识别,识别出的结果是“打开”的语义意图并不完整,然后,继续按第二时长阈值的时间(例如,第二时长阈值可以为5秒),如果用户在5秒内继续输入“空调”两个字,则系统对“打开空调”四个字进行语义识别解析,确定正确的语义意图,并执行打开空调的工作,如果用户在5秒内没有继续输入后续的语音信息,则5秒过后会执行超时响应策略(例如,系统会问“您想打开什么?”等)。In order to clearly introduce the specific implementation of this step, the terminal device is used as the vehicle-mounted voice interaction system to describe it. The "open" before the pause is semantically recognized, and the result of the recognition is that the semantic intent of "open" is incomplete, and then, continue to press the time of the second duration threshold (for example, the second duration threshold can be 5 seconds), if the user Continue to input the word "air conditioner" within 5 seconds, the system will perform semantic recognition and analysis on the words "turn on the air conditioner", determine the correct semantic intent, and execute the work of turning on the air conditioner. If the user does not continue to input within 5 seconds For subsequent voice messages, the timeout response strategy will be executed after 5 seconds (for example, the system will ask "what do you want to open?", etc.).
需要说明的是,在本实施例中,可选地,第一时长阈值可以小于或者等于第二时长阈值。It should be noted that, in this embodiment, optionally, the first duration threshold may be less than or equal to the second duration threshold.
还需说明的是,本实施例中提到的“第一”以及“第二”并不表示任何顺序、数量或者重要性,只是其所对应的目标主体不同。It should also be noted that the "first" and "second" mentioned in this embodiment do not indicate any order, quantity or importance, but only that the corresponding target subjects are different.
本发明实施例的技术方案,通过接收用户输入的第一语音流信息,并对其进行语义识别,当检测到用户输入的语音流信息中出现停顿时,根据语义识别结果确定第一语音流信息的完整状态以及停顿时长,进而,分别根据信息完整以及信息未完整,设置不同的时长阈值,并进一步确定对应的交互响应策略,解决了现有技术中因停顿时长阈值设置过长,因响应速度慢影响用户体验,或者,停顿时长阈值设置过短,则用户无意的停顿后还未来得及继续输入后续的语音信息,导致终端未能识别用户完整的语义意图而不能正确执行命令,更影响用户体验的问题,实现了可以在检测到用户输入出现停顿时,根据停顿前的语义完整状态设置不同的停顿时长阈值,进而,确定对应的交互响应策略,从而可以更加快速响应用户需求并提高在用户输入出现停顿时的用户意图识别的完整性的有益效果。According to the technical solution of the embodiment of the present invention, by receiving the first voice stream information input by the user and performing semantic recognition on it, when a pause occurs in the voice stream information input by the user is detected, the first voice stream information is determined according to the semantic recognition result. The complete state and pause duration of the system, and then, according to the complete information and incomplete information, different duration thresholds are set, and the corresponding interactive response strategy is further determined, which solves the problem in the prior art due to the setting of the pause duration threshold is too long and the response speed. Slow affects the user experience, or, if the pause duration threshold is set too short, the user will not have time to continue to input subsequent voice information after unintentional pause, resulting in the terminal failing to recognize the user's complete semantic intent and unable to execute the command correctly, which affects the user experience even more. It realizes that when a pause in user input is detected, different pause duration thresholds can be set according to the semantic integrity state before the pause, and then the corresponding interactive response strategy can be determined, so that it can respond to user needs more quickly and improve user input. Beneficial effect of completeness of user intent recognition when pauses occur.
实施例三Embodiment 3
为了使本领域技术人员进一步清楚本发明实施例的技术方案,下文以终端设备为车载语音系统,给出具体的应用场景实例。To make the technical solutions of the embodiments of the present invention clear to those skilled in the art, a specific application scenario example is given below by taking the terminal device as the vehicle-mounted voice system.
图3为本发明实施例三所提供的一种应用场景下语音识别方法的流程示意图;本发明实施例为上述发明实施例的一个优选实施例,参见图3所示,本发明实施例的方法包括如下步骤;FIG. 3 is a schematic flowchart of a speech recognition method in an application scenario provided by Embodiment 3 of the present invention; the embodiment of the present invention is a preferred embodiment of the above-mentioned embodiments of the present invention. Referring to FIG. 3 , the method of the embodiment of the present invention is shown in FIG. Including the following steps;
第一步:用户输入第一语音流;Step 1: The user inputs the first voice stream;
第二步:对用户输入的第一语音流进行语音识别(Automatic SpeechRecognition,ASR);Step 2: Perform speech recognition (Automatic Speech Recognition, ASR) on the first speech stream input by the user;
第三步:对用户输入的第一语音流进行语义理解(Natural LanguageUnderstanding,NLU);The third step: perform semantic understanding (Natural Language Understanding, NLU) on the first voice stream input by the user;
第四步:通过语音活动检测技术(Voice Activity Detection,VAD)技术检测用户输入的第一语音流出现停顿;Step 4: Detect the pause in the first voice stream input by the user through the Voice Activity Detection (VAD) technology;
第五步:当检测到第一语音流出现停顿时,根据语义识别结果判断第一语音流的信息状态是否完整,并确定停顿时长;Step 5: when it is detected that the first voice stream pauses, determine whether the information state of the first voice stream is complete according to the semantic recognition result, and determine the pause duration;
第六步:如果第一语音流的信息完整状态为信息完整,继续等待t1时间,判断在t1时间内是否有用户新输入的第二语音流;Step 6: If the information integrity status of the first voice stream is complete, continue to wait for the time t1, and determine whether there is a second voice stream newly input by the user within the time t1;
第七步:如果在t1时间内有用户继续输入的第二语音流,则继续对第二语音流进行语音识别、语义理解等过程;Step 7: If there is a second voice stream that the user continues to input within the time t1, continue to perform processes such as voice recognition and semantic understanding on the second voice stream;
第八步:如果在t1时间内没有用户新输入的语音信息,则执行停顿前第一语音流信息对应的响应功能;Step 8: If there is no voice information newly input by the user within the time t1, execute the response function corresponding to the first voice stream information before the pause;
第九步:如果第一语音流的信息完整状态为信息未完整,继续等待t2时间,判断在t2时间内是否有用户新输入的第二语音流;Step 9: If the information integrity status of the first voice stream is incomplete, continue to wait for the time t2, and determine whether there is a second voice stream newly input by the user within the time t2;
第十步:如果在t2时间内用户有用户继续输入的第二语音流,则继续对第二语音流进行语音识别、语义理解等过程;Step 10: If the user has a second voice stream that the user continues to input within the time t2, continue to perform processes such as voice recognition and semantic understanding on the second voice stream;
第十一步:如果在t2时间内没有用于新输入的第二语音流,则执行超时响应策略。Step 11: If there is no second voice stream for new input within t2 time, execute a timeout response policy.
本发明实施例的技术方案,通过接收用户输入的第一语音流信息,并对其进行语义识别,当检测到用户输入的语音流信息中出现停顿时,根据语义识别结果确定第一语音流信息的完整状态以及停顿时长,进而,分别根据信息完整以及信息未完整,设置不同的时长阈值,并进一步确定对应的交互响应策略,解决了现有技术中因停顿时长阈值设置过长,因响应速度慢影响用户体验,或者,停顿时长阈值设置过短,则用户无意的停顿后还未来得及继续输入后续的语音信息,导致终端未能识别用户完整的语义意图而不能正确执行命令,更影响用户体验的问题,实现了可以在检测到用户输入出现停顿时,根据停顿前的语义完整状态设置不同的停顿时长阈值,进而,确定对应的交互响应策略,从而可以更加快速响应用户需求并提高在用户输入出现停顿时的用户意图识别的完整性的有益效果。According to the technical solution of the embodiment of the present invention, by receiving the first voice stream information input by the user and performing semantic recognition on it, when a pause occurs in the voice stream information input by the user is detected, the first voice stream information is determined according to the semantic recognition result. The complete state and pause duration of the system, and then, according to the complete information and incomplete information, different duration thresholds are set, and the corresponding interactive response strategy is further determined, which solves the problem in the prior art due to the setting of the pause duration threshold is too long and the response speed. Slow affects the user experience, or, if the pause duration threshold is set too short, the user will not have time to continue to input subsequent voice information after unintentional pause, resulting in the terminal failing to recognize the user's complete semantic intent and unable to execute the command correctly, which affects the user experience even more. It realizes that when a pause in user input is detected, different pause duration thresholds can be set according to the semantic integrity state before the pause, and then the corresponding interactive response strategy can be determined, so that it can respond to user needs more quickly and improve user input. Beneficial effect of completeness of user intent recognition when pauses occur.
实施例四Embodiment 4
图4为本发明实施例四所提供的一种语音识别装置的结构示意图,本实施例所提供的语音识别装置可以通过软件和/或硬件来实现,可配置于终端和/或服务器中来实现本发明实施例中的语音识别方法。该装置具体可包括:语义识别模块410、状态确定模块420和响应策略确定模块430。FIG. 4 is a schematic structural diagram of a speech recognition apparatus provided in Embodiment 4 of the present invention. The speech recognition apparatus provided in this embodiment may be implemented by software and/or hardware, and may be configured in a terminal and/or a server to implement The speech recognition method in the embodiment of the present invention. The apparatus may specifically include: a
其中,语义识别模块410,用于接收用户输入的用于进行语音交互的第一语音流,并对第一语音流进行语义识别;Wherein, the
状态确定模块420,用于当检测到第一语音流发生停顿时,根据语义识别的结果确定第一语音流的信息完整状态,并确定第一语音流发生的停顿时长;The
响应策略确定模块430,用于根据信息完整状态以及停顿时长确定与第一语音流对应的交互响应策略。The response
本发明实施例的技术方案,通过接收用户输入的语音流信息,并对其进行语义识别,当检测到用户输入的语音流信息中出现停顿时,根据语义识别结果确定语音流的信息完整状态以及停顿时长,进而,可以根据信息完整状态以及停顿时长确定对应的交互响应策略,解决了现有技术中由于用户输入的语音信息中出现的停顿而影响设备对用户的语义意图的理解,实现了可以根据语音信息的完整状态以及停顿时长分别确定不同的响应策略,有效提高了语义理解的准确率以及对于用户需求的响应速度,提升用户体验。According to the technical solution of the embodiment of the present invention, by receiving the voice stream information input by the user and performing semantic recognition on it, when a pause occurs in the voice stream information input by the user is detected, the information integrity status of the voice stream and The pause duration, and further, the corresponding interactive response strategy can be determined according to the complete state of the information and the pause duration, which solves the problem in the prior art that the pause in the voice information input by the user affects the device's understanding of the user's semantic intention, and realizes the Different response strategies are determined according to the complete state of the voice information and the pause time, which effectively improves the accuracy of semantic understanding and the speed of response to user needs, and improves user experience.
可选地,所述信息完整状态包括信息已经完整和信息未完整。Optionally, the information complete state includes complete information and incomplete information.
可选地,响应策略确定模块430包括第一响应策略确定单元和第二响应策略确定单元。Optionally, the response
其中,第一响应策略确定单元,用于如果语音流的语义已经完整,则根据停顿时长与第一时长阈值确定与第一语音流对应的交互响应策略;第二响应策略确定单元,用于如果语音流的语义未完整,则根据停顿时长与第二时长阈值确定与第一语音流对应的交互响应策略;其中,第一时长阈值小于或等于第二时长阈值。Wherein, the first response strategy determination unit is used to determine the interactive response strategy corresponding to the first speech stream according to the pause duration and the first duration threshold if the semantics of the speech stream is complete; the second response strategy determination unit is used to determine if If the semantics of the speech stream is not complete, the interaction response strategy corresponding to the first speech stream is determined according to the pause duration and the second duration threshold, wherein the first duration threshold is less than or equal to the second duration threshold.
可选地,第一响应策略确定单元,还用于如果语音流的语义已经完整,则判断在第一时长阈值内是否接收到新输入的第二语音流;若否,则响应于第一语音流,执行与第一语音流对应的交互功能。Optionally, the first response strategy determining unit is further configured to, if the semantics of the speech stream is complete, determine whether a newly input second speech stream is received within the first duration threshold; if not, respond to the first speech stream, and execute the interactive function corresponding to the first voice stream.
可选地,第一响应策略确定单元,还用于如果语音流的语义已经完整,则判断在第一时长阈值内是否接收到新输入的第二语音流;若是,则继续对第二语音流进行语义识别。Optionally, the first response strategy determination unit is further configured to, if the semantics of the voice stream is complete, determine whether a newly input second voice stream is received within the first duration threshold; if so, continue to process the second voice stream. perform semantic recognition.
可选地,第二响应策略确定单元,还用于如果语音流的语义已经完整,则判断在第二时长阈值内是否接收到新输入的第二语音流;若是,则继续对第二语音流进行语义识别。Optionally, the second response strategy determining unit is further configured to, if the semantics of the voice stream is complete, determine whether a newly input second voice stream is received within the second duration threshold; if so, continue to process the second voice stream. perform semantic recognition.
可选地,第二响应策略确定单元,还用于如果语音流的语义已经完整,则判断在第二时长阈值内是否接收到新输入的第二语音流;若否,则执行预先设置的超时响应策略。Optionally, the second response strategy determination unit is further configured to, if the semantics of the voice stream is complete, determine whether a newly input second voice stream is received within the second duration threshold; if not, execute a preset timeout response strategy.
可选地,状态确定模块420,还用于根据识别出的第一语音流的语义信息与预先设定的功能白名单中的各个响应功能之间的匹配度,确定第一语音流的信息完整状态。Optionally, the
上述语音识别装置可执行本发明任意实施例所提供的语音识别方法,具备执行语音识别方法相应的功能模块和有益效果。The above voice recognition apparatus can execute the voice recognition method provided by any embodiment of the present invention, and has corresponding functional modules and beneficial effects for executing the voice recognition method.
实施例五Embodiment 5
图5为本发明实施例五提供的一种电子设备的结构示意图。图5示出了适于用来实现本发明实施例实施方式的示例性电子设备50的框图。图5显示的电子设备50仅仅是一个示例,不应对本发明实施例的功能和使用范围带来任何限制。FIG. 5 is a schematic structural diagram of an electronic device according to Embodiment 5 of the present invention. Figure 5 shows a block diagram of an exemplary
如图5所示,电子设备50以通用计算设备的形式表现。电子设备50的组件可以包括但不限于:一个或者多个处理器或者处理单元501,系统存储器502,连接不同系统组件(包括系统存储器502和处理单元501)的总线503。As shown in FIG. 5,
总线503表示几类总线结构中的一种或多种,包括存储器总线或者存储器控制器,外围总线,图形加速端口,处理器或者使用多种总线结构中的任意总线结构的局域总线。举例来说,这些体系结构包括但不限于工业标准体系结构 (ISA)总线,微通道体系结构(MAC)总线,增强型ISA总线、视频电子标准协会(VESA)局域总线以及外围组件互连(PCI)总线。
电子设备50典型地包括多种计算机系统可读介质。这些介质可以是任何能够被电子设备50访问的可用介质,包括易失性和非易失性介质,可移动的和不可移动的介质。
系统存储器502可以包括易失性存储器形式的计算机系统可读介质,例如随机存取存储器(RAM)504和/或高速缓存存储器505。电子设备50可以进一步包括其它可移动/不可移动的、易失性/非易失性计算机系统存储介质。仅作为举例,存储系统506可以用于读写不可移动的、非易失性磁介质(图5未显示,通常称为“硬盘驱动器”)。尽管图5中未示出,可以提供用于对可移动非易失性磁盘(例如“软盘”)读写的磁盘驱动器,以及对可移动非易失性光盘 (例如CD-ROM,DVD-ROM或者其它光介质)读写的光盘驱动器。在这些情况下,每个驱动器可以通过一个或者多个数据介质接口与总线503相连。存储器502可以包括至少一个程序产品,该程序产品具有一组(例如至少一个)程序模块,这些程序模块被配置以执行本发明各实施例的功能。
具有一组(至少一个)程序模块507的程序/实用工具508,可以存储在例如存储器502中,这样的程序模块507包括但不限于操作系统、一个或者多个应用程序、其它程序模块以及程序数据,这些示例中的每一个或某种组合中可能包括网络环境的实现。程序模块507通常执行本发明所描述的实施例中的功能和/或方法。A program/
电子设备50也可以与一个或多个外部设备509(例如键盘、指向设备、显示器510等)通信,还可与一个或者多个使得用户能与该电子设备50交互的设备通信,和/或与使得该电子设备50能与一个或多个其它计算设备进行通信的任何设备(例如网卡,调制解调器等等)通信。这种通信可以通过输入/输出(I/O) 接口511进行。并且,电子设备50还可以通过网络适配器512与一个或者多个网络(例如局域网(LAN),广域网(WAN)和/或公共网络,例如因特网)通信。如图所示,网络适配器512通过总线503与电子设备50的其它模块通信。应当明白,尽管图5中未示出,可以结合电子设备50使用其它硬件和/或软件模块,包括但不限于:微代码、设备驱动器、冗余处理单元、外部磁盘驱动阵列、RAID系统、磁带驱动器以及数据备份存储系统等。The
处理单元501通过运行存储在系统存储器502中的程序,从而执行各种功能应用以及数据处理,例如实现本发明实施例所提供的语音识别方法。The
实施例六Embodiment 6
本发明实施例六还提供一种包含计算机可执行指令的存储介质,所述计算机可执行指令在由计算机处理器执行时用于执行一种语音识别方法,该方法包括:Embodiment 6 of the present invention also provides a storage medium containing computer-executable instructions, where the computer-executable instructions are used to execute a speech recognition method when executed by a computer processor, and the method includes:
接收用户输入的用于进行语音交互的第一语音流,并对所述第一语音流进行语义识别;receiving a first voice stream input by a user for performing voice interaction, and performing semantic recognition on the first voice stream;
当检测到所述第一语音流发生停顿时,根据语义识别的结果确定所述第一语音流的信息完整状态,并确定所述第一语音流发生的停顿时长;When it is detected that the first voice stream is paused, the information integrity state of the first voice stream is determined according to the result of semantic recognition, and the pause duration of the first voice stream is determined;
根据所述信息完整状态以及所述停顿时长确定与所述第一语音流对应的交互响应策略。An interaction response strategy corresponding to the first voice stream is determined according to the information integrity state and the pause duration.
本发明实施例的计算机存储介质,可以采用一个或多个计算机可读的介质的任意组合。计算机可读介质可以是计算机可读信号介质或者计算机可读存储介质。计算机可读存储介质例如可以是——但不限于——电、磁、光、电磁、红外线、或半导体的系统、装置或器件,或者任意以上的组合。计算机可读存储介质的更具体的例子(非穷举的列表)包括:具有一个或多个导线的电连接、便携式计算机磁盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、光纤、便携式紧凑磁盘只读存储器 (CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。在本文件中,计算机可读存储介质可以是任何包含或存储程序的有形介质,该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。The computer storage medium of the embodiments of the present invention may adopt any combination of one or more computer-readable mediums. The computer-readable medium may be a computer-readable signal medium or a computer-readable storage medium. The computer-readable storage medium can be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus or device, or a combination of any of the above. More specific examples (a non-exhaustive list) of computer readable storage media include: electrical connections having one or more wires, portable computer disks, hard disks, random access memory (RAM), read only memory (ROM), Erasable programmable read only memory (EPROM or flash memory), optical fiber, portable compact disk read only memory (CD-ROM), optical storage devices, magnetic storage devices, or any suitable combination of the above. In this document, a computer-readable storage medium can be any tangible medium that contains or stores a program that can be used by or in conjunction with an instruction execution system, apparatus, or device.
计算机可读的信号介质可以包括在基带中或者作为载波一部分传播的数据信号,其中承载了计算机可读的程序代码。这种传播的数据信号可以采用多种形式,包括但不限于电磁信号、光信号或上述的任意合适的组合。计算机可读的信号介质还可以是计算机可读存储介质以外的任何计算机可读介质,该计算机可读介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。A computer-readable signal medium may include a propagated data signal in baseband or as part of a carrier wave with computer-readable program code embodied thereon. Such propagated data signals may take a variety of forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing. A computer-readable signal medium can also be any computer-readable medium other than a computer-readable storage medium that can transmit, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device .
计算机可读介质上包含的程序代码可以用任何适当的介质传输,包括——但不限于无线、电线、光缆、RF等等,或者上述的任意合适的组合。Program code embodied on a computer readable medium may be transmitted using any suitable medium, including - but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
可以以一种或多种程序设计语言或其组合来编写用于执行本发明实施例操作的计算机程序代码,所述程序设计语言包括面向对象的程序设计语言—诸如Java、Smalltalk、C++,还包括常规的过程式程序设计语言——诸如“C”语言或类似的程序设计语言。程序代码可以完全地在用户计算机上执行、部分地在用户计算机上执行、作为一个独立的软件包执行、部分在用户计算机上部分在远程计算机上执行、或者完全在远程计算机或服务器上执行。在涉及远程计算机的情形中,远程计算机可以通过任意种类的网络——包括局域网(LAN)或广域网(WAN)—连接到用户计算机,或者,可以连接到外部计算机(例如利用因特网服务提供商来通过因特网连接)。Computer program code for carrying out the operations of embodiments of the present invention may be written in one or more programming languages, including object-oriented programming languages—such as Java, Smalltalk, C++, and A conventional procedural programming language - such as the "C" language or similar programming language. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer, or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or may be connected to an external computer (eg, using an Internet service provider through Internet connection).
注意,上述仅为本发明的较佳实施例及所运用技术原理。本领域技术人员会理解,本发明不限于这里所述的特定实施例,对本领域技术人员来说能够进行各种明显的变化、重新调整和替代而不会脱离本发明的保护范围。因此,虽然通过以上实施例对本发明进行了较为详细的说明,但是本发明不仅仅限于以上实施例,在不脱离本发明构思的情况下,还可以包括更多其他等效实施例,而本发明的范围由所附的权利要求范围决定。Note that the above are only preferred embodiments of the present invention and applied technical principles. Those skilled in the art will understand that the present invention is not limited to the specific embodiments described herein, and various obvious changes, readjustments and substitutions can be made by those skilled in the art without departing from the protection scope of the present invention. Therefore, although the present invention has been described in detail through the above embodiments, the present invention is not limited to the above embodiments, and can also include more other equivalent embodiments without departing from the concept of the present invention. The scope is determined by the scope of the appended claims.
Claims (10)
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202210155658.4A CN114582333A (en) | 2022-02-21 | 2022-02-21 | Voice recognition method and device, electronic equipment and storage medium |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202210155658.4A CN114582333A (en) | 2022-02-21 | 2022-02-21 | Voice recognition method and device, electronic equipment and storage medium |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| CN114582333A true CN114582333A (en) | 2022-06-03 |
Family
ID=81774791
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN202210155658.4A Pending CN114582333A (en) | 2022-02-21 | 2022-02-21 | Voice recognition method and device, electronic equipment and storage medium |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN114582333A (en) |
Cited By (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN116168700A (en) * | 2023-02-23 | 2023-05-26 | 广州小鹏汽车科技有限公司 | Voice interaction method, vehicle and computer-readable storage medium |
| CN117457003A (en) * | 2023-12-26 | 2024-01-26 | 四川蜀天信息技术有限公司 | Stream type voice recognition method, device, medium and equipment |
| WO2024078419A1 (en) * | 2022-10-14 | 2024-04-18 | 华为技术有限公司 | Voice interaction method, voice interaction apparatus and electronic device |
| CN119811378A (en) * | 2025-03-12 | 2025-04-11 | 科大讯飞股份有限公司 | Streaming voice interaction method, related device, equipment and storage medium |
Citations (11)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20160217124A1 (en) * | 2015-01-23 | 2016-07-28 | Microsoft Technology Licensing, Llc | Methods for understanding incomplete natural language query |
| CN107146602A (en) * | 2017-04-10 | 2017-09-08 | 北京猎户星空科技有限公司 | A kind of audio recognition method, device and electronic equipment |
| CN109377998A (en) * | 2018-12-11 | 2019-02-22 | 科大讯飞股份有限公司 | A kind of voice interactive method and device |
| CN109599130A (en) * | 2018-12-10 | 2019-04-09 | 百度在线网络技术(北京)有限公司 | Reception method, device and storage medium |
| US20190206397A1 (en) * | 2017-12-29 | 2019-07-04 | Microsoft Technology Licensing, Llc | Full duplex communication for conversation between chatbot and human |
| KR20190118995A (en) * | 2019-10-01 | 2019-10-21 | 엘지전자 주식회사 | Speech processing method and apparatus therefor |
| CN111128168A (en) * | 2019-12-30 | 2020-05-08 | 斑马网络技术有限公司 | Voice control method, device and storage medium |
| CN112053687A (en) * | 2020-07-31 | 2020-12-08 | 出门问问信息科技有限公司 | Voice processing method and device, computer readable storage medium and equipment |
| CN112382279A (en) * | 2020-11-24 | 2021-02-19 | 北京百度网讯科技有限公司 | Voice recognition method and device, electronic equipment and storage medium |
| CN113035180A (en) * | 2021-03-22 | 2021-06-25 | 建信金融科技有限责任公司 | Voice input integrity judgment method and device, electronic equipment and storage medium |
| EP3876231A1 (en) * | 2020-03-04 | 2021-09-08 | Beijing Baidu Netcom Science and Technology Co., Ltd | Method and apparatus for recognizing speech |
-
2022
- 2022-02-21 CN CN202210155658.4A patent/CN114582333A/en active Pending
Patent Citations (11)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20160217124A1 (en) * | 2015-01-23 | 2016-07-28 | Microsoft Technology Licensing, Llc | Methods for understanding incomplete natural language query |
| CN107146602A (en) * | 2017-04-10 | 2017-09-08 | 北京猎户星空科技有限公司 | A kind of audio recognition method, device and electronic equipment |
| US20190206397A1 (en) * | 2017-12-29 | 2019-07-04 | Microsoft Technology Licensing, Llc | Full duplex communication for conversation between chatbot and human |
| CN109599130A (en) * | 2018-12-10 | 2019-04-09 | 百度在线网络技术(北京)有限公司 | Reception method, device and storage medium |
| CN109377998A (en) * | 2018-12-11 | 2019-02-22 | 科大讯飞股份有限公司 | A kind of voice interactive method and device |
| KR20190118995A (en) * | 2019-10-01 | 2019-10-21 | 엘지전자 주식회사 | Speech processing method and apparatus therefor |
| CN111128168A (en) * | 2019-12-30 | 2020-05-08 | 斑马网络技术有限公司 | Voice control method, device and storage medium |
| EP3876231A1 (en) * | 2020-03-04 | 2021-09-08 | Beijing Baidu Netcom Science and Technology Co., Ltd | Method and apparatus for recognizing speech |
| CN112053687A (en) * | 2020-07-31 | 2020-12-08 | 出门问问信息科技有限公司 | Voice processing method and device, computer readable storage medium and equipment |
| CN112382279A (en) * | 2020-11-24 | 2021-02-19 | 北京百度网讯科技有限公司 | Voice recognition method and device, electronic equipment and storage medium |
| CN113035180A (en) * | 2021-03-22 | 2021-06-25 | 建信金融科技有限责任公司 | Voice input integrity judgment method and device, electronic equipment and storage medium |
Cited By (6)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2024078419A1 (en) * | 2022-10-14 | 2024-04-18 | 华为技术有限公司 | Voice interaction method, voice interaction apparatus and electronic device |
| CN116168700A (en) * | 2023-02-23 | 2023-05-26 | 广州小鹏汽车科技有限公司 | Voice interaction method, vehicle and computer-readable storage medium |
| CN117457003A (en) * | 2023-12-26 | 2024-01-26 | 四川蜀天信息技术有限公司 | Stream type voice recognition method, device, medium and equipment |
| CN117457003B (en) * | 2023-12-26 | 2024-03-08 | 四川蜀天信息技术有限公司 | Stream type voice recognition method, device, medium and equipment |
| CN119811378A (en) * | 2025-03-12 | 2025-04-11 | 科大讯飞股份有限公司 | Streaming voice interaction method, related device, equipment and storage medium |
| CN119811378B (en) * | 2025-03-12 | 2025-06-17 | 科大讯飞股份有限公司 | Streaming voice interaction method and related device, equipment and storage medium |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN114582333A (en) | Voice recognition method and device, electronic equipment and storage medium | |
| CN109658932B (en) | Equipment control method, device, equipment and medium | |
| CN108055617B (en) | A wake-up method, device, terminal device and storage medium for a microphone | |
| CN109192208B (en) | A control method, system, device, device and medium for electrical equipment | |
| US20190066671A1 (en) | Far-field speech awaking method, device and terminal device | |
| CN109240107B (en) | Control method and device of electrical equipment, electrical equipment and medium | |
| US11393490B2 (en) | Method, apparatus, device and computer-readable storage medium for voice interaction | |
| CN107886944B (en) | Voice recognition method, device, equipment and storage medium | |
| CN107103906B (en) | A method, smart device and medium for waking up a smart device for speech recognition | |
| CN109036406A (en) | A kind of processing method of voice messaging, device, equipment and storage medium | |
| CN109992239A (en) | Voice travel method, device, terminal and storage medium | |
| CN108922564A (en) | Emotion identification method, apparatus, computer equipment and storage medium | |
| CN108831477B (en) | A speech recognition method, device, equipment and storage medium | |
| CN110047481A (en) | Method for voice recognition and device | |
| US10891945B2 (en) | Method and apparatus for judging termination of sound reception and terminal device | |
| CN107220532A (en) | For the method and apparatus by voice recognition user identity | |
| TW201520896A (en) | A method for operating a program and a device thereof | |
| CN112185371B (en) | Voice interaction method, device, equipment and computer storage medium | |
| US10950221B2 (en) | Keyword confirmation method and apparatus | |
| CN112286485A (en) | Method and device for controlling application through voice, electronic equipment and storage medium | |
| CN112489660A (en) | Vehicle-mounted voice recognition method, device, equipment and storage medium | |
| WO2023024455A1 (en) | Voice interaction method and electronic device | |
| CN118553245A (en) | Voice instruction recognition method and device for household equipment and household equipment | |
| CN115440220B (en) | A method, device, equipment and storage medium for switching speech rights | |
| CN112581957B (en) | Computer voice control method, system and related device |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| RJ01 | Rejection of invention patent application after publication | ||
| RJ01 | Rejection of invention patent application after publication |
Application publication date: 20220603 |