[go: up one dir, main page]

CN114242068B - Voice processing method, device, electronic device and storage medium - Google Patents

Voice processing method, device, electronic device and storage medium Download PDF

Info

Publication number
CN114242068B
CN114242068B CN202111391231.6A CN202111391231A CN114242068B CN 114242068 B CN114242068 B CN 114242068B CN 202111391231 A CN202111391231 A CN 202111391231A CN 114242068 B CN114242068 B CN 114242068B
Authority
CN
China
Prior art keywords
information
voice
voice information
user
database
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111391231.6A
Other languages
Chinese (zh)
Other versions
CN114242068A (en
Inventor
袁志伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
FAW Group Corp
Original Assignee
FAW Group Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by FAW Group Corp filed Critical FAW Group Corp
Priority to CN202111391231.6A priority Critical patent/CN114242068B/en
Publication of CN114242068A publication Critical patent/CN114242068A/en
Application granted granted Critical
Publication of CN114242068B publication Critical patent/CN114242068B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • G10L15/1822Parsing for meaning understanding
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Artificial Intelligence (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

本发明实施例公开了一种语音处理方法、装置、电子设备和存储介质。该方法包括:在播放引导信息后,获取用户输入的第一语音信息,其中,引导信息是基于用户输入的第二语音信息生成的;若第一语音信息与引导信息匹配,则根据第一语音信息,获取第一语音解析结果,其中,第二语音信息与第一语音解析结果关联;根据第一语音解析结果,播放第一反馈信息。通过本发明实施例中的方案,可以实现动态引导用户输入语音信息,从而提高语音处理的工作效率;将无法理解的语音信息与可理解的语音信息结果相关联,从而实现在线学习,可以提高语音处理结果的精确度,进一步提高用户体验。

The embodiment of the present invention discloses a voice processing method, device, electronic device and storage medium. The method comprises: after playing the guidance information, obtaining the first voice information input by the user, wherein the guidance information is generated based on the second voice information input by the user; if the first voice information matches the guidance information, obtaining the first voice analysis result according to the first voice information, wherein the second voice information is associated with the first voice analysis result; and playing the first feedback information according to the first voice analysis result. Through the scheme in the embodiment of the present invention, it is possible to dynamically guide the user to input voice information, thereby improving the working efficiency of voice processing; associating the incomprehensible voice information with the understandable voice information result, thereby realizing online learning, which can improve the accuracy of the voice processing result and further improve the user experience.

Description

Voice processing method, device, electronic equipment and storage medium
Technical Field
Embodiments of the present invention relate to natural language processing technologies, and in particular, to a method and apparatus for processing speech, an electronic device, and a storage medium.
Background
With the continuous development of the intelligent question-answering field, the voice assistant is increasingly widely applied to the fields of intelligent home, intelligent vehicle, intelligent customer service and the like.
The existing voice assistant often generates a situation that the user instruction cannot be understood, and the voice assistant generally adopts boring or hard replies such as 'not known me' to reject the user instruction in the face of the situation that the user instruction cannot be understood. The voice assistant is not intelligent enough, and the user experience and the completion rate of the user task are greatly influenced.
Disclosure of Invention
The embodiment of the invention provides a voice processing method, a voice processing device, electronic equipment and a storage medium, which can improve the intelligent degree of voice processing and user experience.
In a first aspect, an embodiment of the present invention provides a method for processing speech, including:
After playing the guide information, acquiring first voice information input by a user, wherein the guide information is generated based on second voice information input by the user;
If the first voice information is matched with the guide information, a first voice analysis result is obtained according to the first voice information, wherein the second voice information is associated with the first voice analysis result;
and playing the first feedback information according to the first voice analysis result.
In a second aspect, an embodiment of the present invention provides a speech processing apparatus, including:
The system comprises an acquisition module, a display module and a display module, wherein the acquisition module is used for acquiring first voice information input by a user after playing guide information, and the guide information is generated based on second voice information input by the user;
The analysis module is used for acquiring a first voice analysis result according to the first voice information if the first voice information is matched with the guide information, wherein the second voice information is associated with the first voice analysis result;
and the feedback module is used for playing the first feedback information according to the first voice analysis result.
In a third aspect, an embodiment of the present invention further provides an electronic device, including a memory, a processor, and a computer program stored in the memory and capable of running on the processor, where the processor implements the speech processing method according to any one of the embodiments of the present invention when executing the program.
In a fourth aspect, embodiments of the present invention further provide a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements a speech processing method according to any of the embodiments of the present invention.
In the embodiment of the invention, the guiding information is played for the user based on the second voice information input by the user, the first voice information input by the user is obtained, if the first voice information is matched with the guiding information, the first voice analysis result is obtained according to the first voice information, wherein the second voice information is related to the first voice analysis result, and the first feedback information is played according to the first voice analysis result. The method and the device can dynamically guide the user to input the voice information, so that the working efficiency of voice processing is improved, and the unintelligible voice information is associated with an understandable voice information result, so that online learning is realized, the accuracy of the voice processing result can be improved, and the user experience is further improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the embodiments will be briefly described below, it being understood that the following drawings only illustrate some embodiments of the present invention and therefore should not be considered as limiting the scope, and other related drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a schematic flow chart of a voice processing method according to an embodiment of the present invention;
FIG. 2 is a schematic flow chart of a speech processing method according to an embodiment of the present invention;
FIG. 3 is a block diagram of a speech processing device according to an embodiment of the present invention;
fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
The invention is described in further detail below with reference to the drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting thereof. It should be further noted that, for convenience of description, only some, but not all of the structures related to the present invention are shown in the drawings.
Fig. 1 is a schematic flow chart of a voice processing method according to an embodiment of the present invention, where the method according to the embodiment of the present invention is applicable to an intelligent question-answering scenario, and the method may be performed by a voice processing device according to an embodiment of the present invention, and the device may be implemented in a software and/or hardware manner. In a specific embodiment, the apparatus may be integrated in an electronic device, which may be a computer, a server, or the like. The following embodiments will be described taking the example of the integration of the apparatus in an electronic device, and referring to fig. 1, the method may specifically include the following steps:
step 101, after playing the guiding information, acquiring first voice information input by a user, wherein the guiding information is generated based on second voice information input by the user.
The second voice information input by the user can be voice information such as a voice instruction, a question or a chat sent by the user based on own needs. After receiving the second voice information sent by the user, the second voice information can be converted into text information through the voice recognition system, so that subsequent processing is facilitated. Of course, the user may also directly input the second voice information in text form through the device. After the second voice information is received, semantic analysis can be performed on the second voice information through a semantic understanding model, and further, guiding information is generated based on the second voice information input by the user. Further, the guiding information is played for the user, and meanwhile text information corresponding to the guiding information can be displayed for the user. After receiving the guiding information, the user can input the first voice information according to the guiding information, and further, the first voice information input by the user is obtained.
For example, the user wants to close the window by using the vehicle-mounted voice assistant, and the second voice information input by the user is "window is closed". After receiving the second voice information input by the user, the voice assistant performs semantic analysis on the second voice information and generates guide information for the second voice information according to the analysis result. Assuming that the generated guidance information is "close all windows", the guidance information is played for the customer that "you can say" close all windows "or" you are to close all windows ". The user may input a first user instruction, such as "close all windows" or "yes", based on the guidance information. Further, first voice information input by a user is acquired.
Step 102, if the first voice information is matched with the guiding information, a first voice analysis result is obtained according to the first voice information, wherein the second voice information is associated with the first voice analysis result.
Specifically, after receiving first voice information input by a user, the first voice information is matched with the guide information. And if the first voice information is matched with the guide information, acquiring a first voice analysis result according to the first voice information. The analysis result of the first voice information is the analysis result of the guiding information, and the second voice information is associated with the first voice analysis result. For example, the second voice information input by the user is "all windows are closed", and the generated guidance information is "all windows are closed", and the first voice information input by the user is "all windows are closed". And obtaining that the first voice information input by the user is matched with the guide information through the semantic matching model, and knowing the first voice analysis result according to the guide information and the first voice information, wherein all windows of the user need to be closed. Further, it is known that, when the user inputs "window is closed", the analysis result is "all windows are closed" analysis result.
Step 103, playing the first feedback information according to the first voice analysis result.
Specifically, after the user inputs the second voice information, generating guide information for the user according to the second voice information, and waiting for the user to input the first voice information. After the user inputs the first voice information, if the first voice information is matched with the guide information, a first voice analysis result is generated according to the analysis results of the guide information and the first voice information, and the first voice analysis result is played or displayed for the user. And associating the second speech information with the first speech analysis result.
According to the technical scheme of the embodiment, after the guiding information is played, first voice information input by a user is obtained, if the first voice information is matched with the guiding information, a first voice analysis result is obtained according to the first voice information, wherein second voice information is associated with the first voice analysis result, and first feedback information is played according to the first voice analysis result. The method and the device can dynamically guide the user to input the voice information, so that the working efficiency of voice processing is improved, and the unintelligible voice information is associated with an understandable voice information result, so that online learning is realized, the accuracy of the voice processing result can be improved, and the user experience is further improved.
Fig. 2 is another flow chart of a voice processing method according to an embodiment of the present invention, and the steps of the voice processing method are refined based on the above embodiment. As shown in fig. 2, the method of this embodiment specifically includes the following steps:
step 201, obtaining second voice information input by a user, and analyzing the second voice information.
Specifically, user input of second voice information is received. And when the second voice information is voice, converting the voice information into text information. Of course, the user may also directly input text as the second voice information. After the second voice information is received, carrying out semantic analysis on the second voice information through a semantic understanding model.
Step 202, if the second voice information analysis fails, determining whether third voice information exists in the first database, wherein the similarity between the third voice information and the second voice information is greater than or equal to a preset threshold value. Further, if the second voice information analysis fails, determining whether third voice information exists in the first database. Specifically, the first database may be constructed in real time based on the user interaction log, and the first database stores guidance information that can be correctly parsed by the semantic understanding system. As shown in table 1 below:
TABLE 1
Guidance information Analysis result
Closing sunshade curtain Intent is to close the sunshade curtain
Half of the skylight is closed The intention is that the skylight is controlled, the action is closed, and the value is 50 percent
Beijing tomorrow weather Intent is weather, place, beijing, time, tomorrow
It should be noted that the guidance information and the analysis result shown in table 1 are merely examples, and do not constitute a final limitation of setting the actual guidance information and the analysis result, and in practical application, the relevant data may be adjusted according to actual needs, which is not specifically limited here.
Further, it is determined whether third voice information exists in the first database, wherein a similarity between the third voice information and the second voice information is greater than or equal to a preset threshold, as shown in the following table 2. It should be noted that the data shown in table 2 is only an example, and does not constitute a final limitation on the setting of actual data, and the relevant data may be adjusted according to actual needs in practical application, which is not specifically limited herein.
TABLE 2
Second voice information Third voice information Similarity scoring
Screen window close Closing the sunshade curtain 0.9800629019737244
Screen window close Close window well 0.5422488117218018
For example, when the second voice message is "screen close", the similarity threshold is set to 0.8, and the third voice message is a voice message having a similarity with the second voice message of 0.8 or more. It is determined whether third voice information is present in the first database. Assuming that the first database has a "blind closed" and the "blind closed" has a similarity to the "screen closed" of greater than 0.8, it can be determined that the third voice information is present in the first database. If a plurality of third voice information exists in the database 1, the third voice information is determined as the highest similarity threshold.
And setting different thresholds according to the second voice information of different types, and taking the information with the similarity with the second voice information being greater than or equal to a preset threshold as third voice information. When the second voice information input by the user is voice information of the type such as the query place, the similarity threshold can be adaptively adjusted to be high. For example, the second voice information is "navigate to a technology road", the first database stores "navigate to a technology road", and if the similarity threshold is set to be low, the "navigate to a technology road" is used as the third voice information, so that the customer experience is affected.
Step 203, if the third voice information exists in the first database, generating guiding information according to the third voice information.
After determining that the third voice information exists in the first database, reading the third voice information with the highest similarity value in the first database, analyzing the third voice information through a semantic understanding model, and generating guide information according to the third voice information. Further, the guiding information is played for the user to guide the user.
Step 204, after playing the guiding information, obtaining the first voice information input by the user.
After receiving the guiding information, the user can input the first voice information according to the guiding information, and further, the first voice information input by the user is obtained.
Step 205, if the first voice information matches the guiding information, a first voice analysis result is obtained according to the first voice information, wherein the second voice information is associated with the first voice analysis result.
The analysis result of the first voice information is the analysis result of the guiding information, and the second voice information is associated with the first voice analysis result.
Specifically, after the user inputs the second voice information, generating guide information for the user according to the second voice information, and waiting for the user to input the first voice information. After the user inputs the first voice information, if the first voice information is matched with the guide information, a first voice analysis result is generated according to the analysis results of the guide information and the first voice information, and the first voice analysis result is played or displayed for the user.
In this embodiment, optionally, the second voice information is associated with the first voice analysis result and stored in the second database.
Specifically, when the first voice information is matched with the guide information, according to the first voice information, after a first voice analysis result is obtained, the analysis result of the first voice information is associated to a voice analysis result of the second information. And the second voice information and the first voice parsing result may be stored to a second database. The second database can be used as training corpus for upgrading iteration of the subsequent semantic understanding model.
The second voice information is associated with the first voice analysis result and stored in the second database, so that the subsequent semantic understanding model upgrading iteration is facilitated, the accuracy of the semantic analysis result can be improved, and the user experience is further improved
If the first voice information is not matched with the guiding information, the second voice information input by the user is acquired, and semantic analysis is performed on the second voice information through a semantic understanding model (step 201 is executed).
Step 206, playing the first feedback information according to the first voice analysis result.
By setting the similarity threshold to determine the third voice information and generating the guiding information according to the third voice information, the accuracy of the result fed back to the user can be improved, and the user experience is further improved.
Step 207, if the third voice information does not exist in the first database, determining whether the second voice information is boring information.
After determining that the third voice information does not exist in the first database, further determining whether the second voice information is boring information. For example, the second voice information input by the user is "i am boring" and "i am boring" no third voice information with similarity greater than a preset threshold value in the first database. The user can be played with the voice message of "whether boring".
And step 208, if the second voice information is chatting information, playing third feedback information, wherein the third feedback information is chatting.
If the second voice information input by the user is the boring information, the boring operation is played back for the user. Such as "that we chat bar" etc.
Step 209, if the second voice information is not boring information, playing fourth feedback information, where the fourth feedback information is used to indicate that voice understanding fails.
If it is determined that the second voice information input by the user is not boring information and the third voice information does not exist in the first database, feeding back information of voice understanding failure to the user. Such as "not understood" and the like.
By the steps, when the user does not adopt the guide information, the user needs are fully understood by adopting the exit strategy, the boring service is provided for the user, and the user experience is improved.
Step 210, if the second voice information analysis is successful, a second voice analysis result is obtained, and the second feedback information is played according to the second voice analysis result.
For example, the second voice information input by the user is "all windows are closed", the analysis result of the second voice information, which is intended to be "all windows are closed", is analyzed through the semantic understanding model ("all windows are closed" stored in the first database), and the semantic understanding model can obtain the analysis result through voice analysis). The feedback "good" is played for the user, all windows are closed for you.
Step 211, associating the second voice information with the second voice analysis result and storing the second voice information in the first database.
For example, the first database stores the analysis result of "close all windows". After the second voice information ("closing all windows") is successfully parsed, the second voice information is associated with the second voice parsing result and stored in the first database.
In this embodiment, optionally, the first database and the second database are constructed in real time based on the user interaction log.
By constructing the first database and the second database in real time based on the user interaction log, manual labeling is not needed, human resources are saved, and the working efficiency of the voice processing method can be further improved.
In the embodiment, second voice information input by a user is acquired and analyzed, if the second voice information is analyzed to fail, guide information is generated and played according to the second voice information, if the second voice information is analyzed to be successful, a second voice analysis result is acquired, second feedback information is played according to the second voice analysis result, and the second voice information and the second voice analysis result are associated and stored in the first database. The second voice information which cannot be understood is matched with the third voice information which can be understood, so that the second voice information input by a user is dynamically guided, the learning efficiency of a voice processing method is improved, the iterative learning of the voice processing method is realized, and the accuracy of a voice processing result is improved.
Fig. 3 is a block diagram of a speech processing device according to an embodiment of the present invention, where the device is adapted to execute the speech processing method according to the embodiment of the present invention. As shown in fig. 3, the apparatus may specifically include:
The obtaining module 301 is configured to obtain first voice information input by a user after playing guide information, where the guide information is generated based on second voice information input by the user;
The parsing module 302 is configured to obtain a first voice parsing result according to the first voice information if the first voice information is matched with the guiding information, where the second voice information is associated with the first voice parsing result;
and the feedback module 303 is configured to play the first feedback information according to the first voice analysis result.
Optionally, the obtaining module 301 is further configured to obtain the second voice information input by the user, and parse the second voice information;
The parsing module 302 is further configured to generate and play the guiding information according to the second voice information if the parsing of the second voice information fails.
Optionally, the parsing module 302 is specifically configured to determine whether third voice information exists in the first database, where a similarity between the third voice information and the second voice information is greater than or equal to a preset threshold, and if the third voice information exists in the first database, generate the guiding information according to the third voice information.
Optionally, the parsing module 302 is further configured to obtain a second voice parsing result if the second voice information is parsed successfully, play second feedback information according to the second voice parsing result, and associate the second voice information with the second voice parsing result and store the second voice information in the first database.
Optionally, the parsing module 302 is further configured to determine whether the second voice information is boring information if the third voice information does not exist in the first database, play third feedback information if the second voice information is boring information, where the third feedback information is boring, and play fourth feedback information if the second voice information is not boring, where the fourth feedback information is used to indicate that voice understanding fails.
Optionally, the parsing module 302 is further configured to associate and store the second voice information with the first voice parsing result to a second database.
Optionally, the first database and the second database are constructed in real time based on user interaction logs.
The voice processing device provided by the embodiment of the invention can execute the voice processing method provided by any embodiment of the invention, and has the corresponding functional modules and beneficial effects of the execution method. Reference is made to the description of any method embodiment of the invention for details not described in this embodiment.
Fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present invention. Fig. 4 illustrates a block diagram of an exemplary electronic device 12 suitable for use in implementing embodiments of the present invention. The electronic device 12 shown in fig. 4 is merely an example and should not be construed as limiting the functionality and scope of use of embodiments of the present invention.
As shown in fig. 4, the electronic device 12 is in the form of a general purpose computing device. The components of the electronic device 12 may include, but are not limited to, one or more processors or processing units 16, a system memory 28, and a bus 18 that connects the various system components, including the system memory 28 and the processing units 16.
Bus 18 represents one or more of several types of bus structures, including a memory bus or memory controller, a peripheral bus, an accelerated graphics port, a processor, and a local bus using any of a variety of bus architectures. By way of example, and not limitation, such architectures include Industry Standard Architecture (ISA) bus, micro channel architecture (MAC) bus, enhanced ISA bus, video Electronics Standards Association (VESA) local bus, and Peripheral Component Interconnect (PCI) bus.
Electronic device 12 typically includes a variety of computer system readable media. Such media can be any available media that is accessible by electronic device 12 and includes both volatile and nonvolatile media, removable and non-removable media.
The system memory 28 may include computer system readable media in the form of volatile memory, such as Random Access Memory (RAM) 30 and/or cache memory 32. The electronic device 12 may further include other removable/non-removable, volatile/nonvolatile computer system storage media. By way of example only, storage system 34 may be used to read from or write to non-removable, nonvolatile magnetic media (not shown in FIG. 4, commonly referred to as a "hard disk drive"). Although not shown in fig. 4, a magnetic disk drive for reading from and writing to a removable non-volatile magnetic disk (e.g., a "floppy disk"), and an optical disk drive for reading from or writing to a removable non-volatile optical disk (e.g., a CD-ROM, DVD-ROM, or other optical media) may be provided. In such cases, each drive may be coupled to bus 18 through one or more data medium interfaces. Memory 28 may include at least one program product having a set (e.g., at least one) of program modules configured to carry out the functions of embodiments of the invention.
A program/utility 40 having a set (at least one) of program modules 42 may be stored in, for example, memory 28, such program modules 42 including, but not limited to, an operating system, one or more application programs, other program modules, and program data, each or some combination of which may include an implementation of a network environment. Program modules 42 generally perform the functions and/or methods of the embodiments described herein.
The electronic device 12 may also communicate with one or more external devices 14 (e.g., keyboard, pointing device, display 24, etc.), one or more devices that enable a user to interact with the electronic device 12, and/or any devices (e.g., network card, modem, etc.) that enable the electronic device 12 to communicate with one or more other computing devices. Such communication may occur through an input/output (I/O) interface 22. In the electronic device 12 of the present embodiment, the display 24 is not provided as a separate body but is embedded in the mirror surface, and the display surface of the display 24 and the mirror surface are visually integrated when the display surface of the display 24 is not displayed. Also, the electronic device 12 may communicate with one or more networks such as a Local Area Network (LAN), a Wide Area Network (WAN) and/or a public network, such as the Internet, through a network adapter 20. As shown, the network adapter 20 communicates with other modules of the electronic device 12 over the bus 18. It should be appreciated that although not shown, other hardware and/or software modules may be used in connection with electronic device 12, including, but not limited to, microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, data backup storage systems, and the like.
The processing unit 16 executes various functional applications and data processing by running a program stored in the system memory 28, for example, to implement a voice processing method provided in an embodiment of the present invention, wherein after playing guidance information, first voice information input by a user is obtained, wherein the guidance information is generated based on second voice information input by the user, if the first voice information matches with the guidance information, a first voice analysis result is obtained according to the first voice information, wherein the second voice information is associated with the first voice analysis result, and first feedback information is played according to the first voice analysis result.
The embodiment of the invention provides a computer readable storage medium, on which a computer program is stored, which when executed by a processor, implements a voice processing method as provided by all the embodiments of the invention, wherein after playing guide information, first voice information input by a user is obtained, and the guide information is generated based on second voice information input by the user; if the first voice information is matched with the guide information, a first voice analysis result is obtained according to the first voice information, wherein the second voice information is associated with the first voice analysis result, and first feedback information is played according to the first voice analysis result. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the computer-readable storage medium include an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
The computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, either in baseband or as part of a carrier wave. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
Computer program code for carrying out operations of the present invention may be written in one or more programming languages, including an object oriented programming language such as Java, smalltalk, C ++ and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (for example, through the Internet using an Internet service provider).
Note that the above is only a preferred embodiment of the present invention and the technical principle applied. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, while the invention has been described in connection with the above embodiments, the invention is not limited to the embodiments, but may be embodied in many other equivalent forms without departing from the spirit or scope of the invention, which is set forth in the following claims.

Claims (7)

1.一种语音处理方法,其特征在于,包括:1. A speech processing method, comprising: 在播放引导信息后,获取用户输入的第一语音信息,其中,所述引导信息是基于所述用户输入的第二语音信息生成的;After playing the guidance information, obtaining first voice information input by the user, wherein the guidance information is generated based on second voice information input by the user; 若所述第一语音信息与所述引导信息匹配,则根据所述第一语音信息,获取第一语音解析结果,其中,所述第二语音信息与所述第一语音解析结果关联;If the first voice information matches the guide information, obtaining a first voice analysis result according to the first voice information, wherein the second voice information is associated with the first voice analysis result; 根据所述第一语音解析结果,播放第一反馈信息;Playing first feedback information according to the first voice analysis result; 所述方法还包括:The method further comprises: 在播放引导信息前,获取所述用户输入的所述第二语音信息,并对所述第二语音信息进行解析;若所述第二语音信息解析失败,则根据所述第二语音信息,生成并播放所述引导信息;Before playing the guidance information, obtaining the second voice information input by the user and parsing the second voice information; if the parsing of the second voice information fails, generating and playing the guidance information according to the second voice information; 生成所述引导信息,包括:Generating the guide information includes: 确定第一数据库中是否存在第三语音信息,其中,所述第三语音信息与所述第二语音信息的相似度大于或者等于预设阈值;Determine whether there is third voice information in the first database, wherein the similarity between the third voice information and the second voice information is greater than or equal to a preset threshold; 若所述第一数据库中存在第三语音信息,则根据所述第三语音信息,生成所述引导信息;若所述第一数据库中不存在第三语音信息,则确定所述第二语音信息是否为闲聊信息;If the third voice information exists in the first database, the guide information is generated according to the third voice information; if the third voice information does not exist in the first database, whether the second voice information is chat information is determined; 若所述第二语音信息为闲聊信息,则播放第三反馈信息,其中,所述第三反馈信息为闲聊话术;若所述第二语音信息不为闲聊信息,则播放第四反馈信息,其中,所述第四反馈信息用于指示语音理解失败。If the second voice information is small talk information, the third feedback information is played, wherein the third feedback information is small talk words; if the second voice information is not small talk information, the fourth feedback information is played, wherein the fourth feedback information is used to indicate that the voice understanding fails. 2.根据权利要求1所述的语音处理方法,其特征在于,还包括:2. The speech processing method according to claim 1, further comprising: 若所述第二语音信息解析成功,则获取第二语音解析结果,并根据所述第二语音解析结果,播放第二反馈信息;If the second voice information is parsed successfully, obtaining the second voice parsing result, and playing the second feedback information according to the second voice parsing result; 将所述第二语音信息与所述第二语音解析结果关联并存储至所述第一数据库。The second voice information is associated with the second voice analysis result and stored in the first database. 3.根据权利要求1-2中任一所述的语音处理方法,其特征在于,还包括:3. The speech processing method according to any one of claims 1 to 2, characterized in that it also includes: 将所述第二语音信息与所述第一语音解析结果关联并存储至的第二数据库。The second voice information is associated with the first voice analysis result and stored in a second database. 4.根据权利要求3所述的语音处理方法,其特征在于,所述第一数据库和所述第二数据库是基于用户交互日志实时构建的。4. The speech processing method according to claim 3 is characterized in that the first database and the second database are constructed in real time based on user interaction logs. 5.一种语音处理装置,其特征在于,包括:5. A speech processing device, comprising: 获取模块,用于在播放引导信息后,获取用户输入的第一语音信息,其中,所述引导信息是基于所述用户输入的第二语音信息生成的;An acquisition module, used for acquiring first voice information input by a user after playing the guidance information, wherein the guidance information is generated based on second voice information input by the user; 解析模块,用于若所述第一语音信息与所述引导信息匹配,则根据所述第一语音信息,获取第一语音解析结果,其中,所述第二语音信息与所述第一语音解析结果关联;a parsing module, configured to obtain a first voice parsing result according to the first voice information if the first voice information matches the guide information, wherein the second voice information is associated with the first voice parsing result; 反馈模块,用于根据所述第一语音解析结果,播放第一反馈信息;A feedback module, configured to play first feedback information according to the first speech analysis result; 获取模块,还用于在播放引导信息前,获取所述用户输入的所述第二语音信息,并对所述第二语音信息进行解析;The acquisition module is further used to acquire the second voice information input by the user before playing the guidance information, and parse the second voice information; 解析模块,还用于若所述第二语音信息解析失败,则根据所述第二语音信息,生成并播放所述引导信息;The parsing module is further configured to generate and play the guiding information according to the second voice information if the parsing of the second voice information fails; 解析模块,还用于确定第一数据库中是否存在第三语音信息,其中,所述第三语音信息与所述第二语音信息的相似度大于或者等于预设阈值;若所述第一数据库中存在第三语音信息,则根据所述第三语音信息,生成所述引导信息;若所述第一数据库中不存在第三语音信息,则确定所述第二语音信息是否为闲聊信息;若所述第二语音信息为闲聊信息,则播放第三反馈信息,其中,所述第三反馈信息为闲聊话术;若所述第二语音信息不为闲聊信息,则播放第四反馈信息,其中,所述第四反馈信息用于指示语音理解失败。The parsing module is also used to determine whether there is a third voice information in the first database, wherein the similarity between the third voice information and the second voice information is greater than or equal to a preset threshold; if the third voice information exists in the first database, the guide information is generated according to the third voice information; if the third voice information does not exist in the first database, it is determined whether the second voice information is small talk information; if the second voice information is small talk information, third feedback information is played, wherein the third feedback information is small talk words; if the second voice information is not small talk information, fourth feedback information is played, wherein the fourth feedback information is used to indicate that voice understanding fails. 6.一种电子设备,包括存储器、处理器及存储在所述存储器上并可在所述处理器上运行的计算机程序,其特征在于,所述处理器执行所述程序时实现如权利要求1至4中任一所述的语音处理方法。6. An electronic device comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor implements the speech processing method as claimed in any one of claims 1 to 4 when executing the program. 7.一种计算机可读存储介质,其上存储有计算机程序,其特征在于,该程序被处理器执行时实现如权利要求1至4中任一所述的语音处理方法。7. A computer-readable storage medium having a computer program stored thereon, wherein when the program is executed by a processor, the speech processing method according to any one of claims 1 to 4 is implemented.
CN202111391231.6A 2021-11-23 2021-11-23 Voice processing method, device, electronic device and storage medium Active CN114242068B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111391231.6A CN114242068B (en) 2021-11-23 2021-11-23 Voice processing method, device, electronic device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111391231.6A CN114242068B (en) 2021-11-23 2021-11-23 Voice processing method, device, electronic device and storage medium

Publications (2)

Publication Number Publication Date
CN114242068A CN114242068A (en) 2022-03-25
CN114242068B true CN114242068B (en) 2025-04-18

Family

ID=80750503

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111391231.6A Active CN114242068B (en) 2021-11-23 2021-11-23 Voice processing method, device, electronic device and storage medium

Country Status (1)

Country Link
CN (1) CN114242068B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103020047A (en) * 2012-12-31 2013-04-03 威盛电子股份有限公司 Method for correcting voice response and natural language dialogue system
CN105931644A (en) * 2016-04-15 2016-09-07 广东欧珀移动通信有限公司 Voice recognition method and mobile terminal
CN109830232A (en) * 2019-01-11 2019-05-31 北京猎户星空科技有限公司 Man-machine interaction method, device and storage medium
CN112164401A (en) * 2020-09-18 2021-01-01 广州小鹏汽车科技有限公司 Voice interaction method, server and computer-readable storage medium
CN113160808A (en) * 2020-01-22 2021-07-23 广州汽车集团股份有限公司 Voice control method and system and voice control equipment

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104253902A (en) * 2014-07-21 2014-12-31 宋婉毓 Method for voice interaction with intelligent voice device
CN107305769B (en) * 2016-04-20 2020-06-23 斑马网络技术有限公司 Voice interaction processing method, device, device and operating system
US10950229B2 (en) * 2016-08-26 2021-03-16 Harman International Industries, Incorporated Configurable speech interface for vehicle infotainment systems
US10834079B2 (en) * 2018-11-28 2020-11-10 International Business Machines Corporation Negotiative conversation chat bot
CN111415656B (en) * 2019-01-04 2024-04-30 上海擎感智能科技有限公司 Speech semantic recognition method, device and vehicle
CN109901810A (en) * 2019-02-01 2019-06-18 广州三星通信技术研究有限公司 A human-computer interaction method and device for intelligent terminal equipment
CN110136700B (en) * 2019-03-15 2021-04-20 湖北亿咖通科技有限公司 Voice information processing method and device
CN111949775B (en) * 2020-07-09 2024-06-11 北京声智科技有限公司 Method, device, equipment and medium for generating guide dialogue

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103020047A (en) * 2012-12-31 2013-04-03 威盛电子股份有限公司 Method for correcting voice response and natural language dialogue system
CN105931644A (en) * 2016-04-15 2016-09-07 广东欧珀移动通信有限公司 Voice recognition method and mobile terminal
CN109830232A (en) * 2019-01-11 2019-05-31 北京猎户星空科技有限公司 Man-machine interaction method, device and storage medium
CN113160808A (en) * 2020-01-22 2021-07-23 广州汽车集团股份有限公司 Voice control method and system and voice control equipment
CN112164401A (en) * 2020-09-18 2021-01-01 广州小鹏汽车科技有限公司 Voice interaction method, server and computer-readable storage medium

Also Published As

Publication number Publication date
CN114242068A (en) 2022-03-25

Similar Documents

Publication Publication Date Title
CN108520743B (en) Voice control method of intelligent device, intelligent device and computer readable medium
US10522136B2 (en) Method and device for training acoustic model, computer device and storage medium
CN110069608B (en) Voice interaction method, device, equipment and computer storage medium
KR101768509B1 (en) On-line voice translation method and device
CN109599095B (en) Method, device and equipment for marking voice data and computer storage medium
US9805718B2 (en) Clarifying natural language input using targeted questions
CN107291828A (en) Spoken inquiry analytic method, device and storage medium based on artificial intelligence
CN109785846B (en) Role recognition method and device for mono voice data
DE202017105669U1 (en) Modality learning on mobile devices
CN113782029B (en) Training method, device, equipment and storage medium of voice recognition model
CN108564944B (en) Intelligent control method, system, equipment and storage medium
CN110276023A (en) POI change event discovery method, device, computing device and medium
CN109215646B (en) Voice interaction processing method, device, computer equipment and storage medium
WO2021218028A1 (en) Artificial intelligence-based interview content refining method, apparatus and device, and medium
CN113674746B (en) Man-machine interaction method, device, equipment and storage medium
US20110213610A1 (en) Processor Implemented Systems and Methods for Measuring Syntactic Complexity on Spontaneous Non-Native Speech Data by Using Structural Event Detection
CN114399992B (en) Voice instruction response method, device and storage medium
JP6875819B2 (en) Acoustic model input data normalization device and method, and voice recognition device
CN112466289A (en) Voice instruction recognition method and device, voice equipment and storage medium
CN109815481B (en) Method, device, equipment and computer storage medium for extracting event from text
CN108305618A (en) Voice acquisition and search method, smart pen, search terminal and storage medium
CN113053390B (en) Text processing method, device, electronic equipment and medium based on speech recognition
CN112185371B (en) Voice interaction method, device, equipment and computer storage medium
CN110704597A (en) Dialogue system reliability verification method, model generation method and device
CN111597800B (en) Method, device, equipment and storage medium for obtaining synonyms

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant