CN114692639B - Text error correction method and electronic device - Google Patents
Text error correction method and electronic device Download PDFInfo
- Publication number
- CN114692639B CN114692639B CN202011565185.2A CN202011565185A CN114692639B CN 114692639 B CN114692639 B CN 114692639B CN 202011565185 A CN202011565185 A CN 202011565185A CN 114692639 B CN114692639 B CN 114692639B
- Authority
- CN
- China
- Prior art keywords
- error correction
- text
- confidence
- confidence value
- value
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/232—Orthographic correction, e.g. spell checking or vowelisation
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/26—Speech to text systems
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Human Computer Interaction (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Telephone Function (AREA)
- User Interface Of Digital Computer (AREA)
- Machine Translation (AREA)
Abstract
The application provides a text error correction method and electronic equipment, which are used for receiving a first text, identifying an intention and a slot of the first text by using an intention identification model, selecting a corresponding error correction model according to the intention to correct errors, wherein the error correction model comprises a general error correction model and/or a field error correction model, and outputting an error corrected target text. The scheme provided by the application can solve the problem that when the error correction is impossible due to the presence of multiple words, missing words and wrong words in the recognized text or when the error correction is impossible due to the inaccurate range of the recognized attribute information, the text error correction with high accuracy can be still carried out on the recognized information, and the accuracy of the recognition intention and the slot position is ensured.
Description
Technical Field
The present application relates to the field of error correction technologies, and in particular, to a text error correction method and an electronic device.
Background
With the popularization of intelligent devices and the development of natural language processing technology, voice input is becoming an increasingly important human-computer interaction means due to the characteristics of convenience and rapidness. However, due to the complex diversity of languages and the influence of surrounding noise, the voice recognition result often has larger deviation from what the user actually wants to input, and further error correction processing is needed for the text after voice recognition, so that the voice recognition result can be applied to an actual system.
The text after the voice recognition is further subjected to error correction processing, and one scheme is that search intention recognition is performed on the text after the voice recognition, corresponding attribute information is determined, and then the similarity between the attribute information and words in a candidate word stock is calculated to perform error correction. The candidate word bank maintains a mapping word list of error prone words to error correcting words, matches the error prone words in the word bank sequentially through a text similarity function, and selects the error correcting word corresponding to the error prone word with the highest similarity as an error correcting result. But this scheme cannot perform the error correction logic correctly when text errors result in errors in intent recognition. Meanwhile, the scheme delimits wrong words in a way of an error prone dictionary, and when the extracted words are not contained in the error prone dictionary due to the fact that the attribute information range extracted by the intention recognition method is inaccurate, error correction capability cannot be provided.
The other scheme is that dictionary matching is carried out on the text subjected to intention recognition, prefix words of dictionary matching words are recognized at the same time, possible intention results corresponding to combinations in the field to which the prefix words and the dictionary matching words belong are defined as error correction rules, and whether the possible intention results recognized through the dictionary are matched with the intention output by the model is judged through executing the rules. When the results do not match, the intended recognition result is modified and the correct recognition result is output. Likewise, when the dictionary cannot be matched due to the fact that the text after recognition contains wrong words, multiple words and missing words, the error correction capability aiming at the intention cannot be effective.
Disclosure of Invention
The application provides a text error correction method and electronic equipment, which can solve the problem that when the error correction is impossible due to the presence of multiple words, missing words and wrong words in a recognized text or when the error correction is impossible due to the inaccurate range of the recognized attribute information, can still correct the text of the recognized information with high accuracy, and ensures the accuracy of the recognition intention and the slot position.
A first aspect provides a text error correction method, which is applied to electronic equipment and comprises the steps of receiving a first text, utilizing an intention recognition model to recognize intention and slot positions of the first text, selecting a corresponding error correction model according to the intention to correct errors, wherein the error correction model comprises a general error correction model and/or a domain error correction model, and outputting an error corrected target text.
According to the scheme provided by the application, the electronic equipment selects the corresponding error correction model for error correction based on the intention identified by the intention identification model, so that the problem of multi-word, word missing and word error in the identified text can cause incapability of error correction or unified error correction can be carried out on the intention and the slot position when the identified attribute information range is inaccurate, and the accuracy of the identified intention and the slot position is ensured.
With reference to the first aspect, in some possible implementations, the selecting a corresponding error correction model for error correction according to the intent includes:
if the intention does not have a corresponding field error correction model, correcting errors by using the general error correction model;
and if the intention has a corresponding field error correction model, correcting errors by using the field error correction model.
The domain error correction model of the present application may include error correction models of a plurality of domains, and error correction may be performed for different domains using corresponding domain error correction models. For example, the plurality of fields included in the field error correction model are, but not limited to, an audio/video field, a place field, a name field, and the like.
Of course, in some embodiments, the domain error correction model may also be obtained according to category or type division, without limitation.
It should be noted that the above-mentioned division of the fields may be not limited, the audio and video fields may be divided into one type, the audio and video field error correction model may be used to correct errors for all the fields intended to belong to the audio and video fields, the audio field and the video field may be divided into two separate types, the audio field error correction model may be used to correct errors for all the fields intended to belong to the audio field, the video field error correction model may be used to correct errors for all the fields intended to belong to the video field, and the method is not limited. In addition, the fields included in the field error correction model may be updated periodically or aperiodically to improve the error correction capability.
According to the scheme provided by the application, when the intention of the first text recognized by the electronic equipment does not have a corresponding field error correction model, the universal error correction model is utilized for correcting errors, and when the intention of the first text recognized by the electronic equipment has the corresponding field error correction model, the corresponding field error correction model is utilized for correcting errors, so that the accuracy of the recognized intention and the slot position can be further ensured.
With reference to the first aspect, in some possible implementations, the method further includes outputting a response of the target text.
The response of the target text in the application can be information related to the target text, for example, the target text in the embodiment of the application is playing the wandering earth, and the response of the target text is video related to the wandering earth, such as video 1, movies of the wandering earth, video 2, flowers of the wandering earth, video 3, star information of the wandering earth, video 4, MVs of the wandering earth, video 5, recording sheets of the wandering earth and the like.
According to the scheme provided by the application, besides the corrected target text, the response of the target text, namely the information related to the target text, can be output, and the user can select the required information from the information, so that the user experience is improved.
With reference to the first aspect, in some possible implementations, the outputting the error corrected target text includes:
and outputting the target text according to the error correction result and the confidence value.
According to the scheme provided by the application, the electronic equipment can output the target text according to the error correction result and the confidence value, and the accuracy of the recognition intention and the slot position can be further ensured.
With reference to the first aspect, in some possible implementations, the identifying, using an intention recognition model, an intention and a slot of the first text includes:
Identifying the intention and the slot position of the first text by using the intention identification model, and obtaining a first confidence value;
the selecting a corresponding error correction model for error correction according to the intention comprises the following steps:
Selecting a corresponding error correction model for error correction according to the intention, and obtaining a second confidence value;
The outputting the target text according to the error correction result and the confidence value comprises the following steps:
identifying the intention and the slot position of a second text by using an intention identification model, and obtaining a third confidence value, wherein the second text is a text subjected to error correction on the first text or a text subjected to slot position replacement on the first text subjected to error correction;
and outputting the target text according to the first confidence value, the second confidence value and the third confidence value.
According to the scheme provided by the application, the electronic equipment outputs the target text according to the first confidence value corresponding to the first text identified by the intention identification model, the second confidence value corresponding to the first text error-corrected by the error-correction model and the third confidence value corresponding to the second text identified by the intention identification model, so that the accuracy of the identified intention and the slot position can be further ensured.
With reference to the first aspect, in some possible implementations, if error correction using the domain error correction model is selected, outputting the target text according to the first confidence value, the second confidence value, and the third confidence value includes:
If the third confidence coefficient value is greater than or equal to the first confidence coefficient value, determining a first combined correction confidence coefficient value of the second text according to the first confidence coefficient value and the second confidence coefficient value, wherein the second text is a text after the first text subjected to the error correction is subjected to the slot position replacement, and the first combined correction confidence coefficient value comprises a plurality of confidence coefficient values;
And outputting a target text corresponding to the maximum confidence coefficient value in the first combined error correction confidence coefficient values.
According to the scheme provided by the application, when the electronic equipment selects the field error correction model for error correction, if the third confidence coefficient value is greater than or equal to the first confidence coefficient value, outputting the target text corresponding to the maximum confidence coefficient value in the first combined error correction confidence coefficient value. Because the third confidence coefficient value is greater than or equal to the first confidence coefficient value, the error correction result after error correction is carried out on the first text can be considered to be correct, and therefore the first joint error correction confidence coefficient value of the second text can be determined by utilizing the second confidence coefficient value and the first confidence coefficient value, and the target text corresponding to the maximum confidence coefficient value is output, so that the accuracy of the recognition intention and the slot position can be ensured.
With reference to the first aspect, in some possible implementations, the first joint error correction confidence value includes a plurality of confidence values obtained by multiplying confidence values having the same intention in the first confidence value and the second confidence value.
With reference to the first aspect, in some possible implementations, the method further includes:
if the third confidence value is smaller than the first confidence value, the second confidence value is reduced;
determining a second combined correction confidence value of the second text according to the first confidence value and the reduced second confidence value, wherein the second combined correction confidence value comprises a plurality of confidence values;
And outputting a target text corresponding to the maximum confidence coefficient value in the second combined error correction confidence coefficient values.
According to the scheme provided by the application, when the electronic equipment selects the field error correction model for error correction, if the third confidence coefficient value is smaller than the first confidence coefficient value, the target text corresponding to the largest confidence coefficient value in the second combined error correction confidence coefficient value is output, and the second combined error correction confidence coefficient is determined according to the first confidence coefficient value and the reduced second confidence coefficient value. Because the third confidence value is smaller than the first confidence value, the error correction result after error correction is considered to be incorrect, the second combined error correction confidence value of the second text can be determined by using the reduced second confidence value and the first confidence value, and the target text corresponding to the maximum confidence value is output, so that the problem of recognition errors caused by error correction can be avoided as much as possible.
With reference to the first aspect, in some possible implementations, the second combined error correction confidence value includes a plurality of confidence values that are multiplied by confidence values that have the same intent as the first confidence value and the reduced second confidence value.
With reference to the first aspect, in some possible implementations, if error correction using the generic error correction model is selected, outputting the target text according to the first confidence value, the second confidence value, and the third confidence value includes:
determining a third combined correction confidence value of the second text according to the second confidence value and the third confidence value, wherein the second text is a text after correction of the first text;
And outputting a target text corresponding to the largest confidence coefficient value in the first confidence coefficient value and the third combined error correction confidence coefficient value.
According to the scheme provided by the application, the electronic equipment selectively uses the universal error correction model to correct errors, determines the third combined error correction confidence value of the second text according to the second confidence value and the third confidence value, and outputs the target text corresponding to the maximum confidence value, so that error correction can be performed even when the identified intention does not have the corresponding field error correction model, and the accuracy of the identified intention and the slot position can be further improved.
With reference to the first aspect, in some possible implementations, the third combined error correction confidence value is obtained by multiplying the second confidence value and the third confidence value.
With reference to the first aspect, in some possible implementations, if the second confidence value and the third confidence value include multiple confidence values, the third combined error correction confidence value includes multiple confidence values, and the multiple confidence values included in the third combined error correction confidence value are obtained by multiplying confidence values with the same intention in the second confidence value and the third confidence value.
According to the scheme provided by the application, when the second confidence coefficient value and the third confidence coefficient value respectively comprise a plurality of confidence coefficient values, the confidence coefficient values included in the third combined error correction confidence coefficient value are obtained by multiplying the confidence coefficient values with the same intention in the second confidence coefficient value and the third confidence coefficient value, so that the accuracy of the intention and the slot position of the recognition can be further ensured.
In a second aspect, an apparatus is provided, the apparatus being included in an electronic device, the apparatus having functionality to implement the above aspect and possible implementations of the above aspect. The functions may be realized by hardware, or may be realized by hardware executing corresponding software. The hardware or software includes one or more modules or units corresponding to the functions described above.
In a third aspect, an electronic device is provided that includes one or more processors, memory, one or more application programs, and one or more computer programs. Wherein one or more computer programs are stored in the memory, the one or more computer programs comprising instructions. The instructions, when executed by the electronic device, cause the electronic device to perform the text correction method in any of the possible implementations of the first aspect described above.
In a fourth aspect, a system on a chip is provided, comprising at least one processor, wherein program instructions, when executed in the at least one processor, cause the functionality of the text error correction method in any of the possible implementations of the first aspect to be implemented on the electronic device.
In a fifth aspect, there is provided a computer storage medium comprising computer instructions which, when run on an electronic device, cause the electronic device to perform the text correction method in any of the possible implementations of the first aspect.
In a sixth aspect, there is provided a computer program product for, when run on an electronic device, causing the electronic device to perform the text correction method in any of the possible designs of the first aspect.
Drawings
Fig. 1 is a schematic diagram of a hardware structure of an electronic device according to an embodiment of the present application.
Fig. 2 is a schematic software structure of an electronic device according to an embodiment of the present application.
FIG. 3 is a schematic diagram of a set of GUIs provided in an embodiment of the present application.
Fig. 4 is a schematic flow chart of a text error correction method provided by an embodiment of the present application.
Fig. 5 is a schematic flow chart of another text error correction method provided by an embodiment of the present application.
Fig. 6 is a schematic flow chart of yet another text error correction method provided by an embodiment of the present application.
Fig. 7 is a schematic block diagram of another electronic device provided by an embodiment of the present application.
Fig. 8 is a schematic block diagram of still another electronic device provided by an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be described below with reference to the accompanying drawings in the embodiments of the present application. In the description of the embodiment of the present application, unless otherwise indicated, "/" means or, for example, a/B may represent a or B, and "and/or" herein is merely an association relationship describing an association object, which means that three relationships may exist, for example, a and/or B, and that three cases, i.e., a alone, a and B together, and B alone, exist. In addition, in the description of the embodiments of the present application, "plurality" means two or more than two.
The terms "first" and "second" are used below for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include one or more such feature.
The application provides a text error correction method, which is characterized in that electronic equipment identifies content input by a user by combining an intention identification model and an error correction model, and can ensure the accuracy of the intention and the slot position when the error cannot be corrected due to the presence of multiple words, missing words and word errors in the identified text or the inaccuracy of the identified attribute information range.
The text error correction method provided by the embodiment of the application can be applied to electronic equipment such as mobile phones, tablet computers, wearable equipment, vehicle-mounted equipment, augmented reality (augmented reality, AR)/Virtual Reality (VR) equipment, notebook computers, ultra-mobile personal computer (UMPC), netbooks, personal digital assistants (personal DIGITAL ASSISTANT, PDA) and the like, and the embodiment of the application does not limit the specific types of the electronic equipment.
By way of example, fig. 1 shows a schematic diagram of an electronic device 100. The electronic device 100 may include a processor 110, an external memory interface 120, an internal memory 121, a universal serial bus (universal serial bus, USB) interface 130, a charge management module 140, a power management module 141, a battery 142, an antenna 1, an antenna 2, a mobile communication module 150, a wireless communication module 160, an audio module 170, a speaker 170A, a receiver 170B, a microphone 170C, an earphone interface 170D, a sensor module 180, keys 190, a motor 191, an indicator 192, a camera 193, a display 194, and a subscriber identity module (subscriber identification module, SIM) card interface 195, etc. The sensor module 180 may include a pressure sensor 180A, a gyro sensor 180B, an air pressure sensor 180C, a magnetic sensor 180D, an acceleration sensor 180E, a distance sensor 180F, a proximity sensor 180G, a fingerprint sensor 180H, a temperature sensor 180J, a touch sensor 180K, an ambient light sensor 180L, a bone conduction sensor 180M, and the like.
It should be understood that the illustrated structure of the embodiment of the present application does not constitute a specific limitation on the electronic device 100. In other embodiments of the application, electronic device 100 may include more or fewer components than shown, or certain components may be combined, or certain components may be split, or different arrangements of components. The illustrated components may be implemented in hardware, software, or a combination of software and hardware.
The processor 110 may include one or more processing units, for example, the processor 110 may include an application processor (application processor, AP), a modem processor, a graphics processor (graphics processing unit, GPU), an image signal processor (IMAGE SIGNAL processor, ISP), a controller, a memory, a video codec, a digital signal processor (DIGITAL SIGNAL processor, DSP), a baseband processor, and/or a neural network processor (neural-network processing unit, NPU), etc. Wherein the different processing units may be separate devices or may be integrated in one or more processors. The recognition of the intention and the slot of the original text by the intention recognition model, the correction of the recognized intention and slot by the correction model, and the processing of the correction result in the embodiments of the present application may be implemented by the processor 110.
The controller may be a neural hub and a command center of the electronic device 100, among others. The controller can generate operation control signals according to the instruction operation codes and the time sequence signals to finish the control of instruction fetching and instruction execution.
The processor 110 may also have a memory configured to store instructions and data, for example, an intent recognition model or an error correction model (including a field error correction model and a general error correction model) and the like in the present application. In some embodiments, the memory in the processor 110 is a cache memory. The memory may hold instructions or data that the processor 110 has just used or recycled. If the processor 110 needs to reuse the instruction or data, it can be called directly from the memory. Repeated accesses are avoided and the latency of the processor 110 is reduced, thereby improving the efficiency of the system.
It should be understood that the interfacing relationship between the modules illustrated in the embodiments of the present application is only illustrative, and is not meant to limit the structure of the electronic device 100. In other embodiments of the present application, the electronic device 100 may also employ different interfacing manners in the above embodiments, or a combination of multiple interfacing manners.
The charge management module 140 is configured to receive a charge input from a charger. The charger can be a wireless charger or a wired charger.
The power management module 141 is used for connecting the battery 142, and the charge management module 140 and the processor 110. The power management module 141 receives input from the battery 142 and/or the charge management module 140 and provides power to the processor 110, the internal memory 121, the external memory, the display 194, the camera 193, the wireless communication module 160, and the like.
The wireless communication function of the electronic device 100 may be implemented by the antenna 1, the antenna 2, the mobile communication module 150, the wireless communication module 160, a modem processor, a baseband processor, and the like.
The mobile communication module 150 may provide a solution for wireless communication including 2G/3G/4G/5G, etc., applied to the electronic device 100. The mobile communication module 150 may include at least one filter, switch, power amplifier, low noise amplifier (low noise amplifier, LNA), etc. The mobile communication module 150 may receive electromagnetic waves from the antenna 1, perform processes such as filtering, amplifying, and the like on the received electromagnetic waves, and transmit the processed electromagnetic waves to the modem processor for demodulation. The mobile communication module 150 can amplify the signal modulated by the modem processor, and convert the signal into electromagnetic waves through the antenna 1 to radiate. In some embodiments, at least some of the functional modules of the mobile communication module 150 may be disposed in the processor 110. In some embodiments, at least some of the functional modules of the mobile communication module 150 may be provided in the same device as at least some of the modules of the processor 110.
The modem processor may include a modulator and a demodulator. The modulator is used for modulating the low-frequency baseband signal to be transmitted into a medium-high frequency signal. The demodulator is used for demodulating the received electromagnetic wave signal into a low-frequency baseband signal. The demodulator then transmits the demodulated low frequency baseband signal to the baseband processor for processing. The low frequency baseband signal is processed by the baseband processor and then transferred to the application processor. The application processor outputs sound signals through an audio device (not limited to the speaker 170A, the receiver 170B, etc.), or displays images or video through the display screen 194. In some embodiments, the modem processor may be a stand-alone device. In other embodiments, the modem processor may be provided in the same device as the mobile communication module 150 or other functional module, independent of the processor 110.
The wireless communication module 160 may provide solutions for wireless communication including wireless local area network (wireless local area networks, WLAN) (e.g., wireless fidelity (WIRELESS FIDELITY, wi-Fi) network), bluetooth (BT), global navigation satellite system (global navigation SATELLITE SYSTEM, GNSS), frequency modulation (frequency modulation, FM), near field communication (NEAR FIELD communication, NFC), infrared (IR), etc., applied to the electronic device 100. The wireless communication module 160 may be one or more devices that integrate at least one communication processing module. The wireless communication module 160 receives electromagnetic waves via the antenna 2, modulates the electromagnetic wave signals, filters the electromagnetic wave signals, and transmits the processed signals to the processor 110. The wireless communication module 160 may also receive a signal to be transmitted from the processor 110, frequency modulate it, amplify it, and convert it to electromagnetic waves for radiation via the antenna 2.
The electronic device 100 implements display functions through a GPU, a display screen 194, an application processor, and the like. The GPU is a microprocessor for image processing, and is connected to the display 194 and the application processor. The GPU is used to perform mathematical and geometric calculations for graphics rendering. Processor 110 may include one or more GPUs that execute program instructions to generate or change display information.
The display 194 is used for displaying images, videos, etc., and in the embodiment of the present application, text of the voice input by the user after recognition and information related to the text can be displayed. The display 194 includes a display panel. The display panel may employ a Liquid Crystal Display (LCD) CRYSTAL DISPLAY, an organic light-emitting diode (OLED), an active-matrix organic LIGHT EMITTING diode (AMOLED), a flexible light-emitting diode (FLED), miniled, microLed, micro-oLed, a quantum dot LIGHT EMITTING diode (QLED), or the like. In some embodiments, the electronic device 100 may include 1 or N display screens 194, N being a positive integer greater than 1.
The electronic device 100 may implement audio functions through an audio module 170, a speaker 170A, a receiver 170B, a microphone 170C, an earphone interface 170D, an application processor, and the like. Such as music playing, recording, etc.
The audio module 170 is used to convert digital audio information into an analog audio signal output and also to convert an analog audio input into a digital audio signal. The audio module 170 may also be used to encode and decode audio signals. In some embodiments, the audio module 170 may be disposed in the processor 110, or a portion of the functional modules of the audio module 170 may be disposed in the processor 110.
The speaker 170A, also referred to as a "horn," is used to convert audio electrical signals into sound signals. The electronic device 100 may listen to music, or to hands-free conversations, through the speaker 170A.
A receiver 170B, also referred to as a "earpiece", is used to convert the audio electrical signal into a sound signal. When electronic device 100 is answering a telephone call or voice message, voice may be received by placing receiver 170B in close proximity to the human ear.
Microphone 170C, also referred to as a "microphone" or "microphone", is used to convert sound signals into electrical signals. When transmitting voice information, a user may sound near the microphone 170C through his/her mouth, and input a sound signal to the microphone 170C (the user may input voice through the microphone 170C as in the embodiment of the present application). The electronic device 100 may be provided with at least one microphone 170C. In other embodiments, the electronic device 100 may be provided with two microphones 170C, and may implement a noise reduction function in addition to collecting sound signals. In other embodiments, the electronic device 100 may also be provided with three, four, or more microphones 170C to enable collection of sound signals, noise reduction, identification of sound sources, directional recording functions, etc.
The earphone interface 170D is used to connect a wired earphone. The headset interface 170D may be a USB interface 130 or a 3.5mm open mobile electronic device platform (open mobile terminal platform, OMTP) standard interface, a american cellular telecommunications industry association (cellular telecommunications industry association of the USA, CTIA) standard interface.
The pressure sensor 180A is used to sense a pressure signal, and may convert the pressure signal into an electrical signal. In some embodiments, the pressure sensor 180A may be disposed on the display screen 194. The pressure sensor 180A is of various types, such as a resistive pressure sensor, an inductive pressure sensor, a capacitive pressure sensor, and the like. The capacitive pressure sensor may be a capacitive pressure sensor comprising at least two parallel plates with conductive material. The capacitance between the electrodes changes when a force is applied to the pressure sensor 180A. The electronic device 100 determines the strength of the pressure from the change in capacitance. When a touch operation is applied to the display screen 194, the electronic apparatus 100 detects the touch operation intensity according to the pressure sensor 180A. The electronic device 100 may also calculate the location of the touch based on the detection signal of the pressure sensor 180A.
The keys 190 include a power-on key, a volume key, etc. The keys 190 may be mechanical keys. Or may be a touch key. The electronic device 100 may receive key inputs, generating key signal inputs related to user settings and function controls of the electronic device 100.
The software system of the electronic device 100 may employ a layered architecture, an event driven architecture, a microkernel architecture, a microservice architecture, or a cloud architecture. In the embodiment of the application, taking an Android system with a layered architecture as an example, a software structure of the electronic device 100 is illustrated.
Fig. 2 is a software configuration block diagram of the electronic device 100 according to the embodiment of the present application. The layered architecture divides the software into several layers, each with distinct roles and branches. The layers communicate with each other through a software interface. In some embodiments, the Android system is divided into four layers, from top to bottom, an application layer, an application framework layer, an Zhuoyun rows (Android runtime) and system libraries, and a kernel layer, respectively. The application layer may include a series of application packages.
As shown in fig. 2, the application package may include applications such as cameras, gallery, calendar, phone calls, maps, navigation, WLAN, bluetooth, music, video, short messages, voice assistants (also referred to as smart voices), etc.
Alternatively, in some embodiments, the voice assistant may be an application framework layer, and may be invoked through a preset interface.
As shown in FIG. 2, the application framework layer may include a window manager, a content provider, a view system, a telephony manager, a resource manager, a notification manager, and the like.
The window manager is used for managing window programs. The window manager can acquire the size of the display screen, judge whether a status bar exists, lock the screen, intercept the screen and the like.
The content provider is used to store and retrieve data and make such data accessible to applications. The data may include video (e.g., video of the planet's earth in the present application), images, audio, calls made and received, browsing history and bookmarks, phone books, etc.
The view system includes visual controls, such as controls to display text, controls to display pictures, and the like. The view system may be used to build applications. The display interface may be composed of one or more views. For example, a display interface including a text message notification icon may include a view displaying text and a view displaying a picture.
The resource manager provides various resources for the application program, such as localization strings, icons, pictures, layout files, video files, and the like.
The notification manager allows the application to display notification information in a status bar, can be used to communicate notification type messages, can automatically disappear after a short dwell, and does not require user interaction. Such as notification manager is used to inform that the download is complete, message alerts, etc. The notification manager may also be a notification in the form of a chart or scroll bar text that appears on the system top status bar, such as a notification of a background running application, or a notification that appears on the screen in the form of a dialog window. For example, a text message is prompted in a status bar, a prompt tone is emitted, the electronic device vibrates, and an indicator light blinks, etc.
The system library may include a plurality of functional modules. Such as surface manager (surface manager), media library (media library), three-dimensional graphics processing library (e.g., openGL ES), 2D graphics engine (e.g., SGL), etc.
The 2D graphics engine is a drawing engine for 2D drawing.
The kernel layer is a layer between hardware and software. The inner core layer at least comprises a display driver, a camera driver, an audio driver and a sensor driver.
In order to facilitate understanding of the aspects of the application, terms involved in the application will be described first.
1. Text error correction
Text error correction (text error correction), whose main purpose is to detect errors in the original text entered (error detection) and to correct the errors according to natural language processing techniques. The original text can be the scanning recognition result of the text content on the books and periodicals, the content in social networks such as newfashioned microblogs and WeChat friends, or the user input voice recognized by an automatic voice recognition (automatic speech recognition, ASR) module. These texts inevitably contain certain errors (or non-canonical terms) which can lead to a decrease in accuracy of subsequent processing (e.g., text translation, text entity recognition, intent recognition, etc.).
The error correction models can be roughly classified into two main categories, a general error correction model and a domain error correction model, from the object of error correction.
The error correction object of the general error correction model is text in an unlimited field, and error detection and error correction are mainly carried out by introducing pronunciation, font, grammar, knowledge base and language model features.
The error correction object of the domain error correction model is text of a defined domain, and the error correction text is obtained mainly by constructing a domain dictionary or domain library and utilizing a fuzzy matching algorithm.
2. Intent recognition
Intent recognition (broadly, natural language understanding, (natural language understanding, NLU)) is one of the important basic capabilities of a voice assistant, and is mainly aimed at understanding an operation that a user wants to perform, corresponding to a natural language text description input by the user, describing the operation with intent (intent, corresponding action or belonging field of the operation in a target system) and slot (parameters required for completing the operation), converting the operation into an interface call or application execution action on the corresponding system through a task execution model, and returning a corresponding execution result, so as to achieve the effect of initiating the operation through the natural language.
The intention recognition depends on understanding semantic information in a natural language sentence input by a user, and when the text has the problems of word misplacement, multiple words, word leakage and the like due to the fact that the user expression is irregular or the ASR recognition is wrong, the accuracy of the intention recognition can be affected, and the voice assistant cannot accurately process the operation which the user wants to execute.
For easy understanding, the following embodiments of the present application will take a mobile phone having a structure shown in fig. 1 and fig. 2 as an example, and the text error correction method provided by the embodiments of the present application will be specifically described with reference to the accompanying drawings.
Fig. 3 shows a set of GUIs of a mobile phone, in which an error correction method when a voice assistant in the mobile phone recognizes a voice error issued by a user is shown from (a) in fig. 3 to (d) in fig. 3.
See the GUI shown in fig. 3 (a), which is the desktop of the handset. When the handset detects that the user clicks on icon 301 of the voice assistant on the desktop, the voice assistant may be activated to display a GUI, which may be referred to as a voice input interface, as shown in fig. 3 (b).
Referring to the GUI shown in fig. 3 (b), a rectangular frame with a plurality of bars shown in the drawing may be referred to as a "speech recognition frame" for recognizing speech input by a user.
In a practical scenario, the plurality of bars in the "speech recognition frame" may be continuously in dynamic variation, and the bars shown in the figures are only exemplary representations, and should not be construed as limiting the application in any way.
Also shown in fig. 3 (b) is text that the user may or often will recognize using a voice assistant, such as "wake me 8 am open day", "turn on flashlight", "tone to maximum", etc. as shown.
After the voice assistant is started, the user can input the voice of 'playing the wandering earth', if the pronunciation of the user is standard and clear, the voice assistant can accurately recognize the voice input by the user, namely, the recognized text is 'playing the wandering earth', the GUI shown in (c) in fig. 3 is displayed, if the pronunciation of the user is partial or unclear, the voice assistant can erroneously recognize the voice input by the user, and if the recognized text is 'playing the wandering earth' or 'playing the wandering earth', the GUI shown in (d) in fig. 3 is displayed.
Referring to the GUI shown in fig. 3 (c), the right upper corner of the drawing shows text after recognition correctly, at this time, since the text recognized by the voice assistant is correct, the mobile phone interface may display a plurality of videos related to the wandering earth (e.g., video 1: movie of the wandering earth, video 2: flower of the wandering earth, video 3: star information related to the wandering earth, video 4: MV related to the wandering earth, video 5: documentary related to the wandering earth, etc.) found by the user. For a plurality of videos displayed, the user may further select a video to be played.
Referring to the GUI shown in fig. 3 (d), the upper right corner of the drawing shows text after the recognition error, such as "broadcasting the wandering earth" or "broadcasting the wandering earth", at which time the voice assistant searches based on the recognized text and does not find the relevant content, the cell phone interface may display "does not find the relevant content, tries me to find the broadcasting wandering earth bar", and a plurality of videos (such as video 1: movie of wandering earth, video 2: flower battle of wandering earth, video 3: star information of wandering earth, video 4: MV of wandering earth, video 5: documentary of wandering earth, etc.) about wandering earth found for the user. For a plurality of videos displayed, the user may further select a video to be played.
The following describes the internal implementation process and judgment logic of the electronic device for recognizing the voice input by the user in the embodiment of the present application with reference to fig. 4. Fig. 4 shows a flowchart of an internal algorithm of an electronic device according to an embodiment of the present application.
S410, receiving the original text input by the user.
The original text in the embodiment of the application can be text which is recognized by the ASR module after the voice input by the user, can be text which is scanned by the text on the books and periodicals, can be text of social networks (such as microblogs and WeChat friend circles), and is not limited.
For the content of the user voice input, during the process of the ASR module identifying the voice input by the user, the content of the user voice input may be correctly identified, and the identification error may be caused by the pronunciation problem of the user.
For example, assuming that the speech input by the user is "play wandering earth", if the pronunciation of the user is standard and clear, the text correctly recognized by the ASR module is "play wandering earth", and if the pronunciation of the user is not standard or clear, the ASR module may incorrectly recognize the text as "play wandering earth" or "play wandering earth".
S420, using the intention recognition model to recognize the intention and the slot of the original text.
The intention recognition model in the present application may be a training model or a statistical model or a network model, etc., without limitation.
As described above, text entered for a user may be described as an operation that the user wishes to perform. The operations may be represented by means of intent and slots, and the intent may be understood as the corresponding action or field of the operation, and the slots may be understood as parameters needed to complete the operation.
For example, if the voice input by the user is "play the rough earth", and the ASR module correctly recognizes the voice input by the user, the corresponding intention after recognition by using the intention recognition model may be to play video or music, and the corresponding slot is the rough earth.
For another example, if the voice input by the user is still "play wander earth", but the ASR module erroneously recognizes that the voice is "play wander earth", the corresponding intention after recognition by the intention recognition model may be web search or call, and the corresponding slot is wander earth.
For another example, if the voice input by the user is still "play wander earth", but the ASR module erroneously recognizes that the voice is "play stream capsule earth", the corresponding intention is to play video or music after recognition by the intention recognition model, and the corresponding slot is stream capsule earth.
S430, correcting the identified intention and the slot by using an error correction model to obtain an error correction result.
The error correction model in the present application may include a general error correction model and a domain error correction model. As described above, the error correction object of the general error correction model is a text that does not limit a domain, and the object of the domain error correction model is a text that limits a domain.
It should be noted that the field error correction model may include error correction models of multiple fields, and for different fields, the corresponding field error correction model may be used to perform error correction. For example, the plurality of fields included in the field error correction model are, but not limited to, an audio/video field, a place field, a name field, and the like.
Of course, in some embodiments, the domain error correction model may also be obtained according to category or type division, without limitation.
And if the intention identified by the intention identification model is the place field, correcting the intention by using the error correction model of the place field. Of course, in some embodiments, if the intent after the intent recognition model recognizes is a web search, then there is no available domain error correction model.
It should be noted that the above-mentioned division of the fields may be not limited, the audio and video fields may be divided into one type, the audio and video field error correction model may be used to correct errors for all the fields intended to belong to the audio and video fields, the audio field and the video field may be divided into two separate types, the audio field error correction model may be used to correct errors for all the fields intended to belong to the audio field, the video field error correction model may be used to correct errors for all the fields intended to belong to the video field, and the method is not limited.
In addition, the fields included in the field error correction model may be updated periodically or aperiodically to improve the error correction capability.
As shown in fig. 5, when the identified intention and slot are error corrected by using the error correction model in step S430, steps S431 to S433 may be included.
S431, it is determined whether the intention of the original text recognized by the intention recognition model has a corresponding field error correction model.
If so, the step S432 is executed, that is, the field error correction model is used for error correction, and if not, the step S433 is executed, that is, the general error correction model is used for error correction.
The two possible cases will be described separately below.
Scheme one, available field error correction model, error correction using field error correction model
Assuming that the speech input by the user is "playing the wandering earth", the ASR module recognizes it as "playing the streaming earth". And carrying out intention and slot recognition on the text recognized by the ASR module by using the intention recognition model.
Matching to the keyword "play" during recognition using the intent recognition model may recognize the intent as music play or video play and give a corresponding confidence value.
For example, if the identified intention is music playing, the identification result is < intention: music playing, slot: streaming earth, confidence value: 0.7>;
If the identified intention is video playing, the identification result is that the intention is video playing, the slot position is streaming bag earth, and the confidence value is 0.7>.
After the pattern recognition is completed, the error correction model corrects the recognition result, and the recognized pattern comprises music playing and video playing, and the available domain error correction model is available, so that the domain error correction model can be used for error correction.
For music playing, the corresponding slot positions are input into an audio field error correction model, and the audio field error correction model outputs error correction results of 'rough earth theme song' and a corresponding confidence value of 0.5.
For video playing, the corresponding slot positions are input into a video field error correction model, and the video field error correction model outputs error correction results of 'wandering earth' and a corresponding confidence value of 0.9.
Scheme II, no available field error correction model is used, and error correction is carried out by utilizing a general error correction model
Assuming that the speech input by the user is "playing the wandering earth", the ASR module recognizes it as "playing the square wandering earth". And carrying out intention and slot recognition on the text recognized by the ASR module by using the intention recognition model.
Using the intent recognition model, during recognition, the keyword "play" is not matched, which may incorrectly recognize the intent as a web search or call, and give a corresponding confidence value.
For example, if the identified intention is web page search, the identified result is < intention: web page search, slot: broadcasting side wandering earth, confidence value: 0.5>;
If the identified intention is to make a call, the identification result is that the intention is to make a call, the slot position is square wave earth, and the confidence value is 0.3>.
After the recognition of the intention recognition model is completed, the error correction model corrects the recognition result, and at this time, since the recognition intention is web page searching or calling, and no available field error correction model exists, the error correction can be performed by using the universal error correction model.
The general error correction model adopts an error-prone dictionary implementation mode, performs error correction on the text identified by the ASR module, and gives the text subjected to error correction and a corresponding confidence value, for example, the text subjected to error correction is 'playing the wandering earth' and the corresponding confidence value is 0.9.
It should be understood that the values in the first and second embodiments are merely examples, and other values are also possible, and the present application should not be limited thereto.
S440, processing the error correction result by using a post-processing module to obtain a judgment result.
S450, reordering the obtained result.
In the embodiment of the present application, the mode of processing by the post-processing module is different for the error correction performed by the domain error correction model in the first scheme and the error correction performed by the general error correction model in the second scheme, and the specific reference is made below.
Scheme one:
For error correction by using the domain error correction model, since the domain error correction model corrects errors in the slots, the slots in the text after error correction can be replaced by random entities in the domain, and the random entities can be re-input to the intention recognition model to recognize the intention and the slots. If the re-identified confidence value is greater than or equal to the confidence value obtained by correcting the error of the slot of the original text by using the field error correction model, the error correction result can be considered to be correct, otherwise, the error correction result can be considered to be incorrect.
As described in the above step S430, for the text with the error correction model in the audio domain as "playing the thematic curve of the rough earth" which is intended to be the music domain, the random entity (such as the qili incense) in the music domain may be used to replace the "thematic curve of the rough earth", and the "playing the qili incense" may be input to the intended recognition model for recognition again.
The confidence value obtained by correcting the error of the slot position of the original text by using the error correction model in the audio field is 0.5, and if the confidence value of playing the Qili incense re-identified by the intention identification model is a value larger than or equal to 0.5 (if the confidence value of playing the Qili incense re-identified by the intention identification model is 0.6), the theme curve of playing the stream bag earth is considered to be correct.
Aiming at the video field, the text corrected by the video field error correction model is 'playing the wandering earth', the random entity (warwolf) in the video field can be utilized to replace the 'wandering earth', and the 'playing warwolf' is input into the intention recognition model for recognition again. If the re-identified confidence value is greater than or equal to the confidence value of correcting the error of the slot of the original text by using the field error correction model, the error correction result is considered to be correct, otherwise, the error correction result is considered to be incorrect.
Similarly, the confidence value obtained by correcting the error of the slot position of the original text by using the error correction model in the video field is 0.9, and if the confidence value of the play warwolf re-identified by the intention identification model is a value greater than or equal to 0.9 (if the confidence value of the play warwolf re-identified by the intention identification model is 1.0), the correction of the play stream capsule earth as the play wander earth is considered to be correct.
And under the condition that the process is correct, processing the confidence values of the original texts and the confidence values of the text after error correction in the respective fields for different fields, and obtaining a combined error correction result.
Similarly, the processing in the present application can be understood as processing the confidence value obtained by correcting the slot of the original text using the correction model and the confidence value obtained by recognizing the original text using the intention recognition model. The processing may include, without limitation, averaging the two confidence values (including arithmetic mean, root mean square mean, weighted mean, etc.), or multiplying the two confidence values, etc.
Taking the example of multiplying these two confidence values, the confidence value obtained by using the error correction model is 0.5, the confidence value obtained by using the intention recognition model to recognize the original text is 0.7, and the confidence value after joint error correction is 0.35 for the intention of music playing.
For video playing intention, the confidence value obtained by using the error correction model is 0.9, the confidence value obtained by using the intention recognition model to recognize the original text is 0.7, and the confidence value after joint error correction is 0.63.
And finally, the obtained results can be ranked, and the intention with the highest confidence value and the slot position result are output.
The results herein can be understood as joint error corrected confidence values, including a joint error corrected confidence value of 0.35 intended for music playback and a joint error corrected confidence value of 0.63 intended for video playback. Since the confidence value ranking is 0.63>0.35, text and related video intended for video play, slot for the wandering earth can be output.
The above embodiments illustrate the case where the error correction result is correct, and in some embodiments it is possible that the error correction result is erroneous. For the case that the error correction result is wrong, the weight of the confidence value after error correction can be adjusted according to a preset rule, and then the sorting processing can be performed.
For example, for the text which is intended to be the music field and is corrected by the error correction model in the audio field is "playing the thematic music of the flowing earth", the random entity (such as the Qilixiang) in the music field can be utilized to replace the "thematic music of the flowing earth", and the "playing Qilixiang" is input to the intended recognition model for recognition again.
The confidence value obtained by correcting the error of the slot position of the original text by using the error correction model in the audio field is 0.5, and if the confidence value of playing the Qili incense re-identified by the intention identification model is a value smaller than 0.5 (if the confidence value of playing the Qili incense re-identified by the intention identification model is 0.4), the theme curve of rectifying the playing stream bag earth into the playing stream wave earth is considered to be wrong.
Thus, the weight of the confidence value 0.5 obtained by correcting the original text by using the audio domain correction model may be reduced, for example, to 0.8, and then the result after the joint correction is (1.0×0.7) ×0.8×0.5) =0.28.
Similarly, for the video field intended as the video field, the text corrected by the error correction model in the video field is "playing the wandering earth", the random entity (such as the warwolf) in the video field can be utilized to replace the "wandering earth", and the "playing warwolf" is input into the intended recognition model for recognition again.
The confidence value obtained by correcting the original text by using the video field error correction model is 0.9, and if the confidence value of the play warwolf re-identified by the intention identification model is a value smaller than 0.9 (if the confidence value of the play warwolf re-identified by the intention identification model is 0.8), the correction of the play stream bag earth as the play stream wave earth is considered to be wrong.
Thus, the weight of the confidence value 0.9 obtained by correcting the original text by using the video domain correction model may be reduced, for example, to 0.8, and then the result after the joint correction is (1.0×0.7) ×0.8×0.9) =0.504.
Finally, the obtained results can be ranked, and the intention with the highest confidence value and the slot position result are output, and as 0.504>0.28, texts and related videos with intention of video playing and slot position of the wandering earth are output.
Scheme II:
for error correction using the general error correction model, since the general error correction model corrects the intention, it is necessary to re-input the text after error correction to the intention recognition model to perform intention and slot recognition.
As described above, the text corrected by the general correction model is "playing the wandering earth", and the corrected text is input into the intention recognition model for recognition, so as to obtain the recognition result. For example, the result after recognition is < intention: video play, slot: wandering earth, confidence value: 0.9>.
After obtaining the recognition result (i.e., confidence value) of the corrected text, the result and the result corrected by the general error correction model can be processed, and a joint error correction result is obtained.
The processing in the embodiment of the application can be understood as processing the confidence value 0.9 obtained by identifying the text after error correction by using the intention identification model and the confidence value 0.9 obtained by using the general error correction model. The processing may include, without limitation, averaging the two confidence values (including arithmetic mean, root mean square mean, weighted mean, etc.), or multiplying the two confidence values, etc.
Illustratively, taking the example of multiplying the two confidence values, the result after joint error correction is 0.81.
And finally, the obtained results can be ranked, and the intention with the highest confidence value and the slot position result are output.
The results herein can be understood as all confidence values in the recognition process, including 0.5 for the web search, 0.3 for the call, and 0.81 for the joint error correction for video playback. Because the confidence value is ordered as 0.81>0.5>0.3, text and related videos which are intended to be played by videos and have slots on the planet earth can be output.
The error correction processing performed under the condition that the text recognized by the ASR module has the wrong word is shown above, and the embodiment of the application can also be applied to the condition that the text recognized by the ASR module has multiple words or missing words relative to the voice input by the user, and the specific error correction process is similar to the above, and is not repeated here.
According to the text error correction method provided by the application, the electronic equipment selects the corresponding error correction model for error correction based on the intention recognized by the intention recognition model, namely, selects the universal error correction model for error correction when the recognized intention does not have the corresponding field error correction model, and selects the field error correction model and the universal error correction model for error correction when the recognized intention has the corresponding field error correction model, so that the problem of multiple words, missing words and wrong words in the recognized text can cause error correction failure or the intention and the slot can be uniformly corrected when the recognized attribute information range is inaccurate, and the accuracy of the recognized intention and the slot can be ensured.
The following describes a flow of a text error correction method provided by the application.
Referring to fig. 6, fig. 6 shows a schematic flow chart of a text error correction method 600. The method flow chart may be performed by the electronic device shown in fig. 1.
As shown in fig. 6, the method 600 may include:
s610, receiving a first text.
The first text in the embodiment of the present application may be the original text in step S410. As described in step S410, the original text (i.e., the first text) may be text of the voice input by the user after being recognized by the ASR module, text of the text scanned on the book and newspaper, or text of the social network (such as microblog and WeChat friend circle), without limitation.
S620, using an intention recognition model to recognize the intention and the slot position of the first text.
In the embodiment of the present application, the intention and the slot position of the first text are identified by using the intention recognition model, and reference may be made to the description of step S420, which is not repeated herein for brevity.
S630, selecting a corresponding error correction model to correct errors according to the intention, wherein the error correction model comprises a general error correction model and/or a domain error correction model.
Optionally, in some embodiments, if the intent has a corresponding domain error correction model, the domain error correction model is utilized to correct errors, and if the intent does not have a corresponding domain error correction model, the general error correction model is utilized to correct errors.
If the domain error correction model is used for error correction, the specific error correction process may refer to the content of the scheme one in the step S431, and if the general error correction model is used for error correction, the specific error correction process may refer to the content of the scheme two in the step S431, which is not described herein.
In addition, after the error correction is completed, the error correction result can be processed by utilizing a post-processing module, and the processing mode is different for different error correction models.
Scheme one:
If the error correction is performed by using the domain error correction model, outputting the target text according to the first confidence value, the second confidence value and the third confidence value includes:
if the third confidence coefficient value is greater than or equal to the first confidence coefficient value, determining a first combined correction confidence coefficient value of the second text according to the first confidence coefficient value and the second confidence coefficient value, wherein the second text is a text after the first text subjected to the slot position replacement after the correction, the first combined correction confidence coefficient value comprises a plurality of confidence coefficient values, and outputting a target text corresponding to the largest confidence coefficient value in the first combined correction confidence coefficient value.
Optionally, in some embodiments, the first joint error correction confidence value includes a plurality of confidence values multiplied by confidence values having the same intent in the first confidence value and the second confidence value.
Optionally, in some embodiments, the method further comprises:
And if the third confidence coefficient value is smaller than the first confidence coefficient value, reducing the second confidence coefficient value, determining a second combined error correction confidence coefficient value of the two texts according to the first confidence coefficient value and the reduced second confidence coefficient value, wherein the second combined error correction confidence coefficient value comprises a plurality of confidence coefficient values, and outputting a target text corresponding to the largest confidence coefficient value in the second combined error correction confidence coefficient.
Optionally, in some embodiments, the second combined error correction confidence value includes a plurality of confidence values multiplied by confidence values for which the first confidence value and the reduced second confidence value have the same intent.
In the embodiment of the present application, for error correction performed by using the domain error correction model, the output target text may be determined based on the magnitudes of the third confidence value and the first confidence value and the joint error correction confidence value (e.g., the first joint error correction confidence value and the second joint error correction confidence value), and the specific process may refer to the content of the first scheme in steps S440 and S450, which is not described herein again.
Scheme II:
If the general error correction model is selected to correct the error, outputting the target text according to the first confidence value, the second confidence value and the third confidence value, including:
And determining a third combined error correction confidence value of the second text according to the second confidence value and the third confidence value, wherein the second text is a text after error correction of the first text, and outputting a target text corresponding to the largest confidence value in the first confidence value and the third combined error correction confidence value.
Optionally, in some embodiments, the third combined error correction confidence value is multiplied by the second confidence value and the third confidence value.
Optionally, in some embodiments, if the second confidence value and the third confidence value respectively include a plurality of confidence values, the third combined error correction confidence value includes a plurality of confidence values, and the plurality of confidence values included in the third combined error correction confidence value are obtained by multiplying confidence values having the same intention in the second confidence value and the third confidence value.
In the embodiment of the present application, for error correction performed by using the general error correction model, the output target text may be determined based on the third combined error correction confidence value and the first confidence value, and the specific process may refer to the content of the second scheme in the steps S440 and S450, which is not described herein again.
S640, outputting the target text after error correction.
The corrected target text is the text corresponding to the maximum confidence value. The method comprises the steps of correcting a target text 'playing the wandering earth' with a confidence value of 0.63 by using a field correction model in the first scheme, and playing the target text 'playing the wandering earth' with a confidence value of 0.81 by using a general correction model in the second scheme.
S650, outputting the response of the target text.
The response of the target text may be information related to the target text, such as the target text of (c) in fig. 3 or (d) in fig. 3 is "play wandering earth", and the response of the target text is video related to wandering earth, such as video 1: movie of wandering earth, video 2: flower of wandering earth, video 3: star information about wandering earth, video 4: MV of wandering earth, video 5: documentary of wandering earth, etc., as shown in (c) in fig. 3 or (d) in fig. 3.
According to the scheme provided by the application, the electronic equipment selects the corresponding error correction model for error correction based on the intention identified by the intention identification model, so that the problem of multi-word, word missing and word error in the identified text can cause incapability of error correction or unified error correction can be carried out on the intention and the slot position when the identified attribute information range is inaccurate, and the accuracy of the identified intention and the slot position is ensured.
It will be appreciated that the electronic device, in order to achieve the above-described functions, includes corresponding hardware and/or software modules that perform the respective functions. The present application can be implemented in hardware or a combination of hardware and computer software, in conjunction with the example algorithm steps described in connection with the embodiments disclosed herein. Whether a function is implemented as hardware or computer software driven hardware depends upon the particular application and design constraints imposed on the solution. Those skilled in the art may implement the described functionality using different approaches for each particular application in conjunction with the embodiments, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
The present embodiment may divide the functional modules of the electronic device according to the above method example, for example, each functional module may be divided corresponding to each function, or two or more functions may be integrated into one processing module. The integrated modules described above may be implemented in hardware. It should be noted that, in this embodiment, the division of the modules is schematic, only one logic function is divided, and another division manner may be implemented in actual implementation.
In the case of dividing the respective functional modules with the respective functions, fig. 7 shows a schematic diagram of one possible composition of the electronic device 700 involved in the above-described embodiment, and as shown in fig. 7, the electronic device 700 may include a receiving unit 710, an identifying unit 720, a selecting unit 730, and an output unit 740.
Wherein the receiving unit 710 may be configured to support the electronic device 700 to perform the above-described step S610, etc., and/or other processes for the techniques described herein.
The identification unit 720 may be used to support the electronic device 700 to perform step S620, etc. described above, and/or other processes for the techniques described herein.
The selection unit 730 may be used to support the electronic device 700 to perform step S630, etc., described above, and/or other processes for the techniques described herein.
The output unit 740 may be used to support the electronic device 700 to perform steps S640, S650, etc. described above, and/or other processes for the techniques described herein.
It should be noted that, all relevant contents of each step related to the above method embodiment may be cited to the functional description of the corresponding functional module, which is not described herein.
The electronic device provided in this embodiment is configured to execute the text error correction method, so that the same effects as those of the implementation method can be achieved.
In case an integrated unit is employed, the electronic device may comprise a processing module, a storage module and a communication module. The processing module may be configured to control and manage actions of the electronic device, for example, may be configured to support the electronic device to perform steps performed by the foregoing units. The memory module may be used to support the electronic device to execute stored program code, data, etc. And the communication module can be used for supporting the communication between the electronic device and other devices.
Wherein the processing module may be a processor or a controller. Which may implement or perform the various exemplary logic blocks, modules and circuits described in connection with this disclosure. A processor may also be a combination that performs computing functions, e.g., including one or more microprocessors, digital Signal Processing (DSP) and a combination of microprocessors, and the like. The memory module may be a memory. The communication module can be a radio frequency circuit, a Bluetooth chip, a Wi-Fi chip and other equipment which interact with other electronic equipment.
In one embodiment, when the processing module is a processor and the storage module is a memory, the electronic device according to this embodiment may be a device having the structure shown in fig. 1.
Fig. 8 shows another possible composition schematic of an electronic device 800 according to the above embodiment, where the electronic device 800 may include a communication unit 810, an input unit 820, a processing unit 830, an output unit 840, a peripheral interface 850, a storage unit 860, and a power supply 870, as shown in fig. 8.
The communication unit 810 is configured to establish a communication channel through which the electronic device 800 connects to and downloads media data from a remote server. The communication unit 810 may include a communication module such as a WLAN module, a bluetooth module, an NFC module, a baseband module, and a Radio Frequency (RF) circuit corresponding to the communication module, for performing wireless local area network communication, bluetooth communication, NFC communication, infrared communication, and/or cellular communication system communication, for example, wideband code division multiple access (wideband code division multiple access, W-CDMA) and/or high-speed downlink packet access (HIGH SPEED downlink PACKET ACCESS, HSDPA). The communication module 810 is used to control communication of components in an electronic device and may support direct memory access.
The input unit 820 may be used to enable user interaction with and/or information input into an electronic device. In a specific embodiment of the present invention, the input unit may be a touch panel, or may be other man-machine interaction interfaces, such as physical input keys, a microphone, or other external information capturing devices, such as a camera.
The processing unit 830 is a control center of the electronic device, and may connect various parts of the entire electronic device using various interfaces and lines, by running or executing software programs and/or modules stored in the storage unit, and invoking data stored in the storage unit to perform various functions of the electronic device and/or process data.
The output unit 840 includes, but is not limited to, an image output unit and a sound output unit. The image output unit is used for outputting characters, pictures and/or videos. In an embodiment of the invention, the touch panel used in the input unit 820 may also be used as the display panel of the output unit 840. For example, when the touch panel detects a gesture operation of touch or approach thereon, the gesture operation is transmitted to the processing unit to determine the type of the touch event, and then the processing unit provides a corresponding visual output on the display panel according to the type of the touch event. Although in fig. 8, the input unit 820 and the output unit 840 implement the input and output functions of the electronic device as two independent components, in some embodiments, the touch panel may be integrated with the display panel to implement the input and output functions of the electronic device. For example, the image output unit may display various graphical user interfaces as virtual control components, including but not limited to windows, scroll shafts, icons, and scrapbooks, for a user to operate by touch.
The outputting of the corrected target text in step S640 and the outputting of the response of the target text in step S650 in the above-described embodiments may be achieved by the output unit 840.
The storage unit 860 may be used to store software programs and modules, and the processing unit executes the software programs and modules stored in the storage unit, thereby performing various functional applications of the electronic device and realizing data processing.
The present embodiment also provides a computer storage medium having stored therein computer instructions which, when executed on an electronic device, cause the electronic device to perform the above-described related method steps to implement the text error correction method in the above-described embodiments.
The present embodiment also provides a computer program product which, when run on a computer, causes the computer to perform the above-described related steps to implement the text error correction method in the above-described embodiments.
In addition, the embodiment of the application also provides a device which can be a chip, a component or a module, and the device can comprise a processor and a memory which are connected, wherein the memory is used for storing computer execution instructions, and when the device runs, the processor can execute the computer execution instructions stored in the memory so that the chip can execute the text error correction method in each method embodiment.
The electronic device, the computer storage medium, the computer program product, or the chip provided in this embodiment are used to execute the corresponding methods provided above, so that the beneficial effects thereof can be referred to the beneficial effects in the corresponding methods provided above, and will not be described herein.
It will be appreciated by those skilled in the art that, for convenience and brevity of description, only the above-described division of the functional modules is illustrated, and in practical application, the above-described functional allocation may be performed by different functional modules according to needs, i.e. the internal structure of the apparatus is divided into different functional modules to perform all or part of the functions described above.
In the several embodiments provided by the present application, it should be understood that the disclosed apparatus and method may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of modules or units is merely a logical function division, and there may be additional divisions when actually implemented, e.g., multiple units or components may be combined or integrated into another apparatus, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and the parts shown as units may be one physical unit or a plurality of physical units, may be located in one place, or may be distributed in a plurality of different places. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional unit in the embodiments of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.
The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a readable storage medium. Based on such understanding, the technical solution of the embodiments of the present application may be essentially or a part contributing to the prior art or all or part of the technical solution may be embodied in the form of a software product stored in a storage medium, including several instructions for causing a device (may be a single-chip microcomputer, a chip or the like) or a processor (processor) to perform all or part of the steps of the methods of the embodiments of the present application. The storage medium includes various media capable of storing program codes such as a U disk, a mobile hard disk, a ROM, a RAM, a magnetic disk or an optical disk.
The foregoing is merely illustrative of the present application, and the present application is not limited thereto, and any person skilled in the art will readily recognize that variations or substitutions are within the scope of the present application. Therefore, the protection scope of the application is subject to the protection scope of the claims.
Claims (13)
1. A method for text error correction, the method being applied to an electronic device and comprising:
receiving a first text;
identifying the intention and the slot position of the first text by using an intention identification model;
selecting a corresponding error correction model to correct errors according to the intention, wherein the error correction model comprises a general error correction model and/or a domain error correction model;
Outputting the corrected target text;
the outputting the corrected target text includes:
outputting the target text according to the error correction result and the confidence value;
the identifying the intent and the slot of the first text using an intent identification model includes:
Identifying the intention and the slot position of the first text by using the intention identification model, and obtaining a first confidence value;
the selecting a corresponding error correction model for error correction according to the intention comprises the following steps:
Selecting a corresponding error correction model for error correction according to the intention, and obtaining a second confidence value;
The outputting the target text according to the error correction result and the confidence value comprises the following steps:
identifying the intention and the slot position of a second text by using an intention identification model, and obtaining a third confidence value, wherein the second text is a text subjected to error correction on the first text or a text subjected to slot position replacement on the first text subjected to error correction;
outputting the target text according to the first confidence value, the second confidence value and the third confidence value;
If the error correction is performed by using the domain error correction model, outputting the target text according to the first confidence value, the second confidence value and the third confidence value includes:
If the third confidence coefficient value is greater than or equal to the first confidence coefficient value, determining a first combined correction confidence coefficient value of the second text according to the first confidence coefficient value and the second confidence coefficient value, wherein the second text is a text after the first text subjected to the error correction is subjected to the slot position replacement, and the first combined correction confidence coefficient value comprises a plurality of confidence coefficient values;
And outputting a target text corresponding to the maximum confidence coefficient value in the first combined error correction confidence coefficient values.
2. The method of claim 1, wherein selecting a corresponding error correction model for error correction based on the intent comprises:
if the intention does not have a corresponding field error correction model, correcting errors by using the general error correction model;
and if the intention has a corresponding field error correction model, correcting errors by using the field error correction model.
3. The method according to claim 1 or 2, characterized in that the method further comprises:
and outputting the response of the target text.
4. The method according to claim 1 or 2, wherein the first joint error correction confidence value comprises a plurality of confidence values multiplied by confidence values of the first confidence value and the second confidence value having the same intention.
5. The method according to claim 1 or 2, characterized in that the method further comprises:
if the third confidence value is smaller than the first confidence value, the second confidence value is reduced;
determining a second combined correction confidence value of the second text according to the first confidence value and the reduced second confidence value, wherein the second combined correction confidence value comprises a plurality of confidence values;
And outputting a target text corresponding to the maximum confidence coefficient value in the second combined error correction confidence coefficient values.
6. The method of claim 5, wherein the second joint error correction confidence value comprises a plurality of confidence values multiplied by confidence values for which the first confidence value and the reduced second confidence value have the same intent.
7. The method according to claim 1 or 2, wherein if error correction using the generic error correction model is selected, the outputting the target text according to the first confidence value, the second confidence value, and the third confidence value comprises:
determining a third combined correction confidence value of the second text according to the second confidence value and the third confidence value, wherein the second text is a text after correction of the first text;
And outputting a target text corresponding to the largest confidence coefficient value in the first confidence coefficient value and the third combined error correction confidence coefficient value.
8. The method of claim 7, wherein the third combined error correction confidence value is multiplied by the second confidence value and the third confidence value.
9. The method of claim 7, wherein if the second confidence value and the third confidence value each comprise a plurality of confidence values, the third combined error correction confidence value comprises a plurality of confidence values, and the plurality of confidence values comprised by the third combined error correction confidence value are multiplied by confidence values having the same intent in the second confidence value and the third confidence value.
10. An electronic device, comprising:
One or more processors;
One or more memories;
The one or more memories store one or more computer programs comprising instructions that, when executed by the one or more processors, cause the electronic device to perform the method of any of claims 1-9.
11. A chip system comprising at least one processor, wherein program instructions, when executed in the at least one processor, cause the functions of the method of any one of claims 1 to 9 to be carried out on the electronic device.
12. A computer storage medium comprising computer instructions which, when run on an electronic device, cause the electronic device to perform the method of any one of claims 1 to 9.
13. A computer program product, characterized in that the computer program product, when run on a computer, causes the computer to perform the method of any of claims 1 to 9.
Priority Applications (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202011565185.2A CN114692639B (en) | 2020-12-25 | 2020-12-25 | Text error correction method and electronic device |
| PCT/CN2021/137440 WO2022135206A1 (en) | 2020-12-25 | 2021-12-13 | Text error correction method and electronic device |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202011565185.2A CN114692639B (en) | 2020-12-25 | 2020-12-25 | Text error correction method and electronic device |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| CN114692639A CN114692639A (en) | 2022-07-01 |
| CN114692639B true CN114692639B (en) | 2025-03-14 |
Family
ID=82129105
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN202011565185.2A Active CN114692639B (en) | 2020-12-25 | 2020-12-25 | Text error correction method and electronic device |
Country Status (2)
| Country | Link |
|---|---|
| CN (1) | CN114692639B (en) |
| WO (1) | WO2022135206A1 (en) |
Families Citing this family (6)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN115273848A (en) * | 2022-08-01 | 2022-11-01 | Vidaa国际控股(荷兰)公司 | A display device and a control method for the display device |
| CN116189664B (en) * | 2022-12-12 | 2023-07-28 | 北京数美时代科技有限公司 | Method, system and electronic equipment for constructing ASR text error correction training sample set |
| CN116129906B (en) * | 2023-02-14 | 2024-09-20 | 新声科技(深圳)有限公司 | Speech recognition text revising method, device, computer equipment and storage medium |
| CN116432693B (en) * | 2023-03-15 | 2024-02-09 | 北京擎盾信息科技有限公司 | Method, device, storage medium and electronic device for constructing large-scale pre-trained language model |
| CN116136957B (en) * | 2023-04-18 | 2023-07-07 | 之江实验室 | Text error correction method, device and medium based on intention consistency |
| CN118398180B (en) * | 2024-04-08 | 2025-09-26 | 平安科技(深圳)有限公司 | Intention recognition method, device, equipment and medium for medical robots |
Citations (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN107220235A (en) * | 2017-05-23 | 2017-09-29 | 北京百度网讯科技有限公司 | Speech recognition error correction method, device and storage medium based on artificial intelligence |
| CN109800407A (en) * | 2017-11-15 | 2019-05-24 | 腾讯科技(深圳)有限公司 | Intension recognizing method, device, computer equipment and storage medium |
Family Cites Families (8)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CA1183609A (en) * | 1982-03-29 | 1985-03-05 | Bruce S. Allen | Man machine interface |
| CN106489148A (en) * | 2016-06-29 | 2017-03-08 | 深圳狗尾草智能科技有限公司 | A kind of intention scene recognition method that is drawn a portrait based on user and system |
| CN107045496B (en) * | 2017-04-19 | 2021-01-05 | 畅捷通信息技术股份有限公司 | Error correction method and error correction device for text after voice recognition |
| CN107807915B (en) * | 2017-09-27 | 2021-03-09 | 北京百度网讯科技有限公司 | Error correction model establishing method, device, equipment and medium based on error correction platform |
| CN110555096A (en) * | 2018-06-01 | 2019-12-10 | 深圳狗尾草智能科技有限公司 | User intention identification method, system, terminal and medium |
| CN112002311A (en) * | 2019-05-10 | 2020-11-27 | Tcl集团股份有限公司 | Text error correction method, device, computer-readable storage medium and terminal device |
| US11302330B2 (en) * | 2019-06-03 | 2022-04-12 | Microsoft Technology Licensing, Llc | Clarifying questions for rewriting ambiguous user utterance |
| CN110232129B (en) * | 2019-06-11 | 2020-09-29 | 北京百度网讯科技有限公司 | Scene error correction method, apparatus, device and storage medium |
-
2020
- 2020-12-25 CN CN202011565185.2A patent/CN114692639B/en active Active
-
2021
- 2021-12-13 WO PCT/CN2021/137440 patent/WO2022135206A1/en not_active Ceased
Patent Citations (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN107220235A (en) * | 2017-05-23 | 2017-09-29 | 北京百度网讯科技有限公司 | Speech recognition error correction method, device and storage medium based on artificial intelligence |
| CN109800407A (en) * | 2017-11-15 | 2019-05-24 | 腾讯科技(深圳)有限公司 | Intension recognizing method, device, computer equipment and storage medium |
Also Published As
| Publication number | Publication date |
|---|---|
| CN114692639A (en) | 2022-07-01 |
| WO2022135206A1 (en) | 2022-06-30 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN114692639B (en) | Text error correction method and electronic device | |
| US11900924B2 (en) | Semantic parsing method and server | |
| JP7252327B2 (en) | Human-computer interaction methods and electronic devices | |
| RU2689203C2 (en) | Flexible circuit for adjusting language model | |
| CN103702297B (en) | Short message enhancement, apparatus and system | |
| US20230125288A1 (en) | Font adjustment method and apparatus, storage medium, and electronic device | |
| CN111177180A (en) | Data query method and device and electronic equipment | |
| TWI597964B (en) | Message storing method and device, and communication terminal | |
| US20210405767A1 (en) | Input Method Candidate Content Recommendation Method and Electronic Device | |
| WO2020207326A1 (en) | Dialogue message sending method and electronic device | |
| WO2022100221A1 (en) | Retrieval processing method and apparatus, and storage medium | |
| KR20150090966A (en) | Method For Providing Search Result And Electronic Device Using The Same | |
| CN113806473A (en) | Intent recognition method and electronic device | |
| US12050633B2 (en) | Data processing method and apparatus | |
| CN112232059B (en) | Text error correction method and device, computer equipment and storage medium | |
| CN108616448A (en) | A kind of the path recommendation method and mobile terminal of Information Sharing | |
| US10015234B2 (en) | Method and system for providing information via an intelligent user interface | |
| CN113241097A (en) | Recording method, recording device, electronic equipment and readable storage medium | |
| CN111868668B (en) | A method, terminal and server for searching candidate words of Chinese input method | |
| CN112740148B (en) | Method for inputting information into input box and electronic device | |
| CN110502126A (en) | Input method and electronic equipment | |
| CN108848240B (en) | Information security protection method, terminal and computer readable storage medium | |
| CN117808015B (en) | Translation method, electronic device and computer storage medium | |
| US12379839B2 (en) | Method for providing clipboard function, and electronic device supporting same | |
| CN119271787A (en) | Question and answer method, electronic device and storage medium |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| GR01 | Patent grant | ||
| GR01 | Patent grant |