CN112802458B - Wake-up method and device, storage medium and electronic equipment - Google Patents
Wake-up method and device, storage medium and electronic equipment Download PDFInfo
- Publication number
- CN112802458B CN112802458B CN201911099800.2A CN201911099800A CN112802458B CN 112802458 B CN112802458 B CN 112802458B CN 201911099800 A CN201911099800 A CN 201911099800A CN 112802458 B CN112802458 B CN 112802458B
- Authority
- CN
- China
- Prior art keywords
- sound signal
- state
- wake
- determining
- word
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 55
- 230000005236 sound signal Effects 0.000 claims abstract description 224
- 238000004590 computer program Methods 0.000 claims description 9
- 238000007781 pre-processing Methods 0.000 claims description 7
- 238000010586 diagram Methods 0.000 description 13
- 238000012545 processing Methods 0.000 description 10
- 238000004891 communication Methods 0.000 description 5
- 238000013527 convolutional neural network Methods 0.000 description 3
- 238000003062 neural network model Methods 0.000 description 3
- 238000012549 training Methods 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 238000010801 machine learning Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 238000007792 addition Methods 0.000 description 1
- 230000004075 alteration Effects 0.000 description 1
- 238000000354 decomposition reaction Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 239000000835 fiber Substances 0.000 description 1
- 230000014509 gene expression Effects 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 230000005055 memory storage Effects 0.000 description 1
- 239000013307 optical fiber Substances 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 230000000750 progressive effect Effects 0.000 description 1
- 238000005215 recombination Methods 0.000 description 1
- 230000006798 recombination Effects 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/063—Training
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/16—Speech classification or search using artificial neural networks
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/26—Speech to text systems
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/28—Constructional details of speech recognition systems
- G10L15/30—Distributed recognition, e.g. in client-server systems, for mobile phones or network applications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L2015/088—Word spotting
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Artificial Intelligence (AREA)
- Signal Processing (AREA)
- Evolutionary Computation (AREA)
- Telephone Function (AREA)
Abstract
The embodiment of the disclosure discloses a wake-up method and device, a storage medium and an electronic device, wherein the method comprises the following steps: acquiring a first sound signal in a space where the electronic equipment is located; determining a first state of the first sound signal containing a wake-up word; acquiring a reference sound signal played by the electronic equipment; determining a second state of the reference sound signal containing a wake-up word; and determining the awakening state of the electronic equipment according to the first state and the second state. The embodiment of the disclosure can avoid the problem that the electronic device is self-awakened due to the fact that the reference sound signal played by the electronic device contains the awakening word.
Description
Technical Field
The present disclosure relates to an intelligent wake-up technology, and in particular, to a wake-up method and apparatus, a storage medium, and an electronic device.
Background
With the continuous development of the intelligent identification technology, the application of intelligent awakening is more and more extensive, and more electronic devices, such as an intelligent sound box, an intelligent television and the like, can be awakened through intelligent awakening.
At present, the main wake-up mode of the smart wake-up application is wake-up by a wake-up word. Taking the smart sound box as an example, if the awakening word of the smart sound box is "little certain", the smart sound box can monitor external sound in real time, and if the outside is recognized to have the sound signal input of "little certain", the smart sound box can be awakened.
Since the external sound signal may not be from the speaker, but may also include the sound signal played by the electronic device itself, the problem of self-awakening may occur by using the current awakening method.
Disclosure of Invention
The present disclosure is provided to solve the above technical problem that the device may self-wake up in the current wake-up manner. The embodiment of the disclosure provides a wake-up method and device, a storage medium and an electronic device.
According to an aspect of the embodiments of the present disclosure, there is provided a wake-up method, including:
acquiring a first sound signal in a space where the electronic equipment is located;
determining a first state of the first sound signal containing a wake-up word;
acquiring a reference sound signal played by the electronic equipment;
determining a second state of the reference sound signal containing a wake-up word;
and determining the awakening state of the electronic equipment according to the first state and the second state.
According to another aspect of the embodiments of the present disclosure, there is provided a wake-up apparatus including:
the first acquisition module is used for acquiring a first sound signal in a space where the electronic equipment is located;
the first determining module is used for determining a first state that the first sound signal acquired by the first acquiring module contains the awakening word;
the second acquisition module is used for acquiring a reference sound signal played by the electronic equipment;
the second determining module is used for determining a second state that the reference sound signal acquired by the second acquiring module contains the awakening word;
and the third determining module is used for determining the awakening state of the electronic equipment according to the first state determined by the first determining module and the second state determined by the second determining module.
According to yet another aspect of the embodiments of the present disclosure, there is provided a computer-readable storage medium storing a computer program for executing the wake-up method according to any of the embodiments.
According to still another aspect of the embodiments of the present disclosure, there is provided the electronic device including:
a processor;
a memory for storing the processor-executable instructions;
the processor is configured to execute the wake-up method according to any of the above embodiments.
Based on the wake-up method provided by the embodiment of the present disclosure, a first state including a wake-up word in a first sound signal and a reference sound signal played by an electronic device in a space where the electronic device is located are determined by obtaining the first sound signal and the reference sound signal, a second state including the wake-up word in the reference sound signal is determined, and a wake-up state of the electronic device is determined according to the first state and the second state.
The technical solution of the present disclosure is further described in detail by the accompanying drawings and embodiments.
Drawings
The above and other objects, features and advantages of the present disclosure will become more apparent by describing in more detail embodiments of the present disclosure with reference to the attached drawings. The accompanying drawings are included to provide a further understanding of the embodiments of the disclosure and are incorporated in and constitute a part of this specification, illustrate embodiments of the disclosure and together with the description serve to explain the principles of the disclosure and not to limit the disclosure. In the drawings, like reference numbers generally represent like parts or steps.
Fig. 1 is an exemplary scenario diagram of the wake-up method proposed by the present disclosure in an application.
Fig. 2 is a flowchart illustrating a wake-up method according to an exemplary embodiment of the disclosure.
Fig. 3 is a flowchart illustrating a wake-up method according to another exemplary embodiment of the disclosure.
Fig. 4 is a flowchart illustrating a wake-up method according to another exemplary embodiment of the present disclosure.
Fig. 5 is a flowchart illustrating a wake-up method according to still another exemplary embodiment of the disclosure.
Fig. 6 is a flowchart illustrating a wake-up method according to still another exemplary embodiment of the disclosure.
Fig. 7 is a flowchart illustrating a wake-up method according to still another exemplary embodiment of the disclosure.
Fig. 8 is a schematic structural diagram of a wake-up apparatus according to an exemplary embodiment of the present disclosure.
Fig. 9 is a schematic structural diagram of a wake-up apparatus according to another exemplary embodiment of the present disclosure.
Fig. 10 is a schematic structural diagram of a wake-up apparatus according to still another exemplary embodiment of the present disclosure.
Fig. 11 is a schematic structural diagram of a wake-up apparatus according to still another exemplary embodiment of the present disclosure.
Fig. 12 is a schematic structural diagram of a wake-up apparatus according to still another exemplary embodiment of the present disclosure.
Fig. 13 is a schematic structural diagram of a wake-up apparatus according to still another exemplary embodiment of the disclosure.
Fig. 14 is a block diagram of an electronic device provided in an exemplary embodiment of the present disclosure.
Detailed Description
Hereinafter, example embodiments according to the present disclosure will be described in detail with reference to the accompanying drawings. It is to be understood that the described embodiments are merely a subset of the embodiments of the present disclosure and not all embodiments of the present disclosure, with the understanding that the present disclosure is not limited to the example embodiments described herein.
It should be noted that: the relative arrangement of the components and steps, the numerical expressions, and numerical values set forth in these embodiments do not limit the scope of the present disclosure unless specifically stated otherwise.
It will be understood by those of skill in the art that the terms "first," "second," and the like in the embodiments of the present disclosure are used merely to distinguish one element from another, and are not intended to imply any particular technical meaning, nor is the necessary logical order between them.
It is also understood that in embodiments of the present disclosure, "a plurality" may refer to two or more than two, and "at least one" may refer to one, two or more than two.
It is also to be understood that any reference to any component, data, or structure in the embodiments of the disclosure, may be generally understood as one or more, unless explicitly defined otherwise or stated otherwise.
In addition, the term "and/or" in the present disclosure is only one kind of association relationship describing an associated object, and means that three kinds of relationships may exist, for example, a and/or B may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" in the present disclosure generally indicates that the former and latter associated objects are in an "or" relationship.
It should also be understood that the description of the various embodiments of the present disclosure emphasizes the differences between the various embodiments, and the same or similar parts may be referred to each other, so that the descriptions thereof are omitted for brevity.
Meanwhile, it should be understood that the sizes of the respective portions shown in the drawings are not drawn in an actual proportional relationship for the convenience of description.
The following description of at least one exemplary embodiment is merely illustrative in nature and is in no way intended to limit the disclosure, its application, or uses.
Techniques, methods, and apparatus known to one of ordinary skill in the relevant art may not be discussed in detail but are intended to be part of the specification where appropriate.
It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, further discussion thereof is not required in subsequent figures.
Embodiments of the disclosure may be implemented in electronic devices such as terminal devices, computer systems, servers, etc., which are operational with numerous other general purpose or special purpose computing system environments or configurations. Examples of well known terminal devices, computing systems, environments, and/or configurations that may be suitable for use with electronic devices, such as terminal devices, computer systems, servers, and the like, include, but are not limited to: personal computer systems, server computer systems, thin clients, thick clients, hand-held or laptop devices, microprocessor-based systems, set top boxes, programmable consumer electronics, network pcs, minicomputer systems, mainframe computer systems, distributed cloud computing environments that include any of the above systems, and the like.
Electronic devices such as terminal devices, computer systems, servers, etc. may be described in the general context of computer system-executable instructions, such as program modules, being executed by a computer system. Generally, program modules may include routines, programs, objects, components, logic, data structures, etc. that perform particular tasks or implement particular abstract data types. The computer system/server may be practiced in distributed cloud computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed cloud computing environment, program modules may be located in both local and remote computer system storage media including memory storage devices.
Summary of the application
In carrying out the present disclosure, the inventors found that: at present, electronic equipment monitors external sound in real time, and awakens once recognizing that the external sound signal containing an awakening word is input. However, in practical applications, the external sound signal may not necessarily come from the speaker, but may also include a sound signal played by the electronic device itself, so that the electronic device wakes up simply according to the external sound signal input including the wake-up word, which may cause a self-wake-up problem.
Exemplary System
Fig. 1 is an exemplary scenario diagram of the wake-up method proposed by the present disclosure in an application.
As shown in fig. 1, the exemplary scenario includes a server 101, an electronic device 102, a speaker 103, and a network 104. The network 104 is used to provide a medium for communication links between the server 101 and the electronic device 102, and the communication links may include various connection types, such as wireless links, wired links, or fiber optic cables, and the like, which are not limited by the present disclosure.
The electronic device 102 may be a variety of smart devices including, but not limited to, a smart speaker, a smart phone, a smart television, and the like.
The server 101 may be a server that provides various servers, such as multimedia assets for the electronic device 102 to play for the electronic device 102.
In application, the speaker 103 may wake up the electronic device 102 by the wake-up word and issue an instruction to the electronic device 102 by a sound signal, for example, instruct the electronic device 102 to play a specific song, or instruct the electronic device 102 to adjust the playing volume, etc. Taking the example that the speaker 103 issues an instruction to the electronic device 102 through the sound signal to instruct the electronic device 102 to play a specific song, the electronic device 102 may know the requirement of the speaker 103 by recognizing the sound signal, and request the server 101 to acquire the multimedia resource of the specific song through the network 104. The server 101 may then issue the multimedia resource to the electronic device 102 through the network 104 for the electronic device 102 to play.
In the process of playing the multimedia resource, the electronic device 102 may control the wake-up state thereof by applying the wake-up method executed by the present disclosure. How the electronic device 102 performs the wake-up method provided by the present disclosure is described below, and will not be described in detail here.
Exemplary method
Fig. 2 is a flowchart illustrating a wake-up method according to an exemplary embodiment of the disclosure. The present embodiment may be applied to an electronic device, such as the electronic device 102 illustrated in fig. 1. As shown in fig. 2, the method comprises the following steps:
The first sound signal refers to a sound signal that can be heard by the electronic device in a space where the electronic device is located.
In one embodiment, the first acoustic signal may be from a speaker in the space in which the electronic device is located, such as speaker 103 illustrated in FIG. 1.
In an embodiment, the first sound signal may also come from the electronic device, i.e. a sound signal played by the electronic device itself.
In step 202, a first state in which the first sound signal contains a wakeup word is determined.
The first state is used to indicate whether the first sound signal includes a wakeup word, and may include the first sound signal including the wakeup word and the first sound signal not including the wakeup word.
How the electronic device determines the first state of the first sound signal containing the wake-up word is shown by way of example in the following, and will not be described in detail here.
In one embodiment, the electronic device does not directly determine the wake-up state of the electronic device according to the first state determined in step 202, but further obtains the reference sound signal played by the electronic device.
The reference sound signal refers to a sound signal that is being played by the electronic device when the electronic device acquires the first sound signal.
The second state is used to indicate whether the reference sound signal contains a wake-up word, and may include that the reference sound signal contains the wake-up word and the reference sound signal does not contain the wake-up word.
As to how the electronic device determines the second state including the wake-up word in the reference sound signal, the following is illustrated by way of example, and will not be described in detail first.
The wake-up state of the electronic device may include wake-up and no wake-up, among others.
In an embodiment, the electronic device does not directly determine the wake-up state of the electronic device according to the second state determined in step 204, but determines the wake-up state of the electronic device according to the first state determined in step 202 and the second state determined in step 204.
As an example, if the first state indicates that the first sound signal includes a wakeup word, and the second state indicates that the reference sound signal includes a wakeup word, the electronic device may be considered to possibly have the wakeup word from itself, and therefore, the electronic device may not be woken up.
Accordingly, if the first state indicates that the first sound signal includes the wakeup word and the second state indicates that the reference sound signal does not include the wakeup word, it may be determined that the wakeup word is from the speaker, and therefore, the electronic device may be awakened based on the indication of the speaker.
Based on the embodiment shown in fig. 2, a first state including a wakeup word in a first sound signal is determined by obtaining the first sound signal in a space where the electronic device is located and a reference sound signal played by the electronic device, a second state including the wakeup word in the reference sound signal is determined, and a wakeup state of the electronic device is determined according to the first state and the second state.
It should be noted that the above example in step 205 is merely an exemplary description of how the electronic device determines the wake-up state of the electronic device according to the first state and the second state, and in practical applications, other implementation flows, such as the flow shown in fig. 3, may also exist.
In one embodiment, as shown in FIG. 3, step 205 comprises the steps of:
in step 2051, if the first state indicates that the first audio signal contains the wakeup word and the second state indicates that the reference audio signal contains the wakeup word, the energy value of the reference audio signal is determined.
The energy value may represent a volume of the reference sound signal, for example, a higher energy value represents a larger volume of the reference sound signal, and conversely, a lower energy value represents a smaller volume of the reference sound signal.
In an embodiment, if the first state indicates that the first sound signal includes the wake-up word and the second state indicates that the reference sound signal includes the wake-up word, the electronic device does not directly determine to not perform the wake-up operation but determines the energy value of the reference sound signal.
In an embodiment, the electronic device may determine the wake-up state of the electronic device based on a magnitude relationship between the compared energy value of the reference sound signal and the first threshold.
As an example, if the energy value of the reference sound signal is smaller than the first threshold value, it may be considered that the reference sound signal is not enough to wake up the electronic device because the volume of the reference sound signal is small even though the wake-up word is included in the reference sound signal. Based on this, a wake-up operation may be performed.
As another example, if the energy value of the reference sound signal is not less than the first threshold value, it may be considered that the reference sound signal contains a wake-up word, and the volume of the reference sound signal is sufficient to wake up the electronic device. Based on this, the wake-up operation may not be performed.
Based on the embodiment shown in fig. 3, when it is determined that the first sound signal includes the wakeup word and the reference sound signal also includes the wakeup word, it is further determined whether the reference sound signal is sufficient to wake up the electronic device by referring to the energy value of the sound signal, so that the electronic device is woken up according to the first sound signal under the condition that the reference sound signal includes the wakeup word but is not sufficient to wake up the electronic device, thereby avoiding the problem that the user experience is affected due to untimely wake-up of the electronic device.
How the electronic device determines the first state of the first sound signal containing the wake-up word is described as follows:
as shown in fig. 4, based on the embodiment shown in fig. 2, step 202 may include the following steps:
In an embodiment, the first sound signal may be first subjected to audio pre-processing including, but not limited to: for convenience of description, the first sound signal after passing through the audio preprocessing is referred to as a second sound signal.
At step 2022, a first state in which the second sound signal includes the wakeup word is determined.
In an embodiment, a state of the second sound signal containing the wake-up word may be determined, and the state of the second sound signal containing the wake-up word may be equal to the first state of the first sound signal containing the wake-up word.
Based on the embodiment shown in fig. 4, since the first sound signal is subjected to audio preprocessing first, noise such as noise and echo in the first sound signal is eliminated, and therefore, the accuracy of the first state including the wakeup word in the subsequently determined second sound signal can be improved.
How the electronic device determines the first state containing the wake-up word in the second sound signal is described as follows:
as shown in fig. 5, on the basis of the embodiment shown in fig. 4, step 2022 may include the following steps:
The voice recognition model may be a neural network model, a convolutional neural network model, or the like. In application, a machine learning algorithm can be used for training a neural network model or a convolutional neural network model through a training sample to obtain a voice recognition model.
At step 20222, a second state in the second sound signal containing the wake word is determined based on the first probability.
In one embodiment, determining the first state in the second sound signal containing the wake-up word based on the first probability comprises: and comparing the first probability with a preset second threshold, and determining a first state containing the awakening word in the second sound signal based on the magnitude relation between the first probability and the second threshold.
As an example, if the first probability is smaller than the second threshold, it is determined that the second sound signal does not include the wake-up word, and if the first probability is not smaller than the second threshold, it is determined that the second sound signal includes the wake-up word.
Based on the embodiment shown in fig. 5, the second sound signal is input to the trained sound recognition model to obtain the first probability that the second sound signal contains the wakeup word, and the first state that the second sound signal contains the wakeup word is determined based on the first probability, so that the first state that the second sound signal contains the wakeup word can be conveniently and accurately determined.
How the electronic device determines the second state of the reference sound signal containing the wake-up word is described as follows:
as shown in fig. 6, based on the embodiment shown in fig. 2, step 204 may include the following steps:
The awakening word recognition model can be a neural network model, a convolutional neural network model and the like. In application, a machine learning algorithm can be utilized to train a training sample to obtain a wake-up word recognition model.
It should be noted that the wakeup word recognition model in step 2041 and the voice recognition model in step 20221 may be the same model or different models, which is not limited in this disclosure.
In an embodiment, the second probability may be compared with a preset third threshold, and a second state of the reference sound signal containing the wake-up word may be determined based on a magnitude relationship between the second probability and the third threshold.
As an example, if the second probability is smaller than the third threshold, it is determined that the reference sound signal does not include the wakeup word, and if the second probability is not smaller than the third threshold, it is determined that the reference sound signal includes the wakeup word.
Based on the embodiment shown in fig. 6, the reference sound signal is input to the trained awakening word recognition model to obtain the second probability that the reference sound signal contains the awakening word, and the second state that the reference sound signal contains the awakening word is determined based on the second probability, so that the second state that the reference sound signal contains the awakening word can be conveniently and accurately determined.
It should be noted that the flow illustrated in fig. 6 is merely an exemplary description of how the electronic device determines the second state in the reference sound signal containing the wake-up word, and in practical applications, other implementation flows, such as the implementation flow illustrated in fig. 7, may also exist.
As shown in fig. 7, based on the embodiment shown in fig. 2, step 204 may include the following steps:
The text content corresponding to the reference sound signal may refer to lyrics, lines, etc. corresponding to the reference sound signal.
In step 702, an energy value of a reference sound signal is determined.
The energy value may represent a volume of the reference sound signal, for example, a higher energy value represents a greater volume of the reference sound signal, and conversely, a lower energy value represents a smaller volume of the reference sound signal.
And step 704, determining a second state containing the wake-up word in the reference sound signal based on the text content and the magnitude relation between the energy value and the fourth threshold value.
In an embodiment, the electronic device may determine the second state containing the wake-up word in the reference sound signal based on the text content corresponding to the reference sound signal and a magnitude relationship between the energy value of the compared reference sound signal and the fourth threshold. As an example, if the text content includes a wakeup word and the energy value of the reference sound signal is greater than the fourth threshold by comparison, it may be considered that the reference sound signal includes the wakeup word; if the text content contains the awakening word and the energy value of the reference sound signal is not greater than the fourth threshold value through comparison, the reference sound signal can be considered to contain no awakening word.
Based on the embodiment shown in fig. 7, by obtaining the text content corresponding to the reference sound signal, determining the energy value of the reference sound signal, and determining the second state containing the wakeup word in the reference sound signal based on the text content and the magnitude relationship between the energy value and the fourth threshold, it is possible to accurately determine the second state containing the wakeup word in the reference sound signal.
Any of the wake-up methods provided by embodiments of the present disclosure may be performed by any suitable device having data processing capabilities, including but not limited to: terminal equipment, a server and the like. Alternatively, any of the wake-up methods provided by the embodiments of the present disclosure may be executed by a processor, for example, the processor may execute any of the wake-up methods mentioned in the embodiments of the present disclosure by calling a corresponding instruction stored in a memory. Which will not be described in detail below.
Exemplary devices
Fig. 8 is a schematic structural diagram of a wake-up apparatus according to an exemplary embodiment of the present disclosure.
As shown in fig. 8, the apparatus includes:
a first obtaining module 81, configured to obtain a first sound signal in a space where the electronic device is located;
a first determining module 82, configured to determine that the first sound signal acquired by the first acquiring module includes a first state of a wakeup word;
a second obtaining module 83, configured to obtain a reference sound signal played by the electronic device;
a second determining module 84, configured to determine a second state where the reference sound signal acquired by the second acquiring module includes a wakeup word;
a third determining module 85, configured to determine the wake-up state of the electronic device according to the first state determined by the first determining module and the second state determined by the second determining module.
Based on the embodiment shown in fig. 8, a first state including a wakeup word in a first sound signal is determined by obtaining the first sound signal in a space where the electronic device is located and a reference sound signal played by the electronic device, a second state including the wakeup word in the reference sound signal is determined, and a wakeup state of the electronic device is determined according to the first state and the second state.
In an embodiment, the third determining module 85 may be specifically configured to:
if the first state determined by the first determining module indicates that the first sound signal acquired by the first acquiring module contains the wake-up word, and the second state determined by the second determining module indicates that the reference sound signal acquired by the second acquiring module contains the wake-up word, not executing wake-up operation;
and if the first state determined by the first determining module indicates that the first sound signal acquired by the first acquiring module contains the awakening word, and the second state determined by the second determining module indicates that the reference sound signal acquired by the second acquiring module does not contain the awakening word, executing awakening operation.
As shown in fig. 9, on the basis of the embodiment shown in fig. 8, the third determining module 85 may include:
an energy determining sub-module 851, configured to determine an energy value of the sound signal acquired by the second acquiring module 83 if the first state determined by the first determining module 82 indicates that the first sound signal acquired by the first acquiring module 81 includes the wake-up word, and the second state determined by the second determining module 84 indicates that the reference sound signal acquired by the second acquiring module 83 includes the wake-up word;
a first comparison sub-module 852, configured to compare the energy value determined by the energy determination sub-module 851 with a preset first threshold;
a first determining sub-module 853, configured to determine a wake-up state of the electronic device based on a magnitude relationship between the energy value compared by the first comparing sub-module 852 and the first threshold.
Based on the embodiment shown in fig. 9, when it is determined that the first sound signal includes the wakeup word and the reference sound signal also includes the wakeup word, it is further determined whether the reference sound signal is sufficient to wake up the electronic device by referring to the energy value of the sound signal, so that the electronic device is woken up according to the first sound signal in the case that the reference sound signal includes the wakeup word but is not sufficient to wake up the electronic device, thereby avoiding the problem that the user experience is affected due to untimely wake-up of the electronic device.
As shown in fig. 10, on the basis of the embodiment shown in fig. 8, the first determining module 82 may include:
the processing submodule 821 is configured to perform audio preprocessing on the first sound signal acquired by the first acquiring module 81 to acquire a second sound signal;
the second determining submodule 822 is configured to determine a first state in which the second sound signal obtained by the processing submodule 821 includes the wakeup word.
Based on the embodiment shown in fig. 10, since the first sound signal is subjected to audio preprocessing first, noise such as noise and echo in the first sound signal is eliminated, and therefore, the accuracy of the first state including the wakeup word in the subsequently determined second sound signal can be improved.
As shown in fig. 11, on the basis of the embodiment shown in fig. 10, the second determining submodule 822 may include:
a first input submodule 8221, configured to input the second sound signal obtained by the processing submodule 821 to a trained sound recognition model, so as to obtain a first probability that the second sound signal includes the wakeup word;
a third determining submodule 8222, configured to determine, based on the first probability obtained by the first input submodule 8221, a first state in which the second sound signal acquired by the processing submodule 821 includes the wake-up word.
In an embodiment, the third determination submodule 8222 includes:
a second comparison submodule 82221, configured to compare the first probability obtained by the first input submodule 8221 with a preset second threshold;
a fourth determining submodule 82222, configured to determine, based on a magnitude relationship between the first probability compared by the second comparing submodule 82221 and the second threshold, a first state in which the second sound signal acquired by the processing submodule 821 includes the wakeup word.
Based on the embodiment shown in fig. 11, the second sound signal is input to the trained sound recognition model to obtain the first probability that the second sound signal includes the wakeup word, and the first state that the second sound signal includes the wakeup word is determined based on the first probability, so that the first state that the second sound signal includes the wakeup word can be conveniently and accurately determined.
As shown in fig. 12, on the basis of the embodiment shown in fig. 8, the second determining module 84 may include:
the second input sub-module 841 is configured to input the reference sound signal acquired by the second acquiring module 83 to a trained awakening word recognition model, so as to obtain a second probability that the reference sound signal includes the awakening word;
a fifth determining sub-module 842, configured to determine, based on the second probability obtained by the second input sub-module 841, a second state in the reference sound signal obtained by the second obtaining module 83, where the reference sound signal includes the wakeup word.
In an embodiment, the fifth determining sub-module 842 may include:
a third comparing submodule 8421, configured to compare the second probability obtained by the second input submodule 841 with a preset third threshold;
a sixth determining sub-module 8422, configured to determine, based on a magnitude relationship between the second probability compared by the third comparing sub-module 8421 and the third threshold, a second state in the reference sound signal acquired by the second acquiring module 83, where the second state includes the wakeup word.
Based on the embodiment shown in fig. 12, the reference sound signal is input to the trained awakening word recognition model to obtain the second probability that the reference sound signal contains the awakening word, and the second state that the reference sound signal contains the awakening word is determined based on the second probability, so that the second state that the reference sound signal contains the awakening word can be conveniently and accurately determined.
As shown in fig. 13, on the basis of the embodiment shown in fig. 8, the second determining module 84 may include:
an obtaining sub-module 843, configured to obtain text content corresponding to the reference sound signal obtained by the second obtaining module 83;
a seventh determining sub-module 844 for determining an energy value of the reference sound signal acquired by the second acquiring module 83;
a fourth comparison sub-module 845, configured to compare the energy value determined by the seventh determination sub-module 844 with a preset fourth threshold;
an eighth determining submodule 846, configured to determine, based on the text content acquired by the acquiring submodule 843 and a magnitude relationship between the energy value compared by the fourth comparing submodule 845 and the fourth threshold, a second state in which the reference sound signal acquired by the second acquiring module 83 includes the wakeup word.
Based on the embodiment shown in fig. 7, by obtaining the text content corresponding to the reference sound signal, determining the energy value of the reference sound signal, and determining the second state containing the wakeup word in the reference sound signal based on the text content and the magnitude relationship between the energy value and the fourth threshold, it is possible to accurately determine the second state containing the wakeup word in the reference sound signal.
Exemplary electronic device
Next, an electronic apparatus according to an embodiment of the present disclosure is described with reference to fig. 14. The electronic device may be either or both of the first device 100 and the second device 200, or a stand-alone device separate therefrom, which stand-alone device may communicate with the first device and the second device to receive the acquired input signals therefrom.
FIG. 14 illustrates a block diagram of an electronic device in accordance with an embodiment of the disclosure.
As shown in fig. 14, the electronic device 140 includes one or more processors 141 and memory 142.
Processor 141 may be a Central Processing Unit (CPU) or other form of processing unit having data processing capabilities and/or instruction execution capabilities, and may control other components in electronic device 140 to perform desired functions.
Memory 142 may include one or more computer program products that may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory. The volatile memory may include, for example, random Access Memory (RAM), cache memory (cache), and/or the like. The non-volatile memory may include, for example, read Only Memory (ROM), hard disk, flash memory, etc. One or more computer program instructions may be stored on the computer-readable storage medium and executed by processor 141 to implement the wake-up methods of the various embodiments of the present disclosure described above and/or other desired functions. Various contents such as an input signal, a signal component, a noise component, etc. may also be stored in the computer-readable storage medium.
In one example, the electronic device 140 may further include: an input device 143 and an output device 144, which are interconnected by a bus system and/or other form of connection mechanism (not shown).
For example, when the electronic device is the first device 100 or the second device 200, the input device 143 may be the microphone or the microphone array described above for capturing the input signal of the sound source. When the electronic device is a stand-alone device, the input means 143 may be a communication network connector for receiving the acquired input signals from the first device 100 and the second device 200.
The input device 143 may also include, for example, a keyboard, a mouse, and the like.
The output device 144 may output various information including the determined distance information, direction information, and the like to the outside. The output devices 144 may include, for example, a display, speakers, a printer, and a communication network and remote output devices connected thereto, among others.
Of course, for simplicity, only some of the components of the electronic device 140 relevant to the present disclosure are shown in fig. 14, omitting components such as buses, input/output interfaces, and the like. In addition, the electronic device 140 may include any other suitable components, depending on the particular application.
Exemplary computer program product and computer-readable storage Medium
In addition to the above-described methods and apparatus, embodiments of the present disclosure may also be a computer program product comprising computer program instructions that, when executed by a processor, cause the processor to perform the steps in the wake-up method according to various embodiments of the present disclosure described in the "exemplary methods" section of this specification, above.
The computer program product may write program code for carrying out operations for embodiments of the present disclosure in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server.
Furthermore, embodiments of the present disclosure may also be a computer-readable storage medium having stored thereon computer program instructions that, when executed by a processor, cause the processor to perform the steps in the wake-up method according to various embodiments of the present disclosure described in the "exemplary methods" section above of this specification.
The computer-readable storage medium may take any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may include, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
The foregoing describes the general principles of the present disclosure in conjunction with specific embodiments, however, it is noted that the advantages, effects, etc. mentioned in the present disclosure are merely examples and are not limiting, and they should not be considered essential to the various embodiments of the present disclosure. Furthermore, the foregoing disclosure of specific details is for the purpose of illustration and description and is not intended to be limiting, since the disclosure is not intended to be limited to the specific details so described.
In the present specification, the embodiments are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same or similar parts in the embodiments are referred to each other. For the system embodiment, since it basically corresponds to the method embodiment, the description is relatively simple, and for the relevant points, reference may be made to the partial description of the method embodiment.
The block diagrams of devices, apparatuses, devices, systems involved in the present disclosure are only given as illustrative examples and are not intended to require or imply that the connections, arrangements, configurations must be made in the manner shown in the block diagrams. These devices, apparatuses, devices, systems may be connected, arranged, configured in any manner, as will be appreciated by those skilled in the art. Words such as "including," "comprising," "having," and the like are open-ended words that mean "including, but not limited to," and are used interchangeably therewith. The words "or" and "as used herein mean, and are used interchangeably with, the word" and/or, "unless the context clearly dictates otherwise. The word "such as" is used herein to mean, and is used interchangeably with, the phrase "such as but not limited to".
The methods and apparatus of the present disclosure may be implemented in a number of ways. For example, the methods and apparatus of the present disclosure may be implemented by software, hardware, firmware, or any combination of software, hardware, and firmware. The above-described order for the steps of the method is for illustration only, and the steps of the method of the present disclosure are not limited to the order specifically described above unless specifically stated otherwise. Further, in some embodiments, the present disclosure may also be embodied as programs recorded in a recording medium, the programs including machine-readable instructions for implementing the methods according to the present disclosure. Thus, the present disclosure also covers a recording medium storing a program for executing the method according to the present disclosure.
It is also noted that in the devices, apparatuses, and methods of the present disclosure, each component or step can be decomposed and/or recombined. These decompositions and/or recombinations are to be considered equivalents of the present disclosure.
The previous description of the disclosed aspects is provided to enable any person skilled in the art to make or use the present disclosure. Various modifications to these aspects will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other aspects without departing from the scope of the disclosure. Thus, the present disclosure is not intended to be limited to the aspects shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.
The foregoing description has been presented for purposes of illustration and description. Furthermore, this description is not intended to limit embodiments of the disclosure to the form disclosed herein. While a number of example aspects and embodiments have been discussed above, those of skill in the art will recognize certain variations, modifications, alterations, additions and sub-combinations thereof.
Claims (11)
1. A wake-up method applied to an electronic device includes:
acquiring a first sound signal in a space where the electronic equipment is located;
determining a first state of the first sound signal containing a wake-up word;
acquiring a reference sound signal played by the electronic equipment;
determining a second state of the reference sound signal containing a wake-up word;
determining the awakening state of the electronic equipment according to the first state and the second state;
wherein the determining the wake-up state of the electronic device according to the first state and the second state comprises:
if the first state represents that the first sound signal contains the awakening word and the second state represents that the reference sound signal contains the awakening word, determining an energy value of the reference sound signal;
comparing the energy value with a preset first threshold value;
determining a wake-up state of the electronic device based on a magnitude relationship of the energy value to the first threshold.
2. The method of claim 1, wherein determining the wake state of the electronic device from the first state and the second state further comprises:
and if the first state represents that the first sound signal contains the awakening word and the second state represents that the reference sound signal does not contain the awakening word, executing awakening operation.
3. The method of claim 1, wherein the determining a first state in the first sound signal containing a wake word comprises:
carrying out audio preprocessing on the first sound signal to obtain a second sound signal;
determining a first state in the second sound signal that includes the wake-up word.
4. The method of claim 3, wherein the determining the first state in the second sound signal that includes a wake-up word comprises:
inputting the second sound signal into a trained sound recognition model to obtain a first probability that the second sound signal contains the awakening word;
determining a first state in the second sound signal that includes the wake-up word based on the first probability.
5. The method of claim 4, wherein the determining a first state in the second sound signal that includes the wake-up word based on the first probability comprises:
comparing the first probability with a preset second threshold value;
determining a first state of the second sound signal containing the wake-up word based on a magnitude relationship between the first probability and the second threshold.
6. The method of claim 1, wherein the determining a second state in the reference sound signal that includes the wake-up word comprises:
inputting the reference sound signal into a trained awakening word recognition model to obtain a second probability that the reference sound signal contains the awakening word;
determining a second state of the reference sound signal that includes the wake-up word based on the second probability.
7. The method of claim 6, wherein the determining a second state of the reference sound signal that includes the wake-up word based on the second probability comprises:
comparing the second probability with a preset third threshold value;
determining a second state of the reference sound signal containing the wake-up word based on a magnitude relationship between the second probability and the third threshold.
8. The method of claim 1, wherein the determining a second state in the reference sound signal that includes the wake-up word comprises:
acquiring text content corresponding to the reference sound signal;
determining an energy value of the reference sound signal;
comparing the energy value with a preset fourth threshold value;
determining a second state of the reference sound signal that includes the wake-up word based on the text content and a magnitude relationship between the energy value and the fourth threshold.
9. A wake-up device applied to an electronic device comprises:
the first acquisition module is used for acquiring a first sound signal in a space where the electronic equipment is located;
the first determining module is used for determining a first state that the first sound signal acquired by the first acquiring module contains the awakening word;
the second acquisition module is used for acquiring a reference sound signal played by the electronic equipment;
the second determining module is used for determining a second state that the reference sound signal acquired by the second acquiring module contains the awakening word;
the third determining module is used for determining the awakening state of the electronic equipment according to the first state determined by the first determining module and the second state determined by the second determining module;
the third determining module is configured to determine an energy value of the reference sound signal if the first state indicates that the first sound signal includes the wake-up word and the second state indicates that the reference sound signal includes the wake-up word; the third determining module is further configured to compare the energy value with a preset first threshold, and then determine the wake-up state of the electronic device based on a magnitude relationship between the energy value and the first threshold.
10. A computer-readable storage medium, which stores a computer program for performing the wake-up method of any one of the preceding claims 1 to 8.
11. An electronic device, the electronic device comprising:
a processor;
a memory for storing the processor-executable instructions;
the processor configured to perform the wake-up method according to any one of claims 1 to 8.
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201911099800.2A CN112802458B (en) | 2019-11-12 | 2019-11-12 | Wake-up method and device, storage medium and electronic equipment |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201911099800.2A CN112802458B (en) | 2019-11-12 | 2019-11-12 | Wake-up method and device, storage medium and electronic equipment |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| CN112802458A CN112802458A (en) | 2021-05-14 |
| CN112802458B true CN112802458B (en) | 2023-03-31 |
Family
ID=75802956
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN201911099800.2A Active CN112802458B (en) | 2019-11-12 | 2019-11-12 | Wake-up method and device, storage medium and electronic equipment |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN112802458B (en) |
Family Cites Families (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP2003195989A (en) * | 2001-12-26 | 2003-07-11 | Internatl Business Mach Corp <Ibm> | Computer device, power source supply control method and program |
| CN107134279B (en) * | 2017-06-30 | 2020-06-19 | 百度在线网络技术(北京)有限公司 | Voice awakening method, device, terminal and storage medium |
| CN109697984B (en) * | 2018-12-28 | 2020-09-04 | 北京声智科技有限公司 | Method for reducing self-awakening of intelligent equipment |
| CN110085223A (en) * | 2019-04-02 | 2019-08-02 | 北京云知声信息技术有限公司 | A kind of voice interactive method of cloud interaction |
-
2019
- 2019-11-12 CN CN201911099800.2A patent/CN112802458B/en active Active
Also Published As
| Publication number | Publication date |
|---|---|
| CN112802458A (en) | 2021-05-14 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN110473546B (en) | Method and device for recommending media files | |
| US11189273B2 (en) | Hands free always on near field wakeword solution | |
| US20250308377A1 (en) | Methods and systems for ambient system control | |
| CN107909998B (en) | Voice instruction processing method and device, computer equipment and storage medium | |
| US20200035241A1 (en) | Method, device and computer storage medium for speech interaction | |
| CN114038457B (en) | Method, electronic device, storage medium, and program for voice wakeup | |
| CN110209812B (en) | Text classification method and device | |
| CN111694926A (en) | Interactive processing method and device based on scene dynamic configuration and computer equipment | |
| CN110770826A (en) | Secure utterance storage | |
| CN114999534B (en) | A method, device, equipment and storage medium for controlling the playback of in-vehicle music | |
| CN113053377A (en) | Voice wake-up method and device, computer readable storage medium and electronic equipment | |
| CN112687286A (en) | Method and device for adjusting noise reduction model of audio equipment | |
| CN110544468B (en) | Application awakening method and device, storage medium and electronic equipment | |
| CN113889091A (en) | Voice recognition method and device, computer readable storage medium and electronic equipment | |
| KR102220964B1 (en) | Method and device for audio recognition | |
| CN111210824B (en) | Voice information processing method and device, electronic equipment and storage medium | |
| CN108962226B (en) | Method and apparatus for detecting end point of voice | |
| US20190180734A1 (en) | Keyword confirmation method and apparatus | |
| CN116665663A (en) | Digital human interaction control method, device, electronic equipment and storage medium | |
| CN112598027A (en) | Equipment abnormity identification method and device, terminal equipment and storage medium | |
| CN113360630B (en) | Interactive information prompting method | |
| CN110827799A (en) | Method, apparatus, device and medium for processing voice signal | |
| CN114842385A (en) | Subject teaching and training video review methods, devices, equipment and media | |
| CN114143608A (en) | Content recommendation method and device, computer equipment and readable storage medium | |
| CN112802458B (en) | Wake-up method and device, storage medium and electronic equipment |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| GR01 | Patent grant | ||
| GR01 | Patent grant |