Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application.
In the related art, in determining the holographic scene type of an audio stream, for example, a common music APP has live broadcast, video and music, and for example, a common communication APP may watch video on a social platform, watch video on a communication interface with friends, listen to voice messages, talk in voice, and use a navigation function, and it is often inappropriate to divide the holographic scene type according to the type of an application.
Secondly, it is often not appropriate to determine the type of holographic scene based on the type of audio stream being played by the application, because the type of audio stream is set by the application when it is played, and the accuracy is poor.
Finally, if the holographic scene type is identified by the software package name of the application program, the identification of the holographic scene type can only be applied to the known software package, and the limited list does not contain the unknown application program, so that the accuracy is poor.
In addition, holographic audio (also called holographic sound effect) is a technology simulating auditory perception mode, namely 3D sound effect technology, which stereoscopically displays sound from a single plane through complex algorithm and processing mode, so that a listener can clearly perceive the position and distance of the sound in space, and then, the control of an application program under the holographic audio is conducted.
In order to solve the technical problem of low accuracy in a manner of determining a holographic scene type under holographic audio in the related art, an embodiment of the present application provides an audio stream identification method, where the method is applied to an electronic device, fig. 1 is a schematic flow diagram of an alternative audio stream identification method provided by the embodiment of the present application, and as shown in fig. 1, the audio stream identification method may include:
s101, acquiring an audio stream of electronic equipment;
In S101, the electronic device first obtains the audio streams of the electronic device, where the number of the audio streams may be one, or may be two or more, and here, the embodiment of the present application is not limited in particular.
In addition, the audio stream of the electronic device may be one audio stream of one application, two or more audio streams of one application, and two or more audio streams of two applications, where the embodiment of the application is not limited in particular.
Specifically, the application program writes the audio stream to the memory of the electronic device through the AudioTrack interface, so that the electronic device obtains the audio stream of the electronic device.
S102, extracting characteristics of an audio stream to obtain the characteristics of the audio stream;
After S101, the electronic device performs feature extraction on the audio stream, where before the feature extraction, processing such as data cleaning, file format alignment, data enhancement, and spectrogram conversion may be sequentially performed on the audio stream to obtain a processed audio stream, and it should be noted that the above processing is not limited thereto.
After the processing is obtained, the processed audio stream is subjected to feature extraction by a front-end processing algorithm (FiliterBank, fbank) or an algorithm such as Mel frequency cepstrum coefficient (Mel-frequency cepstral coefficients, MFCC) and the like, so that the features of the audio stream can be obtained.
The above feature extraction is performed on each audio stream, for example, when the audio stream is one audio stream, the feature of the audio stream is obtained by performing feature extraction on the audio stream, and when the audio stream is two or more audio streams, the feature of each audio stream is obtained by performing feature extraction on each audio stream.
In this way, the characteristics of the audio stream can be derived for determining the holographic scene type of the audio stream.
S103, inputting the characteristics of the audio stream into the trained scene recognition model to perform holographic scene recognition, and obtaining the holographic scene type of the audio stream.
After determining the features of the audio stream through S102, since the trained scene recognition model is stored in the electronic device, the electronic device may input the features of the audio stream into the trained scene recognition model to perform holographic scene recognition, so as to obtain the holographic scene type of the audio stream.
The trained scene recognition model may be obtained by training the electronic device based on a sample data set of the model, or may be obtained by training the cloud server based on a sample data set of the model, and after the trained scene recognition model is obtained, the trained scene recognition model is sent to the electronic device, so that the trained scene recognition model is stored in the electronic device, where the embodiment of the application is not limited in detail.
When the number of the audio streams is one, the characteristics of the audio stream are input into the trained scene recognition model to obtain the holographic scene type of the audio stream, and when the number of the audio streams is two or more, the characteristics of each audio stream are input into the trained scene recognition model to obtain the holographic scene type of each audio stream.
In addition, the above holographic scene types may include game background sound, music, voice, one type of video, two types of video, system call, voice over IP (Voice over Internet Protocol, voIP) type of voice, bell, alarm clock, notification, navigation, game voice, etc., which the embodiments of the present application are not limited in detail herein.
Therefore, the holographic scene type of the audio stream can be obtained, and the trained scene recognition model is adopted to ensure that the determination of the holographic scene type is more accurate, and the audio effect under holographic audio is better.
In general, in a scene of holographic audio, there are multiple audio streams to be played, and in order to implement identification of holographic scene types of the multiple audio streams, in an alternative embodiment, when the number of audio streams is at least two, S102 may include:
Extracting the characteristics of each audio stream in the audio streams to obtain the characteristics of each audio stream;
Accordingly, S103 may include:
and inputting the characteristics of each audio stream into the trained scene recognition model to respectively carry out holographic scene recognition to obtain the holographic scene type of each audio stream.
It can be understood that when the number of the audio streams is at least two, the electronic device extracts the characteristics of each audio stream when the electronic device acquires at least two audio streams, so that the characteristics of each audio stream can be obtained, and then the characteristics of each audio stream are input into the trained scene recognition model to respectively perform holographic scene recognition, so that the holographic scene type of each audio stream can be obtained.
That is, for two or more audio streams, feature extraction and scene recognition are performed respectively, so that the holographic scene type of each audio stream is determined respectively, which is helpful to improve the accuracy of determining the holographic scene type, so that the electronic device can know the holographic scene type of each audio stream, and play the spatial audio corresponding to the audio stream by utilizing the holographic audio better.
In order to enhance the sound effect of the audio stream under holographic audio, in an alternative embodiment, the method may further comprise:
Based on the holographic scene type of the audio stream, playing the spatial audio corresponding to the audio stream.
It will be appreciated that when the holographic scene type of the audio stream is obtained, the spatial audio corresponding to the audio stream may be played based on the determined holographic scene type, wherein mainly the holographic scene type of the audio stream, the spatial position of the audio stream is determined, and then the electronic device plays the spatial audio based on the determined spatial position of the audio stream.
Therefore, the spatial audio corresponding to the audio stream is played based on the holographic scene type of the audio stream, and the spatial sound effect of the audio stream in the holographic scene can be improved.
In order to improve the spatial sound effect of the audio stream under the holographic audio when the number of the audio streams is one, in an alternative embodiment, playing the spatial audio corresponding to the audio stream based on the holographic scene type of the audio stream may include:
Determining a spatial position corresponding to the holographic scene type based on the holographic scene type of the audio stream;
And playing the spatial audio of the audio stream based on the spatial position corresponding to the holographic scene type.
It can be understood that, the electronic device determines the spatial position corresponding to the holographic scene type based on the holographic scene type of the audio stream, where the electronic device may be preset with a correspondence between the scene type and the spatial position, then determine the spatial position corresponding to the holographic scene type based on the correspondence, and play the spatial audio of the audio stream at the determined spatial position corresponding to the holographic scene type.
The central position in the space where the holographic audio is located may be directly determined as the spatial position corresponding to the holographic scene type, and the spatial audio of the audio stream may be played at the central position, which is not particularly limited herein.
In this way, the spatial audio of the audio stream can be played based on the holographic scene type of the audio stream, thereby improving the spatial sound effect under the holographic audio.
In addition, for improving spatial sound effects in the holographic scene when the number of audio streams is at least two, in an alternative embodiment, playing spatial audio corresponding to the audio streams based on the holographic scene type of the audio streams when the number of audio streams is at least two may include:
Determining a spatial position corresponding to each audio stream based on the priority of the holographic scene type of each audio stream in the audio streams;
and playing the spatial audio of each audio stream based on the corresponding spatial position of each audio stream.
It will be appreciated that the electronic device determines the priority of the holographic scene type of each audio stream, where it is to be noted that each holographic scene type corresponds to a priority, e.g. in order from high to low, game background sound, music, voice, one type of video, two types of video, system call, voIP type of voice, ringing, alarm clock, notification, navigation, game voice.
After determining the priority of the holographic scene type of each audio stream, the electronic device may store in advance a correspondence between the priority and the spatial position, and may determine the spatial position of each audio stream based on the correspondence, or may determine the spatial position from high to low according to the priority, where embodiments of the present application are not limited in this particular manner.
After determining the spatial position corresponding to each audio stream, the electronic device may play the spatial audio of each audio stream at the spatial position corresponding to each audio stream.
In this way, the electronic device determines the spatial position corresponding to each audio stream according to the priority of the holographic scene type of each audio, and plays the spatial audio of each audio stream at the spatial position corresponding to each audio stream, which is beneficial to improving the spatial sound effect under the holographic audio.
In order to improve accuracy of holographic scene type identification, in an alternative embodiment, the method may further include:
when the trained scene recognition model continuously outputs the holographic scene types of the audio stream, and the continuous output times reach the preset times, the characteristics of the audio stream and the holographic scene types of the audio stream are sent to the cloud server.
It can be understood that, since the electronic device acquires the audio stream at intervals of a preset time period after acquiring the audio stream, so as to identify the holographic scene type of the audio stream in the above manner, in the model identification, when the trained scene identification model continuously outputs the holographic scene type of the audio stream, and the continuous output times reach the preset times, it is indicated that the identification result of continuously identifying the preset times is the same for the audio stream, and the accuracy of the identification result can be considered to be higher, and the method can be used for retraining the model, so that the characteristics of the audio stream and the holographic scene type of the audio stream are sent to the cloud server.
The method comprises the steps that characteristics of an audio stream and holographic scene types of the audio stream are used for a cloud server to train a locally trained scene recognition model to obtain new parameters of the trained scene recognition model so as to update the locally trained scene recognition model, namely, the cloud server can send the locally trained scene recognition model to the cloud server by utilizing the characteristics of the audio stream and the holographic scene types of the audio stream so as to continuously train the locally trained scene recognition model, so as to obtain the new parameters of the trained scene recognition model, further update the locally trained scene recognition model, and in addition, the new parameters can be used for an electronic device to update the locally trained scene recognition model.
Therefore, the parameters of the model are updated on the cloud server side and the electronic equipment side, and the accuracy of the type of the holographic scene identified by the model is improved.
In order to obtain a trained scene recognition model, in an alternative embodiment, the method may further include:
and acquiring the trained scene recognition model from the cloud server.
It can be understood that, besides the electronic device can train to obtain a trained scene recognition model, the electronic device can train in cloud service to obtain the trained scene recognition model, so that the electronic device can obtain the trained scene recognition model from the cloud server, and after the electronic device obtains the audio stream and extracts the characteristics of the audio stream, the electronic device carries out holographic scene recognition on the audio stream, so as to obtain the holographic scene type of the audio stream.
Therefore, the trained scene recognition model is trained through the cloud server, so that the recognition accuracy of the holographic scene type can be improved while the energy consumption of the electronic equipment is reduced.
In addition, in order to update parameters corresponding to the trained scene recognition model to improve accuracy of holographic scene recognition, in an alternative embodiment, the method may further include:
Acquiring new parameters of the trained scene recognition model from a cloud server;
and updating the parameters of the trained scene recognition model into new parameters so as to retrieve the trained scene recognition model.
It can be understood that after the electronic device obtains the trained scene recognition model, new parameters of the local trained scene recognition model can be obtained from the cloud server, and the parameters of the trained scene recognition model are updated to the new parameters, so that the trained scene recognition model can be obtained again, and the updating of the trained scene recognition model in the electronic device is completed.
Therefore, the method is beneficial to improving the accuracy of identifying the holographic scene type and improving the space sound effect under holographic audio.
To extract the features of the audio stream, in an alternative embodiment, S102 may include:
and extracting the mel-frequency cepstrum coefficient characteristics of the audio stream to obtain the characteristics of the audio stream.
It will be appreciated that, the processing of data cleaning, file format alignment, data enhancement, spectrogram conversion, etc. is mainly used to obtain the processed audio stream before the feature extraction, and it should be noted that the processing is not limited thereto, and the feature extraction is performed on the processed audio stream by using the algorithm such as Fbank or MFCC after the processing is obtained, so that the feature of the audio stream may be obtained.
Thus, after the characteristics of the audio stream are obtained, the more accurate holographic scene type of the audio stream can be obtained, and the accuracy of holographic scene identification is improved.
The embodiment of the application also provides an audio stream identification method, which is applied to a cloud server, and fig. 2 is a schematic flow diagram II of an alternative audio stream identification method provided by the embodiment of the application, as shown in fig. 2, the audio stream identification method may include:
s201, acquiring a sample data set;
s202, training a scene recognition model by using a sample data set to obtain a trained scene recognition model;
And S203, transmitting the trained scene recognition model to the electronic equipment.
In order to enable the electronic device to acquire the trained scene recognition model, in S201, the cloud server acquires a sample data set first, where the sample data set may include an acquired audio stream and a holographic scene type of the acquired audio stream, where the acquired audio stream may be an audio stream of one application program or may be an audio stream of two or more application programs, and embodiments of the present application are not limited in this way specifically.
After the cloud server acquires the sample data set, in S202, the cloud server trains the pre-stored scene recognition model by using the sample data set, so that a trained scene recognition model can be obtained, and in order to enable recognition of the holographic scene type in the electronic device, in S203, the cloud server sends the trained scene recognition model to the electronic device, so that the electronic device recognizes the holographic scene type of the audio stream.
Therefore, the electronic equipment can reduce the energy consumption of the electronic equipment and improve the accuracy of the identification of the holographic scene type.
In order for the cloud server to implement optimization of the trained scene recognition model in the electronic device, in an alternative embodiment, the method may further include:
Acquiring characteristics of an audio stream and a holographic scene type of the audio stream from electronic equipment;
Training the locally trained scene recognition model by utilizing the characteristics of the acquired audio stream and the acquired holographic scene type of the audio stream to obtain new parameters of the locally trained scene recognition model;
the new parameters are sent to the electronic device.
It can be understood that the cloud server can acquire the characteristics of the audio stream and the holographic scene type of the identified audio stream from the electronic device, wherein the holographic scene type of the audio stream is the type that the continuous output from the trained scene recognition model is the same, and the continuous output times reach the preset times, that is, the cloud server acquires accurate data from the electronic device, and can be used for retraining the locally trained scene recognition model to obtain new parameters of the trained scene recognition model.
The cloud server sends the new parameters to the electronic equipment, so that the electronic equipment can update the parameters of the scene recognition model after the local training to the new parameters, and the electronic equipment can optimize the scene recognition model after the local training.
Therefore, the accuracy of holographic scene recognition can be further improved by continuously optimizing the trained scene recognition model in the electronic equipment, so that the spatial sound effect under holographic audio is improved.
In addition, in order to improve accuracy of holographic scene recognition without affecting holographic scene type recognition, in an alternative embodiment, training a locally trained scene recognition model by using the features of the acquired audio stream and the acquired holographic scene type of the audio stream to obtain new parameters of the locally trained scene recognition model may include:
When the current moment reaches a preset time range, training the locally trained scene recognition model by utilizing the characteristics of the acquired audio stream and the acquired holographic scene type of the audio stream to obtain new parameters of the locally trained scene recognition model.
It can be understood that only when the current time reaches the preset time range, the cloud server trains the locally trained scene recognition model by utilizing the characteristics of the acquired audio stream and the acquired holographic scene type of the audio stream to obtain new parameters of the locally trained scene recognition model.
That is, the cloud server trains the locally trained scene recognition model only for a certain specific period of time to obtain new parameters, which are then used to optimize the trained scene recognition model in the electronic device.
The preset time range may be any time period in one day, and usually, a time period at night may be selected to perform retraining of the scene recognition model after local training, so as to prevent the cloud server from being affected by too many resources occupied during the day.
The following describes, by way of example, a method of identifying audio streams in one or more of the embodiments described above.
Fig. 3 is a flowchart of an example one of an alternative audio stream identification method provided in an embodiment of the present application, where, as shown in fig. 3, the audio stream identification method may include:
S301, the mobile terminal acquires an audio stream of an application program;
S302, the mobile terminal performs feature extraction on the audio stream to obtain features of the audio stream;
s303, the mobile terminal inputs the characteristics of the audio stream into a trained artificial intelligence (ARTIFICIAL INTELLIGENCE, AI) model to obtain a holographic scene tag of the audio stream;
s304, the mobile terminal outputs the holographic scene tag.
Specifically, an application program in the mobile terminal writes audio stream data to the mobile terminal through an interface AudioTrack, the mobile terminal performs feature extraction on the part of data, firstly performs data cleaning, such as file format alignment, data enhancement and spectrogram conversion, then extracts features through algorithms such as Fbank or MFCC, and the like, extracts the features, calculates the trained AI model (equivalent to the trained scene recognition model), and then obtains the output holographic scene tag.
Fig. 4 is a flowchart of an example two of an alternative audio stream identification method according to an embodiment of the present application, where as shown in fig. 4, the audio stream identification method may include:
S401, the mobile terminal outputs a holographic scene label by using the trained AI model;
s402, the mobile terminal judges whether the holographic scene tag changes or not, and if yes, S403 is executed;
S403, storing the characteristics and the holographic scene tag by the mobile terminal;
s404, the mobile terminal waits for uploading recent data to a cloud server at night;
s405, the cloud server receives data, retrains the trained AI model by using the data, and obtains new parameters of the trained AI model;
and S406, the cloud server periodically issues new parameters to the mobile terminal.
The learning process of the AI model is divided into two parts, model training and night self-learning, the model training is not carried out on the mobile terminal, but carried out on a cloud server, the cloud server trains the AI model through the acquired sample data set, and then the trained AI model is preloaded on the mobile terminal.
For night self-learning, the mobile terminal outputs a holographic scene tag by using the trained AI model, then judges whether the number of times that the audio stream identified by the mobile terminal continuously outputs the same tag reaches a preset number of times, if so, executes the process of waiting for uploading the latest data to the cloud server at night, wherein the latest data are the characteristics of the audio stream and the holographic scene type of the audio stream, if not, and uploads the data, so that the cloud server retrains the trained AI model by using the data after receiving the data, obtains new parameters of the trained AI model, and sends the new parameters to the mobile terminal to update the parameters of the locally trained AI model.
The present example proposes a neural network-based model that can infer the type of audio stream in a mobile terminal from its content, such as music, video, navigation, alarm clock, notification, incoming call, etc.
Based on the embodiment, the mobile terminal can asynchronously judge the type of the sound source for the playing sound, solves the problem of applying the error mark audio stream type, can support all APP experience holographic audio effects of playing music, and provides great convenience and self-manual experience space for three-party development lovers and holographic fantasy.
The embodiment is applied to a holographic audio module of a mobile terminal, the holographic scene recognition is carried out on partial data content of an input audio source by utilizing an audio classification technology, a holographic scene tag of the audio source is obtained, then a holographic algorithm carries out sound image control according to the tag and the corresponding priority thereof, and therefore position distribution of each audio stream according to the scene is achieved. When the holographic scene label is obtained, the characteristics of the audio stream and the label thereof are asynchronously stored, and then the AI model is retrained when the mobile phone is charged at night, so that the AI model self-learning of the user using habit scene is achieved, and the accuracy rate of the holographic scene type identification is improved.
For example, when a user watches live broadcast with a music APP, after the application program writes down audio stream data, the audio classification technology periodically samples and extracts features from the content of the audio stream, then obtains scene tags through scene recognition, distributes sound image positions according to the tags in a later holographic algorithm, stores and periodically cleans a useful part of the features of the audio play after the audio stream is finished playing, and performs self-learning of an AI model when a user charges a mobile phone at night and is idle.
The embodiment of the application provides an audio stream identification method which is applied to electronic equipment, and comprises the steps of obtaining the audio stream of the electronic equipment, extracting characteristics of the audio stream, obtaining characteristics of the audio stream, inputting the characteristics of the audio stream into a trained scene identification model for carrying out holographic scene identification, and obtaining holographic scene types of the audio stream, namely, in the embodiment of the application, the holographic scene types of the audio stream can be obtained by inputting the audio stream of the electronic equipment into the trained scene identification model for carrying out holographic scene identification, and therefore, the holographic scene types of the audio stream are identified by utilizing the trained scene identification model, so that the determined holographic scene types are more suitable for the holographic audio, the accuracy of determining the holographic scene types under the holographic audio is improved, and the sound effect of the holographic audio is further improved.
Based on the same inventive concept as the foregoing embodiments, an embodiment of the present application provides an audio stream recognition device, where the device is disposed in an electronic device, fig. 5 is a schematic structural diagram of an alternative audio stream recognition device provided in the embodiment of the present application, and as shown in fig. 5, the audio stream recognition device may include:
A first obtaining module 51, configured to obtain an audio stream of the electronic device;
an extracting module 52, configured to perform feature extraction on the audio stream to obtain features of the audio stream;
The recognition module 53 is configured to input the features of the audio stream into the trained scene recognition model to perform holographic scene recognition, so as to obtain a holographic scene type of the audio stream.
In an alternative embodiment, when the number of audio streams is at least two, the extraction module 52 is specifically configured to:
Extracting the characteristics of each audio stream in the audio streams to obtain the characteristics of each audio stream;
Accordingly, the identification module 53 is specifically configured to:
and inputting the characteristics of each audio stream into the trained scene recognition model to respectively carry out holographic scene recognition to obtain the holographic scene type of each audio stream.
In an alternative embodiment, the apparatus is further adapted to:
Based on the holographic scene type of the audio stream, playing the spatial audio corresponding to the audio stream.
In an alternative embodiment, when the number of audio streams is one, the apparatus plays the spatial audio corresponding to the audio stream based on the holographic scene type of the audio stream, and may include:
Determining a spatial position corresponding to the holographic scene type based on the holographic scene type of the audio stream;
And playing the spatial audio of the audio stream based on the spatial position corresponding to the holographic scene type.
In an alternative embodiment, when the number of the audio streams is at least two, the apparatus plays the spatial audio corresponding to the audio stream based on the holographic scene type of the audio stream, and may include:
Determining a spatial position corresponding to each audio stream based on the priority of the holographic scene type of each audio stream in the audio streams;
and playing the spatial audio of each audio stream based on the corresponding spatial position of each audio stream.
In an alternative embodiment, the apparatus is further adapted to:
when the trained scene recognition model continuously outputs the holographic scene types of the audio stream to be the same, and the continuous output times reach the preset times, the characteristics of the audio stream and the holographic scene types of the audio stream are sent to a cloud server;
The method comprises the steps that characteristics of an audio stream and holographic scene types of the audio stream are used for training a locally trained scene recognition model by a cloud server, so that new parameters of the trained scene recognition model are obtained, and the locally trained scene recognition model is updated.
In an alternative embodiment, the apparatus is further adapted to:
and acquiring the trained scene recognition model from the cloud server.
In an alternative embodiment, the apparatus is further adapted to:
Acquiring new parameters of the trained scene recognition model from a cloud server;
and updating the parameters of the trained scene recognition model into new parameters so as to retrieve the trained scene recognition model.
In an alternative embodiment, the extraction module 52 is specifically configured to:
and extracting the mel-frequency cepstrum coefficient characteristics of the audio stream to obtain the characteristics of the audio stream.
In practical applications, the first obtaining module 51, the extracting module 52 and the identifying module 53 may be implemented by a processor located on the identifying device of the audio stream, specifically, a central Processing unit (CPU, central Processing Unit), a microprocessor (MPU, microprocessor Unit), a digital signal processor (DSP, digital Signal Processing), or a field programmable gate array (FPGA, field Programmable GATE ARRAY), etc.
An embodiment of the present application provides an audio stream recognition device, where the device is disposed in an electronic device, fig. 6 is a schematic structural diagram two of an alternative audio stream recognition device provided by the embodiment of the present application, and as shown in fig. 6, the audio stream recognition device may include:
A second acquisition module 61, configured to acquire a sample data set, where the sample data set includes an acquired audio stream and a holographic scene type of the acquired audio stream;
The training module 62 is configured to train the scene recognition model by using the sample data set, so as to obtain a trained scene recognition model;
and the sending module 63 is configured to send the trained scene recognition model to the electronic device.
In an alternative embodiment, the apparatus is further adapted to:
acquiring characteristics of an audio stream and holographic scene types of the audio stream from electronic equipment, wherein the holographic scene types of the audio stream are the types which are continuously output from a trained scene recognition model and have the same continuous output times reaching preset times;
Training the locally trained scene recognition model by utilizing the characteristics of the acquired audio stream and the acquired holographic scene type of the audio stream to obtain new parameters of the locally trained scene recognition model;
the new parameters are sent to the electronic device.
In an alternative embodiment, the device trains the locally trained scene recognition model by using the characteristics of the acquired audio stream and the acquired holographic scene type of the audio stream, and the new parameters of the trained scene recognition model are obtained by:
When the current moment reaches a preset time range, training the locally trained scene recognition model by utilizing the characteristics of the acquired audio stream and the acquired holographic scene type of the audio stream to obtain new parameters of the locally trained scene recognition model.
In practical applications, the second acquiring module 61, the training module 62 and the transmitting module 63 may be implemented by a processor located on the audio stream recognition device, specifically, a central Processing unit (CPU, central Processing Unit), a microprocessor (MPU, microprocessor Unit), a digital signal processor (DSP, digital Signal Processing), a field programmable gate array (FPGA, field Programmable GATE ARRAY), or the like.
Fig. 7 is a schematic structural diagram of an alternative electronic device according to an embodiment of the present application, and as shown in fig. 7, an embodiment of the present application provides an electronic device 700, including:
A processor 71 and a storage medium 72 storing instructions executable by the processor 71, the storage medium 72 performing operations in dependence on the processor 71 through a communication bus 73, the instructions, when executed by the processor 71, performing the method of identifying the audio stream as performed in one or more of the embodiments described above.
In practical use, the components of the electronic device 700 are coupled together via the communication bus 73. It is understood that the communication bus 73 is used to enable connected communication between these components. The communication bus 73 includes a power bus, a control bus, and a status signal bus in addition to the data bus. But for clarity of illustration the various buses are labeled as communication bus 73 in fig. 7.
Fig. 8 is a schematic structural diagram of an optional cloud server according to an embodiment of the present application, and as shown in fig. 8, an embodiment of the present application provides a cloud server 800, including:
a processor 81 and a storage medium 82 storing instructions executable by the processor 81, the storage medium 82 performing operations in dependence on the processor 81 via a communication bus 83, the instructions, when executed by the processor 81, performing the method of identifying the audio stream as performed in one or more embodiments described above.
In practical application, the components in the cloud server 800 are coupled together through the communication bus 83. It is understood that the communication bus 83 is used to enable connected communication between these components. The communication bus 83 includes a power bus, a control bus, and a status signal bus in addition to the data bus. But for clarity of illustration the various buses are labeled as communication bus 83 in fig. 8.
Embodiments of the present application provide a computer storage medium storing executable instructions that, when executed by one or more processors, perform a method for identifying an audio stream as described in one or more embodiments above, where the method is performed by a control device.
The computer readable storage medium may be a magnetic random access Memory (ferromagnetic random access Memory, FRAM), read Only Memory (ROM), programmable Read Only Memory (Programmable Read-Only Memory, PROM), erasable programmable Read Only Memory (Erasable Programmable Read-Only Memory, EPROM), electrically erasable programmable Read Only Memory (ELECTRICALLY ERASABLE PROGRAMMABLE READ-Only Memory, EEPROM), flash Memory (Flash Memory), magnetic surface Memory, optical disk, or Read Only optical disk (Compact Disc Read-Only Memory, CD-ROM), etc.
It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of a hardware embodiment, a software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, magnetic disk storage, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The foregoing description is only of the preferred embodiments of the present application, and is not intended to limit the scope of the present application.