Detailed Description
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs, the terms used in the description herein are used for the purpose of describing particular embodiments only and are not intended to limit the application, and the terms "comprising" and "having" and any variations thereof in the description of the application and the claims and the above description of the drawings are intended to cover non-exclusive inclusions. The terms first, second and the like in the description and in the claims or in the above-described figures, are used for distinguishing between different objects and not necessarily for describing a sequential or chronological order.
Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment may be included in at least one embodiment of the application. The appearances of such phrases in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Those of skill in the art will explicitly and implicitly appreciate that the embodiments described herein may be combined with other embodiments.
In order to make the person skilled in the art better understand the solution of the present application, the technical solution of the embodiment of the present application will be clearly and completely described below with reference to the accompanying drawings.
As shown in fig. 1, a system architecture 100 may include terminal devices 101, 102, 103, a network 104, and a server 105. The network 104 is used as a medium to provide communication links between the terminal devices 101, 102, 103 and the server 105. The network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.
The user may interact with the server 105 via the network 104 using the terminal devices 101, 102, 103 to receive or send messages or the like. Various communication client applications, such as a web browser application, a shopping class application, a search class application, an instant messaging tool, a mailbox client, social platform software, etc., may be installed on the terminal devices 101, 102, 103.
The terminal devices 101, 102, 103 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smartphones, tablet computers, electronic book readers, MP3 (Mov i ng P i ctur e Experts Gr oup Aud i o Layer I I I, dynamic video expert compression standard audio plane 3) players, MP4 (Mov i ng P i cture Expe rts G roup Aud i o Layer I V, dynamic video expert compression standard audio plane 4) players, laptop and desktop computers, and the like.
The server 105 may be a server providing various services, such as a background server providing support for pages displayed on the terminal devices 101, 102, 103.
It should be noted that, the industrial diagnosis method based on voiceprint recognition provided by the embodiment of the present application is generally executed by a server, and accordingly, the industrial diagnosis device based on voiceprint recognition is generally disposed in the server.
It should be understood that the number of terminal devices, networks and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
With continued reference to FIG. 2, a flow chart of one embodiment of an industrial diagnostic method based on voiceprint recognition in accordance with the present application is shown. The industrial diagnosis method based on voiceprint recognition comprises the following steps:
step S201, collecting sound data of the industrial equipment operating in normal state and fault state.
It should be noted that, the related technical scheme of voiceprint recognition in the application can also be applied to the field of financial science and technology. For example, the voice recognition method can be applied to an intelligent customer service system, and can be used for identifying the emotion of a customer or verifying the identity of the customer by analyzing the voice characteristics of the customer, so that the customer service experience and the safety are improved. In summary, voiceprint recognition has great application potential in the field of financial technology, and decision making can be supported through accurate data analysis as an effective means for improving quality of service and risk control level.
In this embodiment, the electronic device (e.g., the server shown in fig. 1) on which the voiceprint recognition-based industrial diagnostic method operates may transmit or receive data through a wired connection or a wireless connection. It should be noted that the wireless connection may include, but is not limited to, 3G/4G/5G connections, wifi connections, bluetooth connections, wimax connections, Z i gbee connections, UWB (u l t ra W i deband) connections, and other now known or later developed wireless connection means.
Specifically, in the process of collecting sound data, the collected industrial equipment can select equipment commonly used in the industrial field, such as a motor, and the environment needs to be selected to be relatively quiet, so that no interference sound generated by other mechanical equipment is ensured to be generated around the industrial equipment, so as to reduce the influence of background noise. In addition, since the process of collecting the sound data of the industrial equipment operating in the normal state and the fault state is required to be divided into two collecting processes, for example, the process of collecting the sound data of the industrial equipment operating in the normal state comprises the steps of placing the equipment in the normal state to ensure that the equipment is completely preheated to a stable working temperature, continuously recording the sound of the equipment operating by using a microphone array or a sensor for 5 minutes, repeating the recording process to obtain at least 10 groups of data, and the process of collecting the sound data of the industrial equipment operating in the fault state comprises the steps of manually simulating various fault types, such as adjusting certain parts of the equipment to deviate from the normal working state and recording the sound data of the equipment operating in different fault types, recording each fault type for 5 minutes, and likewise, collecting at least 10 groups of data of each fault type, and by utilizing the method, the diversity and the sufficiency of the data can be ensured, and finally, each group of data is marked, including the information of the equipment state (normal state or fault state), the fault type (if any), the fault type, the collecting time and the like, and the like is unified and is convenient for storing the following preprocessing and the feature extracting.
Step S202, preprocessing and feature extraction are carried out on the sound data to obtain voiceprint feature vectors.
Specifically, the collected sound data needs to be preprocessed firstly to ensure the accuracy of subsequent analysis, and the method mainly comprises three processes of noise removal, filtering and signal quality enhancement, wherein the noise removal process can remove high-frequency noise in the sound data and retain signals in a main frequency range, the filtering process can convert the signals into a frequency domain and remove the noise, and the signal quality enhancement process can improve the definition of the signals. Then, feature extraction can be performed on the preprocessed sound data, and effective voiceprint features are extracted from the preprocessed sound data to form voiceprint feature vectors for subsequent establishment of a voiceprint model, wherein the feature extraction means comprise a spectrogram, a mel-frequency cepstrum coefficient (MFCCs), short-time energy and the like, and in the embodiment, the mel-frequency cepstrum coefficient is mainly used for feature extraction.
It should be noted that, mel-frequency cepstrum coefficients (Me l Fr equency Cepst ra lCoeff i c i ents, MFCCs) are a feature extraction method widely applied to the fields of speech recognition and voiceprint recognition, which can capture the spectral characteristics of sound signals and simulate the human ear perception characteristics, so that the method is very suitable for voiceprint recognition tasks, and in the following embodiments, the preprocessing and feature extraction processes will be further described, and will not be repeated here.
And step S203, using the voiceprint feature vector to establish a voiceprint model through a deep neural network.
Specifically, the voiceprint model belongs to a deep neural network model, and may be built by a machine learning algorithm, for example, the voiceprint feature vector may be first divided into a training set, a verification set and a test set, where the ratio is about 80%, 10% and 10% to ensure that each set contains the voiceprint feature vector in a normal state and a fault state, and the voiceprint feature vector in the training set is standardized to ensure that the voiceprint feature vector is on the same scale, then, a suitable deep neural network architecture, such as a Convolutional Neural Network (CNN), a Recurrent Neural Network (RNN) or a long-short-term memory network (LSTM), is required to be selected, and the embodiment selects a Recurrent Neural Network (RNN) more suitable for voiceprint recognition, and the network may capture time sequence and semantic information in the voiceprint through techniques such as multi-layer stacking, attention mechanism and time sequence modeling, and after the initial model is built, the training set may be used, and in the training process, since the acquired voice data is marked, the weight parameters of the model may be updated through a back propagation algorithm, and the verification set and the training model may be used to perform and optimization, such as Convolutional Neural Network (CNN), circular neural network (RNN) and Recurrent Neural Network (RNN) may include error rate, etc., and error rate (error rate) may be estimated, and quality model may be improved, and the final model may be built.
It can be understood that when the voiceprint model is applied to an actual scene, the deployment of the voiceprint model can be local deployment on the embedded device or cloud deployment for providing services through a network interface, and after a new voice sample is input, the voiceprint model can perform feature extraction and voiceprint matching, so that the voiceprint recognition function is realized.
And step S204, when target sound data of the industrial equipment to be diagnosed are received, performing feature comparison on the target sound data according to the voiceprint model to obtain a feature comparison result.
Specifically, after the voiceprint model is deployed, feature comparison can be performed on the received target sound data of the industrial equipment to be diagnosed through the voiceprint model, industrial diagnosis can be realized by utilizing a voiceprint recognition mode, for example, the target sound data is preprocessed in the same way, noise is removed, filtering and signal quality is enhanced, then feature extraction is performed by utilizing a mel frequency cepstrum coefficient to obtain a target voiceprint feature vector, the target voiceprint feature vector is input into the deployed voiceprint model, similarity between the target voiceprint feature vector and the voiceprint feature vector stored in the voiceprint model is calculated by using the voiceprint model, the similarity is used as a feature comparison result, the equipment state of the industrial equipment to be diagnosed can be judged according to the feature comparison result, and industrial diagnosis can be performed on the industrial equipment to be diagnosed.
And step S205, performing diagnostic analysis according to the characteristic comparison result to obtain a diagnostic result of the industrial equipment to be diagnosed.
Specifically, the feature comparison result includes a similarity between the target voiceprint feature vector and the voiceprint feature vector stored in the voiceprint model, and since the voiceprint feature vector stored in the voiceprint model has a flag, different device states (normal state or fault state) can be corresponding, if the device states are fault states, and also different fault types can be corresponding, therefore, multiple thresholds can be preset, each threshold corresponds to a different fault type, when performing diagnostic analysis, the similarity between the target voiceprint feature vector and the voiceprint feature vector stored in the voiceprint model can be compared with the preset threshold, if the similarity between the target voiceprint feature vector and the voiceprint feature vector corresponding to a certain fault type is lower than the preset threshold corresponding to the fault type, the industrial device to be diagnosed can be judged to be in the fault state and the fault type corresponding to the preset threshold exists, and if the similarity between the target voiceprint feature vector and the voiceprint feature vector corresponding to each fault type exceeds or is equal to each preset threshold, the industrial device to be judged to be diagnosed to be in the normal state, and finally, the industrial device to be diagnosed can be generated as a diagnostic result according to the judgment.
It can be understood that each fault type may be further associated with one or more threshold intervals, and when performing diagnostic analysis, if the industrial equipment to be diagnosed has a fault type corresponding to the certain preset threshold, the degree of the fault type may be further determined according to the threshold interval in which the similarity is located, so as to obtain a more detailed diagnostic result.
According to the application, by establishing the voiceprint model of the industrial equipment in different states, voiceprint recognition can be automatically carried out on the sound data generated by the industrial equipment, so that non-contact fault diagnosis is realized, the accuracy of industrial diagnosis is improved, and the efficiency of industrial diagnosis is improved.
In some optional implementations of this embodiment, when receiving the target sound data of the industrial equipment to be diagnosed, the step of comparing the features of the target sound data according to the voiceprint model to obtain the feature comparison result includes:
when the target sound data is received, preprocessing and feature extraction are carried out on the target sound data to obtain a target voiceprint feature vector;
Inputting the target voiceprint feature vector into the voiceprint model, and calculating the similarity between the target voiceprint feature vector and the voiceprint feature vector to be used as the feature comparison result.
In this embodiment, when target sound data of an industrial device to be diagnosed is received, the same processing manner as that of the sound data needs to be performed with preprocessing and feature extraction on the received target sound data to obtain a target voiceprint feature vector, and then the target voiceprint feature vector can be input into a voiceprint model to perform feature comparison with the voiceprint feature vector stored in the voiceprint model. Specifically, a metric learning technique may be used to calculate the similarity between a target voiceprint feature vector and a voiceprint feature vector stored in a voiceprint model, the metric learning technique including a contrast penalty (Cont r ast i ve Loss) that is applied to the similarity calculation between a pair of feature vectors, the goal of which is to minimize the distance between the like samples while maximizing the distance between the like samples, and a triplet penalty (T r i p l et Loss) that is applied to the similarity calculation between triplet feature vectors, the triplet consisting of an anchor feature vector, a positive sample feature vector, and a negative sample feature vector, the goal of which is to pull the distance between the anchor and the positive sample while pulling the distance between the anchor and the negative sample. By using the contrast loss or the triplet loss, the characteristic comparison can be carried out on the target voiceprint characteristic vector and the voiceprint characteristic vector stored in the voiceprint model, and whether the two characteristic vectors are matched or not is determined by calculating the similarity between the target voiceprint characteristic vector and the voiceprint characteristic vector, so that the subsequent diagnosis analysis can be carried out by using the similarity, and the industrial diagnosis is realized.
The application ensures the accuracy of feature comparison by calculating the similarity between the feature vectors, and provides a data base for subsequent diagnosis and analysis, so that non-contact fault diagnosis can be realized according to the similarity, thereby increasing the accuracy of industrial diagnosis and improving the efficiency of industrial diagnosis.
In some optional implementations of this embodiment, the step of inputting the target voiceprint feature vector into the voiceprint model, and calculating the similarity between the target voiceprint feature vector and the voiceprint feature vector as the feature comparison result includes:
using a measurement learning technology, and establishing a measurement learning model according to the voiceprint feature vector;
And calculating the similarity according to the measurement learning model.
In this embodiment, a contrast Loss (Cont rast i ve Loss) or a triplet Loss (Tr i p l et Loss) may be selected as the metric learning technique, and a metric learning model may be built from the voiceprint feature vectors stored in the voiceprint model. Specifically, the metric learning model may be regarded as an enhancement component of the voiceprint model, and is used for learning a distance metric function to improve accuracy and robustness of the voiceprint model, and optimize performance of the voiceprint model, for example, selecting a contrast loss as a metric learning technique, establishing the metric learning model, calculating euclidean distance or cosine similarity between every two feature vectors according to the voiceprint feature vectors stored in the voiceprint model, calculating a loss value according to a contrast loss formula, defining a loss function of the metric learning model, and completing model training, so that the trained metric learning model may be used to perform distance measurement on the target voiceprint feature vectors, that is, calculate euclidean distance between the target voiceprint feature vectors and the voiceprint feature vectors stored in the voiceprint model as similarity.
It should be noted that the contrast loss is a commonly used metric learning loss function, which is used to supervise the distance between the similar samples and the dissimilar samples in the learning task, and the objective of the contrast loss function is to minimize the distance between the similar samples and maximize the distance between the dissimilar samples. The contrast loss function for calculating euclidean distance can be expressed as:
Where N is the total number of samples, d (x i,xj) is the distance between samples x i and x j, y ij is the label indicating whether two samples belong to the same class, and m is the edge value used to control the minimum distance between heterogeneous samples.
According to the application, the similarity between the target voiceprint feature vector and the voiceprint feature vector stored in the voiceprint model is calculated through the measurement learning technology, so that a more accurate feature comparison result is obtained, a data base is provided for subsequent diagnosis analysis, non-contact fault diagnosis can be realized according to the similarity, the accuracy of industrial diagnosis is improved, and the efficiency of industrial diagnosis is improved.
In some optional implementations of this embodiment, the step of obtaining the diagnostic result of the industrial device to be diagnosed by performing diagnostic analysis according to the feature comparison result includes:
Comparing the similarity with a preset threshold value to obtain a comparison result;
analyzing according to the comparison result, and determining the equipment state of the industrial equipment to be diagnosed, wherein the equipment state comprises a normal state and a fault state;
if the equipment state is a fault state, determining a corresponding fault type;
And generating a diagnosis analysis report according to the comparison result, the equipment state and the fault type, and taking the diagnosis analysis report as the diagnosis result.
In this embodiment, a plurality of thresholds may be preset based on the historical experimental result and the requirement of the actual application scenario, where each threshold corresponds to a different type of fault. Specifically, common fault types include bearing wear, motor overheat, loosening of mechanical parts, and the like, a threshold value is set for each fault type, and since the voiceprint feature vector corresponding to each fault type is stored in the voiceprint model, when each similarity between the target voiceprint vector and the feature vector stored in the voiceprint model is obtained through feature comparison, each similarity can be compared with a preset threshold value of the corresponding fault type, whether a similarity lower than the preset threshold value exists is determined, if the similarity exists, the fault of the type is determined, if all the similarities exceed or equal to the preset threshold value, the equipment state of the industrial equipment to be diagnosed is determined to be normal, and in addition, one or more threshold value intervals can be set to represent different fault degrees, for example, two threshold values T1 and T2 (T1 < T2) can be set for bearing wear, wherein if the value of the similarity is located at [ T1, T2), light bearing wear is diagnosed, and if the similarity value is located at [ T2,1], serious bearing wear is diagnosed.
It will be appreciated that if the similarity approaches or fluctuates around a preset threshold, further checks may be required, at which point the detection device to be diagnosed may be marked as an uncertain state and used as a diagnostic result.
The application can determine the equipment state, the fault type and the like of the industrial equipment through the diagnosis analysis of the characteristic comparison result, thereby obtaining the diagnosis result, realizing the non-contact fault diagnosis, increasing the accuracy of the industrial diagnosis and improving the efficiency of the industrial diagnosis.
In some optional implementations of this embodiment, the step of preprocessing and extracting features of the sound data to obtain a voiceprint feature vector includes:
Removing noise, filtering and enhancing signal quality of the sound data to obtain preprocessed training data;
and carrying out feature extraction on the training data according to a Mel frequency cepstrum coefficient algorithm to obtain the voiceprint feature vector.
In this embodiment, the preprocessing process includes removing noise, filtering, and enhancing signal quality, and after preprocessing the sound data to obtain training data, feature extraction can be performed on the training data according to a mel frequency cepstrum coefficient algorithm to obtain a voiceprint feature vector. Specifically, the preprocessing is performed on the sound data, a band-pass filter may be used to remove high-frequency noise in the target sound data to preserve signals in a main frequency range, then an FFT conversion is used to convert the signals to a frequency domain, a filter is applied to remove noise, and then gain adjustment is performed on the target sound data to improve the definition of the signals.
According to the application, the voice data is preprocessed and the characteristic extraction is performed by utilizing the Mel frequency cepstrum coefficient algorithm, so that the quality of the characteristic vector is ensured, a foundation is provided for the establishment of a subsequent voiceprint model, and the robustness and generalization capability of the voiceprint model are improved, thereby realizing non-contact fault diagnosis by means of the voiceprint model, increasing the accuracy of industrial diagnosis and improving the efficiency of industrial diagnosis.
In some optional implementations of this embodiment, the step of extracting features of the training data according to the mel-frequency cepstral coefficient algorithm to obtain the voiceprint feature vector includes:
Framing, windowing and Fourier transforming the training data to obtain a frequency spectrum signal;
filtering the spectrum signal through a Mel filter bank to obtain an energy response;
carrying out logarithmic operation and cepstrum coefficient extraction on the energy response to obtain cepstrum coefficients;
And forming the voiceprint feature vector according to the cepstrum coefficient.
In this embodiment, the specific flow of the mel-frequency cepstral coefficient algorithm may include framing, windowing, fourier transformation, mel-filter bank, logarithmic operation, cepstral coefficient extraction, feature processing, and feature vector composition. The method comprises the steps of obtaining pre-processed training data, carrying out framing treatment on the pre-processed training data, dividing the pre-processed voice signals into short-time frames, enabling the duration of each frame to be 20-40 milliseconds, enabling the voice signals to become stable in time, carrying out windowing treatment on each frame of voice signals, carrying out windowing operation on each frame of voice signals, enabling the adopted window functions to comprise a Hamming window, a Haining window and the like to reduce spectrum leakage effects, carrying out fast Fourier transformation on each frame of windowed voice signals, converting time domain signals into frequency domain signals to obtain spectrum signals of each frame, enabling a Mel filter bank to be a triangular filter for simulating perception characteristics of human ears on sound, filtering the spectrum signals through the Mel filter bank to obtain energy response of the Mel filter bank, then taking logarithm of the energy response of the Mel filter bank to obtain a logarithmic energy spectrum, compressing the dynamic range of energy, enhancing details of a low-frequency part, carrying out Discrete Cosine Transformation (DCT) on the logarithmic energy spectrum to obtain discrete cosine transformation, enabling extraction of the characteristic of each frame of voice signals, carrying out independent inverse spectrum coefficient and carrying out inverse spectrum coefficient characteristic vector processing on the discrete cosine transformation to obtain the characteristic of each frame, and obtaining the characteristic of each frame, and carrying out inverse spectrum coefficient characteristic vector processing, and the characteristic vector attenuation and the like.
According to the application, the feature extraction is carried out on the preprocessed sound data through the Mel frequency cepstrum coefficient algorithm, so that the quality of the feature vector is ensured, a foundation is provided for the establishment of a subsequent voiceprint model, and the robustness and generalization capability of the voiceprint model are improved, thereby realizing non-contact fault diagnosis by means of the voiceprint model, increasing the accuracy of industrial diagnosis and improving the efficiency of industrial diagnosis.
In some optional implementations of this embodiment, after the step of performing the diagnostic analysis according to the feature comparison result to obtain a diagnostic result of the industrial device to be diagnosed, the method further includes:
Receiving a data stream corresponding to the sound data in real time, and extracting to obtain incremental characteristics;
And performing incremental training on the voiceprint model according to the incremental characteristics by adopting an online learning algorithm to realize dynamic updating of the voiceprint model.
In this embodiment, since the industrial equipment varies with use and environment, the voiceprint model also needs to be regularly optimized and updated. Specifically, the method can receive the data stream corresponding to the sound data of the industrial equipment running in the normal state and the fault state in real time, process the data by using a message queue (such as Kafka), ensure the real-time transmission and the persistent storage of the data, accelerate the feature extraction speed through parallel processing, extract the features of the data stream to obtain the incremental features, and then select an optimizer suitable for online learning, such as Adam or Adagr ad, so that the incremental features are utilized to perform incremental training on the voiceprint model to perform incremental updating on model parameters, and realize dynamic updating of the voiceprint model to avoid frequently starting training the model from the beginning.
It will be appreciated that to cope with bursty large-scale data streaming, the data buffers may also be designed to avoid excessive data size affecting system processing performance.
According to the application, the dynamic updating of the voiceprint model is realized through online learning, the robustness and the universality of the voiceprint model are ensured, so that the accuracy of industrial diagnosis can be increased and the efficiency of industrial diagnosis can be improved when the non-contact fault diagnosis is realized by means of the voiceprint model.
The embodiment of the application can acquire and process the related data based on the artificial intelligence technology. Among these, artificial intelligence (ART I F I C I A L I NTE L L I GENCE, A I) is the theory, method, technique, and application system that uses a digital computer or a digital computer-controlled machine to simulate, extend, and expand human intelligence, sense the environment, acquire knowledge, and use knowledge to obtain optimal results.
Artificial intelligence infrastructure technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a robot technology, a biological recognition technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and other directions.
Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by computer readable instructions stored in a computer readable storage medium that, when executed, may comprise the steps of the embodiments of the methods described above. The storage medium may be a nonvolatile storage medium such as a magnetic disk, an optical disk, a Read-On-y Memory (ROM), or a random access Memory (Random Access Memory, RAM).
It should be understood that, although the steps in the flowcharts of the figures are shown in order as indicated by the arrows, these steps are not necessarily performed in order as indicated by the arrows. The steps are not strictly limited in order and may be performed in other orders, unless explicitly stated herein. Moreover, at least some of the steps in the flowcharts of the figures may include a plurality of sub-steps or stages that are not necessarily performed at the same time, but may be performed at different times, the order of their execution not necessarily being sequential, but may be performed in turn or alternately with other steps or at least a portion of the other steps or stages.
With further reference to fig. 3, as an implementation of the method shown in fig. 2, the present application provides an embodiment of an industrial diagnostic device based on voiceprint recognition, which corresponds to the embodiment of the method shown in fig. 2, and which is particularly applicable to various electronic devices.
As shown in fig. 3, the industrial diagnostic device 300 based on voiceprint recognition according to the present embodiment includes an acquisition module 301, an extraction module 302, a creation module 303, a comparison module 304, and an analysis module 305. Wherein:
The acquisition module 301 is used for acquiring sound data of the industrial equipment running in a normal state and a fault state;
The extracting module 302 is configured to perform preprocessing and feature extraction on the sound data to obtain a voiceprint feature vector;
a building module 303, configured to build a voiceprint model through a deep neural network using the voiceprint feature vector;
The comparison module 304 is configured to, when receiving target sound data of an industrial device to be diagnosed, perform feature comparison on the target sound data according to the voiceprint model to obtain a feature comparison result;
and the analysis module 305 is used for performing diagnostic analysis according to the characteristic comparison result to obtain a diagnostic result of the industrial equipment to be diagnosed.
According to the industrial diagnosis device based on voiceprint recognition, the voiceprint model of the industrial equipment in different states is established, so that voiceprint recognition can be automatically carried out on sound data generated by the industrial equipment, and non-contact fault diagnosis is realized, and therefore, the accuracy of industrial diagnosis is improved, and the efficiency of industrial diagnosis is improved.
In some alternative implementations of the present embodiment, the comparison module 304 is further configured to:
when the target sound data is received, preprocessing and feature extraction are carried out on the target sound data to obtain a target voiceprint feature vector;
Inputting the target voiceprint feature vector into the voiceprint model, and calculating the similarity between the target voiceprint feature vector and the voiceprint feature vector to be used as the feature comparison result.
According to the voiceprint recognition-based industrial diagnosis device, the similarity between the feature vectors is calculated, so that the accuracy of feature comparison is ensured, and a data base is provided for subsequent diagnosis analysis, so that non-contact fault diagnosis can be realized according to the similarity, the accuracy of industrial diagnosis is improved, and the efficiency of industrial diagnosis is improved.
In some alternative implementations of the present embodiment, the comparison module 304 is further configured to:
using a measurement learning technology, and establishing a measurement learning model according to the voiceprint feature vector;
And calculating the similarity according to the measurement learning model.
According to the voiceprint recognition-based industrial diagnosis device, the similarity between the target voiceprint feature vector and the voiceprint feature vector stored in the voiceprint model is calculated through the measurement learning technology, so that a more accurate feature comparison result is obtained, a data basis is provided for subsequent diagnosis analysis, non-contact fault diagnosis can be realized according to the similarity, the accuracy of industrial diagnosis is improved, and the efficiency of industrial diagnosis is improved.
In some alternative implementations of the present embodiment, the analysis module 305 is further configured to:
Comparing the similarity with a preset threshold value to obtain a comparison result;
analyzing according to the comparison result, and determining the equipment state of the industrial equipment to be diagnosed, wherein the equipment state comprises a normal state and a fault state;
if the equipment state is a fault state, determining a corresponding fault type;
And generating a diagnosis analysis report according to the comparison result, the equipment state and the fault type, and taking the diagnosis analysis report as the diagnosis result.
According to the voiceprint recognition-based industrial diagnosis device, the equipment state, the fault type and the like of the industrial equipment can be determined through the diagnosis analysis of the characteristic comparison result, so that the diagnosis result is obtained, the non-contact fault diagnosis is realized, the accuracy of the industrial diagnosis is improved, and the efficiency of the industrial diagnosis is improved.
In some optional implementations of the present embodiment, the extraction module 302 is further configured to:
Removing noise, filtering and enhancing signal quality of the sound data to obtain preprocessed training data;
and carrying out feature extraction on the training data according to a Mel frequency cepstrum coefficient algorithm to obtain the voiceprint feature vector.
According to the voiceprint recognition-based industrial diagnosis device, the voice data is preprocessed and the characteristic extraction is carried out by utilizing the Mel frequency cepstrum coefficient algorithm, so that the quality of the characteristic vector is ensured, a foundation is provided for the establishment of a subsequent voiceprint model, the robustness and generalization capability of the voiceprint model are improved, the noncontact fault diagnosis can be realized by means of the voiceprint model, the accuracy of the industrial diagnosis is improved, and the efficiency of the industrial diagnosis is improved.
In some optional implementations of the present embodiment, the extraction module 302 is further configured to:
Framing, windowing and Fourier transforming the training data to obtain a frequency spectrum signal;
filtering the spectrum signal through a Mel filter bank to obtain an energy response;
carrying out logarithmic operation and cepstrum coefficient extraction on the energy response to obtain cepstrum coefficients;
And forming the voiceprint feature vector according to the cepstrum coefficient.
According to the voiceprint recognition-based industrial diagnosis device, the feature extraction is carried out on the preprocessed sound data through the Mel frequency cepstrum coefficient algorithm, so that the quality of feature vectors is ensured, a foundation is provided for the establishment of a subsequent voiceprint model, the robustness and generalization capability of the voiceprint model are improved, the noncontact fault diagnosis can be realized by means of the voiceprint model, the accuracy of industrial diagnosis is improved, and the efficiency of industrial diagnosis is improved.
In some alternative implementations of the present embodiment, the voiceprint recognition based industrial diagnostic device 300 is further configured to:
Receiving a data stream corresponding to the sound data in real time, and extracting to obtain incremental characteristics;
And performing incremental training on the voiceprint model according to the incremental characteristics by adopting an online learning algorithm to realize dynamic updating of the voiceprint model.
According to the voiceprint recognition-based industrial diagnosis device, dynamic updating of the voiceprint model is achieved through online learning, robustness and universality of the voiceprint model are guaranteed, and therefore accuracy of industrial diagnosis and efficiency of industrial diagnosis can be improved when non-contact fault diagnosis is achieved by means of the voiceprint model.
In order to solve the technical problems, the embodiment of the application also provides computer equipment. Referring specifically to fig. 4, fig. 4 is a basic structural block diagram of a computer device according to the present embodiment.
The computer device 4 comprises a memory 41, a processor 42, a network interface 43 communicatively connected to each other via a system bus. It should be noted that only computer device 4 having components 41-43 is shown in the figures, but it should be understood that not all of the illustrated components are required to be implemented and that more or fewer components may be implemented instead. It will be appreciated by those skilled in the art that the computer device herein is a device capable of automatically performing numerical calculation and/or information processing according to preset or stored instructions, and the hardware thereof includes, but is not limited to, a microprocessor, an application specific integrated circuit (APP L I CAT I on SPEC I F I C I NTEGRATED C I rcu I t, AS IC), a programmable gate array (Fie l d-Programmab L E GATE AR RAY, FPGA), a digital Processor (D I G I TA L S I GNA L Processor, DSP), an embedded device, and the like.
The computer equipment can be a desktop computer, a notebook computer, a palm computer, a cloud server and other computing equipment. The computer equipment can perform man-machine interaction with a user through a keyboard, a mouse, a remote controller, a touch pad or voice control equipment and the like.
The memory 41 includes at least one type of readable storage medium including flash memory, hard disk, multimedia card, card memory (e.g., SD or DX memory, etc.), random Access Memory (RAM), static Random Access Memory (SRAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), programmable Read Only Memory (PROM), magnetic memory, magnetic disk, optical disk, etc. In some embodiments, the storage 41 may be an internal storage unit of the computer device 4, such as a hard disk or a memory of the computer device 4. In other embodiments, the memory 41 may also be an external storage device of the computer device 4, such as a plug-in hard disk, a smart memory card (SMART MED I A CARD, SMC), a secure digital (Secure D i g i ta l, SD) card, a flash memory card (F L ASH CARD) or the like, which are provided on the computer device 4. Of course, the memory 41 may also comprise both an internal memory unit of the computer device 4 and an external memory device. In this embodiment, the memory 41 is typically used to store an operating system and various application software installed on the computer device 4, such as computer readable instructions of an industrial diagnostic method based on voiceprint recognition, and the like. Further, the memory 41 may be used to temporarily store various types of data that have been output or are to be output.
The processor 42 may be a central processing unit (Centra lProcess i ng Un i t, CPU), controller, microcontroller, microprocessor, or other data processing chip in some embodiments. The processor 42 is typically used to control the overall operation of the computer device 4. In this embodiment, the processor 42 is configured to execute computer readable instructions stored in the memory 41 or process data, such as computer readable instructions for executing the voiceprint recognition based industrial diagnostic method.
The network interface 43 may comprise a wireless network interface or a wired network interface, which network interface 43 is typically used for establishing a communication connection between the computer device 4 and other electronic devices.
According to the computer equipment provided by the application, the voiceprint model of the industrial equipment in different states is established, so that voiceprint recognition can be automatically carried out on sound data generated by the industrial equipment, and non-contact fault diagnosis is realized, thereby increasing the accuracy of industrial diagnosis and improving the efficiency of industrial diagnosis.
The present application also provides another embodiment, namely, a computer-readable storage medium storing computer-readable instructions executable by at least one processor to cause the at least one processor to perform the steps of the voiceprint recognition based industrial diagnostic method as described above.
The computer readable storage medium provided by the application can automatically identify the voiceprint of the sound data generated by the industrial equipment by establishing the voiceprint model of the industrial equipment in different states so as to realize non-contact fault diagnosis, thereby increasing the accuracy of industrial diagnosis and improving the efficiency of industrial diagnosis.
From the above description of the embodiments, it will be clear to those skilled in the art that the above-described embodiment method may be implemented by means of software plus a necessary general hardware platform, but of course may also be implemented by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (e.g. ROM/RAM, magnetic disk, optical disk) comprising instructions for causing a terminal device (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) to perform the method according to the embodiments of the present application.
It is apparent that the above-described embodiments are only some embodiments of the present application, but not all embodiments, and the preferred embodiments of the present application are shown in the drawings, which do not limit the scope of the patent claims. This application may be embodied in many different forms, but rather, embodiments are provided in order to provide a thorough and complete understanding of the present disclosure. Although the application has been described in detail with reference to the foregoing embodiments, it will be apparent to those skilled in the art that modifications may be made to the embodiments described in the foregoing description, or equivalents may be substituted for elements thereof. All equivalent structures made by the content of the specification and the drawings of the application are directly or indirectly applied to other related technical fields, and are also within the scope of the application.