Zhang et al., 2019 - Google Patents
Discriminative frequency filter banks learning with neural networksZhang et al., 2019
View HTML- Document ID
- 16435647028685116021
- Author
- Zhang T
- Wu J
- Publication year
- Publication venue
- EURASIP Journal on Audio, Speech, and Music Processing
External Links
Snippet
Filter banks on spectrums play an important role in many audio applications. Traditionally, the filters are linearly distributed on perceptual frequency scale such as Mel scale. To make the output smoother, these filters are often placed so that they overlap with each other …
- 230000001537 neural 0 title abstract description 19
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/065—Adaptation
- G10L15/07—Adaptation to the speaker
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 characterised by the type of extracted parameters
- G10L25/18—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 characterised by the type of extracted parameters the extracted parameters being spectral information of each sub-band
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification
- G10L17/26—Recognition of special voice characteristics, e.g. for use in lie detectors; Recognition of animal voices
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/02—Feature extraction for speech recognition; Selection of recognition unit
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 specially adapted for particular use for comparison or discrimination
- G10L25/66—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 specially adapted for particular use for comparison or discrimination for extracting parameters related to health condition
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signal, using source filter models or psychoacoustic analysis
- G10L19/02—Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signal, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/14—Fourier, Walsh or analogous domain transformations, e.g. Laplace, Hilbert, Karhunen-Loeve, transforms
- G06F17/141—Discrete Fourier transforms
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/28—Constructional details of speech recognition systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computer systems based on biological models
- G06N3/02—Computer systems based on biological models using neural network models
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N99/00—Subject matter not provided for in other groups of this subclass
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| Ayvaz et al. | Automatic Speaker Recognition Using Mel-Frequency Cepstral Coefficients Through Machine Learning. | |
| Sidi Yakoub et al. | Improving dysarthric speech recognition using empirical mode decomposition and convolutional neural network | |
| Hariharan et al. | Classification of speech dysfluencies using LPC based parameterization techniques | |
| Bermant | BioCPPNet: automatic bioacoustic source separation with deep neural networks | |
| CN113257279A (en) | GTCN-based real-time voice emotion recognition method and application device | |
| Uddin et al. | Gender and region detection from human voice using the three-layer feature extraction method with 1D CNN | |
| KP et al. | ELM speaker identification for limited dataset using multitaper based MFCC and PNCC features with fusion score | |
| Zhang et al. | Discriminative frequency filter banks learning with neural networks | |
| Li et al. | Deep causal speech enhancement and recognition using efficient long-short term memory Recurrent Neural Network | |
| Dash et al. | Multi-objective approach to speech enhancement using tunable Q-factor-based wavelet transform and ANN techniques | |
| Yecchuri et al. | Sub-convolutional U-Net with transformer attention network for end-to-end single-channel speech enhancement | |
| Opochinsky et al. | Single-microphone speaker separation and voice activity detection in noisy and reverberant environments | |
| Zouhir et al. | Bionic Cepstral coefficients (BCC): A new auditory feature extraction to noise-robust speaker identification | |
| Sheeja et al. | Speech dereverberation and source separation using DNN-WPE and LWPR-PCA | |
| Zouhir et al. | A bio-inspired feature extraction for robust speech recognition | |
| Zacarias-Morales et al. | Full single-type deep learning models with multihead attention for speech enhancement | |
| Zeng et al. | A time-frequency fusion model for multi-channel speech enhancement | |
| Slívová et al. | Isolated word automatic speech recognition system | |
| Dutta et al. | Designing of gabor filters for spectro-temporal feature extraction to improve the performance of asr system | |
| Parathai et al. | Single-channel signal separation using spectral basis correlation with sparse nonnegative tensor factorization | |
| Wang et al. | Time-domain adaptive attention network for single-channel speech separation | |
| Prasanna Kumar et al. | Single-channel speech separation using combined EMD and speech-specific information | |
| Bykov et al. | Research of neural network classifier in speaker recognition module for automated system of critical use | |
| Fayyazi et al. | IIRI-Net: An interpretable convolutional front-end inspired by IIR filters for speaker identification | |
| Tkachenko et al. | Speech enhancement for speaker recognition using deep recurrent neural networks |