Nesta et al., 2011 - Google Patents

Robust Automatic Speech Recognition through On-line Semi Blind Signal Extraction

Nesta et al., 2011

Document ID: 6039266553429425140
Author: Nesta F; Matassoni M
Publication year: 2011
Publication venue: Proc. 1st Int. Workshop on Machine Listening in Multisource Environments (CHiME)

External Links

Cited by

Snippet

This paper describes the system used to process the data of the CHiME Pascal 2011 competition, whose goal is to separate the desired speech and recognize the commands being spoken. The binaural recorded mixtures are processed by an on-line Semi-Blind …

Continue reading at cris.fbk.eu (PDF) (other versions)

238000000605 extraction 0 title abstract description 8

Classifications

- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
- G10L2021/02161—Number of inputs available containing the signal or the noise to be suppressed
- G10L2021/02166—Microphone arrays; Beamforming
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/065—Adaptation
- G10L15/07—Adaptation to the speaker
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/14—Speech classification or search using statistical models, e.g. hidden Markov models [HMMs]
- G10L15/142—Hidden Markov Models [HMMs]
- G10L15/144—Training of HMMs
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0272—Voice signal separating
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/063—Training
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification
- G10L17/04—Training, enrolment or model building
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/06—Transformation of speech into a non-audible representation, e.g. speech visualisation or speech processing for tactile aids
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signal, using source filter models or psychoacoustic analysis
- G10L19/008—Multichannel audio signal coding or decoding, i.e. using interchannel correlation to reduce redundancies, e.g. joint-stereo, intensity-coding, matrixing
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/02—Feature extraction for speech recognition; Selection of recognition unit
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 characterised by the type of extracted parameters
- G10L25/18—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 characterised by the type of extracted parameters the extracted parameters being spectral information of each sub-band
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R3/00—Circuits for transducers, loudspeakers or microphones
- H04R3/005—Circuits for transducers, loudspeakers or microphones for combining the signals of two or more microphones

Similar Documents

Publication	Publication Date	Title
JP6400218B2 (en)	2018-10-03	Audio source isolation
HK1244104A1 (en)	2018-07-27	Audio source separation
Mohammadiha et al.	2015	Speech dereverberation using non-negative convolutive transfer function and spectro-temporal modeling
Boeddeker et al.	2021	Convolutive transfer function invariant SDR training criteria for multi-channel reverberant speech separation
CN110998723B (en)	2023-06-27	Signal processing device using neural network, signal processing method, and recording medium
US10904688B2 (en)	2021-01-26	Source separation for reverberant environment
Nesta et al.	2013	A flexible spatial blind source extraction framework for robust speech recognition in noisy environments
US20230306980A1 (en)	2023-09-28	Method and System for Audio Signal Enhancement with Reduced Latency
Kubo et al.	2019	Efficient full-rank spatial covariance estimation using independent low-rank matrix analysis for blind source separation
Habets et al.	2018	Dereverberation
Baby et al.	2021	Speech dereverberation using variational autoencoders
Li et al.	2017	An EM algorithm for audio source separation based on the convolutive transfer function
Astudillo et al.	2013	Integration of beamforming and uncertainty-of-observation techniques for robust ASR in multi-source environments
Cornell et al.	2022	Learning filterbanks for end-to-end acoustic beamforming
Nesta et al.	2011	Robust Automatic Speech Recognition through On-line Semi Blind Signal Extraction
Leglaive et al.	2018	Student's t Source and Mixing Models for Multichannel Audio Source Separation
JP4946330B2 (en)	2012-06-06	Signal separation apparatus and method
Yoshioka et al.	2007	Dereverberation by using time-variant nature of speech production system
Nesta et al.	2013	Unsupervised spatial dictionary learning for sparse underdetermined multichannel source separation
Takatani et al.	2004	High-fidelity blind separation of acoustic signals using SIMO-model-based independent component analysis
Li et al.	2022	Low complex accurate multi-source RTF estimation
Nakatani et al.	2019	Simultaneous denoising, dereverberation, and source separation using a unified convolutional beamformer
Wang et al.	2021	Real-Time Independent Vector Analysis Using Semi-Supervised Nonnegative Matrix Factorization as a Source Model.
Gao et al.	2023	A physical model-based self-supervised learning method for signal enhancement under reverberant environment
Nishikawa et al.	2003	Stable learning algorithm for blind separation of temporally correlated acoustic signals combining multistage ICA and linear prediction