[go: up one dir, main page]

Wang et al., 2023 - Google Patents

Multi-speaker speech separation under reverberation conditions using Conv-Tasnet

Wang et al., 2023

View PDF
Document ID
9624911216941799546
Author
Wang C
Jia M
Zhang Y
Li L
Publication year
Publication venue
Journal of Advances in Information Technology

External Links

Snippet

The goal of speech separation is to separate the target signal from the background interference. With the rapid development of artificial intelligence, speech separation technology combined with deep learning has received more attention as well as a lot of …
Continue reading at www.researchgate.net (PDF) (other versions)

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L2021/02161Number of inputs available containing the signal or the noise to be suppressed
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0272Voice signal separating
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 characterised by the type of extracted parameters
    • G10L25/18Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 characterised by the type of extracted parameters the extracted parameters being spectral information of each sub-band
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00
    • G10L25/78Detection of presence or absence of voice signals
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signal, using source filter models or psychoacoustic analysis
    • G10L19/018Audio watermarking, i.e. embedding inaudible data in the audio signal
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signal, using source filter models or psychoacoustic analysis
    • G10L19/008Multichannel audio signal coding or decoding, i.e. using interchannel correlation to reduce redundancies, e.g. joint-stereo, intensity-coding, matrixing
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00
    • G10L25/93Discriminating between voiced and unvoiced parts of speech signals
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signal, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signal, using source filter models or psychoacoustic analysis using predictive techniques
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00
    • G10L25/90Pitch determination of speech signals
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification

Similar Documents

Publication Publication Date Title
Žmolíková et al. Speakerbeam: Speaker aware neural network for target speaker extraction in speech mixtures
Stöter et al. CountNet: Estimating the number of concurrent speakers using supervised learning
Chazan et al. Multi-microphone speaker separation based on deep DOA estimation
Zhang et al. On end-to-end multi-channel time domain speech separation in reverberant environments
US9008329B1 (en) Noise reduction using multi-feature cluster tracker
Sun et al. Monaural source separation in complex domain with long short-term memory neural network
Wang et al. Multi-speaker speech separation under reverberation conditions using Conv-Tasnet
Roman et al. Pitch-based monaural segregation of reverberant speech
CN118212929A (en) A personalized Ambisonics speech enhancement method
Maldonado et al. Lightweight online separation of the sound source of interest through blstm-based binary masking
Gul et al. Clustering of spatial cues by semantic segmentation for anechoic binaural source separation
Wang et al. Deep neural network based supervised speech segregation generalizes to novel noises through large-scale training
CN115713943A (en) Beam forming voice separation method based on complex space angular center Gaussian mixture clustering model and bidirectional long-short-term memory network
Pirhosseinloo et al. A new feature set for masking-based monaural speech separation
Yu et al. Multi-channel $ l_ {1} $ regularized convex speech enhancement model and fast computation by the split bregman method
Jamal et al. A comparative study of IBM and IRM target mask for supervised malay speech separation from noisy background
Xiang et al. Distributed microphones speech separation by learning spatial information with recurrent neural network
Vincent et al. Blind audio source separation
Pang et al. Multichannel speech enhancement based on neural beamforming and a context-focused post-filtering network
Fan et al. Joint Training for Simultaneous Speech Denoising and Dereverberation with Deep Embedding Representations.
Shao et al. CleanMel: Mel-Spectrogram Enhancement for Improving Both Speech Quality and ASR
Krijnders et al. Tone-fit and MFCC scene classification compared to human recognition
Prasanna Kumar et al. Single-channel speech separation using empirical mode decomposition and multi pitch information with estimation of number of speakers
Jing et al. End-to-end doa-guided speech extraction in noisy multi-talker scenarios
Li et al. Speech separation based on reliable binaural cues with two-stage neural network in noisy-reverberant environments