Wang et al., 2023 - Google Patents
Multi-speaker speech separation under reverberation conditions using Conv-TasnetWang et al., 2023
View PDF- Document ID
- 9624911216941799546
- Author
- Wang C
- Jia M
- Zhang Y
- Li L
- Publication year
- Publication venue
- Journal of Advances in Information Technology
External Links
Snippet
The goal of speech separation is to separate the target signal from the background interference. With the rapid development of artificial intelligence, speech separation technology combined with deep learning has received more attention as well as a lot of …
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
- G10L2021/02161—Number of inputs available containing the signal or the noise to be suppressed
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0272—Voice signal separating
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 characterised by the type of extracted parameters
- G10L25/18—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 characterised by the type of extracted parameters the extracted parameters being spectral information of each sub-band
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00
- G10L25/78—Detection of presence or absence of voice signals
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signal, using source filter models or psychoacoustic analysis
- G10L19/018—Audio watermarking, i.e. embedding inaudible data in the audio signal
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signal, using source filter models or psychoacoustic analysis
- G10L19/008—Multichannel audio signal coding or decoding, i.e. using interchannel correlation to reduce redundancies, e.g. joint-stereo, intensity-coding, matrixing
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00
- G10L25/93—Discriminating between voiced and unvoiced parts of speech signals
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signal, using source filter models or psychoacoustic analysis
- G10L19/04—Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signal, using source filter models or psychoacoustic analysis using predictive techniques
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00
- G10L25/90—Pitch determination of speech signals
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| Žmolíková et al. | Speakerbeam: Speaker aware neural network for target speaker extraction in speech mixtures | |
| Stöter et al. | CountNet: Estimating the number of concurrent speakers using supervised learning | |
| Chazan et al. | Multi-microphone speaker separation based on deep DOA estimation | |
| Zhang et al. | On end-to-end multi-channel time domain speech separation in reverberant environments | |
| US9008329B1 (en) | Noise reduction using multi-feature cluster tracker | |
| Sun et al. | Monaural source separation in complex domain with long short-term memory neural network | |
| Wang et al. | Multi-speaker speech separation under reverberation conditions using Conv-Tasnet | |
| Roman et al. | Pitch-based monaural segregation of reverberant speech | |
| CN118212929A (en) | A personalized Ambisonics speech enhancement method | |
| Maldonado et al. | Lightweight online separation of the sound source of interest through blstm-based binary masking | |
| Gul et al. | Clustering of spatial cues by semantic segmentation for anechoic binaural source separation | |
| Wang et al. | Deep neural network based supervised speech segregation generalizes to novel noises through large-scale training | |
| CN115713943A (en) | Beam forming voice separation method based on complex space angular center Gaussian mixture clustering model and bidirectional long-short-term memory network | |
| Pirhosseinloo et al. | A new feature set for masking-based monaural speech separation | |
| Yu et al. | Multi-channel $ l_ {1} $ regularized convex speech enhancement model and fast computation by the split bregman method | |
| Jamal et al. | A comparative study of IBM and IRM target mask for supervised malay speech separation from noisy background | |
| Xiang et al. | Distributed microphones speech separation by learning spatial information with recurrent neural network | |
| Vincent et al. | Blind audio source separation | |
| Pang et al. | Multichannel speech enhancement based on neural beamforming and a context-focused post-filtering network | |
| Fan et al. | Joint Training for Simultaneous Speech Denoising and Dereverberation with Deep Embedding Representations. | |
| Shao et al. | CleanMel: Mel-Spectrogram Enhancement for Improving Both Speech Quality and ASR | |
| Krijnders et al. | Tone-fit and MFCC scene classification compared to human recognition | |
| Prasanna Kumar et al. | Single-channel speech separation using empirical mode decomposition and multi pitch information with estimation of number of speakers | |
| Jing et al. | End-to-end doa-guided speech extraction in noisy multi-talker scenarios | |
| Li et al. | Speech separation based on reliable binaural cues with two-stage neural network in noisy-reverberant environments |