Schmidt et al., 2022 - Google Patents

PodcastMix: A dataset for separating music and speech in podcasts

Schmidt et al., 2022

Document ID: 10132251374813216840
Author: Schmidt N; Pons J; Miron M
Publication year: 2022
Publication venue: arXiv preprint arXiv:2207.07403

External Links

Cited by

Snippet

We introduce PodcastMix, a dataset formalizing the task of separating background music and foreground speech in podcasts. We aim at defining a benchmark suitable for training and evaluating (deep learning) source separation models. To that end, we release a large …

Continue reading at arxiv.org (PDF) (other versions)

238000000926 separation method 0 abstract description 4

Classifications

- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/003—Changing voice quality, e.g. pitch or formants
- G10L21/007—Changing voice quality, e.g. pitch or formants characterised by the process used
- G10L21/013—Adapting to target pitch
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/065—Adaptation
- G10L15/07—Adaptation to the speaker
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 specially adapted for particular use for comparison or discrimination
- G10L25/66—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 specially adapted for particular use for comparison or discrimination for extracting parameters related to health condition
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS
- G10H2210/00—Aspects or methods of musical processing having intrinsic musical character, i.e. involving musical theory or musical parameters or relying on musical knowledge, as applied in electrophonic musical tools or instruments
- G10H2210/031—Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signal, using source filter models or psychoacoustic analysis
- G10L19/008—Multichannel audio signal coding or decoding, i.e. using interchannel correlation to reduce redundancies, e.g. joint-stereo, intensity-coding, matrixing
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification
- G10L17/26—Recognition of special voice characteristics, e.g. for use in lie detectors; Recognition of animal voices
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS
- G10H2240/00—Data organisation or data communication aspects, specifically adapted for electrophonic musical tools or instruments
- G10H2240/121—Musical libraries, i.e. musical databases indexed by musical parameters, wavetables, indexing schemes using musical parameters, musical rule bases or knowledge bases, e.g. for automatic composing methods
- G10H2240/131—Library retrieval, i.e. searching a database or selecting a specific musical piece, segment, pattern, rule or parameter set
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/3074—Audio data retrieval
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS
- G10H1/00—Details of electrophonic musical instruments
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/16—Sound input; Sound output

Similar Documents

Publication	Publication Date	Title
US10789290B2 (en)	2020-09-29	Audio data processing method and apparatus, and computer storage medium
Huang et al.	2012	Singing-voice separation from monaural recordings using robust principal component analysis
Bishop et al.	2012	Perception of pitch location within a speaker’s range: Fundamental frequency, voice quality and speaker sex
Ewert et al.	2014	Score-informed source separation for musical audio recordings: An overview
CN103597543B (en)	2017-03-22	Semantic Track Mixer
Miron et al.	2016	Score‐Informed Source Separation for Multichannel Orchestral Recordings
Natsiou et al.	2021	Audio representations for deep learning in sound synthesis: A review
CN112992109B (en)	2023-11-28	Auxiliary singing system, auxiliary singing method and non-transient computer readable recording medium
Vijayan et al.	2018	Speech-to-singing voice conversion: The challenges and strategies for improving vocal conversion processes
Schmidt et al.	2022	PodcastMix: A dataset for separating music and speech in podcasts
Rodriguez-Serrano et al.	2015	Online score-informed source separation with adaptive instrument models
Koszewski et al.	2023	Automatic music signal mixing system based on one-dimensional Wave-U-Net autoencoders
CN111859008A (en)	2020-10-30	Method and terminal for recommending music
Biswas et al.	2023	Advances in Speech and Music Technology: Computational Aspects and Applications
US12437213B2 (en)	2025-10-07	Bayesian graph-based retrieval-augmented generation with synthetic feedback loop (BG-RAG-SFL)
Chandna et al.	2022	A deep-learning based framework for source separation, analysis, and synthesis of choral ensembles
Cunningham et al.	2019	Subjective evaluation of music compressed with the ACER codec compared to AAC, MP3, and uncompressed PCM
Yang et al.	2022	Don’t separate, learn to remix: End-to-end neural remixing with joint optimization
Su et al.	2013	Sparse Modeling for Artist Identification: Exploiting Phase Information and Vocal Separation.
Xue et al.	2025	Audio-flan: A preliminary release
Dobashi et al.	2015	A music performance assistance system based on vocal, harmonic, and percussive source separation and content visualization for music audio signals
Tachibana et al.	2016	A real-time audio-to-audio karaoke generation system for monaural recordings based on singing voice suppression and key conversion techniques
Lukito et al.	2024	Audio-as-data tools: Replicating computational data processing
Pardo et al.	2018	Applying source separation to music
CN114566191A (en)	2022-05-31	Sound correcting method for recording and related device