Williams et al., 2023 - Google Patents
Privacy-Preserving Occupancy EstimationWilliams et al., 2023
- Document ID
- 4562187777231278686
- Author
- Williams J
- Yazdanpanah V
- Stein S
- Publication year
- Publication venue
- ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
External Links
Snippet
In this paper, we introduce an audio-based framework for occupancy estimation, including a  new public dataset, and evaluate occupancy in a 'cocktail party'scenario where the party is  simulated by mixing audio to produce speech with overlapping talkers (1-10 people). To … 
    Classifications
- 
        - G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L2015/088—Word spotting
 
- 
        - G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
 
- 
        - G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification
- G10L17/04—Training, enrolment or model building
 
- 
        - G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification
- G10L17/26—Recognition of special voice characteristics, e.g. for use in lie detectors; Recognition of animal voices
 
- 
        - G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification
- G10L17/005—Speaker recognisers specially adapted for particular applications
 
- 
        - G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 specially adapted for particular use for comparison or discrimination
 
- 
        - G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/065—Adaptation
- G10L15/07—Adaptation to the speaker
 
- 
        - G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
 
- 
        - G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification
- G10L17/06—Decision making techniques; Pattern matching strategies
 
- 
        - G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00
- G10L25/78—Detection of presence or absence of voice signals
 
- 
        - H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04M—TELEPHONIC COMMUNICATION
- H04M3/00—Automatic or semi-automatic exchanges
- H04M3/42—Systems providing special services or facilities to subscribers
- H04M3/56—Arrangements for connecting several subscribers to a common circuit, i.e. affording conference facilities
 
- 
        - G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signal, using source filter models or psychoacoustic analysis
 
- 
        - H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04M—TELEPHONIC COMMUNICATION
- H04M2201/00—Electronic components, circuits, software, systems or apparatus used in telephone systems
- H04M2201/40—Electronic components, circuits, software, systems or apparatus used in telephone systems using speech recognition
 
Similar Documents
| Publication | Publication Date | Title | 
|---|---|---|
| Ding et al. | Personal VAD: Speaker-conditioned voice activity detection | |
| US8589167B2 (en) | Speaker liveness detection | |
| Gillick et al. | Robust Laughter Detection in Noisy Environments. | |
| Aloufi et al. | Emotionless: Privacy-preserving speech analysis for voice assistants | |
| CN108417201B (en) | Single-channel multi-speaker identification method and system | |
| Ferrer et al. | A noise-robust system for NIST 2012 speaker recognition evaluation | |
| AU2017294791A1 (en) | Method and system for automatically diarising a sound recording | |
| Ahmed et al. | Towards more robust keyword spotting for voice assistants | |
| WO2014114049A1 (en) | Voice recognition method and device | |
| WO2023040523A1 (en) | Audio signal processing method and apparatus, electronic device, and storage medium | |
| Williams et al. | Privacy-Preserving Occupancy Estimation | |
| Yan et al. | Audio deepfake detection system with neural stitching for add 2022 | |
| Subakan et al. | REAL-M: Towards speech separation on real mixtures | |
| US20180308501A1 (en) | Multi speaker attribution using personal grammar detection | |
| Wang et al. | The application of Gammatone frequency cepstral coefficients for forensic voice comparison under noisy conditions | |
| Krijnders et al. | Sound event recognition through expectancy-based evaluation ofsignal-driven hypotheses | |
| CN114678038B (en) | Audio noise detection method, computer device and computer program product | |
| Chen et al. | Devil in the room: triggering audio backdoors in the physical world | |
| Varela et al. | Combining pulse-based features for rejecting far-field speech in a HMM-based voice activity detector | |
| CN114826709B (en) | Identity authentication and acoustic environment detection method, system, electronic equipment and medium | |
| WO2023124556A1 (en) | Method and apparatus for recognizing mixed key sounds of multiple keyboards, device, and storage medium | |
| Rashed | Fast Algorith for Noisy Speaker Recognition Using ANN | |
| Yakovlev et al. | Lrpd: Large replay parallel dataset | |
| Lee et al. | Overlapping speech detection with cluster-based HMM framework | |
| Chen | AmbianceCount: an objective social ambiance measure from unconstrained day-long audio recordings |