Hershey et al., 2004 - Google Patents
Audio-visual graphical models for speech processingHershey et al., 2004
View PDF- Document ID
- 17843625553296045077
- Author
- Hershey J
- Attias H
- Jojic N
- Kristjansson T
- Publication year
- Publication venue
- 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing
External Links
Snippet
Perceiving sounds in a noisy environment is a challenging problem. Visual lip-reading can provide relevant information but is also challenging because lips are moving and a tracker must deal with a variety of conditions. Typically audio-visual systems have been assembled …
- 210000000088 Lip 0 abstract description 12
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
- G06K9/6217—Design or setup of recognition systems and techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
- G06K9/6232—Extracting features by transforming the feature space, e.g. multidimensional scaling; Mappings, e.g. subspace methods
- G06K9/6247—Extracting features by transforming the feature space, e.g. multidimensional scaling; Mappings, e.g. subspace methods based on an approximation criterion, e.g. principal component analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/00221—Acquiring or recognising human faces, facial parts, facial sketches, facial expressions
- G06K9/00268—Feature extraction; Face representation
- G06K9/00281—Local features and components; Facial parts ; Occluding parts, e.g. glasses; Geometrical relationships
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
- G06K9/6267—Classification techniques
- G06K9/6268—Classification techniques relating to the classification paradigm, e.g. parametric or non-parametric approaches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/00221—Acquiring or recognising human faces, facial parts, facial sketches, facial expressions
- G06K9/00288—Classification, e.g. identification
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/36—Image preprocessing, i.e. processing the image information without deciding about the identity of the image
- G06K9/46—Extraction of features or characteristics of the image
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/00624—Recognising scenes, i.e. recognition of a whole field of perception; recognising scene-specific objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/00335—Recognising movements or behaviour, e.g. recognition of gestures, dynamic facial expressions; Lip-reading
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/00362—Recognising human body or animal bodies, e.g. vehicle occupant, pedestrian; Recognising body parts, e.g. hand
- G06K9/00369—Recognition of whole body, e.g. static pedestrian or occupant recognition
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/063—Training
- G10L2015/0635—Training updating or merging of old and new templates; Mean values; Weighting
- G10L2015/0636—Threshold criteria for the updating
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| AU2022200439B2 (en) | Multi-modal speech separation method and system | |
| US7689413B2 (en) | Speech detection and enhancement using audio/video fusion | |
| Beal et al. | A graphical model for audiovisual object tracking | |
| Zhou et al. | A compact representation of visual speech data using latent variables | |
| US7343289B2 (en) | System and method for audio/video speaker detection | |
| Lu et al. | Ensemble modeling of denoising autoencoder for speech spectrum restoration. | |
| JP2018077479A (en) | Object recognition using multimodal alignment | |
| Estellers et al. | Multi-pose lipreading and audio-visual speech recognition | |
| Gurbuz et al. | Application of affine-invariant Fourier descriptors to lipreading for audio-visual speech recognition | |
| Minotto et al. | Audiovisual voice activity detection based on microphone arrays and color information | |
| Hershey et al. | Audio-visual graphical models for speech processing | |
| Christoudias et al. | Co-adaptation of audio-visual speech and gesture classifiers | |
| Argones Rua et al. | Audio-visual speech asynchrony detection using co-inertia analysis and coupled hidden markov models | |
| Chetty et al. | Robust face-voice based speaker identity verification using multilevel fusion | |
| Canton-Ferrer et al. | Audiovisual head orientation estimation with particle filtering in multisensor scenarios | |
| Schymura et al. | A dynamic stream weight backprop Kalman filter for audiovisual speaker tracking | |
| Gebru et al. | Audio-visual tracking by density approximation in a sequential Bayesian filtering framework | |
| Stiefelhagen et al. | Audio-visual perception of a lecturer in a smart seminar room | |
| Seymour et al. | Audio-visual integration for robust speech recognition using maximum weighted stream posteriors. | |
| Chetty et al. | Audio visual speaker verification based on hybrid fusion of cross modal features | |
| Sarada et al. | Audio deepfake detection and classification | |
| Rajavel et al. | Optimum integration weight for decision fusion audio–visual speech recognition | |
| Dean et al. | Dynamic visual features for audio–visual speaker verification | |
| Chetty | Robust audio visual biometric person authentication with liveness verification | |
| Martinson et al. | Learning speaker recognition models through human-robot interaction |