Liu et al., 2011 - Google Patents
Realistic facial expression synthesis for an image-based talking headLiu et al., 2011
- Document ID
- 10168511998570155883
- Author
- Liu K
- Ostermann J
- Publication year
- Publication venue
- 2011 IEEE International Conference on Multimedia and Expo
External Links
Snippet
This paper presents an image-based talking head system that is able to synthesize realistic facial expressions accompanying speech, given arbitrary text input and control tags of facial expression. As an example of facial expression primitives, smile is used. First, three types of …
- 230000014509 gene expression 0 title abstract description 57
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/06—Transformation of speech into a non-audible representation, e.g. speech visualisation or speech processing for tactile aids
- G10L21/10—Transformation of speech into a non-audible representation, e.g. speech visualisation or speech processing for tactile aids transforming into visible information
- G10L2021/105—Synthesis of the lips movements from speech, e.g. for talking heads
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/003—Changing voice quality, e.g. pitch or formants
- G10L21/007—Changing voice quality, e.g. pitch or formants characterised by the process used
- G10L21/013—Adapting to target pitch
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G10L13/06—Elementary speech units used in speech synthesisers; Concatenation rules
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T13/00—Animation
- G06T13/20—3D [Three Dimensional] animation
- G06T13/205—3D [Three Dimensional] animation driven by audio data
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification
- G10L17/26—Recognition of special voice characteristics, e.g. for use in lie detectors; Recognition of animal voices
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T13/00—Animation
- G06T13/20—3D [Three Dimensional] animation
- G06T13/40—3D [Three Dimensional] animation of characters, e.g. humans, animals or virtual beings
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G10L13/02—Methods for producing synthetic speech; Speech synthesisers
-
- G—PHYSICS
- G09—EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
- G09B—EDUCATIONAL OR DEMONSTRATION APPLIANCES; APPLIANCES FOR TEACHING, OR COMMUNICATING WITH, THE BLIND, DEAF OR MUTE; MODELS; PLANETARIA; GLOBES; MAPS; DIAGRAMS
- G09B5/00—Electrically-operated educational appliances
- G09B5/06—Electrically-operated educational appliances with both visual and audible presentation of the material to be studied
- G09B5/065—Combinations of audio and video presentations, e.g. videotapes, videodiscs, television systems
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 specially adapted for particular use for comparison or discrimination
- G10L25/66—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 specially adapted for particular use for comparison or discrimination for extracting parameters related to health condition
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| Liu et al. | Realistic facial expression synthesis for an image-based talking head | |
| US7933772B1 (en) | System and method for triphone-based unit selection for visual speech synthesis | |
| US12315054B2 (en) | Real-time generation of speech animation | |
| Taylor et al. | Dynamic units of visual speech | |
| Fanelli et al. | A 3-d audio-visual corpus of affective communication | |
| US7353177B2 (en) | System and method of providing conversational visual prosody for talking heads | |
| US8131551B1 (en) | System and method of providing conversational visual prosody for talking heads | |
| US20120130717A1 (en) | Real-time Animation for an Expressive Avatar | |
| Hassid et al. | More than words: In-the-wild visually-driven prosody for text-to-speech | |
| Wang et al. | HMM trajectory-guided sample selection for photo-realistic talking head | |
| Albrecht et al. | " May I talk to you?:-)"-facial animation from text | |
| Kacorri | TR-2015001: A survey and critique of facial expression synthesis in sign language animation | |
| Massaro et al. | A multilingual embodied conversational agent | |
| Mattheyses et al. | On the importance of audiovisual coherence for the perceived quality of synthesized visual speech | |
| Kolivand et al. | Realistic lip syncing for virtual character using common viseme set | |
| Fang et al. | Audio-to-Deep-Lip: Speaking lip synthesis based on 3D landmarks | |
| Verma et al. | Animating expressive faces across languages | |
| Busso et al. | Learning expressive human-like head motion sequences from speech | |
| Liu et al. | Evaluation of an image-based talking head with realistic facial expression and head motion | |
| Liu et al. | Optimization of an image-based talking head system | |
| KR102138132B1 (en) | System for providing animation dubbing service for learning language | |
| Dey et al. | Evaluation of A Viseme-Driven Talking Head. | |
| Mattheyses et al. | Multimodal unit selection for 2D audiovisual text-to-speech synthesis | |
| Kacorri et al. | Evaluating a dynamic time warping based scoring algorithm for facial expressions in ASL animations | |
| Knoppel et al. | Trackside DEIRA: A Dynamic Engaging Intelligent Reporter Agent (Full paper) |