[go: up one dir, main page]

Liu et al., 2011 - Google Patents

Realistic facial expression synthesis for an image-based talking head

Liu et al., 2011

Document ID
10168511998570155883
Author
Liu K
Ostermann J
Publication year
Publication venue
2011 IEEE International Conference on Multimedia and Expo

External Links

Snippet

This paper presents an image-based talking head system that is able to synthesize realistic facial expressions accompanying speech, given arbitrary text input and control tags of facial expression. As an example of facial expression primitives, smile is used. First, three types of …
Continue reading at ieeexplore.ieee.org (other versions)

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/06Transformation of speech into a non-audible representation, e.g. speech visualisation or speech processing for tactile aids
    • G10L21/10Transformation of speech into a non-audible representation, e.g. speech visualisation or speech processing for tactile aids transforming into visible information
    • G10L2021/105Synthesis of the lips movements from speech, e.g. for talking heads
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/003Changing voice quality, e.g. pitch or formants
    • G10L21/007Changing voice quality, e.g. pitch or formants characterised by the process used
    • G10L21/013Adapting to target pitch
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/06Elementary speech units used in speech synthesisers; Concatenation rules
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T13/00Animation
    • G06T13/203D [Three Dimensional] animation
    • G06T13/2053D [Three Dimensional] animation driven by audio data
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification
    • G10L17/26Recognition of special voice characteristics, e.g. for use in lie detectors; Recognition of animal voices
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T13/00Animation
    • G06T13/203D [Three Dimensional] animation
    • G06T13/403D [Three Dimensional] animation of characters, e.g. humans, animals or virtual beings
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/02Methods for producing synthetic speech; Speech synthesisers
    • GPHYSICS
    • G09EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
    • G09BEDUCATIONAL OR DEMONSTRATION APPLIANCES; APPLIANCES FOR TEACHING, OR COMMUNICATING WITH, THE BLIND, DEAF OR MUTE; MODELS; PLANETARIA; GLOBES; MAPS; DIAGRAMS
    • G09B5/00Electrically-operated educational appliances
    • G09B5/06Electrically-operated educational appliances with both visual and audible presentation of the material to be studied
    • G09B5/065Combinations of audio and video presentations, e.g. videotapes, videodiscs, television systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 specially adapted for particular use for comparison or discrimination
    • G10L25/66Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 specially adapted for particular use for comparison or discrimination for extracting parameters related to health condition

Similar Documents

Publication Publication Date Title
Liu et al. Realistic facial expression synthesis for an image-based talking head
US7933772B1 (en) System and method for triphone-based unit selection for visual speech synthesis
US12315054B2 (en) Real-time generation of speech animation
Taylor et al. Dynamic units of visual speech
Fanelli et al. A 3-d audio-visual corpus of affective communication
US7353177B2 (en) System and method of providing conversational visual prosody for talking heads
US8131551B1 (en) System and method of providing conversational visual prosody for talking heads
US20120130717A1 (en) Real-time Animation for an Expressive Avatar
Hassid et al. More than words: In-the-wild visually-driven prosody for text-to-speech
Wang et al. HMM trajectory-guided sample selection for photo-realistic talking head
Albrecht et al. " May I talk to you?:-)"-facial animation from text
Kacorri TR-2015001: A survey and critique of facial expression synthesis in sign language animation
Massaro et al. A multilingual embodied conversational agent
Mattheyses et al. On the importance of audiovisual coherence for the perceived quality of synthesized visual speech
Kolivand et al. Realistic lip syncing for virtual character using common viseme set
Fang et al. Audio-to-Deep-Lip: Speaking lip synthesis based on 3D landmarks
Verma et al. Animating expressive faces across languages
Busso et al. Learning expressive human-like head motion sequences from speech
Liu et al. Evaluation of an image-based talking head with realistic facial expression and head motion
Liu et al. Optimization of an image-based talking head system
KR102138132B1 (en) System for providing animation dubbing service for learning language
Dey et al. Evaluation of A Viseme-Driven Talking Head.
Mattheyses et al. Multimodal unit selection for 2D audiovisual text-to-speech synthesis
Kacorri et al. Evaluating a dynamic time warping based scoring algorithm for facial expressions in ASL animations
Knoppel et al. Trackside DEIRA: A Dynamic Engaging Intelligent Reporter Agent (Full paper)