WO2018190668A1 - Système d'expression d'intention vocale utilisant les caractéristiques physiques d'un articulateur de tête et de cou - Google Patents
Système d'expression d'intention vocale utilisant les caractéristiques physiques d'un articulateur de tête et de cou Download PDFInfo
- Publication number
- WO2018190668A1 WO2018190668A1 PCT/KR2018/004325 KR2018004325W WO2018190668A1 WO 2018190668 A1 WO2018190668 A1 WO 2018190668A1 KR 2018004325 W KR2018004325 W KR 2018004325W WO 2018190668 A1 WO2018190668 A1 WO 2018190668A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- sensor
- unit
- speech
- data
- neck
- Prior art date
Links
- 230000014509 gene expression Effects 0.000 title claims abstract description 135
- 238000007405 data analysis Methods 0.000 claims abstract description 40
- 238000006243 chemical reaction Methods 0.000 claims abstract description 5
- 210000003128 head Anatomy 0.000 claims description 151
- 238000000034 method Methods 0.000 claims description 78
- 210000000056 organ Anatomy 0.000 claims description 60
- 210000000214 mouth Anatomy 0.000 claims description 55
- 230000008859 change Effects 0.000 claims description 51
- 210000001260 vocal cord Anatomy 0.000 claims description 35
- 230000008921 facial expression Effects 0.000 claims description 20
- 239000011159 matrix material Substances 0.000 claims description 19
- 230000003068 static effect Effects 0.000 claims description 15
- 238000003384 imaging method Methods 0.000 claims description 13
- 238000004891 communication Methods 0.000 claims description 12
- 210000004237 neck muscle Anatomy 0.000 claims description 12
- 238000013459 approach Methods 0.000 claims description 10
- 230000001755 vocal effect Effects 0.000 claims description 10
- 230000008602 contraction Effects 0.000 claims description 9
- 238000002567 electromyography Methods 0.000 claims description 9
- 230000000704 physical effect Effects 0.000 claims description 9
- 238000005452 bending Methods 0.000 claims description 7
- 239000010408 film Substances 0.000 claims description 5
- 210000005036 nerve Anatomy 0.000 claims description 5
- 206010044565 Tremor Diseases 0.000 claims description 4
- 239000013078 crystal Substances 0.000 claims description 4
- 210000003739 neck Anatomy 0.000 claims description 4
- 239000000853 adhesive Substances 0.000 claims description 3
- 230000001070 adhesive effect Effects 0.000 claims description 3
- 239000002775 capsule Substances 0.000 claims description 3
- 230000008030 elimination Effects 0.000 claims description 3
- 238000003379 elimination reaction Methods 0.000 claims description 3
- 230000003993 interaction Effects 0.000 claims description 3
- 230000010287 polarization Effects 0.000 claims description 3
- 230000009467 reduction Effects 0.000 claims description 3
- 239000010409 thin film Substances 0.000 claims description 3
- 210000000038 chest Anatomy 0.000 claims description 2
- 230000007274 generation of a signal involved in cell-cell signaling Effects 0.000 claims description 2
- 210000003141 lower extremity Anatomy 0.000 claims description 2
- 238000003825 pressing Methods 0.000 claims description 2
- 230000002040 relaxant effect Effects 0.000 claims description 2
- 238000010079 rubber tapping Methods 0.000 claims description 2
- 210000001364 upper extremity Anatomy 0.000 claims description 2
- 208000024891 symptom Diseases 0.000 claims 1
- 210000002105 tongue Anatomy 0.000 description 96
- 238000010586 diagram Methods 0.000 description 71
- 210000000088 lip Anatomy 0.000 description 26
- 230000033001 locomotion Effects 0.000 description 18
- 230000008569 process Effects 0.000 description 17
- 210000003205 muscle Anatomy 0.000 description 9
- 210000003254 palate Anatomy 0.000 description 9
- 230000035882 stress Effects 0.000 description 9
- 230000006870 function Effects 0.000 description 8
- 238000000691 measurement method Methods 0.000 description 8
- 210000000867 larynx Anatomy 0.000 description 6
- 238000013528 artificial neural network Methods 0.000 description 5
- 239000002131 composite material Substances 0.000 description 5
- 238000012937 correction Methods 0.000 description 5
- 230000000694 effects Effects 0.000 description 5
- 210000001847 jaw Anatomy 0.000 description 5
- 238000004519 manufacturing process Methods 0.000 description 5
- 238000005259 measurement Methods 0.000 description 5
- 230000000007 visual effect Effects 0.000 description 4
- OKTJSMMVPCPJKN-UHFFFAOYSA-N Carbon Chemical compound [C] OKTJSMMVPCPJKN-UHFFFAOYSA-N 0.000 description 3
- 230000009471 action Effects 0.000 description 3
- 230000001815 facial effect Effects 0.000 description 3
- 210000001983 hard palate Anatomy 0.000 description 3
- 201000000615 hard palate cancer Diseases 0.000 description 3
- 239000000463 material Substances 0.000 description 3
- 208000027765 speech disease Diseases 0.000 description 3
- 238000012360 testing method Methods 0.000 description 3
- 210000005182 tip of the tongue Anatomy 0.000 description 3
- 238000011282 treatment Methods 0.000 description 3
- 208000032041 Hearing impaired Diseases 0.000 description 2
- 241000282412 Homo Species 0.000 description 2
- 239000002033 PVDF binder Substances 0.000 description 2
- 241001494479 Pecora Species 0.000 description 2
- 230000005856 abnormality Effects 0.000 description 2
- 230000001133 acceleration Effects 0.000 description 2
- 230000008901 benefit Effects 0.000 description 2
- 210000003169 central nervous system Anatomy 0.000 description 2
- 238000013527 convolutional neural network Methods 0.000 description 2
- 230000006735 deficit Effects 0.000 description 2
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 2
- 229910052451 lead zirconate titanate Inorganic materials 0.000 description 2
- 210000000653 nervous system Anatomy 0.000 description 2
- 210000003800 pharynx Anatomy 0.000 description 2
- 229920000642 polymer Polymers 0.000 description 2
- 229920002981 polyvinylidene fluoride Polymers 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 210000002345 respiratory system Anatomy 0.000 description 2
- 239000005060 rubber Substances 0.000 description 2
- 210000001186 vagus nerve Anatomy 0.000 description 2
- 229920000049 Carbon (fiber) Polymers 0.000 description 1
- 206010011224 Cough Diseases 0.000 description 1
- 230000005355 Hall effect Effects 0.000 description 1
- 241001067453 Therion Species 0.000 description 1
- 235000019013 Viburnum opulus Nutrition 0.000 description 1
- 244000071378 Viburnum opulus Species 0.000 description 1
- 230000002159 abnormal effect Effects 0.000 description 1
- 210000002187 accessory nerve Anatomy 0.000 description 1
- 239000004964 aerogel Substances 0.000 description 1
- 230000032683 aging Effects 0.000 description 1
- -1 and PVDF-TrFE Substances 0.000 description 1
- 208000006673 asthma Diseases 0.000 description 1
- 210000001142 back Anatomy 0.000 description 1
- 235000015278 beef Nutrition 0.000 description 1
- 238000009529 body temperature measurement Methods 0.000 description 1
- 210000000133 brain stem Anatomy 0.000 description 1
- 239000004917 carbon fiber Substances 0.000 description 1
- 239000002041 carbon nanotube Substances 0.000 description 1
- 229910021393 carbon nanotube Inorganic materials 0.000 description 1
- 210000003855 cell nucleus Anatomy 0.000 description 1
- 229920002678 cellulose Polymers 0.000 description 1
- 239000001913 cellulose Substances 0.000 description 1
- 229910010293 ceramic material Inorganic materials 0.000 description 1
- 210000001638 cerebellum Anatomy 0.000 description 1
- 230000001055 chewing effect Effects 0.000 description 1
- 230000001684 chronic effect Effects 0.000 description 1
- 230000001149 cognitive effect Effects 0.000 description 1
- 208000013407 communication difficulty Diseases 0.000 description 1
- 230000000295 complement effect Effects 0.000 description 1
- 229920001940 conductive polymer Polymers 0.000 description 1
- 238000007796 conventional method Methods 0.000 description 1
- 210000003792 cranial nerve Anatomy 0.000 description 1
- 230000003247 decreasing effect Effects 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 201000010099 disease Diseases 0.000 description 1
- 208000035475 disorder Diseases 0.000 description 1
- 239000003814 drug Substances 0.000 description 1
- 229940079593 drug Drugs 0.000 description 1
- 230000007613 environmental effect Effects 0.000 description 1
- 210000002409 epiglottis Anatomy 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 210000003054 facial bone Anatomy 0.000 description 1
- 210000000256 facial nerve Anatomy 0.000 description 1
- 238000002353 field-effect transistor method Methods 0.000 description 1
- 238000010304 firing Methods 0.000 description 1
- 230000004907 flux Effects 0.000 description 1
- 210000000245 forearm Anatomy 0.000 description 1
- 229910021389 graphene Inorganic materials 0.000 description 1
- 239000010439 graphite Substances 0.000 description 1
- 229910002804 graphite Inorganic materials 0.000 description 1
- 239000000017 hydrogel Substances 0.000 description 1
- 230000002631 hypothermal effect Effects 0.000 description 1
- 208000014674 injury Diseases 0.000 description 1
- 238000009434 installation Methods 0.000 description 1
- 238000002955 isolation Methods 0.000 description 1
- 210000003801 laryngeal nerve Anatomy 0.000 description 1
- HFGPZNIAWCZYJU-UHFFFAOYSA-N lead zirconate titanate Chemical compound [O-2].[O-2].[O-2].[O-2].[O-2].[Ti+4].[Zr+4].[Pb+2] HFGPZNIAWCZYJU-UHFFFAOYSA-N 0.000 description 1
- 239000007788 liquid Substances 0.000 description 1
- 210000004072 lung Anatomy 0.000 description 1
- 230000005389 magnetism Effects 0.000 description 1
- 210000004373 mandible Anatomy 0.000 description 1
- 210000002050 maxilla Anatomy 0.000 description 1
- 230000003340 mental effect Effects 0.000 description 1
- 229910021392 nanocarbon Inorganic materials 0.000 description 1
- 239000002114 nanocomposite Substances 0.000 description 1
- 239000002086 nanomaterial Substances 0.000 description 1
- 239000002070 nanowire Substances 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 229940037201 oris Drugs 0.000 description 1
- 239000002245 particle Substances 0.000 description 1
- 230000001575 pathological effect Effects 0.000 description 1
- 210000000578 peripheral nerve Anatomy 0.000 description 1
- 210000001428 peripheral nervous system Anatomy 0.000 description 1
- 230000035790 physiological processes and functions Effects 0.000 description 1
- 239000002861 polymer material Substances 0.000 description 1
- 239000011148 porous material Substances 0.000 description 1
- 239000000843 powder Substances 0.000 description 1
- 238000002360 preparation method Methods 0.000 description 1
- 239000010453 quartz Substances 0.000 description 1
- 230000000306 recurrent effect Effects 0.000 description 1
- 210000002416 recurrent laryngeal nerve Anatomy 0.000 description 1
- 230000003252 repetitive effect Effects 0.000 description 1
- 230000029058 respiratory gaseous exchange Effects 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 238000005096 rolling process Methods 0.000 description 1
- 210000005181 root of the tongue Anatomy 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- VYPSYNLAJGMNEJ-UHFFFAOYSA-N silicon dioxide Inorganic materials O=[Si]=O VYPSYNLAJGMNEJ-UHFFFAOYSA-N 0.000 description 1
- 239000007779 soft material Substances 0.000 description 1
- 210000001584 soft palate Anatomy 0.000 description 1
- 238000001228 spectrum Methods 0.000 description 1
- 239000000126 substance Substances 0.000 description 1
- 230000001502 supplementing effect Effects 0.000 description 1
- 238000001356 surgical procedure Methods 0.000 description 1
- 230000009747 swallowing Effects 0.000 description 1
- 230000008961 swelling Effects 0.000 description 1
- 238000002560 therapeutic procedure Methods 0.000 description 1
- 230000008733 trauma Effects 0.000 description 1
- 210000003901 trigeminal nerve Anatomy 0.000 description 1
- 230000005641 tunneling Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/02—Feature extraction for speech recognition; Selection of recognition unit
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/04—Segmentation; Word boundary detection
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/24—Speech recognition using non-acoustical features
- G10L15/25—Speech recognition using non-acoustical features using position of the lips, movement of the lips or face analysis
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/06—Transformation of speech into a non-audible representation, e.g. speech visualisation or speech processing for tactile aids
Definitions
- the present invention is to recognize the physical characteristics of the articulation organs of the head and neck, including the oral cavity through the articulation sensor to measure the change according to the overall ignition of the head and neck and grasp the intention to speak through this, visual, auditory, tactile
- the present invention relates to a system for providing to the speaker himself or the outside and transferring the intention of uttering the image to the head and neck of the robot.
- Characters produced in articulatory organs are called speech or verbal sounds for the communication of linguistic information and vocalization in non-verbal cases.
- the main organs of the human body involved in the production of the letters are the nervous system and the respiratory system.
- the nervous system is involved in the central nervous system and the peripheral nervous system.
- the cranial or cranial cell nuclei are located in the brain stem, and the cerebellum has the function of precisely controlling the muscle control for movement. Play a dominant role in function
- the cranial nerve involved in speech production includes the fifth cranial nerve involved in jaw movement, the seventh cranial nerve involved in lip movement, the tenth cranial nerve involved in pharynx and larynx, and the eleventh cranial nerve involved in pharyngeal movement. And the 12th nerve involved in the movement of the tongue.
- the peripheral nerves especially the laryngeal nerves and the recurrent laryngeal nerves branched from the vagus nerve are directly involved in laryngeal movement.
- Speech is also produced by the lower respiratory tract, the larynx, and the vocal tract.
- the vocal cords are the source of letters, and the flow of exhalation from the lungs causes the vocal cords to vibrate, and during vocalization, the aerobic control provides an efficient and efficient supply of sound energy.
- the vocal cords When the vocal cords are properly tensioned and closed, the vocal cords vibrate due to exhalation, and the gates are opened and closed at regular intervals to control the expiratory streams that pass through the gates.
- the articulation process refers to a process of forming phonemes, which are units of speech sounds, after the sound is amplified and supplemented through the resonance process.
- the tongue is the most important articulator, but in fact, the phoneme involves not only the tongue but also various structures of the mouth and face.
- These articulators include movable structures such as the tongue, lips, soft palate, jaws, and immovable structures such as teeth or hard palates. These articulators block or restrict the flow of air to form consonants and vowels.
- the tongue can be divided from the front into the apex (tip), the blade (blade), the dorsum, the body of the tongue, and the root of the tongue (root).
- the tip of the tongue is the part used when we point out the tongue or articulate / d / (such as "Larara"), which is the first sound of the syllable, and the tongue is made from the front of the mouth, such as alveolar sounds. It is mainly used to articulate phonemes, and the tongue is the part of the tongue that is commonly used to articulate back sounds such as velar sounds.
- the lips which are the second articulators, form the mouth of the mouth and play an important role in facial expression and articulation of the head and neck.
- the vowels are distinguished by phonemes as well as the movement of the tongue.
- the bilabial sounds can be pronounced only when the lips are closed.
- the shape of the lips is modified by the surrounding muscles.
- the circumference of the mouth around the lips (orbicularis oris muscle) plays an important role in pronounced lip vowels such as head lip consonants and / right / by closing or pinching the lips.
- Quadratus labii superior muscle and quadrates labii inferior muscle open the lips.
- the risorius muscle plays an important role in pulling the corners of the lips and smiling or contracting the lips to produce sounds like /.
- the jaw is divided into the immobilized upper jaw (maxilla) and the lower jaw (mandible) which moves up and down and left and right.
- maxilla the immobilized upper jaw
- the lower jaw the mandible which moves up and down and left and right.
- These jaws are the strongest and largest of the facial bones and are driven by four pairs of muscles.
- the movement of the lower jaw changes the size of the mouth, which is important not only for chewing but also for vowel production.
- the gum is the area where the / c / or / s / speech sounds are articulated
- the hard palate is the hard and rather flat part of the gums where the sound of the / ⁇ / series is articulated Site.
- the Yeongrin Palate is classified as a moving articulator because the muscles of the Yeongrin Palate contract and form a closed lover's head and thus oral sounds.
- the sounds are those which are produced with some obstruction in the oral cavity, or more precisely in the middle of the oral passage, while the air passage through the vocal cords passes through the saints.
- the former is usually called the consonant vowel.
- Consonants should be examined according to how and where they are spoken. Each column represents the articulation position and each line represents the articulation method.
- Clogging sound is the sound that completely blocks and blows the airflow in the oral cavity.
- Clogging sound is the sound of narrowing a part of the saint and passing the airflow through the narrow passage.
- the sound of clogging can again be divided into nasal resonances and non-acoustic sounds.
- the nasal stops which are accompanied by the nasal passages and the nasal passages, are included in the former, and the nasal stops are raised against the pharyngeal wall.
- the latter are the oral blockage sounds (oral stops) that block and prevent airflow from reaching the nasal passages.
- oral clog sounds are considered to be closed (stop) or ruptured (plosive), electric (trill), and snowballs (or flap / tap). can see.
- the mute sound is divided into a fricative (approach) and an approach sound (approximant).
- an approach sound approximately 1 cm
- lateral sound When a passage of air flow is formed on the side of the tongue, it is called lateral sound.
- bilabial When classified according to the position of articulation, bilabial refers to the sound of two lips related to the articulation, and Korean / ⁇ , ⁇ , ⁇ , ⁇ / and the like belong to this.
- the Yangpyeon in modern Korean (standard) is a sound that blocks both lips, but it can also narrow the gap between the two lips, rubbing the airflow between them (sheep friction), and dropping both lips (sheep rolling) .
- Labiodentals refer to sounds related to articulation of the lower lip and upper teeth, and do not exist in Korean. There is no pure sound in Korean, but [f, v] in English belongs to this pure sound.
- Dental refers to the sound that the airflow narrows or closes at the back of the upper teeth, and is sometimes called interdental because of friction between them.
- Alveolar is a sound produced by the constriction or closure of air currents near the upper gums, which belong to the Korean ⁇ , ⁇ , ⁇ , ,, ⁇ , ⁇ /.
- Korean / ⁇ , ⁇ / are the sounds of airflow constriction in the alveolar area, and / s, z / in English and the airflow constriction are almost similar.
- Palatoalveolar also known as postalveolar, is the sound of the tip of the tongue or forearm touching the posterior neck, not in Korean, but in English or French.
- Alveolopalatal is also called prepalatal because it is articulated in front of the palatal or near the alveolar.
- Retroflex differs from other tongues where the tip of the tongue or the upper surface of the tongue is articulated by touching or approaching the palate, and the lower surface of the tongue is articulated by touching or approaching the palate.
- a palatal sound refers to the sound that the body touches or approaches an oral palatal articulation.
- Velar is the sound that the body touches or approaches the research site. Korean closed sound / ⁇ , ⁇ , ⁇ / and nasal / ⁇ / belong to this.
- the palatal masturbate refers to the sound that the body touches or approaches the palate, the tip of the study palate.
- Pharyngeal refers to the sound that the articulation is made in the pharyngeal cavity.
- the glottal refers to the sound that the vocal cords are used as an articulation organ, and there is only a voiceless vocal friction / ⁇ / as a phoneme in Korean.
- Vowel articulation is the three most important variables, such as the height of the tongue, the position of the front and back, and the shape of the lips.
- the opening of the vowel is determined by the height of the tongue.
- the sound of opening the mouth less is called a close vowel, or a high vowel, and the sound of throwing away the mouth greatly. Is called open vowel, or low vowel.
- the sound between the high vowels and the low vowels is called the mid vowel, which is the close-mid vowel, or half-close vewel, with the mouth open again. It can be subdivided into larger open-mid vewels or half-open vewels.
- the second variable, the front and rear position of the tongue is in fact determined based on which part of the tongue is the narrowest, that is, which part of the tongue is closest to the palate.
- the narrow part of the tongue is called the front vowel
- the back vowel is called the back vowel
- the middle vowel is called the central vowel.
- rounded vowels are rounded vowels
- rounded vowels are called rounded vowels.
- Speech impairment refers to the inability of soundness, intensity, sound quality, and fluidity to be appropriate for gender, age, size, social environment, and geographical location. It can be made either innately or acquiredly and can be treated to some extent by increasing or decreasing the vocal cords that are part of the larynx. However, the treatment is not perfect, and the effect is not accurate.
- the larynx functions include swallowing, coughing, occlusion, breathing, and vocalization, and there are various evaluation methods (e.g., firing history test, speech pattern, acoustic test, aerodynamic test ). This assessment can be used to determine whether or not there is a speech impairment.
- a vibration generator capable of artificially generating vibrations.
- the method of the vibration generator can use the principle of the speaker. Looking at the structure of the speaker, it consists of a magnet and a coil, and when the current is reversed in the state of flowing a current to the coil, the pole of the magnet is reversed. Therefore, attraction and repulsive force act according to the direction of the current of the magnet and the coil, which causes the coil to reciprocate. The reciprocating motion of the coil vibrates the air to generate vibration.
- Another method using the piezoelectric phenomenon is that the piezoelectric crystal unit receives a low frequency signal voltage and causes distortion, thereby causing the diaphragm to vibrate to generate sound. Therefore, the vibration generator using these principles can be performed to perform the function of the vocal cords.
- the sound that appears outside is simply a function of vibrating the vocal cords, and it is not easy to identify the speaker's intention.
- the vibration generator should be located at the vocal cords and always have a hand.
- the above-mentioned utterance disorder and the above-described utterance abnormalities can be sought for therapeutic methods such as surgery on the larynx or vocal cords, but such surgical methods or treatments are sometimes impossible to provide a complete solution.
- the Rion EPG, Flecher which was widely commercialized under the name of Rion in 1973 by the University of Reading, Fujimura, Japan, and Tatsumi, used by companies such as WinEPG and Articulate Instruments Ltd.
- This application includes Kay Palatometer, developed by UCLA Phonetics Lab for research purposes, and Complete Speech (Logomertix), developed by Schmidt.
- the conventional techniques have a limitation in igniting based on passive articulation organs, and in order to implement utterances according to the actual articulation method by using oral tongue, which is the active articulation organ itself, or linkage between oral tongue and other articulation organs. There was a definite limit.
- Speech and facial synchronization is to copy the speech, facial expressions, etc., including speech and articulation, which are the most important factors that determine the identity of the object or object, to characters, robots, various electronic products, autonomous vehicles, etc.
- Application is a key means of determining and extending an individual's identity.
- Conventional general techniques have been limited to the use of simple lip library to create low quality animations. Overseas animation content producers such as Pixar and Disney spend a lot of time and money creating realistic character animations through Lip Sync.
- an object of the present invention is to grasp the user's articulation method according to the user's utterance through the sensor of the head and neck including the oral cavity, it is the hearing, visual, tactile It is to provide a device and a method for complementing the speech that can be expressed in the form of a good quality voice, that is, speech can be expressed.
- Another object of the present invention is to implement an appropriate utterance of good quality when the normal function in the utterance and correction or treatment is impossible.
- Still another object of the present invention is to grasp the user's articulation method according to the user's utterance intention through the sensor of the head and neck including the oral cavity, and to map it to the head and neck of the image object including the animation utterance and expression of the image object It is intended to provide a way to implement more similar to humans and naturally.
- Still another object of the present invention is to grasp the user's articulation method according to the user's utterance intention through the sensor of the head and neck including the oral cavity, and to map it to the actuator of the head and neck of the robot including the humanoid ignition of the head and neck of the robot And it is to provide a method of embodying the expression more similar to the human naturally.
- the sensor unit for measuring the physical characteristics of the articulation engine adjacent to one surface of the head and neck portion of the speaker;
- a data analysis unit which grasps a utterance characteristic of a speaker based on the position of the sensor unit and the physical characteristics of the articulator;
- a data converting unit converting the position of the sensor unit and the speech feature into language data; It includes a data expression unit for expressing the language data to the outside, the sensor unit provides a speech intent expression system including a mouth verbal sensor corresponding to the oral cavity.
- the oral tongue sensor is fixed to one side of the oral tongue, wraps the surface of the oral tongue, or is inserted into the oral tongue, and the x-axis, the y-axis, and the z-axis of the oral tongue according to the ignition.
- the physical characteristics of at least one of low altitude, anteroposterior, flexion, extension, rotational, tension, contraction, relaxation, and vibration of the oral tongue can be identified. have.
- the oral tongue sensor is fixed to one side of the oral tongue, or wrapped around the surface of the oral tongue, or inserted into the oral tongue, x-axis, y-axis, z-axis of the oral tongue according to the ignition
- the change amount of the rotation angle per unit time based on the direction, it is possible to grasp the physical characteristics of the articulation organ including the oral cavity.
- the oral tongue sensor is fixed to one side of the oral tongue, or wrapped around the surface of the oral tongue, the polarization due to the change in the crystal structure according to the physical force generated by the contraction and relaxation of the oral tongue due to ignition
- the degree of bending of the oral tongue through the piezoelectric element that generates an electrical signal corresponding to the low altitude, the front and rear tongue, the degree of flexion, the extension, the rotation, the tension, the contraction degree, the relaxation degree, the vibration degree of the oral tongue At least one of the physical characteristics can be identified.
- the sensor unit the rupture degree, friction degree, resonance degree, the approach degree of the triboelectric power (Tribo Electric Generator) corresponding to the approach and contact caused by the oral cavity is due to the interaction with other articulation organs inside and outside the head and neck It may include a triboelectric charging element for identifying at least one physical property.
- the data interpreter may include a consonant, lexical unit stress, and sentence stress that are spoken by the speaker through physical characteristics of the oral culprit and other articulation organs measured by the sensor unit. At least one speech feature can be identified.
- the data analysis unit in grasping the speech characteristics by the physical characteristics of the articulation engine measured by the sensor unit, the pronunciation of the speaker based on a standard speech feature matrix consisting of a binary number or a real number
- the utterance characteristic of at least one of hyper noon, pseudo-proximity and utterance intent can be measured.
- the data analyzing unit may recognize physical characteristics of the articulation engine as measured by the sensor unit, and recognize the physical characteristics of the articulation engine as a pattern of each consonant unit. Extracting a feature of the pattern of the consonant, classifying the extracted feature of the pattern of the consonant unit according to similarity, recombining the feature of the pattern of the classified consonant unit, and uttering the physical characteristics of the articulation organ According to the interpretation of the feature, the speech feature may be identified.
- the data analysis unit according to the physical characteristics of the articulation engine measured by the sensor unit, assimilation, dissimilation, elimination, attachment, stress and stress of the consonants Asthma, Syllabic cosonant, Flapping, Tensification, Labilization, Velarization, and Dinification caused by Reduction It is possible to measure the utterance variation, which is the secondary articulation of at least one of Dentalizatiom, Palatalization, Nasalization, Stress Shift, and Lengthening.
- the oral cavity sensor may include a circuit unit for sensor operation, a capsule unit surrounding the circuit unit, and an adhesive unit attached to one surface of the oral cavity.
- the oral tongue sensor may be operated adjacent to the oral tongue as a film having a thin film circuit.
- the sensor unit may include a face sensor including at least one reference sensor generating a reference potential for measuring nerve signals of the head and neck muscles, at least one anode sensor and at least one cathode sensor for measuring nerve signals of the head and neck muscles. It may include.
- the data analysis unit in acquiring the position of the sensor unit based on the face sensor, by grasping the potential difference between the at least one anode sensor and the at least one cathode sensor based on the reference sensor the face unit sensor I can figure out the position of.
- the data analyzer may be configured to determine a potential difference between the at least one anode sensor and the at least one cathode sensor based on the reference sensor in acquiring a utterance characteristic of the speaker based on the face sensor. Ignition characteristics due to the physical characteristics of the articulation engine generated in the head and neck of the.
- the sensor unit may include a vocal cord sensor that grasps EMG or tremor of the vocal cords adjacent to the vocal cords of the head and neck of the speaker and grasps at least one utterance information of the utterance start, ignition stop, or utterance of the narrator. Can be.
- the sensor unit may include a dental sensor that detects a signal generation position due to a change in electrical capacity generated due to contact between the oral tongue and the lower lip adjacent to one surface of the tooth.
- the data analyzer may acquire a voice of the speaker according to the utterance through a voice acquisition sensor adjacent to one surface of the head and neck of the speaker.
- the sensor unit may detect at least one of change information of the head and neck articulation organ of the speaker, head and neck facial expression change information of the speaker, and non-verbal expression of the head, neck, chest, upper and lower extremities moving according to the speaker's intention to speak. It may include an imaging sensor for imaging the head and neck of the speaker.
- the speech intent expression system may further include a power supply unit for supplying power to at least one of the oral tongue sensor, facial sensor, voice acquisition sensor, vocal cord sensor, dental sensor, imaging sensor.
- the speech intent representation system may further include a wired or wireless communication unit capable of interworking communication when the data analysis unit and the database unit are located outside and operated.
- the data analysis unit may be linked with a database unit including at least one language data index corresponding to the position of the sensor unit, the speaker's speech feature, and the speaker's voice.
- the database unit at least one of the duration of the speech, the frequency according to the speech, the amplitude of the speech, the electromyography of the head and neck muscles according to the speech, the position change of the head and neck muscles according to the speech, the position change due to the bending and rotation of the oral tongue
- the at least one language data index among the phoneme index, the syllable unit index, the word unit index, the phrase unit index, the sentence unit index, the continuous speech unit index, and the high and low index of the pronunciation may be constructed. .
- the data expression unit in conjunction with the language data index of the database unit, the speaker's utterance characteristics (Phoneme) unit, at least one word unit, at least one phrase unit (Citation Forms), at least one It may represent at least one speech expression of the sentence unit of the continuous speech (Consecutive Speech).
- the speech expression represented by the data expression unit may be visualized as at least one of a letter symbol, a picture, a special symbol, and a number, or may be audited in a sound form and provided to the speaker and the listener.
- the utterance expression represented by the data expression unit may be provided to the speaker and the listener by at least one tactile method of vibration, snoozing, tapping, pressing, and relaxing.
- the data converter converts the position of the sensor unit and the head and neck facial expression change information into first basis data, and converts the utterance feature, the change information of the articulation organ, and the head and neck facial expression change information into second basis data.
- the head and neck of the image object or the head and neck of the robot object may be generated as object head and neck data required for at least one object.
- the speech intention expression system in representing the head and neck data processed by the data analysis unit in the head and neck of the image object or the head and neck of the robot object, the static basic coordinates based on the first basis data of the data conversion unit And setting the dynamic variable coordinates based on the second basis data to generate a matching position.
- the head and neck data of the robot is transmitted to an actuator located on one surface of the head and neck of the robot object by the data matching unit, and the actuator is head and neck of the robot object including at least one of articulation, speech, and facial expression according to the head and neck data. Can implement the movement.
- the speech intention expression system based on the physical characteristics of the head and neck articulation organs of the present invention grasps the intention to speak in the use of the head and neck articulator centering on the oral cavity of the speaker and expresses it in the form of hearing, sight, and tactile sound, namely Talking has the effect that can be expressed.
- the articulatory organs inside and outside the head and neck, including the oral cavity are used to grasp the intention of speaking.
- movements include the independent physical characteristics of the oral tongue or the passive articulator and the lips, the gate, the vocal cord, the pharynx, the epiglottis
- the degree of closure, rupture, friction, and resonance caused by interaction with one or more of the active articulators consisting of one or more of the following:
- One or more characteristics of the approach should be identified, and various sensors capable of identifying azimuth, elevation, rotation angle, pressure, friction, distance, temperature, sound, etc. are used to understand these characteristics.
- the existing artificial vocal cords have the disadvantage of sounding vibrations from the outside, the one hand movement is unnatural and the quality of speech is very low.
- the palatal which is a passive articulator. there was.
- phonetic articulatory phonetics which attempts to measure the speaker's speech using artificial palate, has been recognized as the mainstream until now, but in the measurement of speech, only the presence or absence of discrete speech of speech caused by a specific consonant's articulation I could figure it out.
- this claim of articulation does not mean that human speech is discrete.
- each phoneme, in particular vowels divers is aroused by academic acoustics, which claims that each vowel is a continuous system that cannot be segmented and cannot be segmented or pronounced.
- human speech is articulated by "speaking.” Or “did not ignite”, but not discretely, but rather proportional, proportional, or phased by similarity.
- acoustic phonology quantifies the physical properties of language itself according to the speaker's utterance, and grasps the similarity or proximity of utterances, so that the proportional, proportional, stepwise similarity of pronunciation that conventional articulation phonology could not implement It opens the possibility for measuring ignition by degree.
- the present invention is based on articulation phonology, and it is very innovative to grasp and implement more accurate speech intention according to scaling of articulation to be pursued by acoustic phonology. It has advantages.
- the quality of the communication and the convenience of life are very excellent because the present invention intuitively presents the speech intention in the form of hearing, sight, and touch by scaling the articulation generated by the speaker's articulation. It is expected to be.
- Silent Speech speech conversation
- Speech to Text Speech to Text
- the speaker speaks and the hearing impaired as the listener recognizes this as a visual material, thus eliminating communication difficulties.
- it can be used for public transportation, public facilities, military installations and operations and underwater activities that are affected by noise in communication.
- the present invention transmits and matches the speaker articulation information to an actuator that implements the movement of the head and neck of the robot object, thereby reproducing head and neck movements including articulation, utterance, and expression of a robot similar to a human speaker
- Humanoid robots insisted by Masahiro have the effect of overcoming the "Uncanny Valley", a chronic cognitive dissonance caused by humans.
- human-friendly articulation of humanoids and other robots becomes possible, and it is possible to replace human roles of robots and Androids, and furthermore, human-robot dialogues are achieved, thereby increasing the elderly population due to aging. It is effective in preventing mental and psychological diseases such as isolation and depression in elderly society.
- FIG. 1 is a view showing a sensor unit of a speech intention representation system according to a first embodiment of the present invention.
- FIG. 2 is a view showing the position of the sensor unit of the speech intention representation system according to the first embodiment of the present invention.
- FIG. 3 is a diagram illustrating a speech intent representation system according to a first embodiment of the present invention
- FIG. 4 is a view showing the positional name of the oral cavity used in the speech intent expression system according to the first embodiment of the present invention.
- FIG. 5 is a view showing the action of oral tongue for vowel speech utilized in the speech intent expression system according to the first embodiment of the present invention.
- FIG. 6 to 10 are views illustrating various oral cavity sensors of a speech intention expression system according to a first embodiment of the present invention, respectively.
- 11 and 12 are a cross-sectional view and a perspective view respectively showing the attachment state of the oral cavity sensor of the speech intent expression system according to the first embodiment of the present invention.
- FIG. 13 is a view showing a circuit portion of the oral cavity sensor of the speech intention expression system according to the first embodiment of the present invention.
- FIG. 14 is a view illustrating various utilization states of the oral cavity sensor of the speech intention expression system according to the first embodiment of the present invention.
- FIG. 15 is a diagram illustrating a speech intent representation system according to a second embodiment of the present invention.
- FIG. 16 is a diagram illustrating a principle in which a data interpreter of a speech intent expression system according to a second embodiment of the present invention grasps speech characteristics.
- 17 is a view showing a principle of grasping the physical characteristics of the articulation engine measured by the data analysis unit of the speech intent representation system according to the second embodiment of the present invention as speech characteristics.
- FIG. 18 is a diagram illustrating a standard speech feature matrix for a vowel utilized by a data interpreter of a speech intent representation system according to a second embodiment of the present invention.
- FIG. 19 is a diagram showing a standard speech feature matrix relating to consonants utilized by a data interpreter of a speech intent representation system according to a second embodiment of the present invention.
- FIG. 19 is a diagram showing a standard speech feature matrix relating to consonants utilized by a data interpreter of a speech intent representation system according to a second embodiment of the present invention.
- 20 is a diagram illustrating an algorithm process utilized by a data interpreter of a speech intent representation system according to a second embodiment of the present invention to grasp the physical characteristics of an articulation engine as speech features;
- FIG. 21 is a detailed diagram illustrating an algorithm process utilized by a data interpreter of a speech intent representation system according to a second embodiment of the present invention to grasp the physical characteristics of an articulation organ as speech features;
- FIG. 22 illustrates in detail the principle of an algorithmic process utilized by a data interpreter of a speech intent representation system according to a second embodiment of the present invention to grasp the physical characteristics of an articulation organ as speech features;
- FIG. 23 is a diagram illustrating an algorithm process of identifying a specific vowel uttered by the oral cavity sensor of the utterance intention expression system according to the second embodiment of the present invention as a utterance characteristic
- FIG. 24 is a diagram illustrating a case in which the data analysis unit of the speech intent representation system according to the second embodiment of the present invention utilizes Alveolar Stop; FIG.
- FIG. 25 is a diagram illustrating a case in which a data analysis unit of a speech intention representation system according to a second embodiment of the present invention utilizes a bilabial stop;
- FIG. 26 is a diagram illustrating an experimental result using a voiced bilabial stop of a data interpreter of a speech intention expression system according to a second embodiment of the present invention.
- FIGS. 27 and 28 are diagrams illustrating a case in which the data interpreter of the speech intent representation system according to the second embodiment of the present invention utilizes a voiced labiodental fricative;
- FIG. 29 is a diagram illustrating interworking between a data interpreter and a database of a speech intent representation system according to a second embodiment of the present invention.
- FIG. 30 is a diagram illustrating a case in which a data interpreter of a speech intention expression system according to a second embodiment of the present invention recognizes a specific word.
- FIG. 31 is a diagram showing a database unit of a speech intent representation system according to a second embodiment of the present invention.
- FIG. 32 is a diagram showing a speech intention expression system according to a third embodiment of the present invention.
- 33 and 34 are diagrams each showing an actual form of a database unit of the speech intent representation system according to the third embodiment of the present invention.
- 35 is a diagram showing a speech intention expression system according to a fourth embodiment of the present invention.
- FIG. 36 is a view showing interlocking of a sensor unit, a data analysis unit, a data expression unit, and a database unit in a speech intention representation system according to a fourth embodiment of the present invention
- FIG. 42 is a diagram illustrating a case in which a data expression unit of a speech intention expression system according to a fourth embodiment of the present invention expresses language data visually and acoustically;
- FIG. 42 is a diagram illustrating a case in which a data expression unit of a speech intention expression system according to a fourth embodiment of the present invention expresses language data visually and acoustically;
- FIG. 43 is a diagram illustrating a case in which a data expression unit of a speech intention expression system according to a fourth embodiment of the present invention visually expresses language data
- FIG. 44 is a diagram illustrating a case in which a data expression unit of a speech intention expression system in accordance with a fourth embodiment of the present invention visually expresses language data
- FIG. 45 is a diagram illustrating a case in which a data expression unit of a speech intention expression system in accordance with a fourth embodiment of the present invention expresses language data in a continuous speech unit;
- FIG. 46 is a diagram illustrating a confusion matrix utilized by a speech intent representation system according to a fourth embodiment of the present invention.
- FIG. 46 is a diagram illustrating a confusion matrix utilized by a speech intent representation system according to a fourth embodiment of the present invention.
- FIG. 47 is a diagram showing, as a percentage, the Confusion Matrix utilized by the speech intent representation system according to the fourth embodiment of the present invention.
- FIG. 48 is a diagram illustrating a case in which a speech intention expression system according to a fourth embodiment of the present invention assists a speaker in language correction and guidance through a screen;
- FIG. 49 is a diagram illustrating a case where a speech intention expression system according to a fourth embodiment of the present invention captures and captures an image of a head and neck articulation organ;
- FIG. 50 is a diagram illustrating a case in which a speech intent representation system according to a fourth embodiment of the present invention combines mutual information through a standard speech feature matrix;
- FIG. 51 is a diagram illustrating a speech intention presentation system according to a fifth embodiment of the present invention.
- FIG. 52 illustrates a case in which a speech intention expression system according to a fifth embodiment of the present invention matches object head and neck data to head and neck portions of an image object based on static basic coordinates.
- FIG. 53 is a view illustrating static basis coordinates based on a position of a face sensor utilized by a speech intention representation system according to a fifth embodiment of the present invention.
- FIG. 54 illustrates a case in which the speech intent representation system according to the fifth embodiment of the present invention matches object head and neck data to the head and neck of an image object based on dynamic basic coordinates.
- FIG. 55 is a view showing dynamic basis coordinates based on a voltage difference of a face sensor utilized by a speech intention expression system according to a fifth embodiment of the present invention.
- FIG. 56 illustrates a case in which the speech intent representation system according to the fifth embodiment of the present invention matches the head and neck data of the robot object to the actuator of the head and neck of the robot object based on the static basic coordinates.
- FIG. 57 is a view showing static basic coordinates based on a voltage difference of a face sensor utilized by a speech intention expression system according to a fifth embodiment of the present invention.
- FIG. 58 is a diagram illustrating a case in which the speech intent representation system according to the fifth embodiment of the present invention matches the head and neck data with an actuator of the head and neck of a robot object based on dynamic variable coordinates.
- FIG. 59 is a view illustrating dynamic variable coordinates based on a voltage difference of a face sensor utilized by a speech intention expression system according to a fifth embodiment of the present invention.
- 60 and 61 are views showing the operation of the actuator of the head and neck portion of the robot object of the speech intent representation system according to the fifth embodiment of the present invention.
- FIG. 62 is a view showing an actuator of the head and neck part of the robot object of the speech intent expression system according to the fifth embodiment of the present invention.
- FIG. 1 is a diagram illustrating a sensor unit of a speech intent representation system according to a first embodiment of the present invention
- FIG. 2 is a diagram illustrating a position of a sensor unit of a speech intention representation system according to a first embodiment of the present invention
- 3 is a diagram illustrating a speech intention expression system according to a first embodiment of the present invention.
- the sensor unit 100 is oral tongue sensor 110, face sensor 120 located in the head and neck portion ), The voice acquisition sensor 130, the vocal cord sensor 140, the tooth sensor 150.
- the oral tongue sensor 110, the face sensor 120, the voice acquisition sensor 130, the vocal cord sensor 140, and the dental sensor 150 positioned in the head and neck part are located in the sensor unit 210 in which the respective sensors are located. ),
- the speech feature 220 according to the utterance of the speaker 10, the voice of the speaker 230, the speech detail information 240, and the speech variation 250 are provided.
- the data interpreter 200 acquires such data, and the data converter 300 processes the data as language data 310.
- FIG 4 is a view showing the positional name of the oral cavity used in the speech intent representation system according to the first embodiment of the present invention
- Figure 5 is used in the speech intent representation system according to the first embodiment of the present invention It is a diagram showing the action of oral tongue for vowel speech.
- the oral tongue sensor 12 is fixed to one side of the oral tongue 12, surrounds the surface, or is inserted therein, and has a low altitude, front and rear tongue shape, and bending Identify the independent physical properties of the oral tongue itself of at least one of degrees, extension, rotation, tension, contraction, relaxation, and vibration.
- FIGS. 6 to 10 are views illustrating various oral cavity sensors of a speech intention expression system according to a first embodiment of the present invention.
- the oral tongue sensor 110 rotates per unit time and acceleration in the x-axis, y-axis, and z-axis directions.
- the ignition characteristic 220 by the physical characteristics of another articulation organ including the oral cavity 12 is grasped.
- the oral tongue sensor 110 generates a polarization signal due to a change in the crystal structure 111 according to the physical force generated by contraction or relaxation of the oral tongue 12 due to ignition. doing By grasping the bending degree of the oral tongue 12 through the piezoelectric element 112, it is possible to grasp the ignition characteristic 220 due to the physical characteristics of the articulation organ including the oral tongue 12.
- the oral tongue sensor 110 has a triboelectric generator generated by the approach and contact caused by the oral tongue 12 interacting with other articulation organs inside and outside the head and neck.
- the speaker's ignition characteristic 220 is identified using the triboelectric charging element 113.
- the integrated oral tongue sensor 110 includes an acceleration and angular velocity in the x-axis, y-axis, and z-axis directions, an electrical signal by piezoelectricity, and a triboelectricity by contact. Identify the utterance characteristics 220 by the physical characteristics of the articulation organ, including.
- 11 and 12 are a cross-sectional view and a perspective view respectively showing the attachment state of the oral cavity sensor of the speech intent expression system according to the first embodiment of the present invention.
- the oral tongue sensor 110 may be configured as a composite thin film circuit and implemented as a single film.
- the oral tongue sensor 110 includes a circuit part 114 for operating the sensor part 100, a capsule part 115 surrounding the circuit part 114, and an oral tongue sensor 110 on one surface of the oral tongue 12. It consists of an adhesive part 116 to adhere.
- the oral tongue sensor 110 As shown in Figures 6 to 9, the oral tongue sensor 110, the degree of rupture, friction, resonance, approaching degree caused by the adjacent or in contact with other articulation organs inside and outside the head and neck according to the characteristics of each sensor One or more physical characteristics can be identified.
- FIG. 13 is a diagram illustrating a circuit unit of an oral cavity sensor of a speech intention expression system according to a first embodiment of the present invention.
- the circuit unit 114 of the oral cavity sensor 110 includes a communication chip, a sensing circuit, and an MCU.
- FIG. 14 is a diagram illustrating various utilization states of the oral cavity sensor of the speech intent expression system according to the first embodiment of the present invention.
- the oral culprit sensor 110 may grasp the state of the oral culprit 12 according to the utterance of the speaker's various consonants, and identify the utterance characteristics 220 according to the consonant vowels.
- the oral cavity sensor 110 may grasp the utterance feature 220 according to the Bilabial Sound, the Alveolar Sound, and the Palatal Sound.
- FIG. 15 is a diagram illustrating a speech intention expression system according to a second embodiment of the present invention.
- the sensor unit 100 near the head and neck articulation organ including the sensor 150 is located in the head and neck articulation organ, where the sensor is located 210, the utterance feature 220 according to the utterance, and the speaker's voice 230 according to the utterance.
- the identification history information 240 including the start of the utterance, the utterance stop, and the utterance end is grasped.
- the utterance feature 220 may include one or more basic physical utterance characteristics of closed rupture speech, frictional speech, rupture speech, non-negative speech, voiced speech, active speech, sibilant speech, voiceless speech, and voiced speech. it means.
- the speaker's voice 230 is also an auditory speech feature that accompanies the speech feature.
- the utterance history information 240 through the vocal cord sensor 140, grasps the information by EMG or tremor of the vocal cords.
- the data analysis unit 200 includes a sensor unit 100 near the head and neck articulation organ including the oral tongue sensor 110, the face sensor 120, the voice acquisition sensor 130, the vocal cord sensor 140, and the dental sensor 150.
- the speech variation 250 generated according to the speaker's gender, race, age, and native language is identified.
- the utterance variation 250 is an associative syllable caused by assimilation, dissimilation, elimination, attachment, stress, and reduction of consonants.
- the data converter 300 may include a position 210 of the sensor unit measured by the head and neck articulator sensors 110, 120, 130, 140, and 150, a speech feature 220 according to speech, and a speaker's voice according to speech. 230, the utterance history information 240 and the utterance variation 250 are recognized as language data 310 and processed.
- the data analysis unit 200 is linked with the database unit 350.
- the database unit 350 includes a phoneme unit 361, an index syllable unit index 362, a word unit index 363, a phrase unit index 364, a sentence unit index 365, and a continuous speech index 366. ), A linguistic data index 360 including a high and low index 367 of the pronunciation. Through the language data index 360, the data analysis unit 200 may process various speech-related information acquired by the sensor unit 100 as language data.
- FIG. 16 is a diagram illustrating a principle in which a data interpreter of a speech intent representation system according to a second embodiment of the present invention grasps speech characteristics
- FIG. 17 illustrates data of a speech intent representation system according to a second embodiment of the present invention
- FIG. 18 is a diagram illustrating a principle of identifying physical characteristics of an articulation engine measured as an utterance feature
- FIG. 18 is a standard utterance feature regarding a vowel used by a data interpreter of a speech intent representation system according to a second embodiment of the present invention
- 19 is a diagram showing a matrix
- FIG. 19 is a diagram illustrating a standard speech feature matrix related to consonants utilized by a data interpreter of a speech intent representation system according to a second embodiment of the present invention.
- the data analysis unit 200 first acquires physical characteristics of the articulation organ measured from the sensor unit 100 including the oral cavity sensor 110. .
- the oral cavity sensor 110 senses the articulation organ physical characteristics and creates a matrix value of the sensed physical characteristics.
- the data analysis unit 200 grasps the speech feature 220 of the consonant corresponding to the matrix value of the physical property in the standard speech feature matrix 205 of the consonant.
- the standard speech feature matrix 205 of the consonant may have one or more of its values as a consonant utterance symbol, binary or real number.
- FIG. 20 is a diagram illustrating an algorithm process utilized by a data interpreter of a speech intention expression system according to a second embodiment of the present invention to grasp the physical characteristics of an articulation engine as speech features.
- the algorithm process utilized by the data analysis unit 200 includes the steps of acquiring the physical characteristics of the articulation engine in grasping the physical characteristics of the articulation engine measured by the sensor unit 100; Identifying a pattern of each consonant unit possessed by the acquired physical characteristics of the articulated organ, extracting a unique feature from each consonant pattern, classifying the extracted features, and recombining the features of the classified pattern It consists of stages, through which the final identification of the specific speech characteristics.
- FIG. 21 is a detailed diagram illustrating an algorithm process utilized by a data interpreter of a speech intent representation system according to a second embodiment of the present invention to grasp the physical characteristics of an articulation engine as speech features
- FIG. FIG. 23 is a diagram illustrating in detail the principle of an algorithm process utilized by a data interpreter of a speech intention representation system according to a second embodiment to identify physical characteristics of an articulation engine as speech characteristics
- FIG. 23 is a diagram illustrating a second embodiment of the present invention. It is a diagram illustrating an algorithm process of identifying a specific vowel as a speech feature by the oral cavity sensor of the speech intent expression system.
- the step of grasping the pattern of the unit of each consonant is a sensor grasping the physical characteristics of the articulation engine.
- the pattern of the consonant unit is determined based on the x, y, and z axes.
- ANN Artificial Neural Network
- CNN Convolutional Neural Network
- RNN Recurrent Neural Network
- RBM Restricted Boltzmann Machine
- HMM Hidden Markov Model
- the change in the vector amount and the angle change amount are determined by measuring the utterance of the speaker. It is recognized as a vowel [i] having Tongue Height and Tongue Frontness.
- the oral culprit sensor 110 may be caused by the change of the electrical signal due to piezoelectricity and the proximity or friction between the oral tongue sensor 110 and the internal and external articulation organs.
- the triboelectric signal is identified and recognized as a vowel [i] with dullness and legend.
- the height of the tongue and the backness of the vowel are measured to identify the vowel.
- the oral cavity sensor 110 measures vowels such as [i], [u], and [] generated by the utterance of the speaker as the utterance feature 220.
- This vowel speech feature 220 corresponds to the phoneme unit index 361 of the consonant of the database 350.
- FIG. 24 is a diagram illustrating a case in which the data analysis unit of the speech intention expression system according to the second embodiment of the present invention utilizes Alveolar Stop.
- the oral cavity sensor 110 measures the specific consonant uttered by the speaker as the utterance feature 220.
- the consonant utterance feature 220 of the consonant corresponds to the phoneme unit index 361 of the consonant of the database unit 350, and the data interpreter 200 recognizes Alveolar Stop as the language data 310.
- FIG. 25 is a diagram illustrating a case in which the data interpreter of the speech intent representation system according to the second embodiment uses Bilabial Stop.
- the oral cavity sensor 110 and the face sensor 120 measure the specific consonant uttered by the speaker as the utterance feature 220.
- the consonant utterance feature 220 of the consonant corresponds to the phoneme unit index 361 of the consonant of the database unit 350, and the data interpreter 200 recognizes the bilabial stop as the language data 310.
- FIG. 26 is a diagram illustrating an experiment result using a voiced bilabial stop of a data interpreter of a speech intent expression system according to a second embodiment of the present invention.
- the oral tongue sensor 110 and the face sensor 120 measure the specific consonant uttered by the speaker as the utterance feature 220.
- the consonant utterance feature 220 of the consonant corresponds to the phoneme unit index 361 of the consonant of the database unit 350, and the data interpreter 200 is a voiced bilabial stop, which is language data 310, and / or /.
- the voiceless bilabial stop was identified as / per /.
- FIGS. 27 and 28 are diagrams illustrating a case in which the data interpreter of the speech intent representation system according to the second embodiment of the present invention utilizes Voiced Labiodental Fricative.
- oral tongue sensor 110 As shown in Figure 27 and Figure 28, oral tongue sensor 110, facial sensor 120, voice acquisition sensor 130.
- the vocal cord sensor 140 and the dental sensor 150 measure the specific consonant uttered by the speaker as the utterance feature 220.
- the consonant utterance feature 220 of the consonant corresponds to the phoneme unit index 361 of the consonant of the database unit 350, and the data interpreter 200 recognizes the voiced Labiodental Fricative as the language data 310.
- FIG. 29 is a diagram illustrating interworking between a data interpreter and a database of a speech intention expression system according to a second embodiment of the present invention.
- the imaging sensor 160 includes a speaker having an oral cavity sensor 110, a face sensor 120, and a voice acquisition sensor 130.
- the voice data 310 is recognized and processed.
- the face sensor located on one surface of the head and neck has its own position with the potential difference between the anode sensor 122 and the cathode sensor 123 relative to the reference sensor 121, which is captured by the imaging sensor 160 by imaging.
- the physical head and neck joint articulation change information 161, head and neck facial expression change information 162, and non-verbal expression information 163 are transmitted to the data converter 300 as language data 310.
- FIG. 30 is a diagram illustrating a case in which a data interpreter of a speech intention expression system according to a second embodiment of the present invention recognizes a specific word.
- the oral tongue sensor 110, the face sensor 120, the voice acquisition sensor 130, the vocal cord sensor 140, and the dental sensor 150 make a specific consonant and a vowel spoken by the speaker.
- the measurement and data analysis unit 200 recognizes consonants and vowels as speech features 220.
- the speech features 220, [b], [i], and [f] of the respective consonants correspond to phoneme unit indexes 361 of the consonants of the database unit 350, respectively, and the data interpreter determines this as / beef /. To [bif].
- FIG. 31 is a diagram showing a database unit of a speech intent representation system according to a second embodiment of the present invention.
- the language data index 360 of the database unit 350 includes a phoneme unit index 361, a syllable unit index 362, a word unit index 363, and a phrase unit index 364 of a consonant. ), A sentence unit index 365, a continuous speech index 366, and a high and low index 367 of the pronunciation.
- FIG. 32 is a diagram illustrating a speech intention expression system according to a third embodiment of the present invention.
- the communication unit 400 may communicate with each other.
- the communication unit 400 is implemented by wire and wireless, and in the case of wireless, various methods such as Bluetooth, Wi-Fi, 3G, 4G, and NFC may be used.
- 33 and 34 are diagrams each showing an actual form of the database unit of the speech intent representation system according to the third embodiment of the present invention.
- the database unit 350 linked to the data analysis unit 200 has a language data index and has a speech feature 220 according to the actual speech, the speaker's voice 230, and the speech details.
- the information 240 and the speech variation 250 are identified as the language data 310.
- FIG. 33 is a diagram illustrating various consonant speech features including the high front tense vowel and the high back tense vowel of FIG. 23, the alveolar sounds of FIG. 24, and the voiceless labiodental fricative of FIG. 27.
- the actual data of the database unit 350 reflected by the 200 is shown.
- FIG. 34 illustrates a database in which the sensor unit 100 measures various consonant utterance characteristics including the high front lax vowel of FIG. 23, the alveolar sounds of FIG. 24, and the bilabial stop sounds of FIG. 25, and the data interpreter 200 reflects. Actual data of the unit 350.
- FIG. 35 is a diagram illustrating a speech intent expression system according to a fourth embodiment of the present invention
- FIG. 36 is a sensor part, a data interpreter, a data expression part, and a speech intent expression system according to a fourth embodiment of the present invention. It is a figure which shows the interlocking of a database part.
- the speech intention expression system includes a sensor unit 100, a data analysis unit 200, a data conversion unit 300, and a database unit which operate organically in cooperation. 350 and the data expression unit 500.
- the sensor unit 100 is located in an actual articulation engine, measures physical characteristics of the articulation engine according to the utterance of the speaker, and transmits the physical characteristics to the data analysis unit 200, which the data analysis unit 200 performs.
- Interpret as language data The interpreted language data is transmitted to the data expression unit 500, and it can be seen that the database unit 350 works in conjunction with the interpretation process and the expression process of the language data.
- 37 to 41 are diagrams showing means for expressing language data by the data expression unit of the speech intention expression system according to the fourth embodiment of the present invention.
- the physical characteristics of the head and neck articulator of the speaker obtained by the sensor unit 100 is the position of the sensor unit 210, the ignition feature 220, through the data analysis unit 200,
- the speaker 230 is identified as the speaker 230, the speech detail information 240, and the speech variation 250.
- the imaging sensor 160 captures an appearance change of the speaker's head and neck articulation organ, and the data interpreter 200 recognizes the change information 161 and the head and neck facial expression change information 162 of the speaker's head and neck articulation organ.
- FIG. 37 illustrates that the data expression unit 500 acoustically expresses the language data 310
- FIG. 38 illustrates that the data expression unit 500 visually expresses the language data 310.
- the physical characteristics of the speaker's articulation organ measured by the analyzing unit 200 are compared with the language data index 360 of the database unit 350, and the accent noon and the like are accompanied by a broad description of the actual standard pronunciation. It is provided to provide a numerical value measured at least one of proximity, intention to ignite.
- the data expression unit 500 expresses the language data 310 visually and aurally, and indicates the physical characteristics of the speaker's articulation organ measured by the data interpreter 200. Compared with the index 360, it provides a narrow description of the actual standard pronunciation along with a measure of one or more of stressed noon, pseudo-proximity, and speech intent.
- the data expressing unit 500 visually expresses the language data 310
- the physical characteristics of the speaker's articulation organ measured by the data analyzing unit 200 are indexed by the language unit 360 of the database unit 350.
- a number of measurements of one or more of stressed noon, pseudo-proximity, or speech intent and the corresponding language data 310 is a word-by-word index as a word ( 363)
- the corresponding image is provided together with the corresponding image.
- the data expression unit 500 expresses the language data 310 visually and aurally, and indicates the physical characteristics of the speaker's articulation organ measured by the data interpreter 200. Compared with the index 360, a numerical description of one or more of stressed noon, pseudo-proximity, and speech intent, together with a narrow description of the actual standard pronunciation, is provided. It shows that the speech correction image is provided together to utter the pronunciation so that it can be corrected and strengthened.
- FIG. 42 is a diagram illustrating a case in which the data expression unit of the speech intention expression system according to the fourth embodiment expresses language data visually and acoustically.
- the data expression unit 500 visualizes the language data 310 by text and provides audio by audio, so that the physical characteristics of the speaker's articulation engine measured by the data interpreter 200 are measured.
- the linguistic data 310 may include one or more of stressed noon, pseudo-proximity, and speech intent, together with a narrow description of the actual standard pronunciation associated with the speaker's linguistic data 310. By providing the measured letters and sounds to help the speaker to correct and reinforce the language data (310).
- FIG. 43 is a diagram illustrating a case in which the data expression unit of the speech intention expression system according to the fourth embodiment expresses language data visually.
- the data expression unit 500 visualizes and provides the language data 310 as one or more of a text, a picture, a picture, and an image.
- the data analysis unit 200 measures the physics characteristics of the speaker's articulation organ by using the consonant phoneme unit index 361, syllable unit index 362, word unit index 363, and phrase unit of the database unit 350.
- the language data index 360 is compared to one or more of the index 364 and the sentence unit index 365.
- the language data 310 is used by the data expression unit 500 along with a narrow description and broad description of the actual standard pronunciation related to the speaker's language data 310.
- the intention to speak the measured character and sound to help the speaker to correct and reinforce the language data (310).
- FIG. 44 is a diagram illustrating a case in which the data expression unit of the speech intention expression system according to the fourth embodiment expresses language data visually.
- the database unit 350 displays the physical characteristics of the speaker's articulation organ measured by the data analysis unit 200.
- One or more of the phoneme unit index (361), syllable unit index (362), word unit index (363), phrase unit index (364), sentence unit index (365), continuous speech index (366) Compare with index 360.
- the language data 310 is used by the data expression unit 500 along with a narrow description and broad description of the actual standard pronunciation related to the speaker's language data 310.
- by providing a character and a sound of a continuous speech unit measured at least one of the speech intent to help the speaker to correct and reinforce the language data (310).
- 45 is a diagram illustrating a case in which the data expression unit of the speech intention expression system according to the fourth embodiment expresses language data in units of continuous speech.
- the data expression unit 500 when the data expression unit 500 visualizes the language data 310 by text and auditoryizes it with sound, the physical characteristics of the speaker's articulation organ measured by the data analysis unit 200 are measured. Consonant phoneme unit index (361), syllable unit index (362), word unit index (363), phrase unit index (364), sentence unit index (365), and continuous speech index (366) of the database unit 350. Compare the linguistic data index 360 with one or more of the phonetic high and low indexes 367.
- the language data 310 is used by the data expression unit 500 along with a narrow description and broad description of the actual standard pronunciation related to the speaker's language data 310.
- the speaker may provide one or more of the utterance intentions as measured letters and sounds.
- FIG. 46 is a diagram illustrating a confusion matrix utilized by the speech intent representation system according to the fourth embodiment of the present invention
- FIG. 47 is a percentage of the confusion matrix utilized by the speech intent representation system according to the fourth embodiment of the present invention.
- the data interpreter 200 extracts one or more features using time domain variance, frequency domain Cepstral Coefficient, and linear predictive coding coefficient in identifying the language data 310. Represents an algorithm.
- n is the network of population
- I the mean of a population of data that is the articulator physical characteristics collected
- x i the data of the articulator physical characteristics collected.
- Cepstral Coefficient is calculated by the following Equation 2 to formulate the strength of the frequency.
- F- 1 represents an Inverse Fourrier Transform, which is an Inverse Fourier series transform
- X (f) represents a spectrum of frequencies for a signal.
- ANN was used to classify each data by grouping the data of the articulator physical properties according to similarity and generating prediction data.
- the speaker can grasp the noon, proximity similarity, and intention of the utterance of his utterance in preparation for the standard utterance. Based on this, the speaker gets feedback on the contents of the utterance and continuously re-ignites the utterance.
- this repetitive articulation data input method a large amount of articulation data of physical arts is gathered and the accuracy of ANN is increased.
- the physical properties of the articulation organs which are input data, were selected as 10 consonants, and classified into 5 articulation positions, Bilabial, Alveolar, Palatal, Velar, and Glottal.
- 10 consonants corresponding to the five articulation positions were pronounced 100 times in order, 1000 times in total, 50 times in total, and 500 times in total.
- FIG. 48 is a diagram illustrating a case in which a speech intention expression system according to a fourth embodiment of the present invention assists a speaker in language correction and guidance through a screen.
- the data interpreter 200 recognizes that the speaker does not speak properly [ ⁇ ] through comparison with the standard speech feature matrix 205. Then, the data expression unit 300 provided the speaker's utterance noon and similarity, and the result was only 46%. Then, the data expression unit 300 helps the speaker to pronounce [ki ⁇ ] correctly through the screen.
- the data expression unit 300 provides Speech Guidance (Image) to intuitively show which articulation organs the speaker should manipulate.
- Speech Guidance (Image) presented by the data expression unit 300 performs utterance correction and guidance based on the sensor unit attached to or adjacent to the articulator for ignition of the [ ⁇ ]. For example, in the case of [ki ⁇ ], [k] raises the tongue of the tongue (Tongue Body, Root) in the direction of the Velar (drug) and makes a rupture sound while making a play. Fire at / k /
- the oral tongue sensor 110 also detects the tongue's height and frontness. In addition, when igniting [i], both ends of the lips are pulled to both cheeks. The face sensor 120 will grasp this. In the case of [ ⁇ ], the back of the tongue (Tongue Body, Tongue Root) must be lifted in the direction of Velar and snorted to ignite. Therefore, the oral tongue sensor 110 also grasps the high hypothermia and the anteroposterior tongue of the tongue.
- the muscles around the nose and its surroundings tremble. This phenomenon can be identified by attaching the face sensor 120 around the nose.
- FIG. 49 is a diagram illustrating a case in which a speech intention expression system according to a fourth embodiment of the present invention captures and captures an image of a head and neck articulation organ.
- the imaging sensor 160 captures an appearance change of the speaker's head and neck articulation engine according to the utterance, and the data interpreter 200 changes information 161 of the speaker's head and neck articulation engine through this.
- the head and neck facial expression change information 162 is grasped.
- the speaker's speech characteristics 210 identified through the oral tongue sensor 110, the face sensor 120, the voice acquisition sensor 130, the vocal cord sensor 140, and the dental sensor 150 of the sensor unit 100 are also included.
- the data analysis unit 200 will be considered together.
- FIG. 50 is a diagram illustrating a case in which a speech intent representation system according to a fourth embodiment combines mutual information through a standard speech feature matrix.
- the oral cavity sensor 110, the face sensor 120, the voice acquisition sensor 130, and the vocal cord sensor 140 of the sensor unit 100 grasps the speaker's speech characteristics 210
- the imaging sensor 160 grasps the change information 161 of the head and neck articulation organ and the head and neck facial expression change information 162.
- the data interpreter 200 combines the speech information corresponding to the change information 161 of the head and neck articulation organ and the head and neck facial expression change information 162 based on the standard speech feature matrix 205.
- 51 is a diagram illustrating a speech intent expression system according to a fifth embodiment of the present invention.
- the data converter 300 may include an object head and neck unit.
- the first basis data 211 is generated among the data 320.
- the data matching unit 600 may match the head and neck data to one or more objects 20 of the head and neck 21 and the head and neck 22 of the image object based on the first basis data 211.
- Static base coordinates 611 are generated and matched out of 610.
- the data converter 300 generates second basis data 221 of the head and neck data 320.
- the data matching unit 600 may change the dynamic movement of the head and neck part that changes as one or more objects 20 of the head and neck part 21 of the image object and the head and neck part 22 of the robot object ignite based on the second basis data 221. Dynamic variable coordinates 621 are generated and matched for implementation.
- FIG. 52 is a diagram illustrating a case in which the speech intent representation system according to the fifth embodiment of the present invention matches object head and neck data to the head and neck of an image object based on static basic coordinates
- FIG. 53 is a fifth embodiment of the present invention.
- FIG. 3 is a diagram illustrating static basic coordinates based on a position of a face sensor utilized by a speech intention representation system according to an example.
- FIG. 3 is a diagram illustrating static basic coordinates based on a position of a face sensor utilized by a speech intention representation system according to an example.
- the first basis data 211 is used to generate the static basis coordinates 611.
- the position of the face sensor 120 is detected by using the potential difference.
- the reference sensor 121, the anode sensor 122, and the cathode sensor 123 of the face sensor 120 attached in the speaker's non-ignition state have a reference position equal to (0.0), respectively.
- This position is the static basic coordinates 611.
- FIG. 54 illustrates a case in which the speech intent representation system according to the fifth embodiment of the present invention matches object head and neck data to the head and neck of an image object based on dynamic basic coordinates
- FIG. 55 is a fifth embodiment of the present invention.
- FIG. 4 is a diagram illustrating dynamic basic coordinates based on a voltage difference of a face sensor utilized by a speech intent representation system according to an example.
- FIG. 4 is a diagram illustrating dynamic basic coordinates based on a voltage difference of a face sensor utilized by a speech intent representation system according to an example.
- the data matching unit 600 is attached to the head and neck of the speaker to match the object head and neck data 320 to the head and neck 21 of the image object, and thus the head and neck muscles according to the speaker's speech.
- the dynamic variable coordinates 621 are generated using the second basis data 221 which is the potential difference of the face sensor 120 by the operation of.
- the face sensor 120 measures the electromyography of the head and neck moving in accordance with the utterance of the speaker to determine the physical characteristics of the head and neck articulation.
- the reference sensor 121, the positive electrode 122, and the negative electrode sensor 123 of the face sensor 120 attached in the speaker's ignition state detect the EMG of the head and neck muscles that change according to the utterance, respectively (0, -1). ), (-1, -1), and (1, -1)). This position becomes the dynamic variable coordinate 621.
- FIG. 56 illustrates a case in which the speech intent expression system according to the fifth embodiment of the present invention matches object head and neck data to an actuator of a head and neck part of a robot object based on static basic coordinates
- FIG. 57 is a view illustrating an embodiment of the present invention.
- 5 is a diagram illustrating static basic coordinates based on a voltage difference of a face sensor utilized by a speech intent representation system according to an exemplary embodiment.
- the static basis coordinates 611 are generated using the first basis data 211, which is the position of 120.
- the position of the face sensor 120 is detected by using the potential difference.
- the reference sensor 121, the anode sensor 122, and the cathode sensor 123 of the face sensor 120 attached in the speaker's non-ignition state are respectively (0.0) in the actuator 30 of the head and neck portion 22 of the robot object. It will have the same reference position as This position is the static basic coordinates 611.
- FIG. 58 illustrates a case in which the speech intent representation system according to the fifth embodiment of the present invention matches object head and neck data to an actuator of the head and neck portion of a robot object based on dynamic variable coordinates
- FIG. 59 is a view illustrating an embodiment of the present invention.
- FIG. 5 is a diagram illustrating dynamic variable coordinates based on a voltage difference of a face sensor utilized by a speech intention expression system according to an exemplary embodiment.
- the data matching unit 600 is attached to the head and neck of the speaker to match the object head and neck data 320 to the actuator 30 of the head and neck 22 of the robot object.
- the dynamic variable coordinates 621 are generated using the second basis data 221 which is the potential difference of the face sensor 120 caused by the action of the head and neck muscles due to the utterance.
- the face sensor 120 measures the electromyography of the head and neck moving in accordance with the utterance of the speaker to determine the physical characteristics of the head and neck articulation.
- the reference sensor 121, the anode sensor 122, and the cathode sensor 123 of the face sensor 120 attached in the speaker's ignition state grasp the EMG of the head and neck muscles that change according to the utterance, and thus, the head and neck (
- the actuator 30 of 22) has variable positions such as (0, -1), (-1, -1), and (1, -1), respectively, to move accordingly. This position becomes the dynamic variable coordinate 621.
- FIG. 60 and 61 are views showing the operation of the actuator of the head and neck portion of the robot object of the speech intent representation system according to the fifth embodiment of the present invention
- Figure 62 is a speech intent representation according to a fifth embodiment of the present invention It is a figure which shows the actuator of the head and neck of the robot object of a system.
- At least one actuator of the head and neck portion 22 of the robot object may be used by the data matching unit 600 to obtain the object head and neck data 310 obtained from the data interpreter 200 and the data converter 300.
- the actuator 30 is an artificial musculoskeletal structure of the head and neck portion 22 of the robot object, and may be driven by a motor including a DC motor, a step motor, and a servo motor, and may be pneumatic or hydraulic. Can be protruded and embedded and operated in a manner. Through this, the actuator 30 may implement various dynamic movements of one or more of articulation, speech, and facial expression of the head and neck part 22 of the robot object.
- the actuator 30 may be driven by a motor including a DC motor, a step motor, and a servo motor, and operated in a pneumatic or hydraulic manner, thereby allowing contraction or relaxation in tension. It is characterized by.
- the actuator 30 may be located at the head and neck 22 of the robot object.
- the sensor unit 100 may include the following.
- Pressure sensor MEMS sensor, Piezoelectric method, Piezoresistive method, Capacitive method, Pressure sensitive rubber method, Force sensing resistor (FSR) method, Inner particle deformation method, Buckling measurement method.
- FSR Force sensing resistor
- Friction sensor Micro hair array method, friction temperature measurement method.
- Electrostatic sensor electrostatic consumption, electrostatic generation.
- Electric resistance sensor DC resistance measurement method, AC resistance measurement method, MEMS, Lateral electrode array method, Layered electrode method, Field Effect Transistor (FET) method (Organic-FET, Metal-oxide-semiconductor-FET, Piezoelectric-oxide) -semiconductor -FET etc.).
- FET Field Effect Transistor
- Tunnel Effect Tactile Sensor Quantum tunnel composites, Electron tunneling, Electroluminescent light.
- Thermal resistance sensor thermal conductivity measurement method, thermoelectric method.
- Optical sensor light intensity measurement, refractive index measurement.
- Magnetism based sensor Hall-effect measurement method, Magnetic flux measurement method.
- Ultrasonic based sensor Acoustic resonance frequency method, Surface noise method, Ultrasonic emission measurement method.
- Soft material sensors rubber, powder, porous materials, sponges, hydrogels, aerogels, carbon fibers, nanocarbon materials, carbon nanotubes, graphene, graphite, composites, nanocomposites, metal-polymer composites, ceramic-polymer composites Pressure, stress, or strain measurement method using materials such as conductive polymers, Stimuli responsive method.
- Piezoelectric sensor Ceramic materials such as quartz, lead zirconate titanate (PZT), polymer materials such as PVDF, PVDF copolymers, and PVDF-TrFE, and nanomaterials such as cellulose and ZnO nanowires.
- PZT lead zirconate titanate
- polymer materials such as PVDF, PVDF copolymers, and PVDF-TrFE
- nanomaterials such as cellulose and ZnO nanowires.
Landscapes
- Engineering & Computer Science (AREA)
- Acoustics & Sound (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Multimedia (AREA)
- Quality & Reliability (AREA)
- Signal Processing (AREA)
- Data Mining & Analysis (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Toys (AREA)
Abstract
L'invention comprend : une partie capteur permettant de mesurer les caractéristiques physiques d'un articulateur tout en étant adjacente à une surface de la tête et du cou d'un orateur ; une partie d'analyse de données permettant d'identifier les caractéristiques vocales de l'orateur d'après la position de la partie capteur et les caractéristiques physiques de l'articulateur ; une partie de conversion de données permettant de convertir la position de la partie capteur et les caractéristiques vocales en données linguistiques ; et une partie d'expression de données permettant d'exprimer extérieurement les données linguistiques, la partie capteur comprenant une bouche et un capteur de langue correspondant à la bouche et à la langue.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US16/605,361 US20200126557A1 (en) | 2017-04-13 | 2018-04-13 | Speech intention expression system using physical characteristics of head and neck articulator |
Applications Claiming Priority (16)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
KR20170048010 | 2017-04-13 | ||
KR10-2017-0048010 | 2017-04-13 | ||
KR10-2017-0126469 | 2017-09-28 | ||
KR10-2017-0126048 | 2017-09-28 | ||
KR1020170125765A KR20180115599A (ko) | 2017-04-13 | 2017-09-28 | 발화 개선을 위한 두경부 물리 특성 기반의 가이드 및 피드백 시스템 |
KR10-2017-0126049 | 2017-09-28 | ||
KR1020170126049A KR20180115601A (ko) | 2017-04-13 | 2017-09-28 | 영상 객체의 발화 및 표정 구현을 위한 조음기관 물리 특성 기반의 발화-표정 데이터 맵핑 시스템 |
KR1020170126470A KR20180115603A (ko) | 2017-04-13 | 2017-09-28 | 조음기관의 물리 특성과 음성 간 매칭을 통한 발화 의도 측정 및 발화 구현 시스템 |
KR1020170126048A KR20180115600A (ko) | 2017-04-13 | 2017-09-28 | 발화 의도 표현을 위한 두경부 조음기관 물리 특성 기반 시스템 |
KR10-2017-0125765 | 2017-09-28 | ||
KR1020170126469A KR20180115602A (ko) | 2017-04-13 | 2017-09-28 | 촬상센서를 포함한 두경부 조음기관의 물리특성과 기반의 발화 의도 측정 및 발화 구현 시스템 |
KR10-2017-0126470 | 2017-09-28 | ||
KR10-2017-0126770 | 2017-09-29 | ||
KR10-2017-0126769 | 2017-09-29 | ||
KR1020170126769A KR20180115604A (ko) | 2017-04-13 | 2017-09-29 | 조음기관의 물리 특성과 문자 간 매칭을 통한 발화 의도 측정 및 발화 구현 시스템 |
KR1020170126770A KR20180115605A (ko) | 2017-04-13 | 2017-09-29 | 로봇의 발화 및 안면 구현을 위한 조음기관 물리 특성 기반의 발화-표정 데이터 맵핑 시스템 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2018190668A1 true WO2018190668A1 (fr) | 2018-10-18 |
Family
ID=63792694
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/KR2018/004325 WO2018190668A1 (fr) | 2017-04-13 | 2018-04-13 | Système d'expression d'intention vocale utilisant les caractéristiques physiques d'un articulateur de tête et de cou |
Country Status (1)
Country | Link |
---|---|
WO (1) | WO2018190668A1 (fr) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114501212A (zh) * | 2020-10-26 | 2022-05-13 | 和硕联合科技股份有限公司 | 降噪耳机的控制装置及控制方法 |
JPWO2023074119A1 (fr) * | 2021-10-27 | 2023-05-04 | ||
CN117752307A (zh) * | 2023-12-21 | 2024-03-26 | 新励成教育科技股份有限公司 | 一种基于多源生物信号采集的口才表达分析系统 |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20100007512A1 (en) * | 2005-10-31 | 2010-01-14 | Maysam Ghovanloo | Tongue Operated Magnetic Sensor Based Wireless Assistive Technology |
US20120259554A1 (en) * | 2011-04-08 | 2012-10-11 | Sony Computer Entertainment Inc. | Tongue tracking interface apparatus and method for controlling a computer program |
KR20140068080A (ko) * | 2011-09-09 | 2014-06-05 | 아티큘레이트 테크놀로지스, 인코포레이티드 | 스피치 및 언어 트레이닝을 위한 구강내 촉각 바이오피드백 방법들, 디바이스들 및 시스템들 |
US20140342324A1 (en) * | 2013-05-20 | 2014-11-20 | Georgia Tech Research Corporation | Wireless Real-Time Tongue Tracking for Speech Impairment Diagnosis, Speech Therapy with Audiovisual Biofeedback, and Silent Speech Interfaces |
US20160027441A1 (en) * | 2014-07-28 | 2016-01-28 | Ching-Feng Liu | Speech recognition system, speech recognizing device and method for speech recognition |
-
2018
- 2018-04-13 WO PCT/KR2018/004325 patent/WO2018190668A1/fr active Application Filing
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20100007512A1 (en) * | 2005-10-31 | 2010-01-14 | Maysam Ghovanloo | Tongue Operated Magnetic Sensor Based Wireless Assistive Technology |
US20120259554A1 (en) * | 2011-04-08 | 2012-10-11 | Sony Computer Entertainment Inc. | Tongue tracking interface apparatus and method for controlling a computer program |
KR20140068080A (ko) * | 2011-09-09 | 2014-06-05 | 아티큘레이트 테크놀로지스, 인코포레이티드 | 스피치 및 언어 트레이닝을 위한 구강내 촉각 바이오피드백 방법들, 디바이스들 및 시스템들 |
US20140342324A1 (en) * | 2013-05-20 | 2014-11-20 | Georgia Tech Research Corporation | Wireless Real-Time Tongue Tracking for Speech Impairment Diagnosis, Speech Therapy with Audiovisual Biofeedback, and Silent Speech Interfaces |
US20160027441A1 (en) * | 2014-07-28 | 2016-01-28 | Ching-Feng Liu | Speech recognition system, speech recognizing device and method for speech recognition |
Non-Patent Citations (2)
Title |
---|
SHIN, JIN HO ET AL.: "Korean Consonant Classification Based on Physical Sensor according to the Articulation Position for the Silent Speech Recognition", THE JOURNAL OF KOREAN INSTITUTE OF NEXT GENERATION COMPUTING, 21 October 2016 (2016-10-21) * |
SHIN, JIN HO ET AL.: "Korean Consonant Recognition Based on Multiple Motion Sensor according to the Articulation Position", PROCEEDINGS OF THE 2016 KOREAN INSTITUTE OF NEXT GENERATION COMPUTING SPRING CONFERENCE, 28 May 2016 (2016-05-28) * |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114501212A (zh) * | 2020-10-26 | 2022-05-13 | 和硕联合科技股份有限公司 | 降噪耳机的控制装置及控制方法 |
CN114501212B (zh) * | 2020-10-26 | 2025-03-18 | 和硕联合科技股份有限公司 | 降噪耳机的控制装置及控制方法 |
JPWO2023074119A1 (fr) * | 2021-10-27 | 2023-05-04 | ||
JP7611492B2 (ja) | 2021-10-27 | 2025-01-10 | パナソニックIpマネジメント株式会社 | 推定装置、推定方法およびプログラム |
CN117752307A (zh) * | 2023-12-21 | 2024-03-26 | 新励成教育科技股份有限公司 | 一种基于多源生物信号采集的口才表达分析系统 |
CN117752307B (zh) * | 2023-12-21 | 2024-08-20 | 新励成教育科技股份有限公司 | 一种基于多源生物信号采集的口才表达分析系统 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
KR102196099B1 (ko) | 촬상센서를 포함한 두경부 조음기관의 물리특성과 기반의 발화 의도 측정 및 발화 구현 시스템 | |
US5536171A (en) | Synthesis-based speech training system and method | |
US5340316A (en) | Synthesis-based speech training system | |
Denby et al. | Silent speech interfaces | |
WO2018190668A1 (fr) | Système d'expression d'intention vocale utilisant les caractéristiques physiques d'un articulateur de tête et de cou | |
WO2015099464A1 (fr) | Système de support d'apprentissage de prononciation utilisant un système multimédia tridimensionnel et procédé de support d'apprentissage de prononciation associé | |
WO2022080774A1 (fr) | Dispositif, procédé et programme d'évaluation de trouble de la parole | |
Perrier | Control and representations in speech production | |
WO2017082447A1 (fr) | Dispositif de lecture à voix haute et d'affichage en langue étrangère et procédé associé, dispositif d'apprentissage moteur et procédé d'apprentissage moteur basés sur un capteur de détection d'actions rythmiques de langue étrangère l'utilisant, et support électronique et ressources d'étude dans lesquels celui-ci est enregistré | |
KR102071421B1 (ko) | 청음 향상을 위한 두경부 물리 특성 기반 복합시스템 | |
Altalmas et al. | Quranic Letter Pronunciation Analysis based on Spectrogram Technique: Case Study on Qalqalah Letters. | |
KR102364032B1 (ko) | 조음기관의 물리 특성과 음성 및 문자 간 매칭을 통한 발화 의도 측정 및 발화 구현 시스템 | |
Simpson et al. | Detecting larynx movement in non-pulmonic consonants using dual-channel electroglottography | |
CN1064766C (zh) | 合成式语言训练系统 | |
Seong et al. | A study on the voice security system using sensor technology | |
Stone | A silent-speech interface using electro-optical stomatography | |
JP2908720B2 (ja) | 合成を基本とした会話訓練装置及び方法 | |
WO2015019835A1 (fr) | Dispositif de larynx artificiel électrique | |
Shahina et al. | Recognition of consonantvowel units in throat microphone speech | |
Vescovi et al. | CONTROL, OF A MODIFIED TWO-MASS MODEL, FOR ANTHROPOMORPHIC SYNTHESIS | |
Deorukhakar et al. | Speech Recognition for People with Disfluency: A | |
Tran | Silent Communication: whispered speech-to-clear speech conversion | |
Guangpu | Articulatory phonetic features for improved speech recognition | |
Lim | Artificial speech for intensive care unit (ICU) patients and laryngectomees | |
JPH034919B2 (fr) |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 18783668 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 18783668 Country of ref document: EP Kind code of ref document: A1 |