WO1997034292A1 - Method and device at speech-to-speech translation - Google Patents
Method and device at speech-to-speech translation Download PDFInfo
- Publication number
- WO1997034292A1 WO1997034292A1 PCT/SE1997/000205 SE9700205W WO9734292A1 WO 1997034292 A1 WO1997034292 A1 WO 1997034292A1 SE 9700205 W SE9700205 W SE 9700205W WO 9734292 A1 WO9734292 A1 WO 9734292A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- speech
- language
- fundamental tone
- translation
- translated
- Prior art date
Links
- 238000013519 translation Methods 0.000 title claims abstract description 26
- 238000000034 method Methods 0.000 title claims abstract description 15
- 230000015572 biosynthetic process Effects 0.000 abstract description 8
- 238000003786 synthesis reaction Methods 0.000 abstract description 8
- 230000014616 translation Effects 0.000 description 18
- 238000010606 normalization Methods 0.000 description 4
- 230000000694 effects Effects 0.000 description 3
- 239000011295 pitch Substances 0.000 description 3
- 230000001944 accentuation Effects 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 2
- 238000004891 communication Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 238000006243 chemical reaction Methods 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000010187 selection method Methods 0.000 description 1
- 210000002105 tongue Anatomy 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
- 230000001755 vocal effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
- G10L15/1807—Speech classification or search using natural language modelling using prosody or stress
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G10L13/02—Methods for producing synthetic speech; Speech synthesisers
- G10L13/04—Details of speech synthesis systems, e.g. synthesiser structure or memory management
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
- G10L15/1815—Semantic context, e.g. disambiguation of the recognition hypotheses based on word meaning
Definitions
- the present invention relates to, from a given natural speech, to produce a corresponding speech in a second language.
- the second language is produced artificially.
- translation is from a speech to another in different languages.
- the complexity is higher because recognition of the first language is a difficulty in itself. More difficulties will arise if the translated speech shall be reproduced with the voice and characteristics which characterizes the original speaker.
- the invention includes an analyzing unit which analyses the duration and the fundamental tone in the speech in the first language.
- a prosody interpreting unit determines, on basis of the analysis and the information regarding the characteristics of the language, prosody characteristic information in the first language which is used by a prosody generating unit for the second language for control of the speech synthesis.
- a speech synthesis device accordingly effects stresses in the in the second language translated speech which from linguistic point of view correspond to stresses in the first language.
- the translated speech is represented by an artificial voice, the characteristics of which does not correspond to that of the first speaker.
- an artificial voice of a speaker's verbal presentation it is important that the speaker's voice characteristics in all essentials is translated into the second language.
- the presentation shall at that in translated sentence be correspondent in respective language.
- the present invention relates to a method and device at speech-to-speech translation.
- a given speech in a first language is recognized in a speech recognition equipment, A.
- the speech recognition equipment produces a text which is transferred to a translator, B, for translation to a second language.
- Parallelly with these procedures fundamental tone information for the first speech is produced.
- the fundamental tone information has an effect on the prosody generation, G, which effects a text-to-speech converter, C.
- G which effects a text-to-speech converter
- From the text-to-speech converter a speech in a second language is obtained, the synthesis of which essentially is in accordance with the synthesis of the first language.
- the device relates to speech-to-speech translation where a first speech is given.
- the first speech is given in a first language.
- the given speech is recognized and translated into a second language.
- the fundamental tone information in the first language is translated to the second language at which the second speech is produced with a pitch and fundamental tone dynamics corresponding to that of the first speech.
- the at this produced information will at that announce essentially the same message as the original information in the first speech.
- the fundamental tone of the first speech is normalized and its sentence accents are extracted. This information indicates on one hand the characteristics of the speaker regarding speech, and on the other which parts in the speech that are emphasized.
- the accents further decide which shades of the translation that can be decisive at the interpretation of the speech.
- the normalization means that the fundamental tone variation of the speech is divided by the fundamental tone declination of the speech. From normalization of the fundamental tone curve, the dynamics of the speeech can be gathered.
- sentence accents in the incoming speech are classified.
- the location of said sentence accents in the second language are determined.
- the sentence accents consequently are translated into the second language at which an accentuation corresponding to that of the first language is obtained.
- the sentence accent information and the fundamental tone information, fundamental tone declination and fundamental tone dynamics are transferred to a prosody generator.
- a written translation of the speech is combined with said other information. This information is after that utilized at the text-to-speech conversion at which a speech is produced in a pitch of the voice and an intonation in the second language which is well in accordance with the speech the person would have produced in the second language, at which a part of the speaker's identity is transferred.
- the present invention allows that a speech produced by a speaker in a first language is presented with the voice characteristics of the speaker. To a listener of the translated speech this means that the experience is that the translated speech is experienced as directly spoken by the first speaker.
- the utilization of the sentence accents of the first speech and translation of these to the second speech further implies that the characteristics of the second speech is preserved, as well as the intonation at the translation.
- Fig. 1 shows the invention in the form of a block diagram.
- Fig. 2 shows a diagram over the fundamental tone variations over the fundamental tone declination.
- Fig. 3 shows a curve over the fundamental tone variation divided by the fundamental tone declination.
- Speech recognition equipments are since before well known to the expert within the speech recognition field.
- the fundamental functions in speech recognition equipments can be found in books as well as in periodicals.
- a first speech, speech 1, representing speech from a person is received by a speech recognition equipment, A, which converts the speech into a text string.
- the speech recognition equipment evaluates different interpretations which can exist with regard to the interpretation of the speech.
- the selection of the most probable speech can be made in different ways, for instance by calculus of probability, interpretations of previous sequences in the speech, linguistic selection methods etc.
- the text string which has been produced in the speech recognition equipment, A is after that transferred to a translator, B, which translates the given speech to a text string in the second language.
- the fundamental characteristics of the second language is added to the speech of the translated speech.
- the fundamental characteristics consist of normal accents and pitches in the language.
- the person's voice characteristics is transferred to the second speech.
- the intonation in the first language is translated into the second language to make it possible to preserve the meaning.
- Information regarding these voice characteristics are obtained by fundamental tone extraction.
- the fundamental tone of the speech, speech 1 is extracted in a fundamental tone extractor, D.
- the fundamental tone is a combination of fundamental tone declination and fundamental tone variation. Fig.2.
- the normalization means that the variation of the fundamental tone is divided by the declination of the fundamental tone, Fig.3. This information indicates the fundamental tone dynamics of the speaker in the first speech.
- the sentence accents in the first speech is further determined.
- the information regarding the sentence accents are transferred to a sentence accent translator, F, which also receives information regarding the translation from translator.
- the specific sentence accents which have been identified for the first language now are translated into the second language. I.e. the sentence accents are placed in the second language with regard to the characteristics of the second language.
- the translation of the sentence accents are after that returned to the translator for linquistic control.
- the linguistic control includes that the accentuations are modified to the use of the second language.
- the in this way modified text string is after that transferred to a text-to speech-converter, C, and to a prosody converter, G.
- the prosody converter further receives information from the sentence accent translator, F, and fundamental tone information from E.
- a prosody which is adapted to second language after that is generated.
- the information from the prosody generator, G is after that transferred to the text-to- speech converter for generation of a speech, speech 2, the synthesis of which essentially corresponds to the synthesis of the first speech.
Landscapes
- Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Machine Translation (AREA)
Abstract
The present invention relates to a method and device at speech-to-speech translation. A given speech in a first language is recognized in a speech recognition equipment (A). The speech recognition equipment produces a text which is transferred to a translator (B) for translation into a second language. Parallel to these procedures, fundamental tone information is assembled for the first speech. The fundamental tone information influences the prosody generation (G), which influences a text-to-speech converter (C). From the text-to-speech converter a speech in a second language is obtained, the synthesis of which is essentially in accordance with the synthesis of the first speech.
Description
TITLE OF THE INVENTION:
Method and device at speech-to-speech translation.
TECHNICAL FIELD
The present invention relates to, from a given natural speech, to produce a corresponding speech in a second language. The second language is produced artificially.
PRIOR.ART
Attempts to translate between different languages have previously been made. For instance there exist devices which from a given text translate between different languages. Different interpretations of a text however can occur which makes the translator's work more difficult.
Other examples of translation are from a speech to another in different languages. In this case the complexity is higher because recognition of the first language is a difficulty in itself. More difficulties will arise if the translated speech shall be reproduced with the voice and characteristics which characterizes the original speaker.
In patent document 9301596-4 a device for improved understanding of speech at artificial translation from one language into another is described. The invention includes an analyzing unit which analyses the duration and the fundamental tone in the speech in the first language. A prosody interpreting unit determines, on basis of the analysis and the information regarding the characteristics of the language, prosody characteristic information in the first language which is used by a prosody generating unit
for the second language for control of the speech synthesis. A speech synthesis device accordingly effects stresses in the in the second language translated speech which from linguistic point of view correspond to stresses in the first language.
DESCRIPTION OF THE INVENTION TECHNICAL PROBLEM At translation of speech between different languages there is a wish that the characteristics of the speech in the first language is transferred to the second language at the translation. These characteristics are of vital importance for the identification of the speaker of the produced speech. If characteristics are lacking, the produced speech can on one hand be difficult to understand, and on the other give different signals in the speech respective in the characteristics of the speech. The prosodic information content of the speech shall consequently be possible to transfer with principally maintained meaning. Further, there is a wish that the voice of the original speaker shall be reproduced in a lifelike way in the second language.
Further, there is need to find methods and devices which can be used at direct translation between conversing perse.s. This can for instance relate to persons who are communicating over a telecommunications network. Other fields which need translations are for instance persons in authority, physicians etc who shall communicate with immigrants in different situations. Especially if the
person with whom the communication is made, speaks a less frequent language, or if the language in itself is well known but a dialect which is difficult to understand is utilized, interpretation problems may arise. The supply of interpreters further are limited, so distance interpretation may sometimes be necessary. The interpreter can in such connections lose much information in ways of expression and body language which are of importance for the interpretation.
It is further desirable at the translation to obtain a characteristic in the translated speech which corresponds to the speaker's voice and reproduces his/her state of mind. In the devices and methods which are known, the translated speech is represented by an artificial voice, the characteristics of which does not correspond to that of the first speaker. At an artificial voice of a speaker's verbal presentation it is important that the speaker's voice characteristics in all essentials is translated into the second language. The presentation shall at that in translated sentence be correspondent in respective language. The possibilities for real identification for the person whith whom one is talking will at that increase exceedingly. The following invention intends to solve said problems.
THE SOLUTION
The present invention relates to a method and device at speech-to-speech translation. A given speech in a first language is recognized in a speech recognition equipment, A. The speech recognition equipment produces a text which
is transferred to a translator, B, for translation to a second language. Parallelly with these procedures fundamental tone information for the first speech is produced. The fundamental tone information has an effect on the prosody generation, G, which effects a text-to-speech converter, C. From the text-to-speech converter a speech in a second language is obtained, the synthesis of which essentially is in accordance with the synthesis of the first language. The device relates to speech-to-speech translation where a first speech is given. The first speech is given in a first language. The given speech is recognized and translated into a second language. The fundamental tone information in the first language is translated to the second language at which the second speech is produced with a pitch and fundamental tone dynamics corresponding to that of the first speech. The at this produced information will at that announce essentially the same message as the original information in the first speech. The fundamental tone of the first speech is normalized and its sentence accents are extracted. This information indicates on one hand the characteristics of the speaker regarding speech, and on the other which parts in the speech that are emphasized. The accents further decide which shades of the translation that can be decisive at the interpretation of the speech. The normalization means that the fundamental tone variation of the speech is divided by the fundamental tone declination of the speech. From normalization of the fundamental tone curve, the dynamics of the speeech can be gathered. Further, sentence accents in the incoming speech are classified. The location of said
sentence accents in the second language are determined. The sentence accents consequently are translated into the second language at which an accentuation corresponding to that of the first language is obtained. The sentence accent information and the fundamental tone information, fundamental tone declination and fundamental tone dynamics are transferred to a prosody generator. In the prosody generator a written translation of the speech is combined with said other information. This information is after that utilized at the text-to-speech conversion at which a speech is produced in a pitch of the voice and an intonation in the second language which is well in accordance with the speech the person would have produced in the second language, at which a part of the speaker's identity is transferred.
ADVANTAGES
The present invention allows that a speech produced by a speaker in a first language is presented with the voice characteristics of the speaker. To a listener of the translated speech this means that the experience is that the translated speech is experienced as directly spoken by the first speaker. The utilization of the sentence accents of the first speech and translation of these to the second speech further implies that the characteristics of the second speech is preserved, as well as the intonation at the translation. With the present invention consequently an instrument is given where a given speech at translation into a second
language is given a corresponding characteristic in the second language.
By the invention is given possibility for two persons to talk to each other in their mother tongues. Use of such systems are of current interest at telecommunication, communication physician/patient etc.
DESCRIPTION OF FIGURES
Fig. 1 shows the invention in the form of a block diagram. Fig. 2 shows a diagram over the fundamental tone variations over the fundamental tone declination.
Fig. 3 shows a curve over the fundamental tone variation divided by the fundamental tone declination.
DETAILED EMBODIMENT
In the following the invention is described on the basis of the figures and the terms therein.
Speech recognition equipments are since before well known to the expert within the speech recognition field. The fundamental functions in speech recognition equipments can be found in books as well as in periodicals. A first speech, speech 1, representing speech from a person, is received by a speech recognition equipment, A, which converts the speech into a text string. The speech recognition equipment evaluates different interpretations which can exist with regard to the interpretation of the speech. The selection of the most probable speech can be made in different ways, for instance by calculus of probability, interpretations of previous sequences in the speech, linguistic selection methods etc. The text string
which has been produced in the speech recognition equipment, A, is after that transferred to a translator, B, which translates the given speech to a text string in the second language. In the translator, B, the fundamental characteristics of the second language is added to the speech of the translated speech. The fundamental characteristics consist of normal accents and pitches in the language. In order to make a translated speech to give the impression that it is produced by the person in question, it is required that the person's voice characteristics is transferred to the second speech. Further is required that the intonation in the first language is translated into the second language to make it possible to preserve the meaning. Information regarding these voice characteristics are obtained by fundamental tone extraction. Parallelly with the speech recognition in A, the fundamental tone of the speech, speech 1, is extracted in a fundamental tone extractor, D. The fundamental tone is a combination of fundamental tone declination and fundamental tone variation. Fig.2. These components are separated from each other in E. A normalization of the fundamental tone after that takes place. The normalization means that the variation of the fundamental tone is divided by the declination of the fundamental tone, Fig.3. This information indicates the fundamental tone dynamics of the speaker in the first speech. The sentence accents in the first speech is further determined. The information regarding the sentence accents are transferred to a sentence accent translator, F, which also receives information regarding the translation from translator. The
specific sentence accents which have been identified for the first language now are translated into the second language. I.e. the sentence accents are placed in the second language with regard to the characteristics of the second language. The translation of the sentence accents are after that returned to the translator for linquistic control. The linguistic control includes that the accentuations are modified to the use of the second language. The in this way modified text string is after that transferred to a text-to speech-converter, C, and to a prosody converter, G. The prosody converter further receives information from the sentence accent translator, F, and fundamental tone information from E. In the prosody converter a prosody which is adapted to second language after that is generated. The information from the prosody generator, G, is after that transferred to the text-to- speech converter for generation of a speech, speech 2, the synthesis of which essentially corresponds to the synthesis of the first speech.
The invention is not restricted to the above as example shown example or parts of the following patent claims but may be subject to modifications within the frame of the idea of invention.
Claims
PATENT CLAIMS
1. Method at speech-to-speech translation, where a first speech, representing a first language, is recognized and translated into a speech in a second language, c h a r a c t e r i z e d in that the fundamental tone information of the first speech is translated into the second language, and the second speech is produced with a pitch and a fundamental tone dynamics which is in accordance with the first speech. 2_. Method according to patent claim 1, c h a r a c t e r i z e d in that the fundamental tone of the first speech is normalized and that the sentence accents of the first speech are extracted.
3. Method according to patent claim 1 or 2, c h a r a c t e r i z e d in that the sentence accents are translated into the second language.
4. Method according to any of the previous patent claims, c h a r a c t e r i z e d in that information regarding the pitch and fundamental tone dynamics of the first speech is transferred to a prosody generator.
5. Method according to any of the previous patent claims, c h a r a c t e r i z e d in that the first speech is transformed to a first text which is translated into a second text in the second language. 6. Method according to any of the previous patent claims, c h a r a c t e r i z e d in that the sentence accent translation influences the prosody presentation which influences the presentation of the second speech. 7. Method according to any of the previous patent claims, c h a r a c t e r i z e d in that the fundamental tone dynamics of the incoming voice is given by maximum of
the fundamental tone variation of the first speech, divided by the fundamental tone declination of the first speech where the fundamental tone declination indicates the pitch of the first speech. 8. Device at speech-to-speech translation, where a first speech, representing a first language, is recognized and translated into a second speech in a second language, c h a r a c t e r i z e d in that the fundamental tone information of the first speech is translated into the second language, at which the second speech is produced with a pitch and a fundamental tone dynamics corresponding to the first language.
9. Device according to patent claim 8, c h a r a c t e r i z e d in that the fundamental tone of the first speech is normalized and that the sentence accents are extracted.
10. Device according to patent claim 8 or 9, c h a r a c t e r i z e d in that the sentence accent information from the first speech is translated into the second language.
11. Device according to any of the patent claims 8-10, c h a r a c t e r i z e d in that the sentence accent information is arranged to influence the translation from the first language into the second language. 12. Device according to any of the patent claims 8-11, c h a r a c t e r i z e d in that the information regarding the pitch and the fundamental tone dynamics of the first speech is transferred to a prosody generator.
13. Device according to any of the patent claims 8-12, c h a r a c t e r i z e d in that the first speech is
transformed to a text in the second language in a translator.
14. Device according to any of the patent claims 8-13, c h a r a c t e r i z e d in that the prosody generator is influenced by the text and the sentence accent translation.
15. Device according to any of the patent claims 8-14, c h a r a c t e r i z e d in that the prosody generator is arranged to influence a text-to-speech converter which is arranged to produce the second speech from the text.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
SE9600959-2 | 1996-03-13 | ||
SE9600959A SE9600959L (en) | 1996-03-13 | 1996-03-13 | Speech-to-speech translation method and apparatus |
Publications (1)
Publication Number | Publication Date |
---|---|
WO1997034292A1 true WO1997034292A1 (en) | 1997-09-18 |
Family
ID=20401770
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/SE1997/000205 WO1997034292A1 (en) | 1996-03-13 | 1997-02-11 | Method and device at speech-to-speech translation |
Country Status (2)
Country | Link |
---|---|
SE (1) | SE9600959L (en) |
WO (1) | WO1997034292A1 (en) |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO1998043235A3 (en) * | 1997-03-25 | 1998-12-23 | Telia Ab | Device and method for prosody generation at visual synthesis |
WO1998043236A3 (en) * | 1997-03-25 | 1998-12-23 | Telia Ab | Method of speech synthesis |
EP1014277A1 (en) * | 1998-12-22 | 2000-06-28 | Nortel Networks Corporation | Communication system and method employing automatic language identification |
DE10107749A1 (en) * | 2001-02-16 | 2002-08-29 | Holger Ostermann | Worldwide international communication using a modular communication arrangement with speech recognition, translation capability, etc. |
WO2002084643A1 (en) * | 2001-04-11 | 2002-10-24 | International Business Machines Corporation | Speech-to-speech generation system and method |
ES2180392A1 (en) * | 2000-09-26 | 2003-02-01 | Crouy-Chanel Pablo Grosschmid | System, device, and installation of mechanized simultaneous language interpretation |
US7805307B2 (en) | 2003-09-30 | 2010-09-28 | Sharp Laboratories Of America, Inc. | Text to speech conversion system |
EP3491642A4 (en) * | 2016-08-01 | 2020-04-08 | Speech Morphing Systems, Inc. | Method to model and transfer prosody of tags across languages |
WO2021208531A1 (en) * | 2020-04-16 | 2021-10-21 | 北京搜狗科技发展有限公司 | Speech processing method and apparatus, and electronic device |
US20220084500A1 (en) * | 2018-01-11 | 2022-03-17 | Neosapience, Inc. | Multilingual text-to-speech synthesis |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP0624865A1 (en) * | 1993-05-10 | 1994-11-17 | Telia Ab | Arrangement for increasing the comprehension of speech when translating speech from a first language to a second language |
EP0664537A2 (en) * | 1993-11-03 | 1995-07-26 | Telia Ab | Method and arrangement in automatic extraction of prosodic information |
-
1996
- 1996-03-13 SE SE9600959A patent/SE9600959L/en not_active Application Discontinuation
-
1997
- 1997-02-11 WO PCT/SE1997/000205 patent/WO1997034292A1/en active Application Filing
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP0624865A1 (en) * | 1993-05-10 | 1994-11-17 | Telia Ab | Arrangement for increasing the comprehension of speech when translating speech from a first language to a second language |
EP0664537A2 (en) * | 1993-11-03 | 1995-07-26 | Telia Ab | Method and arrangement in automatic extraction of prosodic information |
Cited By (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO1998043236A3 (en) * | 1997-03-25 | 1998-12-23 | Telia Ab | Method of speech synthesis |
US6385580B1 (en) | 1997-03-25 | 2002-05-07 | Telia Ab | Method of speech synthesis |
US6389396B1 (en) | 1997-03-25 | 2002-05-14 | Telia Ab | Device and method for prosody generation at visual synthesis |
WO1998043235A3 (en) * | 1997-03-25 | 1998-12-23 | Telia Ab | Device and method for prosody generation at visual synthesis |
EP1014277A1 (en) * | 1998-12-22 | 2000-06-28 | Nortel Networks Corporation | Communication system and method employing automatic language identification |
ES2180392B1 (en) * | 2000-09-26 | 2004-07-16 | Pablo Grosschmid Crouy-Chanel | DEVICE SYSTEM AND INSTALLATION OF SIMULTANEOUS MECHANIZED LANGUAGE INTERPRETATION. |
ES2180392A1 (en) * | 2000-09-26 | 2003-02-01 | Crouy-Chanel Pablo Grosschmid | System, device, and installation of mechanized simultaneous language interpretation |
DE10107749A1 (en) * | 2001-02-16 | 2002-08-29 | Holger Ostermann | Worldwide international communication using a modular communication arrangement with speech recognition, translation capability, etc. |
WO2002084643A1 (en) * | 2001-04-11 | 2002-10-24 | International Business Machines Corporation | Speech-to-speech generation system and method |
US7461001B2 (en) | 2001-04-11 | 2008-12-02 | International Business Machines Corporation | Speech-to-speech generation system and method |
US7805307B2 (en) | 2003-09-30 | 2010-09-28 | Sharp Laboratories Of America, Inc. | Text to speech conversion system |
EP3491642A4 (en) * | 2016-08-01 | 2020-04-08 | Speech Morphing Systems, Inc. | Method to model and transfer prosody of tags across languages |
US20220084500A1 (en) * | 2018-01-11 | 2022-03-17 | Neosapience, Inc. | Multilingual text-to-speech synthesis |
US11769483B2 (en) * | 2018-01-11 | 2023-09-26 | Neosapience, Inc. | Multilingual text-to-speech synthesis |
WO2021208531A1 (en) * | 2020-04-16 | 2021-10-21 | 北京搜狗科技发展有限公司 | Speech processing method and apparatus, and electronic device |
Also Published As
Publication number | Publication date |
---|---|
SE9600959D0 (en) | 1996-03-13 |
SE9600959L (en) | 1997-09-14 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112435650B (en) | Multi-speaker and multi-language voice synthesis method and system | |
EP0624865B1 (en) | Arrangement for increasing the comprehension of speech when translating speech from a first language to a second language | |
US20040073423A1 (en) | Phonetic speech-to-text-to-speech system and method | |
EP0749109A3 (en) | Speech recognition for tonal languages | |
JP2005502102A (en) | Speech-speech generation system and method | |
US20070088547A1 (en) | Phonetic speech-to-text-to-speech system and method | |
JP3616250B2 (en) | Synthetic voice message creation method, apparatus and recording medium recording the method | |
WO1997034292A1 (en) | Method and device at speech-to-speech translation | |
EP0664537B1 (en) | Method and arrangement in automatic extraction of prosodic information | |
US11783813B1 (en) | Methods and systems for improving word discrimination with phonologically-trained machine learning models | |
US20070203703A1 (en) | Speech Synthesizing Apparatus | |
JP7406418B2 (en) | Voice quality conversion system and voice quality conversion method | |
JPH0580791A (en) | Device and method for speech rule synthesis | |
Smith et al. | Clinical applications of speech synthesis | |
KR102747987B1 (en) | Voice synthesizer learning method using synthesized sounds for disentangling language, pronunciation/prosody, and speaker information | |
Banerjee et al. | Prosody Labelled Dataset for Hindi | |
Banerjee et al. | Prosody Labelled Dataset for Hindi using Semi-Automated Approach | |
JPH0323500A (en) | text to speech synthesizer | |
Kuo et al. | An NN-based approach to prosody generation for English word spelling in English-Chinese bilingual TTS | |
JP2001166787A (en) | Speech synthesizer and natural language processing method | |
JPH05313685A (en) | Document loud reading device | |
Rizk et al. | Arabic text to speech synthesizer: Arabic letter to sound rules | |
JP2578876B2 (en) | Text-to-speech device | |
KR19980065482A (en) | Speech synthesis method to change the speaking style | |
KR100194814B1 (en) | Text-to-speech converter using multilevel input information and its method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AK | Designated states |
Kind code of ref document: A1 Designated state(s): JP NO US |
|
AL | Designated countries for regional patents |
Kind code of ref document: A1 Designated state(s): AT BE CH DE DK ES FI FR GB GR IE IT LU MC NL PT SE |
|
DFPE | Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101) | ||
121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
NENP | Non-entry into the national phase |
Ref country code: JP Ref document number: 97532500 Format of ref document f/p: F |
|
122 | Ep: pct application non-entry in european phase |