CN110767233A - Voice conversion system and method - Google Patents
Voice conversion system and method Download PDFInfo
- Publication number
- CN110767233A CN110767233A CN201911042474.1A CN201911042474A CN110767233A CN 110767233 A CN110767233 A CN 110767233A CN 201911042474 A CN201911042474 A CN 201911042474A CN 110767233 A CN110767233 A CN 110767233A
- Authority
- CN
- China
- Prior art keywords
- module
- voice
- information
- file
- speech
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000006243 chemical reaction Methods 0.000 title claims abstract description 31
- 238000000034 method Methods 0.000 title claims abstract description 14
- 241001672694 Citrus reticulata Species 0.000 claims abstract description 16
- 238000012545 processing Methods 0.000 claims abstract description 12
- 238000004891 communication Methods 0.000 claims description 6
- 230000001360 synchronised effect Effects 0.000 abstract description 4
- 230000006870 function Effects 0.000 description 10
- 238000013500 data storage Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/26—Speech to text systems
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/06—Protocols specially adapted for file transfer, e.g. file transfer protocol [FTP]
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Signal Processing For Digital Recording And Reproducing (AREA)
- Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
- Television Signal Processing For Recording (AREA)
Abstract
The invention discloses a voice conversion system and a method, belonging to the technical field of voice conversion, and comprising a recording module, a voice-to-character module, a character-to-voice module, a dubbing module, a subtitle module and a storage module; the voice-to-text module is used for converting voice information into text information; the text-to-speech module is used for converting text information into speech information; the dubbing module is used for processing the voice information into a voice file; the method comprises the following steps: s1: recording and storing sound as a recording audio file; s2: converting the audio file recorded in the step S1 into character information; s3: converting the text information in step S2 into voice information; s4: processing and converting the voice information in the step S3 into an audio file; s5: converting the text information in the step S1 into a subtitle file; s6: the audio file and the subtitle file in step S3 are stored and played. The scheme realizes mandarin and dialect conversion, pause control and synchronous subtitle display.
Description
Technical Field
The present invention relates to the field of voice conversion technology, and more particularly, to a voice conversion system and method.
Background
The language is the most important communication tool for human, is the main expression mode for people to communicate, and is one of the important characteristics of nationality by means of the achievement of storing and transmitting human civilization by the language, and generally speaking, each nationality has own language which is the medium of communication thought of people and can necessarily influence politics, economy, society, science and technology and even culture. The cultural phenomenon of language is continuously developed, and the current spatial distribution is also the result of the past development. The languages in the world are divided into language systems according to the common features and origin relations of the characteristics of the languages, the grammar, the vocabularies and the like. Each language family includes different languages, and the languages and the language family have a certain distribution area in regions, and many cultural characteristics have close relations with the languages.
The voice conversion system in the prior art mainly aims at conversion between the common speech and the foreign languages, but dialects in China are more popular, and each place has a local dialect, so that people mostly adopt the common speech to communicate when communicating, and the communication becomes a difficult problem for people who are not smooth in the common speech. Or in some cases, it may be more convenient and appropriate to require dialects to communicate, but local dialects are not said to be good.
Disclosure of Invention
In view of the deficiencies of the prior art, the present invention provides a system and a method for converting speech to achieve mandarin and dialect conversion, pause control and synchronous caption display.
The purpose of the invention can be realized by the following technical scheme:
a voice conversion system comprises a recording module, a voice-to-character module, a character-to-voice module, a dubbing module, a subtitle module and a storage module; the recording module is used for recording sound and forming audio information; the voice-to-text module is used for converting voice information into text information; the text-to-speech module is used for converting text information into speech information; the dubbing module is used for processing the voice information into a voice file; the caption module is used for converting the text information into a caption file; the storage module stores audio files and subtitle files, the storage module is connected with a server through the Internet, the server stores the audio files, the storage module comprises an uploading function and a downloading function, the uploading function is to upload the audio files in the storage module to the server, and the downloading function is to download the audio files in the server to the storage module.
As a preferable scheme of the present invention, the recording module further includes a recording device.
As a preferred embodiment of the present invention, the speech-to-text module further comprises speech recognition for recognizing mandarin and dialect.
As a preferred scheme of the present invention, the text-to-speech module further comprises a speech setting, and the speech setting converts text into speech information of mandarin or dialect.
As a preferred aspect of the present invention, the dubbing module processes the voice information into a voice file, and the processing includes setting a pause of the voice information, where the pause includes a position of the pause and a time of the pause.
As a preferred scheme of the present invention, the subtitle module includes setting a font of a subtitle file, and setting a font size, a font color, and a font background color.
As a preferred scheme of the present invention, the storage module is connected to the server in a wireless communication manner.
A method of speech conversion comprising the steps of:
s1: recording and storing sound as a recording audio file;
s2: converting the audio file recorded in the step S1 into character information;
s3: converting the text information in step S2 into voice information;
s4: processing and converting the voice information in the step S3 into an audio file;
s5: converting the text information in the step S1 into a subtitle file;
s6: the audio file and the subtitle file in step S3 are stored and played.
As a preferred embodiment of the present invention, the subtitle file is played synchronously with the audio file being played.
The invention has the beneficial effects that:
the voice conversion system comprises a recording module, a voice-to-character module, a character-to-voice module, a dubbing module, a subtitle module and a storage module, wherein the voice-to-character module and the character-to-voice module are used for realizing conversion between Mandarin and dialects, the dubbing module realizes a pause function and is convenient for listeners to understand and learn, and the storage module is used for storing audio files and subtitle files to realize data storage on a server. The caption module realizes that when a voice file is played after voice conversion is finished, a caption synchronous with voice is formed, so that a listener can understand and learn conveniently, and the caption can also be set with font size, font color and font background color, so that different contents can be emphasized.
Drawings
In order to more clearly illustrate the embodiments or technical solutions in the prior art of the present invention, the drawings used in the description of the embodiments or prior art will be briefly described below, and it is obvious for those skilled in the art that other drawings can be obtained based on these drawings without creative efforts.
FIG. 1 is a block diagram of a voice conversion system according to the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
As shown in fig. 1, a voice conversion system includes a recording module, a voice-to-text module, a text-to-voice module, a dubbing module, a caption module, and a storage module; the recording module is used for recording sound and forming audio information; the voice-to-text module is used for converting voice information into text information; the text-to-speech module is used for converting text information into speech information; the dubbing module is used for processing the voice information into a voice file; the caption module is used for converting the text information into a caption file; the storage module stores audio files and subtitle files, the storage module is connected with a server through the Internet, the server stores the audio files, the storage module comprises an uploading function and a downloading function, the uploading function is to upload the audio files in the storage module to the server, and the downloading function is to download the audio files in the server to the storage module. The recording module also comprises a recording device. The speech-to-text module also includes speech recognition, recognizing mandarin and dialect. The text-to-speech module also includes speech settings that convert text to speech information for mandarin or dialect. The dubbing module processes the voice information into a voice file, and the processing comprises setting pause of the voice information, wherein the pause comprises a pause position and a pause time. The caption module comprises the setting of the font of the caption file, and the setting of the font size, the font color and the font background color. The storage module is connected with the server in a wireless communication mode. The system comprises a speech-to-text module, a speech-to-speech module, a dubbing module, a storage module and a storage module, wherein the speech-to-text module and the text-to-speech module are used for realizing conversion between Mandarin and dialects, the dubbing module realizes a pause function, listeners can conveniently understand and learn the system, and the storage module is used for storing audio files and subtitle files to realize data storage on a server. The caption module realizes that when a voice file is played after voice conversion is finished, a caption synchronous with voice is formed, so that a listener can understand and learn conveniently, and the caption can also be set with font size, font color and font background color, so that different contents can be emphasized.
A method of speech conversion comprising the steps of:
s1: recording and storing sound as a recording audio file;
s2: converting the audio file recorded in the step S1 into character information;
s3: converting the text information in step S2 into voice information;
s4: processing and converting the voice information in the step S3 into an audio file;
s5: converting the text information in the step S1 into a subtitle file;
s6: the audio file and the subtitle file in step S3 are stored and played.
And the subtitle file and the played audio file are synchronously played.
The mandarin Chinese to dialect sound recording module records dialects by utilizing a sound recording device and forms dialect audio information, the dialect sound to character conversion module converts dialect sound information into character information, the character to speech module converts the character information into mandarin sound information, the dubbing module is used for processing the mandarin sound information into a mandarin sound file and adding proper pause so as to be convenient for listening and understanding, the subtitle module converts the character information into a subtitle file and sets font size, font color and font background color, the storage module stores the mandarin sound file and the matched subtitle file in a server in a system or on the internet and plays the mandarin sound file and the matched subtitle file, and sound and subtitles are synchronously displayed and easily understood by listeners.
In the description herein, references to the description of "one embodiment," "an example," "a specific example" or the like are intended to mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
The foregoing shows and describes the general principles, essential features, and advantages of the invention. It will be understood by those skilled in the art that the present invention is not limited to the embodiments described above, which are described in the specification and illustrated only to illustrate the principle of the present invention, but that various changes and modifications may be made therein without departing from the spirit and scope of the present invention, which fall within the scope of the invention as claimed.
Claims (9)
1. A speech conversion system, characterized by: the system comprises a recording module, a voice-to-character module, a character-to-voice module, a dubbing module, a subtitle module and a storage module; the recording module is used for recording sound and forming audio information; the voice-to-text module is used for converting voice information into text information; the text-to-speech module is used for converting text information into speech information; the dubbing module is used for processing the voice information into a voice file; the caption module is used for converting the text information into a caption file; the storage module stores audio files and subtitle files, the storage module is connected with a server through the Internet, the server stores the audio files, the storage module comprises an uploading function and a downloading function, the uploading function is to upload the audio files in the storage module to the server, and the downloading function is to download the audio files in the server to the storage module.
2. A speech conversion system according to claim 1, characterized in that: the recording module also comprises a recording device.
3. A speech conversion system according to claim 1, characterized in that: the voice-to-text module further comprises voice recognition and identifies mandarin and dialect.
4. A speech conversion system according to claim 1, characterized in that: the text-to-speech module also comprises speech setting, and the speech setting converts the text into speech information of Mandarin or dialect.
5. A speech conversion system according to claim 1, characterized in that: and the dubbing module processes the voice information into a voice file, wherein the processing comprises setting pause of the voice information, and the pause comprises a pause position and pause time.
6. A speech conversion system according to claim 1, characterized in that: the subtitle module comprises a subtitle file font, and font size, font color and font background color are set.
7. A speech conversion system according to claim 1, characterized in that: the storage module is connected with the server in a wireless communication mode.
8. A speech conversion method of a speech conversion system according to claim 1, characterized in that: the method comprises the following steps:
s1: recording and storing sound as a recording audio file;
s2: converting the audio file recorded in the step S1 into character information;
s3: converting the text information in step S2 into voice information;
s4: processing and converting the voice information in the step S3 into an audio file;
s5: converting the text information in the step S1 into a subtitle file;
s6: the audio file and the subtitle file in step S3 are stored and played.
9. The speech conversion method of claim 8, wherein: and synchronously playing the subtitle file and the played audio file.
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201911042474.1A CN110767233A (en) | 2019-10-30 | 2019-10-30 | Voice conversion system and method |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201911042474.1A CN110767233A (en) | 2019-10-30 | 2019-10-30 | Voice conversion system and method |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| CN110767233A true CN110767233A (en) | 2020-02-07 |
Family
ID=69334617
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN201911042474.1A Pending CN110767233A (en) | 2019-10-30 | 2019-10-30 | Voice conversion system and method |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN110767233A (en) |
Cited By (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN111358066A (en) * | 2020-03-10 | 2020-07-03 | 中国人民解放军陆军军医大学第一附属医院 | Protective clothing based on speech recognition |
| CN112492342A (en) * | 2020-12-01 | 2021-03-12 | 南京翰氜信息科技有限公司 | E-commerce video live broadcast platform based on cloud computing data analysis |
| CN114900724A (en) * | 2022-05-25 | 2022-08-12 | 龙宇天下(北京)文化传媒有限公司 | Intelligent television terminal based on internet |
Citations (7)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN101667424A (en) * | 2008-09-04 | 2010-03-10 | 英业达股份有限公司 | Speech translation system between Mandarin and multiple dialects and method thereof |
| US20130030789A1 (en) * | 2011-07-29 | 2013-01-31 | Reginald Dalce | Universal Language Translator |
| CN103491429A (en) * | 2013-09-04 | 2014-01-01 | 张家港保税区润桐电子技术研发有限公司 | Audio processing method and audio processing equipment |
| CN106791913A (en) * | 2016-12-30 | 2017-05-31 | 深圳市九洲电器有限公司 | Digital television program simultaneous interpretation output intent and system |
| CN107465887A (en) * | 2017-09-14 | 2017-12-12 | 潍坊学院 | Video call system and video call method |
| CN107750009A (en) * | 2017-10-27 | 2018-03-02 | 深圳市联谛信息无障碍有限责任公司 | A kind of method that the plug-in captions of video file are synchronously read aloud using Android device |
| CN109660672A (en) * | 2019-01-09 | 2019-04-19 | 浙江强脑科技有限公司 | Conversion method, equipment and the computer readable storage medium of sound-type |
-
2019
- 2019-10-30 CN CN201911042474.1A patent/CN110767233A/en active Pending
Patent Citations (7)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN101667424A (en) * | 2008-09-04 | 2010-03-10 | 英业达股份有限公司 | Speech translation system between Mandarin and multiple dialects and method thereof |
| US20130030789A1 (en) * | 2011-07-29 | 2013-01-31 | Reginald Dalce | Universal Language Translator |
| CN103491429A (en) * | 2013-09-04 | 2014-01-01 | 张家港保税区润桐电子技术研发有限公司 | Audio processing method and audio processing equipment |
| CN106791913A (en) * | 2016-12-30 | 2017-05-31 | 深圳市九洲电器有限公司 | Digital television program simultaneous interpretation output intent and system |
| CN107465887A (en) * | 2017-09-14 | 2017-12-12 | 潍坊学院 | Video call system and video call method |
| CN107750009A (en) * | 2017-10-27 | 2018-03-02 | 深圳市联谛信息无障碍有限责任公司 | A kind of method that the plug-in captions of video file are synchronously read aloud using Android device |
| CN109660672A (en) * | 2019-01-09 | 2019-04-19 | 浙江强脑科技有限公司 | Conversion method, equipment and the computer readable storage medium of sound-type |
Cited By (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN111358066A (en) * | 2020-03-10 | 2020-07-03 | 中国人民解放军陆军军医大学第一附属医院 | Protective clothing based on speech recognition |
| CN112492342A (en) * | 2020-12-01 | 2021-03-12 | 南京翰氜信息科技有限公司 | E-commerce video live broadcast platform based on cloud computing data analysis |
| CN114900724A (en) * | 2022-05-25 | 2022-08-12 | 龙宇天下(北京)文化传媒有限公司 | Intelligent television terminal based on internet |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US7490039B1 (en) | Text to speech system and method having interactive spelling capabilities | |
| JP5750380B2 (en) | Speech translation apparatus, speech translation method, and speech translation program | |
| JP6113302B2 (en) | Audio data transmission method and apparatus | |
| US7644000B1 (en) | Adding audio effects to spoken utterance | |
| US7124082B2 (en) | Phonetic speech-to-text-to-speech system and method | |
| KR20230165395A (en) | End-to-end speech conversion | |
| JP2005502102A (en) | Speech-speech generation system and method | |
| CN106328146A (en) | Video subtitle generating method and device | |
| KR20190005103A (en) | Electronic device-awakening method and apparatus, device and computer-readable storage medium | |
| CN110767233A (en) | Voice conversion system and method | |
| CN103020048A (en) | Method and system for language translation | |
| KR20200027331A (en) | Voice synthesis device | |
| JP2012181358A (en) | Text display time determination device, text display system, method, and program | |
| CN108628859A (en) | A kind of real-time voice translation system | |
| JP2011504624A (en) | Automatic simultaneous interpretation system | |
| CN114464180A (en) | Intelligent device and intelligent voice interaction method | |
| US6308154B1 (en) | Method of natural language communication using a mark-up language | |
| JP2000207170A (en) | Device and method for processing information | |
| CN109460548B (en) | Intelligent robot-oriented story data processing method and system | |
| CN113851140A (en) | Voice conversion correlation method, system and device | |
| Mihelič et al. | Spoken language resources at LUKS of the University of Ljubljana | |
| CN109065019A (en) | A kind of narration data processing method and system towards intelligent robot | |
| CN110851564B (en) | Voice data processing method and related device | |
| KR101920653B1 (en) | Method and program for edcating language by making comparison sound | |
| CN201585019U (en) | Mobile terminal with voice conversion function |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination |