CN110767233A

CN110767233A - Voice conversion system and method

Info

Publication number: CN110767233A
Application number: CN201911042474.1A
Authority: CN
Inventors: 陈阳; 鲁永春; 王周
Original assignee: Hefei Mingyang Information Technology Co Ltd
Current assignee: Hefei Mingyang Information Technology Co Ltd
Priority date: 2019-10-30
Filing date: 2019-10-30
Publication date: 2020-02-07

Abstract

The invention discloses a voice conversion system and a method, belonging to the technical field of voice conversion, and comprising a recording module, a voice-to-character module, a character-to-voice module, a dubbing module, a subtitle module and a storage module; the voice-to-text module is used for converting voice information into text information; the text-to-speech module is used for converting text information into speech information; the dubbing module is used for processing the voice information into a voice file; the method comprises the following steps: s1: recording and storing sound as a recording audio file; s2: converting the audio file recorded in the step S1 into character information; s3: converting the text information in step S2 into voice information; s4: processing and converting the voice information in the step S3 into an audio file; s5: converting the text information in the step S1 into a subtitle file; s6: the audio file and the subtitle file in step S3 are stored and played. The scheme realizes mandarin and dialect conversion, pause control and synchronous subtitle display.

Description

Voice conversion system and method

Technical Field

The present invention relates to the field of voice conversion technology, and more particularly, to a voice conversion system and method.

Background

The language is the most important communication tool for human, is the main expression mode for people to communicate, and is one of the important characteristics of nationality by means of the achievement of storing and transmitting human civilization by the language, and generally speaking, each nationality has own language which is the medium of communication thought of people and can necessarily influence politics, economy, society, science and technology and even culture. The cultural phenomenon of language is continuously developed, and the current spatial distribution is also the result of the past development. The languages in the world are divided into language systems according to the common features and origin relations of the characteristics of the languages, the grammar, the vocabularies and the like. Each language family includes different languages, and the languages and the language family have a certain distribution area in regions, and many cultural characteristics have close relations with the languages.

The voice conversion system in the prior art mainly aims at conversion between the common speech and the foreign languages, but dialects in China are more popular, and each place has a local dialect, so that people mostly adopt the common speech to communicate when communicating, and the communication becomes a difficult problem for people who are not smooth in the common speech. Or in some cases, it may be more convenient and appropriate to require dialects to communicate, but local dialects are not said to be good.

Disclosure of Invention

In view of the deficiencies of the prior art, the present invention provides a system and a method for converting speech to achieve mandarin and dialect conversion, pause control and synchronous caption display.

The purpose of the invention can be realized by the following technical scheme:

a voice conversion system comprises a recording module, a voice-to-character module, a character-to-voice module, a dubbing module, a subtitle module and a storage module; the recording module is used for recording sound and forming audio information; the voice-to-text module is used for converting voice information into text information; the text-to-speech module is used for converting text information into speech information; the dubbing module is used for processing the voice information into a voice file; the caption module is used for converting the text information into a caption file; the storage module stores audio files and subtitle files, the storage module is connected with a server through the Internet, the server stores the audio files, the storage module comprises an uploading function and a downloading function, the uploading function is to upload the audio files in the storage module to the server, and the downloading function is to download the audio files in the server to the storage module.

As a preferable scheme of the present invention, the recording module further includes a recording device.

As a preferred embodiment of the present invention, the speech-to-text module further comprises speech recognition for recognizing mandarin and dialect.

As a preferred scheme of the present invention, the text-to-speech module further comprises a speech setting, and the speech setting converts text into speech information of mandarin or dialect.

As a preferred aspect of the present invention, the dubbing module processes the voice information into a voice file, and the processing includes setting a pause of the voice information, where the pause includes a position of the pause and a time of the pause.

As a preferred scheme of the present invention, the subtitle module includes setting a font of a subtitle file, and setting a font size, a font color, and a font background color.

As a preferred scheme of the present invention, the storage module is connected to the server in a wireless communication manner.

A method of speech conversion comprising the steps of:

s1: recording and storing sound as a recording audio file;

s2: converting the audio file recorded in the step S1 into character information;

s3: converting the text information in step S2 into voice information;

s4: processing and converting the voice information in the step S3 into an audio file;

s5: converting the text information in the step S1 into a subtitle file;

s6: the audio file and the subtitle file in step S3 are stored and played.

As a preferred embodiment of the present invention, the subtitle file is played synchronously with the audio file being played.

The invention has the beneficial effects that:

the voice conversion system comprises a recording module, a voice-to-character module, a character-to-voice module, a dubbing module, a subtitle module and a storage module, wherein the voice-to-character module and the character-to-voice module are used for realizing conversion between Mandarin and dialects, the dubbing module realizes a pause function and is convenient for listeners to understand and learn, and the storage module is used for storing audio files and subtitle files to realize data storage on a server. The caption module realizes that when a voice file is played after voice conversion is finished, a caption synchronous with voice is formed, so that a listener can understand and learn conveniently, and the caption can also be set with font size, font color and font background color, so that different contents can be emphasized.

Drawings

In order to more clearly illustrate the embodiments or technical solutions in the prior art of the present invention, the drawings used in the description of the embodiments or prior art will be briefly described below, and it is obvious for those skilled in the art that other drawings can be obtained based on these drawings without creative efforts.

FIG. 1 is a block diagram of a voice conversion system according to the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

As shown in fig. 1, a voice conversion system includes a recording module, a voice-to-text module, a text-to-voice module, a dubbing module, a caption module, and a storage module; the recording module is used for recording sound and forming audio information; the voice-to-text module is used for converting voice information into text information; the text-to-speech module is used for converting text information into speech information; the dubbing module is used for processing the voice information into a voice file; the caption module is used for converting the text information into a caption file; the storage module stores audio files and subtitle files, the storage module is connected with a server through the Internet, the server stores the audio files, the storage module comprises an uploading function and a downloading function, the uploading function is to upload the audio files in the storage module to the server, and the downloading function is to download the audio files in the server to the storage module. The recording module also comprises a recording device. The speech-to-text module also includes speech recognition, recognizing mandarin and dialect. The text-to-speech module also includes speech settings that convert text to speech information for mandarin or dialect. The dubbing module processes the voice information into a voice file, and the processing comprises setting pause of the voice information, wherein the pause comprises a pause position and a pause time. The caption module comprises the setting of the font of the caption file, and the setting of the font size, the font color and the font background color. The storage module is connected with the server in a wireless communication mode. The system comprises a speech-to-text module, a speech-to-speech module, a dubbing module, a storage module and a storage module, wherein the speech-to-text module and the text-to-speech module are used for realizing conversion between Mandarin and dialects, the dubbing module realizes a pause function, listeners can conveniently understand and learn the system, and the storage module is used for storing audio files and subtitle files to realize data storage on a server. The caption module realizes that when a voice file is played after voice conversion is finished, a caption synchronous with voice is formed, so that a listener can understand and learn conveniently, and the caption can also be set with font size, font color and font background color, so that different contents can be emphasized.

A method of speech conversion comprising the steps of:

s1: recording and storing sound as a recording audio file;

s3: converting the text information in step S2 into voice information;

s5: converting the text information in the step S1 into a subtitle file;

s6: the audio file and the subtitle file in step S3 are stored and played.

And the subtitle file and the played audio file are synchronously played.

The mandarin Chinese to dialect sound recording module records dialects by utilizing a sound recording device and forms dialect audio information, the dialect sound to character conversion module converts dialect sound information into character information, the character to speech module converts the character information into mandarin sound information, the dubbing module is used for processing the mandarin sound information into a mandarin sound file and adding proper pause so as to be convenient for listening and understanding, the subtitle module converts the character information into a subtitle file and sets font size, font color and font background color, the storage module stores the mandarin sound file and the matched subtitle file in a server in a system or on the internet and plays the mandarin sound file and the matched subtitle file, and sound and subtitles are synchronously displayed and easily understood by listeners.

In the description herein, references to the description of "one embodiment," "an example," "a specific example" or the like are intended to mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.

The foregoing shows and describes the general principles, essential features, and advantages of the invention. It will be understood by those skilled in the art that the present invention is not limited to the embodiments described above, which are described in the specification and illustrated only to illustrate the principle of the present invention, but that various changes and modifications may be made therein without departing from the spirit and scope of the present invention, which fall within the scope of the invention as claimed.

Claims

1. A speech conversion system, characterized by: the system comprises a recording module, a voice-to-character module, a character-to-voice module, a dubbing module, a subtitle module and a storage module; the recording module is used for recording sound and forming audio information; the voice-to-text module is used for converting voice information into text information; the text-to-speech module is used for converting text information into speech information; the dubbing module is used for processing the voice information into a voice file; the caption module is used for converting the text information into a caption file; the storage module stores audio files and subtitle files, the storage module is connected with a server through the Internet, the server stores the audio files, the storage module comprises an uploading function and a downloading function, the uploading function is to upload the audio files in the storage module to the server, and the downloading function is to download the audio files in the server to the storage module.

2. A speech conversion system according to claim 1, characterized in that: the recording module also comprises a recording device.

3. A speech conversion system according to claim 1, characterized in that: the voice-to-text module further comprises voice recognition and identifies mandarin and dialect.

4. A speech conversion system according to claim 1, characterized in that: the text-to-speech module also comprises speech setting, and the speech setting converts the text into speech information of Mandarin or dialect.

5. A speech conversion system according to claim 1, characterized in that: and the dubbing module processes the voice information into a voice file, wherein the processing comprises setting pause of the voice information, and the pause comprises a pause position and pause time.

6. A speech conversion system according to claim 1, characterized in that: the subtitle module comprises a subtitle file font, and font size, font color and font background color are set.

7. A speech conversion system according to claim 1, characterized in that: the storage module is connected with the server in a wireless communication mode.

8. A speech conversion method of a speech conversion system according to claim 1, characterized in that: the method comprises the following steps:

s1: recording and storing sound as a recording audio file;

s3: converting the text information in step S2 into voice information;

s5: converting the text information in the step S1 into a subtitle file;

s6: the audio file and the subtitle file in step S3 are stored and played.

9. The speech conversion method of claim 8, wherein: and synchronously playing the subtitle file and the played audio file.