CN104702856A

CN104702856A - System device and method for synthesizing MV with real-time Selfie special effects applied to accompaniment player

Info

Publication number: CN104702856A
Application number: CN201310671480.XA
Authority: CN
Inventors: 庄嘉宾
Original assignee: INYUAN TECHNOLOGY Inc
Current assignee: INYUAN TECHNOLOGY Inc
Priority date: 2013-12-10
Filing date: 2013-12-10
Publication date: 2015-06-10

Abstract

A system device and method for synthesizing MV with special effect of autodyne in real time, which is applied to the karaoke, the device includes an image input interface, in order to produce an original image data; the special effect image library is used for storing at least one piece of special effect image data; the lyric caption library is used for storing lyric caption data of at least one song; the dynamic image synthesis processing unit is coupled with the image input interface and the special effect image library, and reads the at least one special effect image data from the special effect image library and carries out real-time synthesis processing on the original image data so as to generate synthetic image data; the core processing unit is coupled with the lyric caption library, is in communication connection with the dynamic image synthesis processing unit, and reads lyric caption data of the at least one song from the lyric caption library to carry out superposition processing and word coating processing; and the image output interface is coupled with the core processing unit and used for outputting the synthesized image data and the lyric caption data of the at least one song to a display device.

Description

System device and method for synthesizing MV with real-time Selfie special effects applied to accompaniment player

技术领域technical field

本发明有关于一种系统装置及其方法，特别是指应用于伴唱机的实时自拍特效合成MV的系统装置及方法。The present invention relates to a system device and method thereof, in particular to a system device and method for synthesizing MV with real-time Selfie special effects applied to an accompaniment player.

背景技术Background technique

一般卡拉OK设备在播放时，多使用内建预置或外部储存的风景、剧情、情境影片或图片当背景，并在其上方迭上歌词字幕，或直接以原歌星的音乐录像带(MV)播放，提供使用者伴唱。然而，使用者在进行歌唱时，大多会想成为音乐录像带(MV)中的主角，而目前较进阶的卡拉OK设备虽然可以让用户连接外部影像输入设备，例如摄影机，但只能撷取现场单调原始用户影像，并无法实时产生类似MV剪辑含有用户影像搭配各式歌曲情境变化的动态视觉效果，然而若想利用计算机后制达到此效果，不仅仅缺少实时性，又是一笔庞大的花费。Generally, when playing karaoke equipment, the built-in preset or externally stored scenery, plot, situational video or picture is used as the background, and the lyrics and subtitles are superimposed on it, or the music video (MV) of the original singer is played directly. , providing user accompaniment. However, most users want to be the main character in the music video (MV) when they are singing. Although the current more advanced karaoke equipment can allow users to connect external image input devices, such as video cameras, they can only capture live images. Monotonous original user images cannot produce dynamic visual effects similar to MV clips containing user images and various song scene changes in real time. However, if you want to use computer post-production to achieve this effect, not only lack of real-time performance, but also a huge cost .

发明内容Contents of the invention

本发明的目的之一在于提供一种应用于伴唱机(点歌机/卡拉OK/Karaoke)的实时自拍特效合成MV(Music Video)的系统装置及方法，能实时显示或录制含有用户搭配各式各样的情境及视觉特效。One of the purposes of the present invention is to provide a system device and method for synthesizing MV (Music Video) with real-time self-portraits and special effects applied to a singing machine (Karaoke machine/Karaoke/Karaoke), which can display or record in real time the content of the user's collocation. Various situations and visual effects.

为了达到上面所描述的，本发明提供应用于伴唱机的实时自拍特效合成MV的系统装置，包括：In order to achieve what is described above, the present invention provides a system device for synthesizing MV with real-time Selfie special effects applied to accompaniment players, including:

一影像输入接口，用以输入一影像输出装置所输出的影像，以产生一原始影像数据；an image input interface for inputting an image output by an image output device to generate an original image data;

一特效影像库，用以储放至少一特效影像数据；a special effect image database for storing at least one special effect image data;

一歌词字幕库，用以储放至少一首歌曲的歌词字幕数据；A lyric and subtitle library for storing the lyric and subtitle data of at least one song;

一动态影像合成处理单元，耦接所述影像输入接口及所述特效影像库，用以接收及处理所述原始影像数据，并由所述特效影像库读取所述至少一特效影像数据与所述原始影像数据进行实时合成处理，以产生一合成影像数据；A dynamic image synthesis processing unit, coupled to the image input interface and the special effect image library, for receiving and processing the original image data, and reading the at least one special effect image data and the special effect image data from the special effect image library performing real-time composite processing on the original image data to generate composite image data;

一核心处理单元，耦接所述歌词字幕库并与所述动态影像合成处理单元通讯连接，用以接收及处理所述合成影像数据，并由所述歌词字幕库读取所述至少一首歌曲的歌词字幕数据并与所述合成影像数据进行迭加处理及涂字处理；及A core processing unit, coupled to the lyrics and subtitle library and communicated with the dynamic image synthesis processing unit, for receiving and processing the synthesized image data, and reading the at least one song from the lyrics and subtitle library Lyric subtitle data and superimposed processing and doodling processing with the synthetic image data; and

一影像输出接口，耦接所述核心处理单元，用以输出所述合成影像数据及所述至少一首歌曲的歌词字幕数据至一显示设备，使所述显示设备显示所述合成影像数据及所述至少一首歌曲的歌词字幕数据。An image output interface, coupled to the core processing unit, for outputting the synthesized image data and the lyrics and subtitle data of the at least one song to a display device, so that the display device can display the synthesized image data and the Lyric subtitle data describing at least one song.

其中，所述影像输出装置所输出的影像为一用户的实时影像。Wherein, the image output by the image output device is a real-time image of a user.

其中，所述特效影像数据至少为下列其中一种：雪花特效影像数据、星星特效影像数据、泡泡特效影像数据、场景特效影像数据。Wherein, the special effect image data is at least one of the following: snowflake special effect image data, star special effect image data, bubble special effect image data, scene special effect image data.

其中，所述特效影像数据至少包含下列其中一种：一动态对象、一静态对象。Wherein, the special effect image data includes at least one of the following: a dynamic object and a static object.

其中，所述系统装置还包括一操作单元，所述操作单元耦接所述核心处理单元，且所述操作单元具有一操作接口，供用户输入指令，使所述核心处理单元及所述动态影像合成处理单元根据自所述操作接口所输入的指令而进行相对应的操作。Wherein, the system device further includes an operation unit, the operation unit is coupled to the core processing unit, and the operation unit has an operation interface for the user to input instructions to make the core processing unit and the dynamic image The synthesis processing unit performs corresponding operations according to the instructions input from the operation interface.

其中，所述系统装置还包括一音频处理单元，所述音频处理单元耦接至所述核心处理单元，所述音频处理单元用以接收至少一用户的人声并与一对应歌曲的背景音乐进行迭加混合的混音处理，以产生一混音数据。Wherein, the system device further includes an audio processing unit, the audio processing unit is coupled to the core processing unit, and the audio processing unit is used to receive at least one user's voice and perform it with the background music of a corresponding song. The audio mixing process is superimposed and mixed to generate a mixed audio data.

其中，所述系统装置还包括一录制单元，所述录制单元耦接至所述核心处理单元，所述录制单元用以选择性地录制所述合成影像数据、所述混音数据、及所述歌词字幕数据用来形成一影音档。Wherein, the system device further includes a recording unit, the recording unit is coupled to the core processing unit, and the recording unit is used to selectively record the synthesized image data, the audio mixing data, and the The lyrics and subtitle data are used to form an audio-video file.

其中，所述系统装置还包括一储存媒体，所述储存媒体包含一内部储存媒体及一外部储存媒体，所述内部储存媒体耦接所述动态影像合成处理单元及所述核心处理单元，且所述内部储存媒体用以存放所述歌词字幕库及所述特效影像库，所述外部储存媒体耦接所述录制单元及所述核心处理单元，且所述外部储存媒体用以存放所述影音档。Wherein, the system device further includes a storage medium, the storage medium includes an internal storage medium and an external storage medium, the internal storage medium is coupled to the dynamic image synthesis processing unit and the core processing unit, and the The internal storage medium is used to store the lyrics subtitle library and the special effect image library, the external storage medium is coupled to the recording unit and the core processing unit, and the external storage medium is used to store the video files .

其中，所述系统装置还包括一网络接口，所述网络接口耦接至所述核心处理单元，用以链接至一网络，进而使所述影音文件被所述核心处理单元读出后上传至所述网络进行分享。Wherein, the system device also includes a network interface, the network interface is coupled to the core processing unit for linking to a network, so that the audio and video files are read by the core processing unit and uploaded to the network to share.

为了达到上面所描述的，本发明提供应用于伴唱机的实时自拍特效合成MV的方法，用来实时合成至少一用户的影像，所述应用于伴唱机的实时自拍特效合成MV的方法包括：In order to achieve what is described above, the present invention provides a method for synthesizing an MV with real-time self-timer special effects applied to an accompaniment machine, which is used to synthesize at least one user's image in real time, and the method for synthesizing an MV with real-time self-timer special effects applied to an accompaniment machine includes:

接收所述用户的一原始影像数据；receiving an original image data of the user;

判别并选择至少一特效影像数据；Discriminating and selecting at least one special effect image data;

实时合成所述原始影像数据与所述特效影像数据，用来形成一合成影像数据；synthesizing the original image data and the special effect image data in real time to form a synthetic image data;

迭加上一歌词字幕数据；及superimposing a lyric subtitle data; and

显示所述合成影像数据及所述歌词字幕数据。and displaying the synthesized image data and the lyrics and subtitle data.

进一步地，所述方法还包括对所述原始影像数据进行动态影像前处理。Further, the method further includes performing dynamic image pre-processing on the original image data.

进一步地，所述方法还包括对所述合成影像数据进行动态影像后处理。Further, the method further includes performing dynamic image post-processing on the synthetic image data.

进一步地，所述判别并选择至少一特效影像数据的方法是根据一随机数法则从一特效影像库中随机地选出所述至少一特效影像数据。Further, the method for discriminating and selecting at least one special effect image data is to randomly select the at least one special effect image data from a special effect image library according to a random number rule.

进一步地，所述判别并选择至少一特效影像数据的方法是根据特定的顺序从一特效影像库中依序地选出所述至少一特效影像数据。Further, the method for discriminating and selecting at least one special effect image data is to sequentially select the at least one special effect image data from a special effect image library according to a specific order.

进一步地，所述判别并选择至少一特效影像数据的方法是根据一操作单元所输入的指令从一特效影像库中选择所述至少一特效影像数据。Further, the method for discriminating and selecting at least one special effect image data is to select the at least one special effect image data from a special effect image library according to an instruction input by an operating unit.

进一步地，所述实时合成所述原始影像数据与所述特效影像数据的方法是对所述用户的所述原始影像数据执行脸部辨识并撷取所述用户的脸部作为前景与所述特效影像数据进行实时去背景合成，以形成所述合成影像数据。Further, the method of synthesizing the original image data and the special effect image data in real time is to perform face recognition on the original image data of the user and capture the user's face as the foreground and the special effect image data. The image data is subjected to real-time de-background synthesis to form the synthesized image data.

进一步地，所述实时合成所述原始影像数据与所述特效影像数据的方法是对所述用户的所述原始影像数据执行人体辨识并撷取所述用户的全身作为前景与所述特效影像数据进行实时去背景合成，以形成所述合成影像数据。Further, the method of synthesizing the original image data and the special effect image data in real time is to perform human body recognition on the original image data of the user and capture the whole body of the user as the foreground and the special effect image data performing real-time de-background synthesis to form the synthesized image data.

为使能更进一步了解本发明的特征及技术内容，请参阅以下有关本发明的详细说明与附图，然而所附图式仅提供参考与说明用，并非用来对本发明加以限制。In order to further understand the features and technical content of the present invention, please refer to the following detailed description and accompanying drawings of the present invention. However, the accompanying drawings are provided for reference and illustration only, and are not intended to limit the present invention.

附图说明Description of drawings

图1表示本发明一实施例的架构示意图；FIG. 1 shows a schematic diagram of an architecture of an embodiment of the present invention;

图2表示本发明一实施例的功能方块示意图；FIG. 2 shows a functional block diagram of an embodiment of the present invention;

图3表示本发明一实施例的示意图；Fig. 3 represents the schematic diagram of an embodiment of the present invention;

图4表示本发明一方法流程图。Fig. 4 shows a flow chart of a method of the present invention.

附图标记说明：Explanation of reference signs:

用户AUser A

伴唱机MAccompaniment machine M

伴唱机实时影音特效合成装置100Accompaniment machine real-time audio-visual special effects synthesis device 100

影像输出装置CImage output device C

显示设备DDisplay device D

画面D1Screen D1

影像输入接口11Video input interface 11

影像输出接口12Video output interface 12

核心处理单元13Core Processing Unit 13

动态影像合成处理单元14Dynamic image synthesis processing unit 14

特效影像库15VFX Library 15

歌词字幕库16Lyric subtitle library 16

内部储存媒体17Internal storage media 17

外部储存媒体18External storage media18

操作单元19Operating unit 19

操作接口191Operation interface 191

音频处理单元20Audio Processing Unit 20

录制单元21recording unit 21

网络接口22network interface 22

具体实施方式Detailed ways

请参阅图1，本发明主要在于提供一种应用于伴唱机的实时自拍特效合成MV的系统装置。如图1所示，本发明的应用于伴唱机的实时自拍特效合成MV的系统装置包括有一伴唱机实时影音特效合成装置100、一影像输出装置C、及一显示设备D。伴唱机实时影音特效合成装置100可耦接在一影像输出装置C及一显示设备D之间。本发明所指伴唱机M是具有处理多媒体数据及重现多媒体数据并具有伴唱功能的装置，而伴唱机M也可称为点歌机/卡拉OK/Karaoke。另外，需强调的是，有关本发明中「影音」一词，实质上可包含影像或声音至少之一，并非局限需同时包含影像或声音两种格式。Please refer to FIG. 1 , the present invention mainly provides a system device for synthesizing MV with real-time Selfie special effects applied to a song player. As shown in FIG. 1 , the system device of the present invention for synthesizing MV with real-time self-timer special effects applied to a chorus player includes a real-time audio-visual special effects synthesis device 100 for a chorus player, an image output device C, and a display device D. The device 100 for synthesizing real-time audio-visual special effects with a vocal player can be coupled between an image output device C and a display device D. As shown in FIG. The accompaniment machine M referred to in the present invention is a device capable of processing multimedia data and reproducing multimedia data and having the function of accompaniment, and the accompaniment machine M can also be called a karaoke machine/karaoke/karaoke. In addition, it should be emphasized that the term "video and audio" in the present invention can essentially include at least one of video or audio, and is not limited to include both video and audio formats.

请参阅图2及图3。图2是本发明一较佳实施例的功能方块示意图。如图2所示，伴唱机实时影音特效合成装置100基本上包括一影像输入接口11、一影像输出接口12、一核心处理单元13、一动态影像合成处理单元14、及一特效影像库15。更进一步地，伴唱机实时影音特效合成装置100还可包括一歌词字幕库16、一内部储存媒体17、一外部储存媒体18、一操作单元19、一音频处理单元20、一录制单元21、及一网络接口22。Please refer to Figure 2 and Figure 3. FIG. 2 is a functional block diagram of a preferred embodiment of the present invention. As shown in FIG. 2 , the device 100 for synthesizing real-time audio-visual effects for a singing player basically includes an image input interface 11 , an image output interface 12 , a core processing unit 13 , a dynamic image synthesis processing unit 14 , and a special effect image library 15 . Furthermore, the real-time audio-visual special effect synthesis device 100 of the accompaniment player can also include a lyrics subtitle library 16, an internal storage medium 17, an external storage medium 18, an operating unit 19, an audio processing unit 20, a recording unit 21, and A network interface 22 .

影像输入接口11可链接至影像输出装置C，用以输入影像输出装置C所输出的影像。其中，影像输出装置C可包含但不限于电荷耦合组件(CCD)、摄影机(camera)、计算机或网络摄影机(PC-cam,Web-cam)、或照相机等具有撷取影像功能的装置，也可为任何多媒体播放器(Media player)。影像输出接口12可链接至显示设备D，用以输出影像至显示设备D。其中，显示设备D可包含但不限于CRT、液晶显示屏(LED,LCD)或电浆显示屏、投影机等。其中，影像输入接口11与影像输出接口12可包含但不限于HDMI接口、CVBS接口、色差端子接口、AV端子接口、VGA接口、DVI接口、无线讯号传输接口等。The image input interface 11 can be connected to the image output device C for inputting the image output by the image output device C. Wherein, the image output device C may include, but not limited to, a charge-coupled device (CCD), a camera, a computer or a network camera (PC-cam, Web-cam), or a camera and other devices capable of capturing images. For any Media player. The image output interface 12 can be connected to the display device D for outputting images to the display device D. Wherein, the display device D may include but not limited to a CRT, a liquid crystal display (LED, LCD) or a plasma display, a projector, and the like. Wherein, the image input interface 11 and the image output interface 12 may include but not limited to HDMI interface, CVBS interface, component terminal interface, AV terminal interface, VGA interface, DVI interface, wireless signal transmission interface and so on.

核心处理单元13可为一中央处理器(CPU)或是一单芯片系统(SOC)且可为控制伴唱机M的主控组件。The core processing unit 13 can be a central processing unit (CPU) or a system-on-a-chip (SOC), and can be a main control component for controlling the accompaniment player M.

动态影像合成处理单元14耦接至影像输入接口11并与核心处理单元13通讯连接。动态影像合成处理单元14可为一单独的数字信号处理器(DSP)，也可集成至核心处理单元13或者是设置在机顶盒(setup-box)。另外，动态影像合成处理单元14内可写入可执行的程序。动态影像合成处理单元14可经由影像输入接口11接收影像输出装置C所撷取的影像所产生的一原始影像数据。影像输出装置C所撷取的影像可经由影像输出接口12呈现在显示设备D的画面D1中。在本实施例中，影像输出装置C为一摄影机，其可以撷取一连续动态影像。影像输出装置C所撷取的连续动态影像在显示设备D的画面D1中呈现实时入镜于影像输出装置C的至少一用户A(或可称为演唱者)，也即影像输出装置C所撷取的连续动态影像为用户的实时影像。也可以说，影像输出装置C可实时撷取用户A的实时影像并经由模拟/数字转换后成为动态影像合成处理单元14所接收的一原始影像数据。The dynamic image synthesis processing unit 14 is coupled to the image input interface 11 and communicated with the core processing unit 13 . The dynamic image synthesis processing unit 14 can be an independent digital signal processor (DSP), or can be integrated into the core processing unit 13 or installed in a set-top box (set-top box). In addition, an executable program can be written in the video synthesis processing unit 14 . The dynamic image synthesis processing unit 14 can receive an original image data generated by the image captured by the image output device C through the image input interface 11 . The image captured by the image output device C can be presented on the screen D1 of the display device D through the image output interface 12 . In this embodiment, the image output device C is a video camera, which can capture a continuous moving image. The continuous dynamic image captured by the image output device C is displayed on the screen D1 of the display device D in real time at least one user A (or can be called a singer) captured by the image output device C, that is, captured by the image output device C The continuous dynamic image taken is the real-time image of the user. It can also be said that the image output device C can capture the real-time image of the user A in real time and convert it into an original image data received by the dynamic image synthesis processing unit 14 after analog/digital conversion.

特效影像库15可建置于内部储存媒体17或是直接置入于动态影像合成处理单元14。内部储存媒体17可包含但不限于一闪存、一暂存内存、一随机存取内存。特效影像库15预先储放有多个特效影像数据以供选择与应用。特效影像库15至少包含以下：雪花、星星特效影像数据、泡泡特效影像数据、场景特效影像数据、脸部特效影像等常常出现在MV影片中各式各样具有情境效果及视觉效果的特效影像数据。上述特效影像数据可包含一动态对象、一静态对象。The special effect image library 15 can be built in the internal storage medium 17 or directly placed in the dynamic image synthesis processing unit 14 . The internal storage medium 17 may include but not limited to a flash memory, a temporary memory, and a random access memory. The special effect image library 15 pre-stores a plurality of special effect image data for selection and application. The special effect image database 15 includes at least the following: snowflakes, star special effect image data, bubble special effect image data, scene special effect image data, face special effect images, etc., which often appear in MV films and have various special effect images with situational effects and visual effects. data. The above-mentioned special effect image data may include a dynamic object and a static object.

动态影像合成处理单元14经由影像输入接口11接收影像输出装置C所撷取的影像所产生的原始影像数据后，依据指令自特效影像库15读取一特效影像数据或一个以上的特效影像数据加以搭配应用，使原始影像数据与特效影像数据进行实时合成处理，以产生一合成影像数据输出至核心处理单元13，并经由影像输出接口12实时显示在显示设备D的画面D1中。细部来说，动态影像合成处理单元14可依据其写入的程序对原始影像数据自动进行各种预定的动态图像处理及预定的动态合成处理，例如：亮度、色调、对比度、饱和度，画面分割、堆栈、旋转处理等。也可进行脸部辨识去背景合成处理、或脸部复制去背景合成处理、或者人体辨识去背景合成处理等，以使原始影像数据呈现出类似MV影片效果的画面。进而使原始影像数据经处理后与特效影像数据进行实时合成，以产生合成影像数据。After the dynamic image synthesis processing unit 14 receives the original image data generated by the image captured by the image output device C through the image input interface 11, it reads a special effect image data or more than one special effect image data from the special effect image library 15 according to the instruction and adds With the application, the original image data and the special effect image data are synthesized in real time to generate a composite image data which is output to the core processing unit 13 and displayed on the screen D1 of the display device D through the image output interface 12 in real time. In detail, the dynamic image synthesis processing unit 14 can automatically perform various predetermined dynamic image processing and predetermined dynamic synthesis processing on the original image data according to the program written in it, such as: brightness, hue, contrast, saturation, screen division , stack, rotation processing, etc. It can also perform face recognition and background synthesis processing, or face copying and background synthesis processing, or human body recognition and background synthesis processing, so that the original image data presents a picture similar to an MV film effect. Furthermore, the processed original image data is synthesized with the special effect image data in real time to generate composite image data.

歌词字幕库16可建置于内部储存媒体17且预先储放有多首歌曲的歌词字幕资料。核心处理单元13耦接歌词字幕库16，且当核心处理单元13接收由动态影像合成处理单元14所输出的合成影像数据后，可由歌词字幕库16读取对应歌曲的歌词字幕数据，并将对应歌曲的歌词字幕数据与合成影像数据进行迭加处理及涂字处理，而涂字处理是指将歌词字幕数据显示出的歌词颜色随着歌曲进行由白色涂满成蓝色，当然也可以由白色涂满成红色或绿色，在此并不对涂字颜色作任何的限定，更可以通过不同的视觉效果明白地区隔目前歌曲的进度(例如：依歌曲进度所呈现的圆点效果)，然后将合成影像数据及歌词字幕数据经由影像输出接口12实时呈现于显示设备D的画面D1中，以实时呈现出类似MV影片效果以及含有歌词涂字效果的画面。The lyrics and subtitle library 16 can be built in the internal storage medium 17 and pre-store the lyrics and subtitle data of multiple songs. The core processing unit 13 is coupled to the lyrics and subtitle library 16, and after the core processing unit 13 receives the synthesized image data output by the dynamic image synthesis processing unit 14, the lyrics and subtitle data of the corresponding song can be read by the lyrics and subtitle library 16, and the corresponding The lyrics and subtitle data of the song and the synthesized image data are superimposed and painted. The word painting process refers to the color of the lyrics displayed in the lyrics and subtitle data from white to blue as the song progresses. Of course, it can also be filled from white to red. Or green, there is no limit to the color of the words here, and the progress of the current song can be clearly separated through different visual effects (for example: the dot effect presented according to the progress of the song), and then the synthesized image data and lyrics The subtitle data is presented in real time on the screen D1 of the display device D through the video output interface 12 , so as to present a screen similar to an MV movie effect and a screen containing lyrics graffiti effect in real time.

操作单元19耦接核心处理单元13。操作单元19可包含但不限于一遥控器、一平板、一智慧手机、一传感器等。操作单元19具有一操作接口191，其可为多个功能性按键、或一用户图形接口、一人体动作感知接口、一声控接口等。因此，经由人体动作或声音下达指令，或是经由按压或触碰操作接口191上的功能性按键或功能性图像可供用户输入指令，使核心处理单元13及动态影像合成处理单元14根据用户自操作接口191所输入的指令而进行相对应的操作。因此，当显示设备D的画面D1中呈现出实时入镜于影像输出装置C的用户A后，用户A可以透过操作接口191下达指令而从特效影像库15中选择一特效影像数据或两个以上的特效影像数据与原始影像数据进行实时特效合成，并实时呈现于显示单元D的画面D1中，以供使用者A实时观看。举例来说，当使用者A想在画面D1中呈现置身在下雪的情境，即可透过操作接口191下达指令从特效影像库15选择雪花特效影像数据进行实时合成，或者可再搭配改变色调之特效影像数据进行实时合成，即可犹如置身在下雪又变色的情境里。另外，也可由特效影像库15选择其他的特效影像数据，例如脸部特效影像数据进行实时合成，即可呈现脸部特效的视觉效果。除此之外，用户A还可透过操作接口191的功能设定下达指令，以加入美术效果处理，例如油画效果、水彩效果、素描效果等处理。为此，使用者A在歌曲欢唱中透过操作接口191下达指令选择偏好的特效影像数据与包含有用户A的原始影像数据进行实时合成而实时呈现各式各样的情境效果及视觉效果于显示设备D的画面D1中，让使用者犹如MV影片导演一般。The operation unit 19 is coupled to the core processing unit 13 . The operation unit 19 may include but not limited to a remote controller, a tablet, a smart phone, a sensor and so on. The operation unit 19 has an operation interface 191 , which can be a plurality of functional keys, or a graphical user interface, a human motion sensing interface, a voice control interface, and the like. Therefore, instructions can be given by human body movements or voices, or by pressing or touching functional buttons or functional images on the operation interface 191 to allow users to input instructions, so that the core processing unit 13 and the dynamic image synthesis processing unit 14 can be processed according to the user's own The command input by the operation interface 191 is used to perform the corresponding operation. Therefore, when the screen D1 of the display device D shows the user A who is in the image output device C in real time, the user A can issue an instruction through the operation interface 191 to select a special effect image data or two special effect image data from the special effect image database 15. The above special effect image data and the original image data are synthesized in real time with special effects, and presented on the screen D1 of the display unit D in real time for the user A to watch in real time. For example, when the user A wants to present the scene of being in snow in the screen D1, he can issue an instruction through the operation interface 191 to select snowflake special effect image data from the special effect image library 15 for real-time synthesis, or he can combine it with changing the color tone. Real-time synthesis of special effect image data, you can feel like you are in a snowy and discolored situation. In addition, other special effect image data can also be selected from the special effect image database 15, for example, facial special effect image data can be synthesized in real time to present the visual effect of facial special effects. In addition, the user A can also issue instructions through the function settings of the operation interface 191 to add artistic effects, such as oil painting effects, watercolor effects, sketch effects, and other processing. To this end, user A issues an instruction through the operation interface 191 to select the preferred special effect image data and the original image data containing user A to perform real-time synthesis to present a variety of situational effects and visual effects in real time during singing. In the screen D1 of the display device D, the user is like an MV film director.

音频处理单元20可内建在核心处理单元13内或为单独的音效芯片或声卡耦接至核心处理单元13。音频处理单元20用以接收多个或一个使用者A的人声(或者说使用者的歌声)并从内部储存媒体17中选出一对应歌曲的背景音乐进行迭加混合的混音处理，以产生一混音数据。The audio processing unit 20 can be built in the core processing unit 13 or be a separate audio chip or sound card coupled to the core processing unit 13 . The audio processing unit 20 is used to receive multiple or one user A's vocals (or the user's singing voice) and select the background music of a corresponding song from the internal storage medium 17 to carry out the mixing process of superposition and mixing, so as to Generate a mix data.

录制单元21可内建在核心处理单元13内或为单独的录制芯片耦接至核心处理单元13。录制单元21用以选择性地录制合成影像数据、混音数据、及歌曲的字幕资料以形成一影音档。为此，用户A可经由操作单元19的操作接口191设定录制功能选项，选择只录像或同时录像及录音，以将含有使用者A自拍并合成后的实时影像或歌声录制起来，并进一步储存在外部储存媒体18。外部储存媒体18可包含有一硬盘、一光盘、一随身碟、及一记忆卡。The recording unit 21 can be built in the core processing unit 13 or be a separate recording chip coupled to the core processing unit 13 . The recording unit 21 is used for selectively recording composite image data, audio mixing data, and song subtitle data to form an audio-video file. For this reason, user A can set the recording function option through the operation interface 191 of the operation unit 19, and select only video recording or simultaneous video recording and audio recording, so as to record the real-time video or singing voice that contains user A's self-portrait and synthesis, and further store it. The medium 18 is stored externally. The external storage medium 18 may include a hard disk, a CD, a flash drive, and a memory card.

网络接口22可以为一以太网络适配器或者一无线网络适配器耦接至核心处理单元13。网络接口22用以有线或无线链接至一网络。上述网络可以为无线网络、局域网络或因特网。为此，用户A可经由操作单元19的操作接口191设定网络功能选项以进行网络链接，进而使储存在外部储存媒体18的影音文件被核心处理单元13读出后上传至网络进行分享。另外，也可透过网络接口22直接进行无线传输，进而使影音文件无限传输至无线讯号的距离范围内的平板或智慧手机进行分享。让用户A自拍并合成后的实时影像及声音所录制起来的影音档可立即分享给其亲友、特定人或大众观赏。The network interface 22 can be an Ethernet network adapter or a wireless network adapter coupled to the core processing unit 13 . The network interface 22 is used for wired or wireless connection to a network. The aforementioned network may be a wireless network, a local area network or the Internet. To this end, user A can set the network function option through the operation interface 191 of the operation unit 19 to perform a network link, and then make the video and audio files stored in the external storage medium 18 be read by the core processing unit 13 and uploaded to the network for sharing. In addition, wireless transmission can also be directly performed through the network interface 22, so that the video and audio files can be transmitted unlimitedly to tablets or smart phones within the range of the wireless signal for sharing. Allow user A to take a selfie and synthesize the real-time video and audio to record the audio and video file, which can be immediately shared with his relatives and friends, specific people or the public for viewing.

请参考图4，并配合参考图2。如图4所示，为本发明的一种应用于伴唱机的实时自拍特效合成MV的方法的主要流程步骤，本方法用以实时合成至少一用户A的影像，而用户的影像可由任一影像输出装置C所实时撷取并经过编译码及压缩等处理而产生用户的原始影像数据。首先，由动态影像合成处理单元14接收上述用户的原始影像数据(S201)，此时可视需要将该原始影像数据进行动态影像前处理(例如：亮度、色调、对比度、饱和度、画面分割、堆栈、旋转处理等)(S203)，然后可由动态影像合成处理单元14判别并从特效影像库15选取至少一特效影像数据(S205)(例如：雪花特效影像数据、下雨特效影像数据、星星特效影像数据、场景特效影像数据)。其中，判别选取特效影像数据的方法可根据一随机数法则从一特效影像库15中随机地选出至少一特效影像数据，或根据特定的顺序从一特效影像库15中依序地选出该至少一特效影像数据，或者是根据用户喜好由操作单元19所输入的指令从一特效影像库15中选择该至少一特效影像数据，或者是根据系统装置侦测音乐的类型或节拍，自动选择对应的特效影像数据。之后，动态影像合成处理单元14将处理后的原始影像数据与特效影像数据进行实时动态合成而形成一合成影像数据(S207)。其中，处理后的原始影像数据与特效影像数据实时动态合成的方法可利用脸部辨识技术，撷取使用者的脸部作为前景，然后与特效影像数据进行实时去背影合成，以形成合成影像数据，或者利用人体辨识技术，撷取使用者的全身作为前景，然后与特效影像数据进行实时去背景合成，以形成合成影像数据。之后，可根据需要将合成影像数据进行动态影像后处理(S209)(例如：油画效果、水彩效果、素描效果等美术效果处理)，换言之，可依据指令对合成影像数据进行运算以使合成影像数据呈现出具有油画、水彩或素描等美术效果的画面。最后，迭加上歌词字幕数据(含涂字效果)(S211)，并显示经处理后的合成影像数据及歌词字幕数据于一显示设备D的画面D1中（S211），以实时呈现出类似MV影片效果以及含有涂字效果的画面。另外，还可接收一包含使用者的歌声及一歌曲的背影音乐的混音数据，并选择只录制合成影像数据、歌词字幕数据、或混音数据以形成一影音档，进而储存该影音档或传送该影档以供分享。Please refer to Figure 4, and refer to Figure 2 together. As shown in Figure 4, it is the main process steps of a method for synthesizing MV with real-time self-timer special effects applied to the accompaniment player of the present invention. This method is used to synthesize at least one user A's image in real time, and the user's image can be composed of any image The output device C captures in real time and undergoes encoding, decoding and compression processing to generate the user's original image data. First, the original image data of the above-mentioned user is received by the dynamic image synthesis processing unit 14 (S201). At this time, the original image data can be subjected to dynamic image pre-processing (for example: brightness, hue, contrast, saturation, screen division, etc.) Stacking, rotation processing, etc.) (S203), then it can be judged by the dynamic image synthesis processing unit 14 and select at least one special effect image data (S205) from the special effect image library 15 (for example: snowflake special effect image data, raining special effect image data, star special effect image data, scene special effect image data). Wherein, the method for judging and selecting the special effect image data can be to randomly select at least one special effect image data from a special effect image database 15 according to a random number rule, or sequentially select the special effect image data from a special effect image database 15 according to a specific order. At least one special effect image data, or select the at least one special effect image data from a special effect image library 15 according to the instruction input by the operation unit 19 according to the user's preference, or automatically select the corresponding special effect image data. Afterwards, the dynamic image synthesis processing unit 14 performs real-time dynamic synthesis of the processed original image data and special effect image data to form a composite image data ( S207 ). Among them, the method of real-time dynamic synthesis of processed original image data and special effect image data can use face recognition technology to capture the user's face as the foreground, and then perform real-time background removal and synthesis with special effect image data to form synthetic image data , or use human body recognition technology to capture the user's whole body as the foreground, and then perform real-time de-background synthesis with special effect image data to form synthetic image data. Afterwards, the synthetic image data can be subjected to dynamic image post-processing (S209) as required (for example: oil painting effect, watercolor effect, sketch effect and other art effect processing), in other words, the synthetic image data can be calculated according to the instruction to make the synthetic image data Presents a picture with artistic effects such as oil painting, watercolor or sketch. Finally, add lyric and subtitle data (including graffiti effect) (S211), and display the processed synthetic image data and lyric and subtitle data on the screen D1 of a display device D (S211), to present a similar MV in real time Movie effects and frames with doodle effects. In addition, it is also possible to receive a mixed sound data that includes the user's singing voice and the background music of a song, and choose to record only synthetic image data, lyrics and subtitle data, or mixed sound data to form a video file, and then store the video file or Send this video file for sharing.

以上所述仅为本发明的较佳可行实施例，不因此局限本发明的专利范围，因此凡是运用本发明说明书及图示内容所作的等效技术变化，均包含在本发明的范围内。The above descriptions are only preferred feasible embodiments of the present invention, and do not limit the patent scope of the present invention. Therefore, all equivalent technical changes made by using the description and illustrations of the present invention are included in the scope of the present invention.

Claims

1. be applied to a system and device of the real-time auto heterodyne special efficacy synthesis MV of accompanying video, it is characterized in that, comprising:

One image input interface, in order to input the image that an image output device exports, to produce raw video data;

One special efficacy Image Database, in order to store at least one special efficacy image data;

One dynamic image synthesis processing unit, couple described image input interface and described special efficacy Image Database, in order to receive and to process described raw video data, and synthesize process in real time, to produce resultant image data by the described special efficacy Image Database described at least one special efficacy image data of reading and described raw video data;

One lyrics captions storehouse, in order to store the lyrics caption data of at least one song;

One core processing unit, couple described lyrics captions storehouse and synthesize processing unit communication with described dynamic image and be connected, in order to receive and to process described resultant image data, and read the lyrics caption data of at least one song described by described lyrics captions storehouse and carry out superposition process with described resultant image data and be coated with word processing; And

One image output interface, couple described core processing unit, in order to export lyrics caption data to display device of described resultant image data and at least one song described, described display device is made to show the lyrics caption data of described resultant image data and at least one song described.

2. the system and device being applied to the real-time auto heterodyne special efficacy synthesis MV of accompanying video according to claim 1, it is characterized in that, the image that described image output device exports is the real-time imaging of a user.

3. the system and device being applied to the real-time auto heterodyne special efficacy synthesis MV of accompanying video according to claim 1, it is characterized in that, described special efficacy image data is at least following wherein a kind of: snowflake special efficacy image data, star special efficacy image data, bubble special efficacy image data, scene special effect image data.

4. the system and device being applied to the real-time auto heterodyne special efficacy synthesis MV of accompanying video according to claim 1, it is characterized in that, described special efficacy image data at least comprises following wherein a kind of: a dynamic object, a static object.

5. the system and device being applied to the real-time auto heterodyne special efficacy synthesis MV of accompanying video according to claim 1, it is characterized in that, also comprise an operating unit, described operating unit couples described core processing unit, and described operating unit has an operation-interface, for user input instruction, described core processing unit and described dynamic image synthesis processing unit is made to carry out corresponding operation according to the instruction inputted from described operation-interface.

6. the system and device being applied to the real-time auto heterodyne special efficacy synthesis MV of accompanying video according to claim 1, it is characterized in that, also comprise an audio treatment unit, described audio treatment unit is coupled to described core processing unit, described audio treatment unit carries out the stereo process of superposition mixing, to produce audio mixing data in order to the background music of the voice song corresponding to that receive at least one user.

7. the system and device being applied to the real-time auto heterodyne special efficacy synthesis MV of accompanying video according to claim 6, it is characterized in that, also comprise a recording elements, described recording elements is coupled to described core processing unit, and described recording elements is used for formation one audio and video file in order to optionally to record described resultant image data, described audio mixing data and described lyrics caption data.

8. the system and device being applied to the real-time auto heterodyne special efficacy synthesis MV of accompanying video according to claim 7, it is characterized in that, also comprise a Storage Media, described Storage Media comprises internal reservoir media and an outside Storage Media, described internal reservoir media couple described dynamic image synthesis processing unit and described core processing unit, and described internal reservoir media are in order to deposit described lyrics captions storehouse and described special efficacy Image Database, described outside Storage Media couples described recording elements and described core processing unit, and described outside Storage Media is in order to deposit described audio and video file.

9. the system and device being applied to the real-time auto heterodyne special efficacy synthesis MV of accompanying video according to claim 8, it is characterized in that, also comprise a network interface, described network interface is coupled to described core processing unit, in order to link to a network, and then be uploaded to described network after described audio/video file is read by described core processing unit and shared.

10. be applied to a method of the real-time auto heterodyne special efficacy synthesis MV of accompanying video, be used for synthesizing in real time the image of at least one user, it is characterized in that, described in be applied to accompanying video the real-time auto heterodyne special efficacy method of synthesizing MV comprise:

Receive raw video data of described user;

Differentiate and select at least one special efficacy image data;

Real-time synthesis described raw video data and described special efficacy image data, be used for formation one resultant image data;

A lyrics caption data on superposition; And

Show described resultant image data and described lyrics caption data.

11. methods being applied to the real-time auto heterodyne special efficacy synthesis MV of accompanying video according to claim 10, is characterized in that, also comprise and carry out dynamic image pre-treatment to described raw video data.

12. methods being applied to the real-time auto heterodyne special efficacy synthesis MV of accompanying video according to claim 10, is characterized in that, also comprise and carry out dynamic image reprocessing to described resultant image data.

13. methods being applied to the real-time auto heterodyne special efficacy synthesis MV of accompanying video according to claim 10, it is characterized in that, described differentiation also selects the method for at least one special efficacy image data to be from a special efficacy Image Database, select described at least one special efficacy image data randomly according to a random number rule.

14. methods being applied to the real-time auto heterodyne special efficacy synthesis MV of accompanying video according to claim 10, it is characterized in that, described differentiation also selects the method for at least one special efficacy image data to be from a special efficacy Image Database, select described at least one special efficacy image data in order according to specific order.

15. methods being applied to the real-time auto heterodyne special efficacy synthesis MV of accompanying video according to claim 10, it is characterized in that, described differentiation also selects the method for at least one special efficacy image data to be that described at least one special efficacy image data is selected in the instruction inputted according to an operating unit from a special efficacy Image Database.

16. methods being applied to the real-time auto heterodyne special efficacy synthesis MV of accompanying video according to claim 10, it is characterized in that, the method of described real-time synthesis described raw video data and described special efficacy image data performs face recognition to the described raw video data of described user and the face capturing described user goes background to synthesize as prospect and described special efficacy image data, to form described resultant image data in real time.

17. methods being applied to the real-time auto heterodyne special efficacy synthesis MV of accompanying video according to claim 10, it is characterized in that, the method of described real-time synthesis described raw video data and described special efficacy image data performs human body identification to the described raw video data of described user and the whole body capturing described user goes background to synthesize as prospect and described special efficacy image data, to form described resultant image data in real time.