JP2007310204A

JP2007310204A - Musical piece practice support device, control method, and program

Info

Publication number: JP2007310204A
Application number: JP2006140054A
Authority: JP
Inventors: Naohiro Emoto; 直博江本
Original assignee: Yamaha Corp
Current assignee: Yamaha Corp
Priority date: 2006-05-19
Filing date: 2006-05-19
Publication date: 2007-11-29

Abstract

<P>PROBLEM TO BE SOLVED: To provide a person who practices singing a song or performing a musical instrument with model sounds imitating a desired model person's singing voice or performing sound. <P>SOLUTION: When a musical piece ID and a model person ID are selected by a practicing person, the control section 21 of a KARAOKE device 2 transmits the selected musical piece ID and the model person ID to a server device 3 through a communication section 28. The control section 31 of the server device 3 reads guide voice data corresponding to the selected musical piece ID out of a voice data storage area 32a, reads characteristic data corresponding to the selected model person ID out from a characteristic data storage area 32d, and processes a part of the readout guide voice data corresponding to the readout characteristic data on the basis of the characteristic data. Then, the control section 31 transmits the processed guide voice data to the KARAOKE device 2. The KARAOKE device 2 plays the sound of the received guide voice data. <P>COPYRIGHT: (C)2008,JPO&INPIT

Description

本発明は、楽曲練習支援装置、制御方法及びプログラムに関する。 The present invention relates to a music practice support device, a control method, and a program.

歌唱を練習する者は、歌詞を画面に表示すると共に楽曲の伴奏を再生するカラオケ装置を用いて歌唱を練習することが多い。このようなカラオケ装置においては、模範となる歌唱音声を再生するいわゆる「ガイドボーカル機能」を備えたものもある。このガイドボーカル機能を用いれば、練習者は、その楽曲をどのように歌唱すればよいかを試聴することができる。 People who practice singing often practice singing using a karaoke device that displays lyrics on the screen and reproduces the accompaniment of the music. Some of such karaoke apparatuses have a so-called “guide vocal function” that reproduces an exemplary singing voice. If this guide vocal function is used, the practitioner can audition how to sing the music.

ところで、歌手のように熟練した歌唱者は、楽譜の内容に忠実に従って歌唱することはほとんどなく、その多くが、歌い始めや歌い終わりを意図的にずらしたり、声質や音量を変化させたり、或いはビブラートやこぶしなどの各種歌唱技法を用いたりして、歌のなかに情感を表現する。このような情感は歌唱者によって様々に表現されるが、例えば、フレーズの末尾に必ずビブラートをかける歌唱者や、歌い始めを必ずためる（タイミングをずらす）歌唱者や、または、高音をファルセットで歌唱する歌唱者もおり、歌唱者毎にその歌い方に特徴があることが多い。歌唱を練習する者は、自身の好みの歌手の独特な歌い方を真似て歌唱したいと考えていることが多い。 By the way, a skilled singer like a singer rarely sings according to the content of the score, many of which intentionally shift the beginning and end of singing, change the voice quality and volume, or Express emotions in the song by using various singing techniques such as vibrato and fist. Such feelings are expressed in various ways by the singer. For example, a singer who always puts the vibrato at the end of the phrase, a singer who always tries to start singing (shifts the timing), or sings the treble with a falsetto. There are also singers who perform, and each singer is often characterized by its way of singing. People who practice singing often want to imitate the unique singing method of their favorite singers.

しかしながら、上述のガイドボーカル機能で再生される歌唱音声は、練習者が目指そうとしている歌手の歌唱音声ではないことが多く、ガイドボーカルの歌い方と好みの歌手の歌い方とでは、用いる歌唱テクニックに相当の開きがあることが多い。ガイドボーカルによる歌い方と目指したい歌い方とがかけ離れている場合には、練習者の上達したいという意欲を下げてしまう場合も多々ある。これは楽曲の歌唱に限らず、楽器の演奏についても同様である。 However, the singing voice that is played by the above-mentioned guide vocal function is often not the singer's singing voice that the practitioner is aiming for. Often there are considerable gaps. When the way of singing with guide vocals is far from the way of singing, the desire to improve the trainer is often lowered. This applies not only to the singing of music but also to the performance of musical instruments.

発話練習を支援するためのシステムとして、例えば特許文献１には、模範者（先生）の発話音声を練習者（生徒）の音声に変換して聴かせるシステムが提案されている。また、特許文献２には、練習者自身の発話音声についてスペクトル・エンベロープを修正するなどの音声処理を施して再生するシステムが提案されている。
特開２００２−２４４５４７号公報特開２００４−１３３４０９号公報 As a system for supporting utterance practice, for example, Patent Document 1 proposes a system that converts an utterance voice of a model (teacher) into a voice of a practitioner (student) and listens to it. Further, Patent Document 2 proposes a system that reproduces an utterance voice of a trainee by performing a voice process such as correcting a spectrum envelope.
JP 2002-244547 A JP 2004-133409 A

しかしながら、特許文献１に記載のシステムにおいては、どの模範音声で練習するかを練習者自身が選択する必要があり、練習者にとって煩雑である。また、特許文献２に記載のシステムでは、練習者の発話音声に音声処理を施すので、不自然な発音になってしまうことがある。
本発明は上述した背景の下になされたものであり、歌唱又は演奏を練習する者に、好みの模範者の歌唱音声又は演奏音を模した歌唱音声又は演奏音を提供することを目的とする。 However, in the system described in Patent Document 1, it is necessary for the practitioner himself to select which model voice to practice, which is complicated for the practitioner. Further, in the system described in Patent Document 2, since speech processing is performed on the speech sound of the practitioner, unnatural pronunciation may occur.
The present invention has been made under the above-described background, and an object thereof is to provide a singing voice or performance sound simulating a favorite model person's singing voice or performance sound to those who practice singing or playing. .

本発明の好適な態様である楽曲練習支援装置は、歌唱音声又は楽器演奏音の進行における特定の部分の特徴を示す１又は複数の特徴データを含む特徴データブロックを複数記憶するとともに、前記各特徴データブロックを各々識別する複数の識別情報を記憶する特徴データ記憶手段と、楽曲を識別する楽曲識別情報と当該楽曲の歌唱音声又は演奏音を表すガイド音声データとを関連付けて複数組記憶するガイド音声データ記憶手段と、前記識別情報と前記楽曲識別情報の入力を受け付ける入力受付手段と、前記入力受付手段により入力を受け付けた楽曲識別情報に対応するガイド音声データを前記ガイド音声データ記憶手段から読み出すガイド音声データ読出手段と、前記入力受付手段により入力を受け付けた識別情報に対応する特徴データブロックに含まれる特徴データを前記特徴データ記憶手段から読み出す特徴データ読出手段と、前記ガイド音声データ読出手段が読み出したガイド音声データに対し、前記特徴データ読出手段が読み出した特徴データに対応する部分を、当該特徴データに基づいて加工するガイド音声データ加工手段と、前記ガイド音声データ加工手段により加工されたガイド音声データを出力する出力手段とを備える。
この態様において、前記特定の部分は、歌唱技法又は演奏技法が用いられている部分であってもよい。
また、この態様において、前記特定の部分は、ビブラート、しゃくり、こぶし、ファルセット、つっこみ、ため及び息継ぎのうちの少なくともいずれかひとつの技法が用いられている部分であってもよい。 A music practice support device according to a preferred aspect of the present invention stores a plurality of feature data blocks including one or a plurality of feature data indicating features of a specific part in the progress of a singing voice or a musical instrument performance sound. Feature data storage means for storing a plurality of identification information for identifying each of the data blocks, and a guide voice for storing a plurality of sets in association with song identification information for identifying a song and guide voice data representing the singing voice or performance sound of the song A data storage means; an input receiving means for receiving input of the identification information and the music identification information; and a guide for reading out guide voice data corresponding to the music identification information received by the input receiving means from the guide voice data storage means. Feature data corresponding to the identification information received by the voice data reading means and the input receiving means A feature data reading means for reading feature data included in the lock from the feature data storage means, and a portion corresponding to the feature data read by the feature data reading means for the guide voice data read by the guide voice data reading means. , Guide voice data processing means for processing based on the feature data, and output means for outputting the guide voice data processed by the guide voice data processing means.
In this aspect, the specific part may be a part where a singing technique or a performance technique is used.
In this aspect, the specific portion may be a portion in which at least one of a technique of vibrato, sneeze, fist, falsetto, pricking, and breathing is used.

本発明によれば、歌唱又は演奏を練習する者に、好みの模範者の歌唱音声又は演奏音を模した歌唱音声又は演奏音を提供することができる。 ADVANTAGE OF THE INVENTION According to this invention, the singing voice or performance sound imitating a favorite model person's singing voice or performance sound can be provided to those who practice singing or performing.

次に、この発明の実施の形態について説明する。
なお、以下の説明では、歌唱を練習する者を「練習者」と呼び、その練習者にとって模範となるような歌唱を行う者を「模範者」と呼ぶ。
＜Ａ：構成＞
図１は、この実施形態に係る楽曲練習支援システム１の全体構成を示すブロック図である。この楽曲練習支援システム１は、複数のカラオケ装置２ａ，２ｂ，２ｃと、サーバ装置３と、これらを接続するネットワーク４とを備えている。カラオケ装置２ａ，２ｂ，２ｃは、一般家庭や、カラオケボックス又は飲食店などの各種店舗に備えられており、ネットワーク４を介して通信を行う通信装置として機能する。サーバ装置３は、練習者の好みの模範者の歌唱音声を模した歌唱音声を練習者に提供する楽曲練習支援装置として機能する。ネットワーク４は、例えばＩＳＤＮ（Integrated Services Digital Network）や、インターネットであり、有線区間又は無線区間を含んでいる。なお、図１には３つのカラオケ装置２ａ，２ｂ，２ｃを例示しているが、この楽曲練習支援システム１に含まれるカラオケ装置の数は３に限定されるものではなく、これより多くても少なくてもよい。また、以下の説明においては、カラオケ装置２ａ，２ｂ，２ｃを各々区別する必要がない場合には、単に「カラオケ装置２」として説明する。 Next, an embodiment of the present invention will be described.
In the following description, a person who practices singing is called a “practicing person”, and a person who performs singing that serves as an example for the practicing person is called a “executive person”.
<A: Configuration>
FIG. 1 is a block diagram showing the overall configuration of a music practice support system 1 according to this embodiment. This music practice support system 1 includes a plurality of karaoke apparatuses 2a, 2b, 2c, a server apparatus 3, and a network 4 connecting them. The karaoke devices 2 a, 2 b, 2 c are provided in various households such as ordinary households, karaoke boxes or restaurants, and function as communication devices that perform communication via the network 4. The server device 3 functions as a music practice support device that provides a practitioner with a singing voice imitating the singing voice of a model person who likes the practice person. The network 4 is, for example, ISDN (Integrated Services Digital Network) or the Internet, and includes a wired section or a wireless section. FIG. 1 illustrates three karaoke devices 2a, 2b, and 2c, but the number of karaoke devices included in the music practice support system 1 is not limited to three, and even more than this. It may be less. In the following description, when it is not necessary to distinguish the karaoke apparatuses 2a, 2b, and 2c, they are simply described as “karaoke apparatus 2”.

図２は、カラオケ装置２の構成を示したブロック図である。図２において、制御部２１は例えばＣＰＵであり、記憶部２２に記憶されているコンピュータプログラムを読み出して実行することにより、カラオケ装置２の各部を制御する。表示部２３は、例えば液晶ディスプレイであり、制御部２１の制御の下、カラオケ装置２を操作するためのメニュー画面や、背景画像に歌詞テロップが重ねられたカラオケ画面などの各種画面を表示する。操作部２４は、各種のキーを備えており、押下されたキーに対応した信号を制御部２１へ出力する。マイクロフォン２５は、練習者が発声した音声を収音する。音声処理部２６は、マイクロフォン２５によって収音された音声（アナログデータ）をデジタルデータに変換して制御部２１に出力する。スピーカ２７は、音声処理部２６から出力される音声を放音する。通信部２８は、制御部２１の制御の下、ネットワーク４を介してサーバ装置３とデータ通信を行う。 FIG. 2 is a block diagram showing the configuration of the karaoke apparatus 2. In FIG. 2, the control unit 21 is, for example, a CPU, and controls each unit of the karaoke apparatus 2 by reading and executing a computer program stored in the storage unit 22. The display unit 23 is, for example, a liquid crystal display, and displays various screens such as a menu screen for operating the karaoke device 2 and a karaoke screen in which lyrics telop is superimposed on a background image under the control of the control unit 21. The operation unit 24 includes various keys and outputs a signal corresponding to the pressed key to the control unit 21. The microphone 25 collects the voice uttered by the practitioner. The sound processing unit 26 converts the sound (analog data) collected by the microphone 25 into digital data and outputs the digital data to the control unit 21. The speaker 27 emits sound output from the sound processing unit 26. The communication unit 28 performs data communication with the server device 3 via the network 4 under the control of the control unit 21.

記憶部２２は、例えばハードディスクなどの大容量の記憶手段であり、前述したコンピュータプログラムを記憶するほか、伴奏・歌詞データ記憶領域２２ａを有している。伴奏・歌詞データ記憶領域２２ａには、楽曲の伴奏を行う各種楽器の演奏音が楽曲の進行に伴って記された伴奏データと、楽曲の歌詞を示す歌詞データとが、楽曲に割り当てられた楽曲ＩＤに関連付けられて記憶されている。伴奏データは、例えばＭＩＤＩ（Musical Instruments Digital Interface）形式などのデータ形式であり、練習者がカラオケ歌唱する際に再生される。歌詞データは、そのカラオケ歌唱の際に歌詞テロップとして表示部２３に表示される。 The storage unit 22 is a large-capacity storage unit such as a hard disk, and has an accompaniment / lyric data storage area 22a in addition to storing the computer program described above. In the accompaniment / lyric data storage area 22a, accompaniment data in which performance sounds of various musical instruments that accompany the music are recorded as the music progresses and lyrics data indicating the lyrics of the music are assigned to the music It is stored in association with the ID. The accompaniment data has a data format such as MIDI (Musical Instruments Digital Interface) format, and is reproduced when the practitioner sings a karaoke song. The lyrics data is displayed on the display unit 23 as a lyrics telop at the time of the karaoke song.

次に、図３は、サーバ装置３の構成を示したブロック図である。図３において、制御部３１は例えばＣＰＵであり、記憶部３２に記憶されているコンピュータプログラムを読み出して実行することにより、サーバ装置３の各部を制御する。記憶部３２は、例えばハードディスクなどの大容量の記憶手段である。通信部３３は、制御部３１による制御の下で、ネットワーク４を介してカラオケ装置２とデータ通信を行う。 Next, FIG. 3 is a block diagram showing a configuration of the server device 3. In FIG. 3, the control unit 31 is, for example, a CPU, and controls each unit of the server device 3 by reading and executing a computer program stored in the storage unit 32. The storage unit 32 is a large-capacity storage unit such as a hard disk. The communication unit 33 performs data communication with the karaoke apparatus 2 through the network 4 under the control of the control unit 31.

記憶部３２は、前述したコンピュータプログラムを記憶するほか、図示のように、ガイド音声データ記憶領域３２ａと、楽譜音データ記憶領域３２ｂと、技法挿入箇所指定データ記憶領域３２ｃと、特徴データ記憶領域３２ｄとを有している。
ガイド音声データ記憶領域３２ａには、楽曲の伴奏に合わせて或る歌唱者が歌唱した歌唱音声を表すガイド音声データが、楽曲に割り当てられた楽曲ＩＤに関連付けて記憶されている。このガイド音声データは、例えばＷＡＶＥ形式やＭＰ３（MPEG Audio Layer-3）形式などのデータ形式である。楽譜音データ記憶領域３２ｂには、楽曲の楽譜によって規定された歌唱音を表す楽譜音データが、楽曲ＩＤに関連付けて記憶されている。この楽譜音データは、例えばＭＩＤＩ形式などのデータ形式であり、歌唱音のピッチとその発音タイミングとを含んでいる。 In addition to storing the computer program described above, the storage unit 32 stores a guide voice data storage area 32a, a musical score data storage area 32b, a technique insertion location designation data storage area 32c, and a feature data storage area 32d as shown in the figure. And have.
In the guide voice data storage area 32a, guide voice data representing a singing voice sung by a certain singer in accordance with the music accompaniment is stored in association with the music ID assigned to the music. The guide audio data is in a data format such as WAVE format or MP3 (MPEG Audio Layer-3) format. In the musical score data storage area 32b, musical score data representing a singing sound defined by the musical score of the music is stored in association with the music ID. The musical score data is in a data format such as a MIDI format, for example, and includes the pitch of the singing sound and its pronunciation timing.

技法挿入箇所指定データ記憶領域３２ｃには、後述するガイド音声データの加工処理において用いられる技法挿入箇所指定データが、「楽曲ＩＤ」に関連付けて記憶されている。
図４は、技法挿入箇所指定データの内容の一例を示す図である。技法挿入箇所指定データは、図示のように、「技法種別」と「時刻(タイミング)」と「条件」との各項目が互いに関連付けて記憶されている。これらの項目のうち、「技法種別」の項目には、例えば「ビブラート」、「しゃくり」、「こぶし」、「ファルセット」、「つっこみ」、「ため」、「息継ぎ」などの歌唱の技法を識別する識別情報が記憶される。「ビブラート」は、音の高さをほんのわずかに連続的に上下させ、震えるような音色を出すという技法である。「しゃくり」は、目的の音より低い音から発音し、音程を滑らかに目的の音に近づけていくという技法である。「こぶし」は、装飾的に加えるうねるような節回しを行うという技法である。「ファルセット」は、いわゆる「裏声」で歌うという技法である。「つっこみ」は、歌い出しを本来のタイミングよりも早いタイミングにするという技法である。「ため」は、歌い出しを本来のタイミングよりも遅いタイミングにするという技法である。「息継ぎ」は、練習者が息継ぎをするタイミングを意味する。
次に、「タイミング」の項目には、ガイド音声データにおいて技法を挿入するタイミング（部分）を示す時刻情報が記憶される。次に、「条件」の項目には、ガイド音声データにその技法を挿入するか否かを判定するための条件を示す情報が記憶される。 In the technique insertion location designation data storage area 32c, technique insertion location designation data used in the processing of guide voice data to be described later is stored in association with the “music ID”.
FIG. 4 is a diagram illustrating an example of the contents of technique insertion location designation data. As shown in the figure, the technique insertion location designation data stores items of “technique type”, “time (timing)”, and “condition” in association with each other. Among these items, the item of “Technology Type” identifies singing techniques such as “Vibrato”, “Shakuri”, “Fist”, “Falset”, “Tsukumi”, “For”, “Breath”, etc. The identification information to be stored is stored. “Vibrato” is a technique that produces a trembling tone by raising and lowering the pitch of the sound only slightly. “Shikkuri” is a technique in which sound is generated from a sound lower than the target sound, and the pitch is smoothly brought close to the target sound. “Fist” is a technique of performing a undulating curl that is decoratively added. “Falset” is a technique of singing with a so-called “back voice”. “Tsukumi” is a technique in which singing is performed at a timing earlier than the original timing. “Tame” is a technique in which singing is made later than the original timing. “Respiration” means the timing when the practitioner takes a breath.
Next, in the “timing” item, time information indicating the timing (part) at which the technique is inserted in the guide voice data is stored. Next, in the item “condition”, information indicating a condition for determining whether or not to insert the technique into the guide voice data is stored.

次に、記憶部３２の特徴データ記憶領域３２ｄには、模範者の歌唱音声の進行における歌唱技法が用いられている部分の特徴を示す特徴データを含む特徴データブロックが、模範者に割り当てられた「模範者ＩＤ」に関連付けられて記憶されている。
図５は、特徴データ記憶領域３２ｄに記憶されている特徴データブロックの内容の一例を示す図である。特徴データブロックは、図示のように、「技法種別」と「パラメータ」との各項目が互いに関連付けられた特徴データを複数含んでいる。なお、図５に示す例においては、特徴データを複数含む特徴データブロックについて例示しているが、特徴データブロックに含まれる特徴データは１つであってもよく、複数であってもよい。
特徴データの各項目のうち、「技法種別」の項目には、上述の技法挿入箇所指定データの「技法種別」の項目と同様に、技法を識別する識別情報が記憶される。
「パラメータ」の項目には、その技法の態様を示す情報が記憶される。図５に示す例においては、技法の態様の程度を１０段階で示す数値が記憶されている。例えば、「ビブラート」の場合には、ビブラートの「深さ」、「速さ」、「長さ」、「頻度」の程度を１０段階で示す数値が記憶され、「しゃくり」の場合は、しゃくりの「傾斜度」や「ピッチ差」、「頻度」の程度を１０段階で示す数値が記憶され、「つっこみ」、「ため」の場合は、「タイミング差」、「頻度」を１０段階で示す値が記憶され、「息継ぎ」、「こぶし」の場合は、その技法が用いられた時間長として「長さ」を１０段階で示す数値が記憶されている。また、「ファルセット」の場合は、ファルセット（裏声）となる音高を示す情報が記憶されている。例えば、図５に示す例においては、「ビブラート」の技法が、「深さ」が「３」、「速さ」が「６」、「長さ」が「４」、「頻度」が「５」の程度で用いられた場合のデータが記憶されている。
なお、図５に示す例においては、技法の態様を１０段階の数値で示しているが、これに限らず、技法の態様を示す情報であればどのような情報を用いてもよい。 Next, in the feature data storage area 32d of the storage unit 32, a feature data block including feature data indicating features of a part in which the singing technique is used in the progression of the singing voice of the modeler is assigned to the modeler. It is stored in association with “exemplary ID”.
FIG. 5 is a diagram illustrating an example of the contents of the feature data block stored in the feature data storage area 32d. As shown in the figure, the feature data block includes a plurality of feature data in which items of “technique type” and “parameter” are associated with each other. In the example illustrated in FIG. 5, the feature data block including a plurality of feature data is illustrated. However, the feature data block may include one or more feature data.
Among the items of feature data, the “technique type” item stores identification information for identifying a technique, as in the “technique type” item of the technique insertion location designation data described above.
Information indicating an aspect of the technique is stored in the “parameter” item. In the example shown in FIG. 5, numerical values indicating the degree of the aspect of the technique in 10 levels are stored. For example, in the case of “Vibrato”, numerical values indicating the degree of “depth”, “speed”, “length”, and “frequency” of vibrato are stored in 10 levels. Numerical values indicating the degree of “gradient”, “pitch difference”, and “frequency” are stored in 10 levels. In the case of “push” and “for”, “timing difference” and “frequency” are displayed in 10 levels. In the case of “breathing” or “fist”, a numerical value indicating “length” in 10 stages is stored as the time length in which the technique is used. Further, in the case of “Falset”, information indicating the pitch that is the falset (back voice) is stored. For example, in the example shown in FIG. 5, the “vibrato” technique is such that “depth” is “3”, “speed” is “6”, “length” is “4”, and “frequency” is “5”. The data when used at a degree of "is stored.
In the example shown in FIG. 5, the mode of the technique is indicated by 10-stage numerical values, but the present invention is not limited to this, and any information may be used as long as the information indicates the mode of the technique.

＜Ｂ：動作＞
次に、この実施形態の動作について説明する。
＜Ｂ−１：オーサリング動作＞
まず、複数の模範者が或る楽曲（以下、「リファレンス曲」）を歌唱し、サーバ装置３の制御部３１が、それぞれの模範者の歌唱音声から特徴データブロックを生成するオーサリング動作について説明する。
図６のシーケンス図において、模範者は、カラオケ装置２の操作部２４を操作して、リファレンス曲のカラオケ伴奏の再生を指示する。このとき、模範者は自身の模範者ＩＤを操作部２４によって入力するか、又は、制御部２１自身が模範者ＩＤを生成する。そして、制御部２１は、カラオケ伴奏を開始する（ステップＳ１）。すなわち、制御部２１は、伴奏・歌詞データ記憶領域２２ａから伴奏データを読み出して音声処理部２６に供給し、音声処理部２６は、伴奏データをアナログ信号に変換し、スピーカ２７に供給して放音させる。また、制御部２１は、伴奏・歌詞データ記憶領域２２ａから歌詞データを読み出して歌詞テロップを表示部２３に表示させる。模範者はスピーカ２７から放音される伴奏に合わせて歌唱を行う。このとき、模範者の音声はマイクロフォン２５によって収音されて音声信号に変換され、音声処理部２６へと出力される。音声処理部２６によってＡ／Ｄ変換された音声データ（以下、「模範音声データ」）は、伴奏開始からの経過時間を表す情報と共に、記憶部２２に記憶（録音）されていく（ステップＳ２）。 <B: Operation>
Next, the operation of this embodiment will be described.
<B-1: Authoring operation>
First, an authoring operation in which a plurality of modelers sing a certain piece of music (hereinafter referred to as “reference music”) and the control unit 31 of the server device 3 generates a feature data block from each modeler's singing voice will be described. .
In the sequence diagram of FIG. 6, the model person operates the operation unit 24 of the karaoke apparatus 2 to instruct the reproduction of the karaoke accompaniment of the reference song. At this time, the modeler inputs his / her modeler ID through the operation unit 24, or the control unit 21 itself generates the modeler ID. And the control part 21 starts karaoke accompaniment (step S1). That is, the control unit 21 reads the accompaniment data from the accompaniment / lyric data storage area 22a and supplies the accompaniment data to the audio processing unit 26. The audio processing unit 26 converts the accompaniment data into an analog signal, supplies it to the speaker 27, and releases it. Let it sound. In addition, the control unit 21 reads out the lyric data from the accompaniment / lyric data storage area 22 a and causes the display unit 23 to display the lyrics telop. The model person sings along with the accompaniment emitted from the speaker 27. At this time, the voice of the model person is picked up by the microphone 25, converted into a voice signal, and output to the voice processing unit 26. The audio data A / D converted by the audio processing unit 26 (hereinafter “exemplary audio data”) is stored (recorded) in the storage unit 22 together with information indicating the elapsed time from the start of accompaniment (step S2). .

伴奏データの再生が終了すると、制御部２１は模範者の歌唱音声を録音する処理を終了する。次に、制御部２１は、記憶部２２に記憶されている模範音声データを、上記の模範者ＩＤと共に通信部２８からサーバ装置３に送信する（ステップＳ３）。サーバ装置３の制御部３１は、通信部３３によって模範音声データ及び模範者ＩＤが受信されたことを検知すると、模範音声データと模範者ＩＤとを記憶部３２に記憶する。次いで、制御部３１は、記憶部３２に記憶されている模範音声データを、所定時間長のフレーム単位に分離し、フレーム単位でピッチ、発音タイミング、パワー及びスペクトルを算出する（ステップＳ４）。発音タイミングの算出に関しては、或るピッチが次のピッチに変化するタイミングを発音タイミングとして考えればよい。また、スペクトルの算出には例えばＦＦＴ（Fast Fourier Transform）を用いればよい。 When the reproduction of the accompaniment data ends, the control unit 21 ends the process of recording the model person's singing voice. Next, the control part 21 transmits the model audio | voice data memorize | stored in the memory | storage part 22 to the server apparatus 3 from the communication part 28 with said model person ID (step S3). When the control unit 31 of the server device 3 detects that the model voice data and the model ID are received by the communication unit 33, the control unit 31 stores the model voice data and the model ID in the storage unit 32. Next, the control unit 31 separates the model audio data stored in the storage unit 32 into frames of a predetermined time length, and calculates the pitch, sound generation timing, power, and spectrum in units of frames (step S4). Regarding the calculation of the sound generation timing, the timing at which a certain pitch changes to the next pitch may be considered as the sound generation timing. Further, for example, FFT (Fast Fourier Transform) may be used for the calculation of the spectrum.

次いで、制御部３１は、模範者音声データから技法を抽出する（ステップＳ５）。制御部３１は、まず、これらの各技法が用いられている区間を特定（検出）する。例えば「ビブラート」及び「しゃくり」については、模範者音声データのピッチに基づいて検出することができる。また、「こぶし」及び「ファルセット」については、模範者音声データのスペクトルに基づいて検出することができる。また、「ため」及び「つっこみ」については、模範者音声データのピッチと、楽譜音データ記憶領域３２ｂに記憶されている楽譜音データとに基づいて検出することができる。また、「息継ぎ」については、模範者音声データのパワーと、楽譜音データ記憶領域３２ｂに記憶されている楽譜音データとに基づいて検出することができる。 Next, the control unit 31 extracts a technique from the model voice data (step S5). First, the control unit 31 specifies (detects) a section in which each of these techniques is used. For example, “vibrato” and “shrimp” can be detected based on the pitch of the model voice data. Further, “fist” and “falset” can be detected based on the spectrum of the exemplary voice data. Further, “for” and “tsukkomi” can be detected based on the pitch of the model voice data and the score sound data stored in the score sound data storage area 32b. Further, “breathing” can be detected based on the power of the model voice data and the musical score data stored in the musical score data storage area 32b.

具体的な検出方法は以下のとおりである。
制御部３１は、模範者音声データと楽譜音データ記憶領域３２ｂに記憶された楽譜音データとの対応関係と、模範者音声データから算出されたピッチとに基づいて、模範者音声データに含まれる音の開始時刻と当該音に対応する楽譜音データの音の開始時刻とが異なる区間を特定する。ここで、制御部３１は、模範者音声データのピッチの変化タイミングが楽譜音データのピッチの変化タイミングよりも早く現れている区間、すなわち模範者音声データに含まれる音の開始時刻が当該音に対応する楽譜音データの音の開始時刻よりも早い区間については、この区間を「つっこみ」の技法が用いられている区間であると特定する。制御部３１は、このようにして特定した区間の区間情報を、「つっこみ」を示す識別情報と関連付ける。また、制御部３１は、模範者音声データのピッチの変化タイミングと楽譜音データのピッチの変化タイミングとの時間差を算出し、算出した値を、「タイミング差」として１０段階の数値で示される値に変換する。 A specific detection method is as follows.
The control unit 31 is included in the model voice data based on the correspondence between the model voice data and the score sound data stored in the score data storage area 32b and the pitch calculated from the model voice data. A section in which the start time of the sound is different from the start time of the sound of the musical score data corresponding to the sound is specified. Here, the control unit 31 determines that the interval in which the pitch change timing of the model voice data appears earlier than the pitch change timing of the score sound data, that is, the start time of the sound included in the model voice data is the sound. For a section earlier than the sound start time of the corresponding musical score data, this section is specified as a section in which the “Tsukumi” technique is used. The control unit 31 associates the section information of the section specified in this way with identification information indicating “push”. Further, the control unit 31 calculates the time difference between the pitch change timing of the model voice data and the pitch change timing of the musical score sound data, and the calculated value is a value indicated by a 10-step numerical value as a “timing difference”. Convert to

逆に、制御部３１は、模範者音声データと楽譜音データとの対応関係と、模範者音声データから算出されたピッチとに基づいて、模範者音声データのピッチの変化発音タイミングが楽譜音データのピッチの変化発音タイミングよりも遅れて現れている区間、すなわち模範者音声データに含まれる音の開始時刻が当該音に対応する楽譜音データの音の開始時刻よりも遅い区間を検出し、検出した区間を「ため」の技法が用いられている区間であると特定する。また、制御部３１は、模範者音声データのピッチの変化タイミングと楽譜音データのピッチの変化タイミングとの時間差を算出し、算出した値を、「タイミング差」として１０段階の数値で示される値に変換する。 Conversely, the control unit 31 determines that the pitch change pronunciation timing of the model voice data is the score data based on the correspondence between the model voice data and the score sound data and the pitch calculated from the model voice data. Detects a section that appears later than the sound generation timing of the pitch, that is, a section in which the start time of the sound included in the exemplary voice data is later than the start time of the sound of the musical score sound data corresponding to the sound This section is identified as a section in which the “for” technique is used. Further, the control unit 31 calculates the time difference between the pitch change timing of the model voice data and the pitch change timing of the musical score sound data, and the calculated value is a value indicated by a 10-step numerical value as a “timing difference”. Convert to

また、制御部３１は、模範者音声データから算出したピッチの時間的な変化のパターンを解析して、中心となる周波数の上下に所定の範囲内でピッチが連続的に変動している区間を検出し、検出した区間を「ビブラート」の技法が用いられている区間であると特定する。また、制御部３１は、検出した区間におけるピッチの変動範囲を算出し、算出した値を、この技法の「深さ」の程度として１０段階の数値で示される値に変換する。また、制御部３１は、検出した区間の時間長を、この技法の「長さ」の程度として１０段階の数値で示される値に変換する。また、制御部３１は、検出した区間におけるピッチの変動速度を「速さ」の程度として１０段階の数値で示される値に変換する。 In addition, the control unit 31 analyzes a pattern of temporal change of the pitch calculated from the model voice data, and determines a section in which the pitch continuously fluctuates within a predetermined range above and below the center frequency. Detecting and identifying the detected section as a section in which the “vibrato” technique is used. In addition, the control unit 31 calculates a pitch fluctuation range in the detected section, and converts the calculated value into a value indicated by a numerical value of 10 levels as the degree of “depth” of this technique. Further, the control unit 31 converts the time length of the detected section into a value indicated by a numerical value of 10 steps as the degree of “length” of this technique. Further, the control unit 31 converts the pitch fluctuation speed in the detected section into a value indicated by a numerical value of 10 steps as the degree of “speed”.

また、制御部３１は、模範者音声データから算出したピッチの時間的な変化のパターンを解析して、低いピッチから高いピッチに連続的にピッチが変化する区間を検出し、検出した区間を「しゃくり」の技法が用いられている区間であると特定する。なお、この処理は、楽譜音データとの対応関係に基づいて行うようにしてもよい。すなわち、制御部３１は、模範者音声データと楽譜音データとの対応関係に基づいて、模範者音声データのピッチが、低いピッチから連続的に楽譜音データのピッチに近づいている区間を検出すればよい。また、制御部３１は、検出した区間のピッチの変化量を区間で除算して傾斜度を算出し、算出した値を、この技法の「傾斜度」の程度として１０段階の数値で示される値に変換する。また、制御部３１は、検出したピッチの変化量を「ピッチ差」の程度として１０段階の数値で示される値に変換する。 Further, the control unit 31 analyzes a pattern of temporal change in pitch calculated from the model voice data, detects a section in which the pitch continuously changes from a low pitch to a high pitch, and detects the detected section as “ It is identified as a section where the technique of “shakuri” is used. This process may be performed based on the correspondence with the musical score data. That is, the control unit 31 can detect a section in which the pitch of the exemplary voice data is continuously approaching the pitch of the musical score sound data from a low pitch based on the correspondence between the exemplary voice data and the score sound data. That's fine. Further, the control unit 31 calculates the inclination by dividing the detected amount of change in the pitch of the section by the section, and the calculated value is a value indicated by a numerical value in 10 steps as the degree of “inclination” of this technique. Convert to Further, the control unit 31 converts the detected change amount of the pitch into a value indicated by a numerical value in 10 steps as the degree of the “pitch difference”.

また、制御部３１は、模範者音声データと楽譜音データとの対応関係と、模範者音声データから算出されたパワーとに基づいて、楽譜音データが有音である区間であって模範者音声データのパワー値が所定の閾値よりも小さい区間を検出し、検出した区間を「息継ぎ」の区間であると特定する。また、制御部３１は、検出した区間の時間長を、この技法の「長さ」の程度として１０段階の数値で示される値に変換する。 In addition, the control unit 31 is a section in which the score sound data is sound based on the correspondence between the model voice data and the score sound data and the power calculated from the model voice data. A section in which the power value of the data is smaller than a predetermined threshold is detected, and the detected section is specified as a “breathing” section. Further, the control unit 31 converts the time length of the detected section into a value indicated by a numerical value of 10 steps as the degree of “length” of this technique.

また、制御部３１は、模範者音声データから算出されたスペクトルの時間的な変化パターンを解析して、スペクトル特性が予め決められた変化状態に急激に遷移している区間を検出し、検出した区間を「ファルセット」の技法が用いられている区間であると特定する。ここで、予め決められた変化状態とは、スペクトル特性の高調波成分が極端に少なくなる状態である。例えば地声の場合は沢山の高調波成分が含まれるが、ファルセットになると高調波成分の大きさが極端に小さくなる。なお、この場合、制御部３１は、ピッチが大幅に上方に変化したかどうかも参照してよい。ファルセットは地声と同一のピッチを発声する場合でも用いられることもあるが、一般には地声では発声できない高音を発声するときに使われる技法だからである。したがって、模範者音声データのピッチが所定音高以上の場合に限って「ファルセット」の検出をするように構成してもよい。また、男声と女声とでは一般にファルセットを用いる音高の領域が異なるので、模範者音声データの音域や、模範者音声データから検出されるフォルマントによって性別検出を行い、この結果を踏まえてファルセット検出の音高領域を設定してもよい。
また、制御部３１は、検出した区間のピッチ（音高）を示す情報を、この技法の態様の情報とする。 Further, the control unit 31 analyzes the temporal change pattern of the spectrum calculated from the model voice data, and detects and detects a section in which the spectral characteristics are rapidly transitioning to a predetermined change state. The section is identified as the section in which the “Falset” technique is used. Here, the predetermined change state is a state in which the harmonic component of the spectrum characteristic is extremely reduced. For example, in the case of the local voice, many harmonic components are included, but in the case of a falset, the magnitude of the harmonic components becomes extremely small. In this case, the control unit 31 may also refer to whether or not the pitch has changed significantly upward. The falset is sometimes used even when uttering the same pitch as the local voice, but is generally a technique used when uttering a high pitch that cannot be uttered by the local voice. Therefore, the “false set” may be detected only when the pitch of the model voice data is equal to or higher than a predetermined pitch. In addition, the male voice and female voice generally have different pitch ranges using the falset, so gender detection is performed based on the voice range of the model voice data and the formant detected from the model voice data. A pitch region may be set.
Moreover, the control part 31 makes the information which shows the pitch (pitch) of the detected area the information of the aspect of this technique.

また、制御部３１は、スペクトル特性の変化の態様が短時間に多様に切り替わる区間を検出し、検出した区間を「こぶし」の技法が用いられている区間であると特定する。「こぶし」の場合は、短い区間において声色や発声方法を変化させて唸るような味わいを付加する技法であるため、この技法が用いられている区間においてはスペクトル特性が多様に変化するからである。また、制御部３１は、検出した区間の時間長を、この技法の「長さ」の程度として１０段階の数値で示される値に変換する。 In addition, the control unit 31 detects a section in which the mode of change of the spectral characteristics changes variously in a short time, and identifies the detected section as a section in which the “fist” technique is used. In the case of “fist”, this is a technique for adding a taste that changes the voice color and utterance method in a short section, so that the spectral characteristics change variously in the section where this technique is used. . Further, the control unit 31 converts the time length of the detected section into a value indicated by a numerical value of 10 steps as the degree of “length” of this technique.

以上のようにして、制御部３１は、模範者音声データから技法が用いられている区間と技法の態様とを検出し、検出した区間を示す区間情報をその技法を示す種別情報と技法の態様を示す情報とに関連付けた特徴抽出データを生成する（ステップＳ６）。
図７は、ステップＳ６において生成される特徴抽出データの内容の一例を示す図である。図示のように、このデータは、「技法種別」と「区間」と「態様」との各項目が互いに関連付けて構成されている。 As described above, the control unit 31 detects the section in which the technique is used and the mode of the technique from the model voice data, and sets the section information indicating the detected section as the type information indicating the technique and the mode of the technique. Feature extraction data associated with the information indicating is generated (step S6).
FIG. 7 is a diagram illustrating an example of the content of the feature extraction data generated in step S6. As shown in the figure, this data is configured by associating items of “technique type”, “section”, and “mode” with each other.

次いで、制御部３１は、生成した特徴抽出データの各パラメータを平均化して、図５に示した特徴データブロックを生成し、生成した特徴データブロックを、カラオケ装置２から受信した模範者ＩＤに関連付けて、特徴データ記憶領域３２ｄに記憶する（ステップＳ７）。具体的には、例えば、制御部３１は、「ビブラート」技法の「深さ」、「速さ」、「長さ」の各数値の平均値を算出し、「ビブラート」技法の特徴データの「深さ」、「速さ」、「長さ」の各パラメータに、算出した平均値を設定する。また、「ビブラート」の技法が用いられた区間の総数をカウントし、その数を１０段階の数値で示す値に変換し、変換した値を「頻度」のパラメータに設定する。この変換処理は、例えば、予め定められた値とカウントされた総数とを比較することによって行うようにしてもよく、または、複数の模範特徴抽出データから算出された技法毎の総数の平均値と比較して行うようにしてもよい。他の技法についても同様に、各パラメータの平均値及び総数を算出して、特徴データの各パラメータに算出した値を設定する。 Next, the control unit 31 averages each parameter of the generated feature extraction data to generate the feature data block shown in FIG. 5, and associates the generated feature data block with the model ID received from the karaoke apparatus 2. And stored in the feature data storage area 32d (step S7). Specifically, for example, the control unit 31 calculates an average value of each value of “depth”, “speed”, and “length” of the “vibrato” technique, and “feature data” of the characteristic data of the “vibrato” technique. The calculated average value is set for each parameter of “depth”, “speed”, and “length”. In addition, the total number of sections in which the “vibrato” technique is used is counted, the number is converted into a value indicated by a numerical value in 10 steps, and the converted value is set as a parameter of “frequency”. This conversion process may be performed, for example, by comparing a predetermined value with the counted total number, or an average value of the total number for each technique calculated from a plurality of exemplary feature extraction data and You may make it carry out by comparing. Similarly for other techniques, the average value and total number of each parameter are calculated, and the calculated value is set for each parameter of the feature data.

複数の模範者について図６に示す一連の処理が行われることによって、サーバ装置３の特徴データ記憶領域には、複数の模範者の歌唱音声の進行における技法が用いられている部分を示す特徴データを含む特徴データブロックが、模範者ＩＤに関連付けて記憶される。 By performing a series of processes shown in FIG. 6 for a plurality of model persons, the feature data storage area of the server device 3 indicates feature data indicating a portion where a technique for advancing the singing voices of the plurality of model persons is used. Is stored in association with the model ID.

＜Ｂ−２：ガイド音声データ加工動作＞
次に、ガイド音声データの加工動作について説明する。
図８に示すシーケンス図において、練習者は、カラオケ装置２の操作部２４を操作して、歌唱したい楽曲の楽曲ＩＤと好みの模範者の模範者ＩＤとを選択する操作を行い、楽曲の再生を指示する。操作部２４は操作された内容に応じた信号を制御部２１へ出力し、制御部２１は、操作部２４から供給される信号に応じて、楽曲に割り当てられた楽曲ＩＤを選択するとともに、模範者に割り当てられた模範者ＩＤを選択する（ステップＳ１１）。次いで、制御部２１は、選択した模範者ＩＤと楽曲ＩＤとを、通信部２８によってサーバ装置３に送信する（ステップＳ１２）。 <B-2: Guide voice data processing operation>
Next, the processing operation of the guide voice data will be described.
In the sequence diagram shown in FIG. 8, the practitioner operates the operation unit 24 of the karaoke device 2 to perform an operation of selecting a song ID of a song to be sung and a modeler ID of a favorite modeler, thereby reproducing the song. Instruct. The operation unit 24 outputs a signal corresponding to the operated content to the control unit 21, and the control unit 21 selects the song ID assigned to the song according to the signal supplied from the operation unit 24, and The model ID assigned to the person is selected (step S11). Next, the control unit 21 transmits the selected model ID and music ID to the server device 3 through the communication unit 28 (step S12).

カラオケ装置２から模範者ＩＤと楽曲ＩＤとをサーバ装置３が受信する。サーバ装置３の制御部３１は、模範者ＩＤと楽曲ＩＤとの入力を受け付ける（ステップＳ１３）。模範者ＩＤと楽曲ＩＤとの入力を受け付けると、制御部３１は、入力を受け付けた楽曲ＩＤに対応するガイド音声データをガイド音声データ記憶領域３２ａから読み出すとともに、入力を受け付けた模範者ＩＤに対応する特徴データブロックに含まれる特徴データを特徴データ記憶領域３２ｄから読み出し、読み出したガイド音声データに対し、読み出した特徴データに対応する部分を、当該特徴データに基づいて加工する（ステップＳ１４）。より詳細には、制御部３１は、楽曲ＩＤに対応するガイド音声データについて、当該楽曲ＩＤに対応する技法挿入箇所指定データの「タイミング」の項目に記憶されている時刻情報の示す箇所に、各技法に対応する加工処理を施す。このとき、制御部３１は、特徴データに含まれる各技法毎の「頻度」パラメータの値が、技法箇所指定データに含まれる各技法毎の「条件」を満たすか否かを判定し、条件を満たす場合には、「タイミング」の示す箇所にその技法に対応する加工処理を施す。一方、条件を満たさない場合には、制御部３１は加工処理を施さない。
具体的には、制御部３１は、楽曲ＩＤに対応するガイド音声データにおける「タイミング」の示す箇所に、特徴データの深さ、速さ、長さ等の各パラメータに応じて、ガイド音声データの波形を変換加工する。また、しゃくりの部分では、特徴データの急峻度、スタートピッチを反映して同じくピッチ変換加工、ファルセットでは、ファルセットに変化する部分の音程からスペクトル変換の加工を施す。他の技法についても同様にして、制御部３１は、ガイド音声データを、特徴データを用いて加工する。 The server device 3 receives the model ID and the song ID from the karaoke device 2. The control part 31 of the server apparatus 3 receives the input of model person ID and music ID (step S13). When the input of the model ID and the music ID is received, the control unit 31 reads the guide voice data corresponding to the music ID for which the input has been received from the guide voice data storage area 32a and corresponds to the model ID that has received the input. The feature data included in the feature data block to be read is read from the feature data storage area 32d, and a portion corresponding to the read feature data is processed based on the read guide data (step S14). More specifically, the control unit 31 sets each guide voice data corresponding to the music ID at a location indicated by the time information stored in the item “timing” of the technique insertion location designation data corresponding to the music ID. Apply processing corresponding to the technique. At this time, the control unit 31 determines whether or not the value of the “frequency” parameter for each technique included in the feature data satisfies the “condition” for each technique included in the technique location designation data. If the condition is satisfied, a processing corresponding to the technique is applied to the location indicated by “timing”. On the other hand, when the condition is not satisfied, the control unit 31 does not perform the processing.
Specifically, the control unit 31 sets the guide voice data in the location indicated by “timing” in the guide voice data corresponding to the music ID according to the parameters such as the depth, speed, and length of the feature data. Convert and process the waveform. In the shackle portion, the pitch conversion processing is similarly performed to reflect the steepness of the feature data and the start pitch, and in the case of the falset, the spectrum conversion is processed from the pitch of the part that changes to the falset. Similarly for other techniques, the control unit 31 processes the guide voice data using the feature data.

模範者ＩＤに対応する特徴データは、練習者の好みの模範者の歌唱音声の特徴を表すデータであるから、制御部３１が特徴データを用いてガイド音声データを加工処理することによって、練習者の好みの模範者の歌唱音声を模した歌唱音声を示すデータが生成される。具体的には、例えば、その模範者がフレーズの末尾に必ずビブラートをかける場合には、技法箇所指定データによって指定されているタイミングでビブラートがかかった歌唱音声を表すデータが制御部３１の加工処理によって生成される。または、その模範者が或る音高以上ではファルセットを用いる場合には、同一の音高以上でファルセットになる歌唱音声を表すデータが制御部３１の加工処理によって生成される。 Since the feature data corresponding to the modeler ID is data representing the features of the singing voice of the modeler favorite by the practitioner, the control unit 31 processes the guide voice data using the feature data, so that the practitioner Data indicating the singing voice imitating the singing voice of the favorite model person is generated. Specifically, for example, when the modeler always applies vibrato at the end of the phrase, data representing the singing voice to which vibrato is applied at the timing specified by the technique location specifying data is processed by the control unit 31. Generated by. Alternatively, when the model uses a falset at a certain pitch or higher, data representing a singing voice that becomes a falset at the same pitch or higher is generated by the processing of the control unit 31.

制御部３１は、加工処理を施したガイド音声データを、カラオケ装置２によって再生可能なデータ形式で送信することによって、ガイド音声データを出力する（ステップＳ１５）。カラオケ装置２の制御部２１は、受信したガイド音声データを再生する（ステップＳ１６）。つまり、制御部２１は、ガイド音声データを音声処理部２６に供給し、音声処理部２６がそのガイド音声データをアナログ信号に変換し、スピーカ２７から放音させる。これにより、練習者は、自らが真似たい歌手の歌唱音声を模した歌唱音声を聴くことができ、それを模範とすることで、自身の歌唱の上達を図ることができる。 The control unit 31 outputs the guide voice data by transmitting the processed guide voice data in a data format reproducible by the karaoke apparatus 2 (step S15). The control unit 21 of the karaoke apparatus 2 reproduces the received guide voice data (step S16). That is, the control unit 21 supplies the guide voice data to the voice processing unit 26, and the voice processing unit 26 converts the guide voice data into an analog signal and emits sound from the speaker 27. Thereby, the practitioner can listen to the singing voice imitating the singing voice of the singer that he / she wants to imitate, and can improve his / her singing by using it as a model.

以上説明したように本実施形態においては、制御部３１が、選択された楽曲ＩＤに対応するガイド音声データを、選択された模範者ＩＤに対応する特徴データを用いてガイド音声データを加工するから、これにより、練習者が真似したい模範者の歌唱音声を模した歌唱音声を練習者に模範として提供することができる。すなわち、或る練習者の真似したい模範者が、その楽曲を持ち歌としておらず、その楽曲を歌唱したことがない場合であっても、あたかもその模範者がその楽曲を歌唱したかのような歌唱音声を、練習者に聴かせることができる。練習者がその歌唱を模範として聞き、その歌い方をお手本とすることで、練習者の上達意欲が上がりやすい。 As described above, in the present embodiment, the control unit 31 processes the guide voice data corresponding to the selected music ID using the feature data corresponding to the selected model ID. Thus, the singing voice imitating the singing voice of the model person that the practitioner wants to imitate can be provided as a model to the practitioner. In other words, even if a modeler who wants to imitate a practitioner does not have the song as a song and has never sung the song, as if the modeler sang the song The trainer can hear the singing voice. The practitioner listens to the singing as a model and uses the way of singing as a model, so the practitioner's motivation is likely to improve.

＜Ｃ：変形例＞
以上、本発明の実施形態について説明したが、本発明は上述した実施形態に限定されることなく、他の様々な形態で実施可能である。以下にその一例を示す。
（１）上述した実施形態においては、ガイド音声データとして、或る歌唱者が歌唱した歌唱音声を表すデータを用いた。これに代えて、歌唱シンセサイザーなどを用いて伴奏データからガイドボーカルデータを生成し、生成したガイドボーカルデータをガイド音声データとして用いてもよい。この場合は、模範者毎の特徴データとして、模範者の声質の特徴を示すデータを特徴データに含めて保持しておけば、より模範者の歌唱音声に似たガイド音声データを作成することができる。 <C: Modification>
As mentioned above, although embodiment of this invention was described, this invention is not limited to embodiment mentioned above, It can implement with another various form. An example is shown below.
(1) In the above-described embodiment, data representing a singing voice sung by a certain singer is used as the guide voice data. Instead of this, guide vocal data may be generated from accompaniment data using a singing synthesizer or the like, and the generated guide vocal data may be used as guide voice data. In this case, if the data indicating the characteristics of the voice of the model person is included in the feature data as the characteristic data for each model person, the guide voice data more similar to the voice of the model person can be created. it can.

（２）上述した実施形態においては、模範者の歌唱音声から特徴データを生成した。これに加えて、練習者の歌唱音声を録音し、録音することによって生成された音声データに対して同様の処理を施して特徴データを生成し、模範者の特徴データと練習者の特徴データとを比較して、比較結果を練習者に報知してもよい。この場合は、具体的には、例えばカラオケ装置２のマイクロフォン２５で練習者の歌唱音声を収音して、音声処理部２６が練習者の歌唱音声を表す練習者音声データを生成し、生成した練習者音声データから、制御部２１が、ピッチ、タイミング、スペクトル、パワーを算出し、その算出結果から練習者の特徴データを生成する。そして、カラオケ装置２又はサーバ装置３が、練習者の特徴データと模範者の特徴データとを各パラメータ毎に比較し、カラオケ装置２の制御部２１が、比較結果を表示部２３に表示したり音声メッセージで出力するなどして比較結果を報知する。
このようにすれば、練習者は、報知結果を参照することで、自身の歌唱テクニック（技法の使い方）が模範者の歌唱テクニックにどれだけ近づいたか、又は、どれだけ似ているかを把握することができる。 (2) In the embodiment described above, the feature data is generated from the singing voice of the model person. In addition to this, the singing voice of the practitioner is recorded, and the voice data generated by recording is subjected to the same processing to generate feature data. The comparison result may be notified to the practitioner. In this case, specifically, for example, the practitioner's singing voice is picked up by the microphone 25 of the karaoke device 2, and the voice processing unit 26 generates and generates the practitioner voice data representing the practicing singing voice. The control unit 21 calculates the pitch, timing, spectrum, and power from the trainee voice data, and generates the trainer feature data from the calculation result. Then, the karaoke device 2 or the server device 3 compares the feature data of the practitioner and the feature data of the model person for each parameter, and the control unit 21 of the karaoke device 2 displays the comparison result on the display unit 23. The comparison result is notified by outputting it as a voice message.
In this way, the practitioner will know how close or similar his singing technique (how to use the technique) is to the exemplar's singing technique by referring to the notification result. Can do.

（３）上述した実施形態においては、１曲のリファレンス曲を用いて、複数の模範者の特徴データを抽出するようにした。特徴データを抽出するための模範者音声データは、同一のリファレンス曲を歌唱した際の音声データである必要はなく、複数の模範者が、それぞれ異なる楽曲を歌唱し、それぞれの歌唱音声から特徴データを抽出するようにしてもよい。 (3) In the above-described embodiment, feature data of a plurality of models are extracted using one reference song. The model voice data for extracting the feature data does not need to be voice data when the same reference song is sung, but a plurality of model people sing different songs, and feature data from each song voice May be extracted.

（４）上述した実施形態においては、ガイド音声データを加工する部分を特定するための技法挿入箇所指定データを、サーバ装置３の技法挿入箇所指定データ記憶領域３２ｃに予め記憶させておくようにした。これに代えて、サーバ装置３の制御部３１が、ガイド音声データを加工する部分を特定するためのデータを生成するようにしてもよい。具体的には、例えば、制御部３１が、或る楽曲について複数の模範音声データから特徴抽出データを生成し、生成した複数の特徴抽出データについて技法毎に統計をとって、各技法毎に、予め定められた閾値以上の割合でその技法が用いられている区間を示すデータを、ガイド音声データを加工する部分を特定するためのデータとして生成するようにしてもよい。
また、上述した実施形態においては、ガイド音声データにおいて、技法挿入箇所指定データによって示されるタイミング（部分）に技法を挿入する（ガイド音声データを加工する）ようにした。ガイドデータを加工する部分、すなわち特徴データに対応する部分の特定方法は、技法挿入箇所指定データに基づいて特定されるに限らず、例えば、サーバ装置３の制御部３１が楽譜音データやガイド音声データに基づいて特定するようにしてもよい。具体的には、例えば２分音符以上の音についてはビブラート技法を挿入すると特定する等、予め定められたアルゴリズムに基づいて特定すればよい。 (4) In the above-described embodiment, technique insertion location designation data for specifying a portion for processing the guide voice data is stored in advance in the technique insertion location designation data storage area 32c of the server device 3. . Instead of this, the control unit 31 of the server device 3 may generate data for specifying a portion for processing the guide voice data. Specifically, for example, the control unit 31 generates feature extraction data from a plurality of model voice data for a certain piece of music, takes statistics for each technique for the generated plurality of feature extraction data, and for each technique, You may make it produce | generate the data which show the area where the technique is used in the ratio more than a predetermined threshold value as data for specifying the part which processes guide audio | voice data.
In the above-described embodiment, the technique is inserted into the guide voice data at the timing (part) indicated by the technique insertion location designation data (the guide voice data is processed). The method for specifying the portion for processing the guide data, that is, the portion corresponding to the feature data is not limited to the specification based on the technique insertion location designation data. For example, the control unit 31 of the server device 3 performs the musical score sound data and the guide sound. You may make it identify based on data. Specifically, for example, it may be specified on the basis of a predetermined algorithm, such as specifying that a vibrato technique is inserted for a sound of half notes or more.

また、上述した実施形態においては、技法を挿入するか否かを、技法挿入箇所指定データに含まれる「条件」に応じて判定するようにしたが、判定方法はこれに限定されるものではなく、サーバ装置３の制御部３１が、例えば、各技法の頻度に応じた数の挿入箇所（タイミング）をランダムに抽出することによって判定するなど、予め定められたアルゴリズムに基づいて、そのタイミングの箇所に技法を挿入するか否かを判定するようにしてもよい。
または、技法を挿入するか否かを判定するようにせず、技法挿入箇所指定データによって示される部分の全てに技法を挿入するようにしてもよい。 In the above-described embodiment, whether to insert a technique is determined according to the “condition” included in the technique insertion location designation data. However, the determination method is not limited to this. The location of the timing based on a predetermined algorithm, for example, when the control unit 31 of the server device 3 makes a determination by randomly extracting the number of insertion locations (timing) according to the frequency of each technique. It may be determined whether or not a technique is to be inserted.
Alternatively, instead of determining whether or not to insert a technique, the technique may be inserted into all of the parts indicated by the technique insertion location designation data.

（５）上述した実施形態においては、歌唱音声を表すガイド音声データを加工する場合について説明したが、これに限らず、楽器の演奏音を表すガイド音声データを加工するようにしてもよい。この場合、上述した或る歌唱者の歌唱音声に代えて演奏者の演奏音を表すガイド音声データが用いられ、楽器演奏音の進行における特定の部分の特徴を示す特徴データが用いられる。また、伴奏・歌詞データ記憶領域２２ａには、楽譜に演奏音として規定された楽譜音データが記憶される。サーバ装置３の制御部３１は、これらのデータに基づき、上記と同様の処理を経てガイド音声データを加工する。 (5) In the embodiment described above, the case where the guide voice data representing the singing voice is processed has been described. However, the present invention is not limited to this, and the guide voice data representing the performance sound of the musical instrument may be processed. In this case, instead of the singing voice of a certain singer described above, guide voice data representing the performance sound of the performer is used, and feature data indicating the characteristics of a specific part in the progression of the musical instrument performance sound is used. In the accompaniment / lyric data storage area 22a, musical score sound data defined as performance sounds in the musical score is stored. Based on these data, the control unit 31 of the server device 3 processes the guide voice data through the same processing as described above.

（６）上述した実施形態においては、特徴データは、サーバ装置３の制御部３１によって生成されるようになっていたが、これに代えて、カラオケ装置２の制御部２１によって生成されるようにしてもよい。または、サーバ装置３の制御部３１が特徴データの入力を促し、予め用意された特徴データが入力されるようにしてもよい。
また、上述した実施形態においては、サーバ装置３が、カラオケ装置２から、模範者ＩＤと楽曲ＩＤとを受信することによって、模範者ＩＤと楽曲ＩＤとの入力を受け付けた。これに代えて、サーバ装置３に操作部を設けて、操作者が操作部を操作することによって模範者ＩＤと楽曲ＩＤとを入力し、サーバ装置３の制御部３１が、模範者ＩＤと楽曲ＩＤとの入力を受け付けるようにしてもよい。又は、ＵＳＢ（Universal Serial Bus）等のインタフェースを介して模範者ＩＤと楽曲ＩＤとの入力を制御部３１が受け付けるようにしてもよい。要するに、制御部３１が模範者ＩＤと楽曲ＩＤとの入力を受け付けるようにすればよい。 (6) In the above-described embodiment, the feature data is generated by the control unit 31 of the server device 3. Instead, the feature data is generated by the control unit 21 of the karaoke device 2. May be. Alternatively, the control unit 31 of the server device 3 may prompt the input of feature data, and the feature data prepared in advance may be input.
Moreover, in embodiment mentioned above, the server apparatus 3 received the model ID and music ID by receiving model ID and music ID from the karaoke apparatus 2. Instead, the server device 3 is provided with an operation unit, and the operator inputs the model ID and the music ID by operating the operation unit, and the control unit 31 of the server device 3 controls the model ID and music. You may make it receive the input with ID. Or you may make it the control part 31 receive the input of model ID and music ID via interfaces, such as USB (Universal Serial Bus). In short, the control unit 31 may accept input of the model ID and the music ID.

（７）なお、上述した実施形態においては、特徴データとして、「ビブラート」、「つっこみ」、「ため」などの技法を示すデータを用いたが、上述した実施形態で用いた技法を全て用いる必要はなく、いずれか一つを用いるようにしてもよく、複数を用いるようにしてもよい。また、特徴データは、上述した実施形態で示した以外の歌唱技法又は演奏技法が用いられている部分の特徴を示すデータであってもよい。例えば、弦楽器などで倍音を奏するいわゆるハーモニクスの技法が用いられている部分の特徴を示すデータであってもよい。
また、特徴データは、技法が用いられている部分の特徴を示すデータに限らず、例えば、歌唱が開始された部分などであってもよい。要するに、歌唱音声又は楽器演奏音の進行における特定の部分の特徴を示すデータであればよい。 (7) In the above-described embodiment, data indicating techniques such as “vibrato”, “push”, and “for” is used as the feature data. However, it is necessary to use all the techniques used in the above-described embodiment. However, any one of them may be used, or a plurality of them may be used. Further, the characteristic data may be data indicating characteristics of a part where a singing technique or performance technique other than those shown in the above-described embodiment is used. For example, it may be data indicating the characteristics of a portion where a so-called harmonics technique for playing overtones with a stringed instrument or the like is used.
Further, the feature data is not limited to data indicating the feature of the part where the technique is used, and may be, for example, a part where singing is started. In short, it may be data indicating the characteristics of a specific part in the progress of the singing voice or the musical instrument performance sound.

また、上述した実施形態においては、特徴データブロックを識別する識別情報として、模範者に割り当てられた模範者ＩＤを用いたが、特徴データブロックを識別する識別情報は、これに限らず、他の情報であってもよい。要するに、１又は複数の特徴データを含む特徴データブロックを複数記憶するとともに、各特徴データブロックを各々識別する複数の識別情報を記憶すればよい。 Further, in the above-described embodiment, the model ID assigned to the model person is used as the identification information for identifying the feature data block. However, the identification information for identifying the feature data block is not limited to this. It may be information. In short, a plurality of feature data blocks including one or a plurality of feature data may be stored, and a plurality of identification information for identifying each feature data block may be stored.

（８）なお、ガイド音声データや模範者音声データはＷＡＶＥ形式やＭＰ３形式のデータとしたが、データの形式はこれに限定されるものではなく、歌唱音声又は演奏音を表すデータであればどのような形式のデータであってもよい。 (8) Although the guide voice data and the model voice data are data in the WAVE format or MP3 format, the data format is not limited to this, and any data representing singing voice or performance sound can be used. Data in such a format may be used.

（９）上述した実施形態では、カラオケ装置２とサーバ装置３とがネットワーク４で接続された楽曲練習支援システム１が、上記実施形態に係る機能の全てを実現するようになっている。これに対し、ネットワークで接続された３以上の装置が上記機能を分担するようにし、それら複数の装置を備えるシステムが同実施形態のシステムを実現するようにしてもよい。または、ひとつの装置が上記機能の全てを実現するようにしてもよい。 (9) In the above-described embodiment, the music practice support system 1 in which the karaoke device 2 and the server device 3 are connected via the network 4 realizes all the functions according to the above-described embodiment. On the other hand, three or more devices connected via a network may share the above functions, and a system including the plurality of devices may realize the system of the embodiment. Alternatively, one device may realize all of the above functions.

（１０）上述した実施形態におけるカラオケ装置２の制御部２１又はサーバ装置３の制御部３１によって実行されるプログラムは、磁気テープ、磁気ディスク、フレキシブルディスク、光記録媒体、光磁気記録媒体、ＣＤ（Compact Disk）−ＲＯＭ、ＤＶＤ（Digital Versatile Disk）、ＲＡＭなどの記録媒体に記憶した状態で提供し得る。また、インターネットのようなネットワーク経由でカラオケ装置２又はサーバ装置３にダウンロードさせることも可能である。 (10) The program executed by the control unit 21 of the karaoke device 2 or the control unit 31 of the server device 3 in the above-described embodiment is a magnetic tape, a magnetic disk, a flexible disk, an optical recording medium, a magneto-optical recording medium, a CD ( It can be provided in a state of being stored in a recording medium such as a Compact Disk (ROM), a DVD (Digital Versatile Disk), or a RAM. It is also possible to download to the karaoke apparatus 2 or the server apparatus 3 via a network such as the Internet.

楽曲練習支援システムの全体構成を示すブロック図である。It is a block diagram which shows the whole structure of a music practice assistance system. カラオケ装置の構成を示すブロック図である。It is a block diagram which shows the structure of a karaoke apparatus. サーバ装置の構成を示すブロック図である。It is a block diagram which shows the structure of a server apparatus. サーバ装置によって記憶される技法挿入箇所指定データの一例を示す図である。It is a figure which shows an example of the technique insertion location designation | designated data memorize | stored by the server apparatus. サーバ装置によって記憶される特徴データブロックの一例を示す図である。It is a figure which shows an example of the characteristic data block memorize | stored by the server apparatus. 実施形態の動作を示すシーケンス図である。It is a sequence diagram which shows operation | movement of embodiment. 特徴抽出データの一例を示す図である。It is a figure which shows an example of the feature extraction data. 実施形態の動作を示すシーケンス図である。It is a sequence diagram which shows operation | movement of embodiment.

Explanation of symbols

１…楽曲練習支援システム、２，２ａ，２ｂ，２ｃ…カラオケ装置、３…サーバ装置、４…ネットワーク、２１…制御部、２２…記憶部、２３…表示部、２４…操作部、２５…マイクロフォン、２６…音声処理部、２７…スピーカ、２８…通信部、３１…制御部、３２…記憶部、３３…通信部。 DESCRIPTION OF SYMBOLS 1 ... Music practice support system 2, 2a, 2b, 2c ... Karaoke apparatus, 3 ... Server apparatus, 4 ... Network, 21 ... Control part, 22 ... Memory | storage part, 23 ... Display part, 24 ... Operation part, 25 ... Microphone , 26 ... audio processing unit, 27 ... speaker, 28 ... communication unit, 31 ... control unit, 32 ... storage unit, 33 ... communication unit.

Claims

A feature that stores a plurality of feature data blocks including one or a plurality of feature data indicating features of a specific part in the progress of a singing voice or a musical instrument performance sound, and stores a plurality of identification information for identifying each of the feature data blocks. Data storage means;
A guide voice data storage means for storing a plurality of sets of song identification information for identifying a song and guide voice data representing the singing voice or performance sound of the song in association with each other;
Input receiving means for receiving input of the identification information and the music identification information;
Guide voice data reading means for reading guide voice data corresponding to the music identification information received by the input receiving means from the guide voice data storage means;
Feature data reading means for reading out feature data included in the feature data block corresponding to the identification information received by the input receiving means from the feature data storage means;
Guide voice data processing means for processing a portion corresponding to the feature data read by the feature data reading means with respect to the guide voice data read by the guide voice data reading means, based on the feature data;
An output means for outputting the guide voice data processed by the guide voice data processing means.

The music practice support apparatus according to claim 1, wherein the specific part is a part where a singing technique or a performance technique is used.

3. The music practice support according to claim 2, wherein the specific part is a part in which at least one of a technique of vibrato, shackle, fist, falsetto, squeeze, and breathing is used. apparatus.

A feature that stores a plurality of feature data blocks including one or a plurality of feature data indicating features of a specific part in the progress of a singing voice or a musical instrument performance sound, and stores a plurality of identification information for identifying each of the feature data blocks. Music practice support comprising data storage means, music identification information for identifying music, and guide voice data storage means for associating and storing a plurality of sets of guide voice data representing singing voice or performance sound of the music, and control means An apparatus control method comprising:
The control means accepting input of the identification information and the music identification information;
Reading guide voice data corresponding to the music identification information received from the guide voice data storage means;
Reading out feature data contained in a feature data block corresponding to the identification information received from the feature data storage means;
Processing the portion corresponding to the read feature data with respect to the read guide voice data based on the feature data;
And a step of outputting the processed guide voice data.

A feature that stores a plurality of feature data blocks including one or a plurality of feature data indicating features of a specific part in the progress of a singing voice or a musical instrument performance sound, and stores a plurality of identification information for identifying each of the feature data blocks. A computer comprising: data storage means; and guide voice data storage means for storing a plurality of sets of song identification information for identifying a song and guide voice data representing the singing voice or performance sound of the song in association with each other.
An input receiving function for receiving input of the identification information and the music identification information;
A guide voice data reading function for reading guide voice data corresponding to the music identification information received by the input receiving function from the guide voice data storage unit;
A feature data reading function for reading out feature data included in the feature data block corresponding to the identification information received by the input receiving function from the feature data storage unit;
A guide voice data processing function for processing a portion corresponding to the feature data read by the feature data reading means with respect to the guide voice data read by the guide voice data reading function, based on the feature data;
A program for realizing an output function for outputting guide voice data processed by the guide voice data processing function.