JP7608242B2

JP7608242B2 - Content playback control system and program

Info

Publication number: JP7608242B2
Application number: JP2021061424A
Authority: JP
Inventors: 誠史 ▲高▼橋
Original assignee: Kabushiki Kaisha Bandai Namco Entertainment (also trading as Bandai Namco Entertainment Inc.); Namco Ltd
Current assignee: Kabushiki Kaisha Bandai Namco Entertainment (also trading as Bandai Namco Entertainment Inc.); Namco Ltd
Priority date: 2021-03-31
Filing date: 2021-03-31
Publication date: 2025-01-06
Anticipated expiration: 2041-03-31
Also published as: JP2022157292A

Description

本発明は、コンテンツ再生制御システム及びプログラムなどに関する。 The present invention relates to a content playback control system and program.

従来から、テキストだけでなく、漫画などのコンテンツに含まれるテキストに音声データを割り当てて、様々な音声によってテキストを朗読させるコンテンツ再生システムなるものが登場している。 Content playback systems have been developed that assign voice data to text contained in content such as manga, rather than just text, and allow the text to be read aloud using a variety of voices.

特に、最近では、漫画データにおけるキャラクタのセリフを音声として出力するための音声データの選択を受け付け、当該漫画データを表示させる際に、選択された音声データに基づいて、表示された漫画データのキャラクタにおけるセリフを音声として出力させるシステムが知られている（例えば、特許文献１）。 In particular, recently, a system has been known that accepts the selection of audio data for outputting the lines of characters in manga data as audio, and when the manga data is displayed, outputs the lines of the characters in the displayed manga data as audio based on the selected audio data (for example, Patent Document 1).

特開２０１８－１６９６９１号公報JP 2018-169691 A

しかしながら、特許文献１に記載のシステムにあっては、単に音声データをテキストに割り当てるだけであり、コンテンツを提供する事業者の収益を確保させ、かつ、従来のコンテンツ提供者によって予め定められたキャラクタの音声に制限されること無く、純粋にユーザの好きな音声によってコンテンツの聴取や視聴を実行させることは難しいとされている。 However, the system described in Patent Document 1 simply assigns voice data to text, and it is considered difficult to ensure profits for content providers and to allow users to listen to or watch content using their preferred voice without being restricted to the voices of characters predetermined by conventional content providers.

本発明は、上記課題を解決するためになされたものであり、その目的は、コンテンツそもそもの制作費などの現実的な状況を含めて様々な制限を排除し、かつ、よりリアルで聴取者や視聴などの好みに合致した配役体験を提供し、ユーザのコンテンツに対する興趣性を向上させるコンテンツ再生制御システムなどを提供することにある。 The present invention has been made to solve the above problems, and its purpose is to provide a content playback control system that eliminates various limitations, including realistic circumstances such as the production costs of the content itself, and provides a more realistic casting experience that matches the preferences of listeners and viewers, thereby increasing the user's interest in the content.

（１）上記課題を解決するため、本発明は、
記憶手段に記憶されている情報であって、ユーザに関するユーザ情報と、当該ユーザ情報に対応付けられており、かつ、発話者から採取された音素データから構成される発話音素情報と、を管理するユーザ情報管理手段と、
前記発話者の音声によって音声言語化するためのテキストがデータ化されたテキストデータ及び当該テキストを発話するキャラクタに関するキャラクタデータを少なくとも含むコンテンツデータから構成されるコンテンツ情報を管理するコンテンツ管理手段と、
所与の指示に基づいて、前記キャラクタデータに、前記音素データを割り当てて、前記コンテンツのテキストを音声言語化するための音声言語データを生成する生成処理を実行する生成処理手段と、
前記コンテンツデータのテキストに沿って前記キャラクタの音声を再生出力する再生出力手段に、前記生成された音声言語データを提供する提供制御処理を実行する提供制御手段と、
前記生成処理に用いる音素データ、キャラクタデータ及びテキストデータのうち、いずれか１のデータの使用に関するコストが規定されたコストパラメータを管理するコスト管理手段と、
前記コストパラメータに基づいて、前記生成処理を実行する際の実行コストを算出する算出処理を実行するコスト算出手段と、
前記算出処理によって算出された実行コストに対する前記ユーザの支払いの有無に基づいて、前記生成処理、及び、前記提供制御処理の少なくともいずれか一方の処理の実行許可の可否を判定する実行許可判定処理を実行する許可判定処理手段と、
を備える、構成を有している。 (1) In order to solve the above problems, the present invention provides
a user information management means for managing information stored in the storage means, the user information being related to a user and speech phoneme information being associated with the user information and being composed of phoneme data collected from a speaker;
a content management means for managing content information including content data including at least text data obtained by converting a text to be spoken by the voice of the speaker into data and character data related to a character who speaks the text;
a generation processing means for executing a generation process of assigning the phoneme data to the character data based on a given instruction to generate speech language data for converting the text of the content into speech language;
a provision control means for executing a provision control process for providing the generated speech language data to a reproduction output means for reproducing and outputting the voice of the character along with the text of the content data;
a cost management means for managing a cost parameter that defines a cost related to the use of any one of phoneme data, character data, and text data used in the generation process;
a cost calculation unit that executes a calculation process to calculate an execution cost for executing the generation process based on the cost parameters;
a permission determination processing means for executing an execution permission determination process for determining whether or not to permit execution of at least one of the generation process and the provision control process based on whether or not the user has paid the execution cost calculated by the calculation process;
The present invention has a configuration comprising:

この構成により、本発明は、ユーザによって実行コストの支払いが無い場合には音声言語データの生成、提供、又はその双方が実行されず、当該ユーザによって実行コストの支払いがある場合には音声言語データの生成、提供、又はその双方を実行させることができる。 With this configuration, the present invention is capable of preventing the generation, provision, or both of speech language data from being executed if the execution cost is not paid by the user, and of executing the generation, provision, or both of speech language data from being executed if the execution cost is paid by the user.

すなわち、本発明は、ユーザにおけるコストの支払いの有無によって音声言語データの利用を制限させることができるので、コンテンツや音素データを提供する事業者の収益を確保することができるとともに、従来のコンテンツ提供者によって予め定められたキャラクタ（声優や俳優などの）の音声に制限されること無く、純粋にユーザの好きな音声によってコンテンツの聴取や視聴を実行させることができる。 In other words, the present invention can restrict the use of speech language data depending on whether or not the user pays the cost, thereby ensuring profits for businesses that provide content and phoneme data, and allowing users to listen to and watch content using their preferred voice, without being limited to the voices of characters (such as voice actors and actors) predetermined by conventional content providers.

したがって、本発明は、コンテンツそもそもの制作費などの現実的な状況を含めて様々な制限を排除し、よりリアルで聴取者や視聴などの好みに合致した配役体験を提供し、ユーザのコンテンツに対する興趣性を向上させることができる。 The present invention therefore eliminates various limitations, including practical considerations such as the production costs of the content itself, and provides a more realistic casting experience that matches the preferences of listeners and viewers, thereby increasing users' interest in the content.

なお、「発話者」とは、例えば、声優、俳優、又は、アナウンサーなどの実際に声を発する者を示す。 Note that "speaker" refers to the person who actually produces the voice, such as a voice actor, actor, or announcer.

そして、「音素データ」とは、子音・母音・半母音などの分節音素、当該分節音素の関係性を示す声調・イントネーションを含む音の高さ、強勢やアクセント、方言の種別、言語の種別（日本語や英語）、及び、文字間における子音と母音のつながり（すなわち、連接要素）などが規定されている音の素となるデータを示す。 The term "phoneme data" refers to data that defines the phonemes, such as consonants, vowels, and semivowels, the pitch including tones and intonations that indicate the relationships between the phonemes, stress and accent, dialect type, language type (Japanese or English), and the connections between consonants and vowels between characters (i.e., connecting elements).

また、「キャラクタデータ」とは、テキストをセリフとして発する（すなわち、発話する）キャラクタのデータであり、例えば、映画、漫画、ゲーム、アニメーション、又は、小説などのコンテンツ内に登場するキャラクタに関するデータを示す。 In addition, "character data" refers to data of a character that recites (i.e. speaks) text as dialogue, and refers to data about characters that appear in content such as movies, manga, games, animation, or novels.

さらに、「コンテンツデータ」とは、例えば、ゲーム、アニメーション、映画、又は、漫画などの画像（静止画及び動画を含む。）、コンテンツに登場するキャラクタに関するキャラクタデータ、及び、画像に合わせた各キャラクタなどのセリフなどのテキストデータ、から構成されるコンテンツデータが記憶される。ただし、当該「コンテンツデータ」には、画像が無く、テキストデータ及びキャラクタデータから構成されるものが含まれてもよい。 Furthermore, the "content data" stored here is content data that is composed of images (including still images and moving images) such as games, animations, movies, or manga, character data about characters that appear in the content, and text data such as lines of each character that match the images. However, the "content data" may also include data that does not contain images and is composed of text data and character data.

そして、「コンテンツ情報の管理」とは、コンテツ情報がデータベースに読み出し可能に記憶されていること、又は、ネットワークなどの外部から取得することなどを示す。 "Management of content information" refers to content information being readably stored in a database, or being obtained from an external source such as a network.

上記に加えて、「コストパラメータ」とは、例えば、キャラクタデータ、テキストデータ若しくは音素データの音声言語データを生成するために必要なデータのコストやアイテムなどの消費量を定めたパラメータを示す。 In addition to the above, "cost parameters" refer to parameters that define the cost of data or the consumption of items, etc., required to generate speech language data, such as character data, text data, or phoneme data.

特に、「コストパラメータ」としては、システム内通貨若しくはシステム内で用いるアイテム（例えば、アイテム種別や数）に基づいて規定される消費量、又は、課金額が規定
されている。 In particular, the "cost parameter" is a consumption amount or a billing amount that is defined based on the currency in the system or the items (for example, item type or number) used in the system.

また、「提供制御処理」には、再生出力手段（例えば、ユーザの端末装置）にダウンロードさせて当該端末装置における再生を制御するための各種のデータを提供する処理、又は、音声言語データを含み、コンテンツ情報を再生し、その再生出力データをリアルタイムでユーザの端末装置に提供するストリーミング方式によって提供する処理が含まれる。ただし、再生出力手段は、システムに組み込まれた手段であってもよく、この場合には、提供制御処理として、再生出力手段を制御する制御処理を実行する。 The "provision control process" also includes a process of providing various data for downloading to a playback output means (e.g., a user's terminal device) and controlling playback on the terminal device, or a process of providing content information including speech language data by a streaming method in which the playback output data is provided to the user's terminal device in real time. However, the playback output means may be a means incorporated into the system, in which case a control process for controlling the playback output means is executed as the provision control process.

（２）また、上記課題を解決するため、本発明は、
記憶手段に記憶されている情報であって、ユーザに関するユーザ情報と、当該ユーザ情報に対応付けられており、かつ、発話者から採取された音素データから構成される発話音素情報と、を管理するユーザ情報管理手段と、
前記発話者の音声によって音声言語化するためのテキストがデータ化されたテキストデータ及び当該テキストを発話するキャラクタに関するキャラクタデータを少なくとも含むコンテンツデータから構成されるコンテンツ情報を管理するコンテンツ管理手段と、
所与の指示に基づいて、前記キャラクタデータに、前記音素データを割り当てて、前記コンテンツのテキストを音声言語化するための音声言語データを生成する生成処理を実行する生成処理手段と、
前記コンテンツデータのテキストに沿って前記キャラクタの音声を再生出力する再生出力手段に、前記生成された音声言語データを提供する提供制御処理を実行する提供制御手段と、
前記生成処理に用いる音素データ、キャラクタデータ及びテキストデータのうち、いずれか１のデータの使用に関するコストが規定されたコストパラメータを管理するコスト管理手段と、
前記コストパラメータと、予め設定されたコストの限界値と、が所与の関係条件を具備している場合に、前記生成処理、及び、前記提供制御処理の少なくともいずれか一方の処理の実行を許可する実行許可判定処理を実行する許可判定処理手段と、
を備える、構成を有している。 (2) In order to solve the above problems, the present invention provides
a user information management means for managing information stored in the storage means, the user information being related to a user and speech phoneme information being associated with the user information and being composed of phoneme data collected from a speaker;
a content management means for managing content information including content data including at least text data obtained by converting a text to be spoken by the voice of the speaker into data and character data related to a character who speaks the text;
a generation processing means for executing a generation process of assigning the phoneme data to the character data based on a given instruction to generate speech language data for converting the text of the content into speech language;
a provision control means for executing a provision control process for providing the generated speech language data to a reproduction output means for reproducing and outputting the voice of the character along with the text of the content data;
a cost management means for managing a cost parameter that defines a cost related to the use of any one of phoneme data, character data, and text data used in the generation process;
a permission determination processing means for executing an execution permission determination process for permitting execution of at least one of the generation process and the provision control process when the cost parameter and a preset cost limit value satisfy a given relational condition;
The present invention has a configuration comprising:

この構成により、本発明は、例えば、生成処理に用いる音素データのコストが、ユーザが有するコストの限界値又はコンテンツに設定されているコストの限界値を超えている場合などの関係性条件が具備されていない場合には音声言語データの生成、提供、又はその双方を実行せず、当該生成処理に用いる音素データのコストが、限界値内の場合などの当該関係性条件が具備された場合には、音声言語データの生成、提供、又は、その双方を実行させることができる。 With this configuration, the present invention can, for example, not generate or provide speech language data, or both, if a relationship condition is not met, such as when the cost of the phoneme data used in the generation process exceeds a cost limit value held by the user or a cost limit value set in the content, and can generate or provide speech language data, or both, if the relationship condition is met, such as when the cost of the phoneme data used in the generation process is within a limit value.

すなわち、本発明は、例えば、予めユーザが既に支払った範囲内か否か（サブスクリプションなどの予め定められた支払い額の範囲内か否か）によって、音声言語データの利用を制限させることができるので、コンテンツや音素データを提供する事業者の収益をも確保することができるとともに、従来のコンテンツ提供者によって予め定められたキャラクタ（声優や俳優などの）の音声に制限されること無く、純粋にユーザの好きな音声によってコンテンツの聴取や視聴を実行させることができる。 In other words, the present invention can restrict the use of speech language data depending on, for example, whether it is within the range of what the user has already paid in advance (whether it is within a predetermined payment amount such as a subscription), thereby ensuring profits for businesses that provide content and phoneme data, and allowing users to listen to and watch content purely using their preferred voice, without being limited to the voices of characters (such as voice actors and actors) predetermined by conventional content providers.

したがって、本発明は、現実的な状況を含めて様々な制限を排除し、よりリアルで聴取者や視聴などの好みに合致した配役体験を提供し、ユーザのコンテンツに対する興趣性を向上させることができる。 Therefore, the present invention can eliminate various limitations, including those in realistic situations, provide a more realistic casting experience that matches the preferences of listeners and viewers, and increase users' interest in the content.

なお、「予め設定されたコストの限界値」とは、例えば、ユーザが予め支払ったコストなどの上限値やそれに対応する値を示し、コンテンツに対応付けられて設定されている値
（例えば、コンテンツ毎に設定された値）でもよいし、ユーザに対応付けられた値であってもよい。 In addition, the "predetermined cost limit value" refers to, for example, an upper limit value of a cost paid in advance by a user or a value corresponding thereto, and may be a value set in association with the content (for example, a value set for each content) or a value associated with the user.

（３）また、本発明は、
前記ユーザのコンテンツに関する所与の状況を検出するユーザ状況検出手段を更に備え、
前記コスト管理手段が、
前記検出されたユーザの状況に基づいて、前記コストパラメータの変動を制御する変動制御処理を実行する、構成を有している。 (3) The present invention also provides
The method further comprises: detecting a given state of the user related to the content;
The cost management means:
The system is configured to execute a variation control process for controlling a variation of the cost parameter based on the detected user situation.

この構成により、本発明は、変動されたコストパラメータを用いて算出された実行コストに基づいて、又は、当該変動されたコストパラメータとコストの限界値とを比較することによって、実行許可判定処理を実行することができる。 With this configuration, the present invention can execute the execution permission determination process based on the execution cost calculated using the changed cost parameter, or by comparing the changed cost parameter with the cost limit value.

すなわち、本発明は、例えば、ユーザの課金額、又は、当該コンテンツを聴取や視聴するサービスにログインすることによってコンテンツの再生制御を行う場合には、ログイン状況などの所定の条件に応じて、実行コストや音素データの利用によるコストを変更することができる。 In other words, the present invention can change the execution cost or the cost of using phoneme data depending on predetermined conditions such as the amount charged to the user, or, in the case of controlling playback of content by logging in to a service for listening to or viewing the content, the login status, for example.

したがって、本発明は、このようなサービスを利用するユーザに応じて支払うコスト、割り当て可能な音素データの数、又は、利用可能なコンテンツ数を変化させることができるので、ユーザに対するサービスなどを充実させてユーザのコンテンツ利用の満足度を向上させることができる。 The present invention therefore makes it possible to vary the cost paid by users of such services, the number of phoneme data that can be assigned, or the number of available contents, thereby enhancing services for users and improving their satisfaction with content usage.

この結果、本発明は、長期に渡るサービスの利用を促し、事業者の収益性を確保させて適切なビジネス環境を構築させることができる。 As a result, the present invention can encourage long-term use of services, ensure the profitability of operators, and create an appropriate business environment.

なお、「ユーザの状況」には、例えば、
（Ａ１）ユーザの現在までの課金額の総計、
（Ａ２）コンテンツを聴取や視聴するサービスにログインすることによってコンテンツの再生制御を行う場合には、ログイン状況（ログインの頻度、総ログイン時間、又は、ログインによって獲得した特典の数・種別及び量）、
（Ａ３）コンテンツの現在までの利用時間（聴取時間や視聴時間）又は利用することによって獲得したポイント、及び、
（Ａ４）ユーザのランクやレベルなどの他のユーザからの優位性を示す優位度、
などが含まれる。 In addition, "user status" may include, for example,
(A1) The total amount charged to the user to date;
(A2) In the case where content playback control is performed by logging in to a service for listening to or viewing content, the login status (login frequency, total login time, or number, type, and amount of benefits acquired by logging in),
(A3) The amount of time spent using the content up to now (listening time or viewing time) or the points earned through using the content, and
(A4) A degree of superiority indicating the superiority of the user over other users, such as the user's rank or level,
etc.

また、「検出されたユーザの状況に基づいて、コストパラメータの変動を制御する」とは、変動値とコストパラメータとが対応付けられたテーブルデータを参照することによって当該変動値を定めること、又は、所与の演算式によってユーザの状況に基づいて変動値を算出することなどを示す。 In addition, "controlling the variation of the cost parameter based on the detected user's situation" refers to determining the variation value by referencing table data in which the variation value and the cost parameter are associated, or calculating the variation value based on the user's situation using a given arithmetic formula.

（４）また、本発明は、
前記コスト管理手段が、
前記コンテンツデータ、キャラクタデータ及びテキストデータのうち、いずれか１のデータに関する情報を示す関連情報に基づいて、前記コストパラメータの変動を制御する変動制御処理を実行する、構成を有している。 (4) The present invention also provides
The cost management means:
The apparatus is configured to execute a variation control process for controlling a variation of the cost parameter based on related information indicating information on any one of the content data, character data, and text data.

この構成により、本発明は、例えば、キャラクタの発話回数や人気度などの属性に応じて、実行コストを変更することができるので、ユーザの興趣性を向上させつつ、コンテン
ツなどを提供する事業者の収益を確保することができる。 With this configuration, the present invention can change the execution cost depending on attributes such as the number of times a character speaks and its popularity, thereby increasing user interest while ensuring profits for businesses that provide content, etc.

なお、「関連情報」とは、コンテンツデータの場合には、例えば、コンテンツのジャンルを示すジャンル情報、及び、テキストデータ又はキャラクタデータの場合には、例えば、テキスト又はキャラクタの属性を示す属性情報などが含まれる。 In the case of content data, "related information" includes, for example, genre information indicating the genre of the content, and in the case of text data or character data, attribute information indicating the attributes of the text or character.

また、「関連情報」とは、音素データの場合には、声優やアナウンサーなどの発話者のジャンル、性別、年齢や年代、又は、人気度などの音素データを生成する際の発話者の属性を示す属性情報が含まれる。 In the case of phoneme data, "related information" includes attribute information indicating the speaker's attributes when generating the phoneme data, such as the genre, gender, age or generation of the speaker (such as a voice actor or announcer), or popularity level.

そして、テキストの属性には、例えば、小説、漫画、ノンフィクション、新聞などのテキストの種別、及び、当該テキストが属するキャラクタ（当該テキストが発話されるキャラクタ）の属性が含まれる。 Text attributes include, for example, the type of text (such as novel, manga, non-fiction, newspaper, etc.) and the attributes of the character to which the text belongs (the character who speaks the text).

さらに、キャラクタの属性（すなわち、キャラクタの属性）には、動物・ロボット・人間などの種別、性別や年齢、方言やテキストの言語（他言語）の種別、人気度などの属性が含まれる。 Furthermore, the character's attributes (i.e., the character's attributes) include attributes such as type (e.g., animal, robot, human, etc.), gender, age, dialect, type of language of the text (other languages), popularity, etc.

（５）また、本発明は、
前記生成処理によって割り当てた音素データの組み合わせを検出する組み合わせ検出手段を更に備え、
前記コスト管理手段が、
前記検出された音素データの組み合わせの情報に基づいて、前記コストパラメータの変動を制御する変動制御処理を実行する、構成を有している。 (5) The present invention also provides
A combination detection means for detecting a combination of phoneme data assigned by the generation process is further provided,
The cost management means:
The apparatus is configured to execute a variation control process for controlling the variation of the cost parameter based on information on the detected combination of phoneme data.

この構成により、本発明は、例えば、同一の属性（例えば、発話者のジャンルが同一であること、又は、発話者が属する組織（グループ）が同一であることなど）、又は、同一の発話者によって採取されたデータなどの音素データの組み合わせによって実行コストを低減させることができるので、ユーザに対して利用しやすい環境を提供することができる。 With this configuration, the present invention can reduce execution costs by combining phoneme data such as data collected by the same attributes (e.g., speakers in the same genre, or speakers belonging to the same organization (group)), or the same speaker, and can provide an easy-to-use environment for users.

なお、「組み合わせの情報」には、例えば、同一の属性（例えば、発話者のジャンルが同一であること、又は、発話者が属する組織（グループ）が同一であることなど）であること、又は、同一の発話者によって採取されたデータであることなどが含まれる。 In addition, "combined information" includes, for example, the same attributes (for example, the speaker's genre is the same, or the speaker belongs to the same organization (group), etc.), or data collected by the same speaker, etc.

（６）また、本発明は、
前記生成処理手段が、
前記所与の指示としての前記ユーザの指示に基づいて、前記音素データが割り当てられていないキャラクタを特定キャラクタとして検出した場合には、当該特定キャラクタに、予め定められた音素データを設定する、構成を有している。 (6) The present invention also provides
The generation processing means
The configuration is such that, when a character to which no phoneme data is assigned is detected as a specific character based on the user's instruction as the given instruction, predetermined phoneme data is set to the specific character.

この構成により、本発明は、コンテンツデータに設定されている全てのキャラクタに音素データを割り当てる必要もないので、ユーザの操作性を向上させることができるとともに、たとえ、ユーザが音素データをキャラクタに割り当てることができない場合であっても、コンテンツの聴取又は視聴を行うこと、及び、ユーザが指定していないキャラクタに割り当てられた音素データにコストを発生させなければ、ユーザの負担をも軽減させることができる。 With this configuration, the present invention eliminates the need to assign phoneme data to all characters set in the content data, improving user operability, and also reducing the burden on the user by allowing the user to listen to or watch the content even if the user is unable to assign phoneme data to a character, and by not incurring costs for phoneme data assigned to characters not specified by the user.

したがって、本発明は、ユーザに対して利用しやすい環境を提供することができる。 Therefore, the present invention can provide users with an easy-to-use environment.

（７）また、本発明は、
前記生成処理手段が、
前記キャラクタの属性、及び、前記テキストの属性の少なくともいずれか一方の属性に基づいて生成された音素データのデータモデルを示すモデル情報に従って、前記音声言語データを生成する前記生成処理を実行し、
当該生成した音声言語データに基づいて前記モデル情報を学習させる学習処理を実行し、
前記コスト管理手段が、
前記モデル情報の学習処理の状況に基づいて、前記コストパラメータの変動を制御する変動制御処理を実行する、構成を有している。 (7) The present invention also provides
The generation processing means
executing the generation process for generating the speech language data in accordance with model information indicating a data model of phoneme data generated based on at least one of the attributes of the character and the attributes of the text;
A learning process is executed to learn the model information based on the generated speech language data;
The cost management means:
The system is configured to execute a variation control process for controlling the variation of the cost parameter based on the status of the learning process of the model information.

すなわち、本発明は、例えば、音声言語データが生成されればされるほど、すなわち、利用されればされるほどコストを低減又は増加させることができるので、ユーザに利益を還元すること、又は、より品質の良いデータに付加価値を付けて提供することができる。 In other words, the present invention can reduce or increase costs the more speech language data is generated, i.e., the more it is used, so that benefits can be passed on to users or better quality data can be provided with added value.

なお、「属性情報に基づいて生成された音素データのデータモデルを示すモデル情報」とは、人工知能（ＡＩ：Artificial Intelligent）の技術を用いたモデルの情報あって、例えば、音素データをテキストに割り当てることによって発話音（すなわち、音声としての発話される音）を構築して発話音声データを生成する際に、各テキストやコンテンツの属性情報に対応付けて各音素の変化量や変化態様などの特徴量を抽出し、当該抽出された特徴量について機械学習をすることなどによって生成された音素データのモデル情報をいう。 Note that "model information indicating a data model of phoneme data generated based on attribute information" refers to model information using artificial intelligence (AI) technology, and refers to model information of phoneme data generated, for example, when generating speech data by constructing speech sounds (i.e., sounds spoken as voice) by assigning phoneme data to text, by extracting features such as the amount and mode of change of each phoneme in association with the attribute information of each text or content, and by performing machine learning on the extracted features.

また、「学習処理」とは、例えば、評価された音声言語データ（すなわち、発話された音）を教師データとして用いるサポートベクターマシンやニューラルネットワーク（例えば、再帰型ニューラルネットワーク）などのディープラーニングを含む機械学習、又は、ＧＡＮ（敵対的生成ネットワーク）やアソシエーション分析などの教師データ無しのディープラーニングを含む機械学習を実行する処理を示す。 In addition, "learning process" refers to a process of performing machine learning including deep learning such as support vector machines and neural networks (e.g., recurrent neural networks) that use evaluated speech language data (i.e., spoken sounds) as training data, or deep learning without training data such as GAN (generative adversarial network) and association analysis.

さらに、「学習状況」には、例えば、学習回数、学習進度（所与期間における学習回数）、学習した音声言語データの評価値（例えば、人気度などの利用回数を含む。）、などが含まれる。 Furthermore, the "learning status" includes, for example, the number of times learning has been done, the learning progress (the number of times learning has been done in a given period of time), the evaluation value of the learned speech language data (including, for example, the number of times it has been used, such as its popularity), etc.

（８）また、上記課題を解決するため、本発明は、
記憶手段に記憶されている情報であって、ユーザに関するユーザ情報と、当該ユーザ情報に対応付けられてており、かつ、発話者から採取された音素データから構成される発話音素情報と、を管理するユーザ情報管理手段、
前記発話者の音声によって音声言語化するためのテキストがデータ化されたテキストデータ及び当該テキストを発話するキャラクタに関するキャラクタデータを少なくとも含むコンテンツデータから構成されるコンテンツ情報を管理するコンテンツ管理手段、
所与の指示に基づいて、前記キャラクタデータに、前記音素データを割り当てて、前記コンテンツのテキストを音声言語化するための音声言語データを生成する生成処理を実行する生成処理手段、
前記コンテンツデータのテキストに沿って前記キャラクタの音声を再生出力する再生出力手段に、前記生成された音声言語データを提供する提供制御処理を実行する提供制御手段、
前記生成処理に用いる音素データ、キャラクタデータ及びテキストデータのうち、いずれか１のデータの使用に関するコストが規定されたコストパラメータを管理するコスト管理手段、
前記コストパラメータに基づいて、前記生成処理を実行する際の実行コストを算出する算出処理を実行するコスト算出手段、及び、
前記算出処理によって算出された実行コストに対する前記ユーザの支払いの有無に基づいて、前記生成処理、及び、前記提供制御処理の少なくともいずれか一方の処理の実行許可の可否を判定する実行許可判定処理を実行する許可判定処理手段、
として機能させる、構成を有している。 (8) In order to solve the above problems, the present invention provides
a user information management means for managing information stored in the storage means, the user information relating to a user and speech phoneme information associated with the user information and composed of phoneme data collected from a speaker;
a content management means for managing content information including content data including at least text data obtained by converting a text to be spoken by the voice of the speaker into data and character data related to a character who speaks the text;
a generation processing means for executing a generation process of assigning the phoneme data to the character data based on a given instruction, and generating speech language data for converting the text of the content into speech language;
a provision control means for executing a provision control process for providing the generated speech language data to a reproduction output means for reproducing and outputting the voice of the character along with the text of the content data;
a cost management means for managing a cost parameter that defines a cost related to the use of any one of the phoneme data, the character data, and the text data used in the generation process;
a cost calculation unit that calculates an execution cost of the generation process based on the cost parameters; and
a permission determination processing means for executing an execution permission determination process for determining whether or not to permit execution of at least one of the generation process and the provision control process based on whether or not the user has paid the execution cost calculated by the calculation process;
The present invention has a configuration that functions as a

この構成により、本発明は、ユーザにおけるコストの支払いの有無によって音声言語データの利用を制限させることができるので、コンテンツや音素データを提供する事業者の収益を確保することができるとともに、従来のコンテンツ提供者によって予め定められたキャラクタ（声優や俳優などの）の音声に制限されること無く、純粋にユーザの好きな音声によってコンテンツの聴取や視聴を実行させることができる。 With this configuration, the present invention can restrict the use of speech language data depending on whether or not the user pays the cost, thereby ensuring profits for businesses that provide content and phoneme data, and allowing users to listen to and watch content using their preferred voice, without being limited to the voices of characters (such as voice actors and actors) predetermined by conventional content providers.

（９）また、上記課題を解決するため、本発明は、
記憶手段に記憶されている情報であって、ユーザに関するユーザ情報と、当該ユーザ情報に対応付けられており、かつ、発話者から採取された音素データから構成される発話音素情報と、を管理するユーザ情報管理手段、
前記発話者の音声によって音声言語化するためのテキストがデータ化されたテキストデータ及び当該テキストを発話するキャラクタに関するキャラクタデータを少なくとも含むコンテンツデータから構成されるコンテンツ情報を管理するコンテンツ管理手段、
所与の指示に基づいて、前記キャラクタデータに、前記音素データを割り当てて、前記コンテンツのテキストを音声言語化するための音声言語データを生成する生成処理を実行する生成処理手段、
前記コンテンツデータのテキストに沿って前記キャラクタの音声を再生出力する再生出力手段に、前記生成された音声言語データを提供する提供制御処理を実行する提供制御手段、
前記生成処理に用いる音素データ、キャラクタデータ及びテキストデータのうち、いずれか１のデータの使用に関するコストが規定されたコストパラメータを管理するコスト管理手段、及び、
前記コストパラメータと、予め設定されたコストの限界値と、が所与の関係条件を具備している場合に、前記生成処理、及び、前記提供制御処理の少なくともいずれか一方の処理の実行を許可する実行許可判定処理を実行する許可判定処理手段、
として機能させる、構成を有している。 (9) In order to solve the above problems, the present invention provides
a user information management means for managing information stored in the storage means, the user information relating to a user and speech phoneme information associated with the user information and composed of phoneme data collected from a speaker;
a content management means for managing content information including content data including at least text data obtained by converting a text to be spoken by the voice of the speaker into data and character data related to a character who speaks the text;
a generation processing means for executing a generation process of assigning the phoneme data to the character data based on a given instruction, and generating speech language data for converting the text of the content into speech language;
a provision control means for executing a provision control process for providing the generated speech language data to a reproduction output means for reproducing and outputting the voice of the character along with the text of the content data;
a cost management means for managing a cost parameter that defines a cost related to the use of any one of the phoneme data, the character data, and the text data used in the generation process; and
a permission determination processing means for executing an execution permission determination process for permitting execution of at least one of the generation process and the provision control process when the cost parameter and a preset cost limit value satisfy a given relational condition;
The present invention has a configuration that functions as a

この構成により、本発明は、例えば、予めユーザが既に支払った範囲内か否か（サブスクリプションなどの予め定められた支払い額の範囲内か否か）によって、音声言語データの利用を制限させることができるので、コンテンツや音素データを提供する事業者の収益をも確保することができるとともに、従来のコンテンツ提供者によって予め定められたキャラクタ（声優や俳優などの）の音声に制限されること無く、純粋にユーザの好きな音声によってコンテンツの聴取や視聴を実行させることができる。 With this configuration, the present invention can restrict the use of speech language data depending on, for example, whether it is within the range of what the user has already paid in advance (whether it is within the range of a predetermined payment amount such as a subscription), thereby ensuring profits for businesses that provide content and phoneme data, and allowing users to listen to and watch content purely using their preferred voice, without being limited to the voices of characters (such as voice actors and actors) predetermined by conventional content providers.

したがって、本発明は、現実的な状況を含めて様々な制限を排除し、よりリアルで聴取者や視聴などの好みに合致した配役体験を提供し、ユーザのコンテンツに対する興趣性を
向上させることができる。 Therefore, the present invention can eliminate various limitations, including those in realistic situations, provide a more realistic casting experience that matches the preferences of listeners and viewers, and increase users' interest in content.

一実施形態のゲームシステムの構成を示すシステム構成の一例を示す図である。FIG. 1 is a diagram illustrating an example of a system configuration showing the configuration of a game system according to an embodiment. 一実施形態のサーバ装置の機能ブロックを示す図である。FIG. 2 illustrates functional blocks of a server device according to an embodiment. 一実施形態の端末装置の機能ブロックを示す図である。FIG. 2 is a diagram illustrating functional blocks of a terminal device according to an embodiment. 一実施形態における実行許可判定処理を説明するための図（その１）である。FIG. 11 is a diagram (part 1) for explaining the execution permission determination process according to an embodiment. 一実施形態における実行許可判定処理を説明するための図（その２）である。FIG. 11 is a diagram (part 2) for explaining the execution permission determination process according to an embodiment. 一実施形態のコンテンツ情報記憶部、又は、音素情報記憶部のそれぞれに記憶されコンテンツ情報、又は、音素情報の一例を示す図である。3A and 3B are diagrams illustrating an example of content information or phoneme information stored in a content information storage unit or a phoneme information storage unit of an embodiment; 一実施形態のユーザ情報記憶部に記憶されユーザ情報の一例を示す図である。FIG. 4 is a diagram illustrating an example of user information stored in a user information storage unit of the embodiment. 一実施形態の実行コスト算出処理を含む実行コストに基づく実行許可判定処理を説明するための図（その１）である。FIG. 11 is a diagram (part 1) for explaining an execution permission determination process based on an execution cost including an execution cost calculation process according to an embodiment; 一実施形態の実行コスト算出処理を含む実行コストに基づく実行許可判定処理を説明するための図（その２）である。FIG. 2 is a diagram (part 2) for explaining the execution permission determination process based on the execution cost including the execution cost calculation process according to an embodiment. 一実施形態の実行コストと限界値とに基づく実行許可判定処理について説明するための図（その１）である。FIG. 11 is a diagram (part 1) for explaining the execution permission determination process based on the execution cost and the limit value according to an embodiment; 一実施形態の実行コストと限界値とに基づく実行許可判定処理について説明するための図（その２）である。FIG. 11 is a diagram (part 2) for explaining the execution permission determination process based on the execution cost and the limit value according to an embodiment; 一実施形態のサーバ装置によって実行される音声言語データ生成処理、及び、実行コストに基づく実行許可判定処理を含むコンテンツの視聴を開始する際のコンテンツ試聴開始処理の動作を示すフローチャート（その１）である。11 is a flowchart (part 1) showing the operation of a content preview start process when starting viewing of content, including a speech language data generation process executed by a server device of one embodiment and an execution permission determination process based on an execution cost. 一実施形態のサーバ装置によって実行される音声言語データ生成処理、及び、実行コストに基づく実行許可判定処理を含むコンテンツの視聴を開始する際のコンテンツ試聴開始処理の動作を示すフローチャート（その２）である。This is a flowchart (part 2) showing the operation of a content preview start process when starting viewing of content, including a speech language data generation process executed by a server device of one embodiment and an execution permission determination process based on an execution cost.

以下、本実施形態について説明する。なお、以下に説明する本実施形態は、特許請求の範囲に記載された本発明の内容を不当に限定するものではない。また本実施形態で説明される構成の全てが、本発明の必須構成要件であるとは限らない。 The present embodiment will be described below. Note that the present embodiment described below does not unduly limit the content of the present invention described in the claims. Furthermore, not all of the configurations described in the present embodiment are necessarily essential components of the present invention.

［１］コンテンツ提供システム
まず、図１を用いて本実施形態のコンテンツ提供システム１の概要及び概要構成について説明する。なお、図１は、本実施形態のコンテンツ提供システム１の構成を示すシステム構成の一例を示す図である。 [1] Content Providing System First, an overview and a general configuration of a content providing system 1 according to the present embodiment will be described with reference to Fig. 1. Fig. 1 is a diagram showing an example of a system configuration showing the configuration of the content providing system 1 according to the present embodiment.

本実施形態のコンテンツ提供システム１は、図１に示すように、ユーザに、漫画（アニメ）、映画、又はゲームなどのコンテンツを視聴（以下、「聴取」のみも含む。）させるサービスを提供するサーバ装置１０と、当該コンテンツを視聴するためにユーザが用いる端末装置２０（例えば、端末装置２０Ａ、２０Ｂ、２０Ｃ）と、がインターネット（ネットワークの一例）に接続可能に構成されている。 As shown in FIG. 1, the content providing system 1 of this embodiment is configured to include a server device 10 that provides a service that allows users to view (hereinafter, this also includes only "listening") to content such as manga (animation), movies, or games, and a terminal device 20 (e.g., terminal devices 20A, 20B, 20C) that users use to view the content, and is connectable to the Internet (an example of a network).

ユーザは、端末装置２０からサーバ装置１０にアクセスすることにより、インターネットを介してサーバ装置１０から送信されてくる各種のデータを受信し、コンテンツを視聴することができるようになっている。 By accessing the server device 10 from the terminal device 20, the user can receive various data transmitted from the server device 10 via the Internet and view the content.

そして、ユーザは、端末装置２０からサーバ装置１０にアクセスすることにより、他のユーザとの間でコミュニケーションを図ることができるようになっている。 The user can then communicate with other users by accessing the server device 10 from the terminal device 20.

サーバ装置１０は、インターネットを介して通信接続された端末装置２０を用いて、ユ
ーザにコンテンツを視聴させる（提供する）サービス（以下、「コンテンツ視聴サービス」ともいう。）を提供することが可能な情報処理装置である。 The server device 10 is an information processing device capable of providing a service (hereinafter also referred to as a “content viewing service”) that allows (provides) content to users using a terminal device 20 that is communicatively connected via the Internet.

また、サーバ装置１０は、コミュニケーション型のサービスを提供するＳＮＳサーバとして機能してもよい。 The server device 10 may also function as an SNS server that provides communication-type services.

なお、ＳＮＳサーバとは、複数のユーザ間でコミュニケーションを提供することが可能なサービスを提供する情報処理装置である。 An SNS server is an information processing device that provides a service that allows multiple users to communicate with each other.

特に、サーバ装置１０は、例えば、ＳＮＳサーバとして機能する場合には、提供するＳＮＳの動作環境（ＡＰＩ（アプリケーションプログラミングインタフェース）、プラットフォーム等）を利用してコンテンツ（具体的には、コンテンツを視聴させるために構成されたコンテンツデータ）を提供することができるようになっている。 In particular, when the server device 10 functions as an SNS server, for example, it is capable of providing content (specifically, content data configured for viewing the content) by utilizing the operating environment (API (Application Programming Interface), platform, etc.) of the SNS it provides.

具体的には、サーバ装置１０は、端末装置２０のＷｅｂブラウザ、例えばＨＴＭＬ、ＦＬＡＳＨ（登録商標）、ＣＧＩ、ＰＨＰ、ｓｈｏｃｋｗａｖｅ、Ｊａｖａ（登録商標）アプレット、ＪａｖａＳｃｒｉｐｔ（登録商標）など様々な言語で作られたブラウザ又は専用のアプリケーションを介して提供することができるようになっている。 Specifically, the server device 10 is capable of providing the information via a web browser of the terminal device 20, for example, a browser written in various languages such as HTML, FLASH (registered trademark), CGI, PHP, shockwave, Java (registered trademark) applet, JavaScript (registered trademark), or a dedicated application.

一方、サーバ装置１０は、１つの（装置、プロセッサ）で構成されていてもよいし、複数の（装置、プロセッサ）で構成されていてもよい。 On the other hand, the server device 10 may be composed of one (device, processor) or multiple (devices, processors).

そして、サーバ装置１０の記憶領域（後述する記憶部１４０）に記憶される課金情報、ログイン情報、コンテンツに関する各情報を、ネットワーク（イントラネット又はインターネット）を介して接続されたデータベース（広義には記憶装置、メモリ）に記憶するようにしてもよいし、ＳＮＳサーバとして機能する場合には、記憶領域に記憶されるユーザ情報記憶部１４８等の情報を、ネットワーク（イントラネット又はインターネット）を介して接続されたデータベース（広義には記憶装置、メモリ）に記憶するようにしてもよい。 The billing information, login information, and content-related information stored in the memory area (memory unit 140 described below) of server device 10 may be stored in a database (broadly speaking, a storage device, memory) connected via a network (intranet or the Internet), and when functioning as an SNS server, information stored in the memory area of user information storage unit 148, etc. may be stored in a database (broadly speaking, a storage device, memory) connected via a network (intranet or the Internet).

具体的には、本実施形態のサーバ装置１０は、端末装置２０のユーザ（すなわち、コンテンツの視聴を希望するユーザ）の操作に基づく入力情報を受信し、受信した入力情報に基づいてコンテンツの提供や視聴に関する各種の処理を行うようになっている。 Specifically, the server device 10 of this embodiment receives input information based on the operation of the user of the terminal device 20 (i.e., the user who wishes to view the content), and performs various processes related to the provision and viewing of the content based on the received input information.

そして、サーバ装置１０は、ユーザによって選択されたコンテンツを視聴させるためのデータ（すなわち、コンテンツデータ）などを端末装置２０に送信し、端末装置２０は、サーバ装置１０から受信したコンテンツデータなどを端末装置２０にユーザに視聴可能に提供する各種の処理を行うようになっている。 The server device 10 then transmits data (i.e., content data) for viewing the content selected by the user to the terminal device 20, and the terminal device 20 performs various processes to provide the content data received from the server device 10 to the terminal device 20 so that the user can view it.

なお、サーバ装置１０は、端末装置２０を介してユーザにコンテンツを視聴させる際に、ストリーミング方式によってコンテンツデータを提供してもよいし、ダウンロードさせて提供してもよい。 When allowing a user to view content via the terminal device 20, the server device 10 may provide the content data by streaming or by downloading.

端末装置２０は、スマートフォン、携帯電話、ＰＨＳ、コンピュータ、ゲーム装置、ＰＤＡ等、画像生成装置などの情報処理装置であり、インターネット（ＷＡＮ）、ＬＡＮなどのネットワークを介してサーバ装置１０に接続可能な装置である。なお、端末装置２０とサーバ装置１０との通信回線は、有線でもよいし無線でもよい。 The terminal device 20 is an information processing device such as a smartphone, a mobile phone, a PHS, a computer, a game device, a PDA, or an image generating device, and is a device that can be connected to the server device 10 via a network such as the Internet (WAN) or a LAN. The communication line between the terminal device 20 and the server device 10 may be wired or wireless.

特に、端末装置２０は、Ｗｅｂページ（ＨＴＭＬ形式のデータ）を閲覧可能なＷｅｂブラウザを備えている。すなわち、端末装置２０は、サーバ装置１０との通信を行うための
通信制御機能、及び、サーバ装置１０から受信したデータ（Ｗｅｂデータ、ＨＴＭＬ形式で作成されたデータなど）を用いて表示制御を行うとともに、ユーザ操作のデータをサーバ装置１０に送信するＷｅｂブラウザ機能などを備える。 In particular, the terminal device 20 has a web browser capable of browsing web pages (data in HTML format). That is, the terminal device 20 has a communication control function for communicating with the server device 10, and a web browser function for controlling display using data received from the server device 10 (web data, data created in HTML format, etc.) and transmitting data of user operations to the server device 10.

そして、端末装置２０は、Ｗｅｂブラウザ機能によって、サーバ装置１０から提供されたコンテンツを視聴するためのコンテンツデータや制御情報を取得して所定の処理を実行し、ユーザにコンテンツを視聴させる。 Then, the terminal device 20 uses the web browser function to obtain content data and control information for viewing the content provided by the server device 10, executes a predetermined process, and allows the user to view the content.

具体的には、端末装置２０は、所定コンテンツの視聴を希望する旨の要求をサーバ装置１０に対して行うと、サーバ装置１０のコンテンツを提供するサイトに接続され、コンテンツの視聴が開始される。 Specifically, when the terminal device 20 makes a request to the server device 10 to view a specific content, the terminal device 20 is connected to a site that provides the content of the server device 10, and viewing of the content begins.

そして、端末装置２０は、必要に応じてＡＰＩを用いることにより、ＳＮＳサーバとして機能するサーバ装置１０に所定の処理を行わせ、又は、ＳＮＳサーバとして機能するサーバ装置１０が管理するユーザ情報記憶部１４８を取得させて種々のＳＮＳなどと連動させてコンテンツの提供を実行する構成を有している。 The terminal device 20 is configured to use the API as necessary to cause the server device 10 functioning as an SNS server to perform a specified process, or to obtain the user information storage unit 148 managed by the server device 10 functioning as an SNS server and provide content in conjunction with various SNSs, etc.

［２］サーバ装置
次に、図２を用いて本実施形態のサーバ装置１０について説明する。なお、図２は、本実施形態のサーバ装置１０の機能ブロックを示す図である。また、本実施形態のサーバ装置１０は図２の構成要素（各部）の一部を省略した構成としてもよい。 [2] Server Device Next, the server device 10 of this embodiment will be described with reference to Fig. 2. Fig. 2 is a diagram showing functional blocks of the server device 10 of this embodiment. Also, the server device 10 of this embodiment may have a configuration in which some of the components (parts) of Fig. 2 are omitted.

サーバ装置１０は、管理者やその他の入力に用いるための入力部１２０、所定の表示を行う表示部１３０、所定の情報が記憶された情報記憶媒体１８０、端末装置２０やその他と通信を行う通信部１９６、主に提供するコンテンツに関する処理を実行する処理部１００、及び、主にコンテンツに用いる各種のデータを記憶する記憶部１４０を含む。 The server device 10 includes an input unit 120 for use in administrator and other input, a display unit 130 for performing predetermined displays, an information storage medium 180 in which predetermined information is stored, a communication unit 196 for communicating with the terminal device 20 and others, a processing unit 100 that mainly executes processing related to the content to be provided, and a storage unit 140 that mainly stores various data used for the content.

入力部１２０は、システム管理者等がコンテンツに関する設定やその他の必要な設定、データの入力に用いるものである。例えば、本実施形態の入力部１２０は、マウスやキーボード等によって構成される。 The input unit 120 is used by a system administrator or the like to input content-related settings and other necessary settings and data. For example, the input unit 120 in this embodiment is configured with a mouse, keyboard, etc.

表示部１３０は、システム管理者用の操作画面を表示するものである。例えば、本実施形態の表示部１３０は、液晶ディスプレイ等によって構成される。 The display unit 130 displays an operation screen for the system administrator. For example, the display unit 130 in this embodiment is configured with a liquid crystal display or the like.

情報記憶媒体１８０（コンピュータにより読み取り可能な媒体）は、プログラムやデータなどを格納するものであり、その機能は、光ディスク（ＣＤ、ＤＶＤ）、光磁気ディスク（ＭＯ）、磁気ディスク、ハードディスク、磁気テープ、或いはメモリ（ＲＯＭ）などによって構成される。 The information storage medium 180 (a computer-readable medium) stores programs, data, etc., and its functions are realized by optical disks (CD, DVD), magneto-optical disks (MO), magnetic disks, hard disks, magnetic tapes, or memory (ROM), etc.

通信部１９６は、外部（例えば、端末、他のサーバや他のネットワークシステム）との間で通信を行うための各種制御を行うものであり、その機能は、各種プロセッサ又は通信用ＡＳＩＣなどのハードウェアや、プログラムなどによって構成される。 The communication unit 196 performs various controls for communicating with the outside (e.g., terminals, other servers, or other network systems), and its functions are configured by hardware such as various processors or communication ASICs, programs, etc.

記憶部１４０は、処理部１００や通信部１９６などのワーク領域となるもので、その機能は、ＲＡＭ（ＶＲＡＭ）などによって構成される。 The memory unit 140 serves as a work area for the processing unit 100, communication unit 196, etc., and its functions are realized by RAM (VRAM) etc.

なお、記憶部１４０に記憶される情報は、データベースで管理してもよい。また、本実施形態の記憶部１４０は、本発明の記憶手段を構成する。 The information stored in the storage unit 140 may be managed in a database. The storage unit 140 in this embodiment constitutes the storage means of the present invention.

また、本実施形態の記憶部１４０は、主記憶部１４２の他に、
（Ａ１）各コンテンツに関するデータ（以下、「コンテンツデータ」という。）を有し、各コンテンツデータに基づいてコンテンツを視聴する際のコストに関する情報（以下、「コンテンツコスト情報」という。）を含む、コンテンツ情報が記憶されるコンテンツ情報記憶部１４４と、
（Ａ２）コンテンツを再生する際に、当該コンテンツに登場するキャラクタに割り当てられ、発話者から予め採取された音素データ（例えば、後述のデータモデル）を有し、当該音素データを使用する際のコストに関する情報（以下、「音素データコスト情報」という。）を含む音素情報が記憶される音素情報記憶部１４６と、
（Ａ３）各ユーザが所有するコンテンツ（所有の有無に関係なく視聴可能なコンテンツを含む。）の情報、及び、ユーザが使用可能な音素データ（所有の有無に関係なく使用可能な音素データを含む。）の情報と、各ユーザに関する情報と、を有するユーザ情報と、各ユーザにおける当該コンテンツを視聴する際に支払われるコストの管理に関する情報（以下。「ユーザコスト情報」ともいう。）と、が各ユーザに対応付けて記憶されるユーザ情報記憶部１４８と、
（Ａ４）コンテンツのテキストを音声言語化するための音声言語データを生成する生成処理（以下、「音声言語データ生成処理」という。）を含む、各処理を実行するためのアプリケーションなどのコンテンツの視聴を実行するために必要なデータ（例えば、テーブルデータなど）が記憶されるアプリケーション情報記憶部１４９と、
を有している。 In addition to the main memory unit 142, the memory unit 140 of this embodiment includes:
(A1) a content information storage unit 144 having data on each content (hereinafter referred to as "content data") and storing content information including information on the cost of viewing the content based on each content data (hereinafter referred to as "content cost information");
(A2) a phoneme information storage unit 146 for storing phoneme information that is assigned to a character appearing in content when the content is played back, has phoneme data (e.g., a data model described later) collected in advance from a speaker, and includes information regarding the cost of using the phoneme data (hereinafter referred to as "phoneme data cost information");
(A3) a user information storage unit 148 in which user information including information on content owned by each user (including content that can be viewed regardless of whether the content is owned by the user) and information on phoneme data that can be used by the user (including phoneme data that can be used regardless of whether the content is owned by the user), and information on each user, and information on management of costs paid when each user views the content (hereinafter also referred to as "user cost information") are stored in association with each user;
(A4) an application information storage unit 149 in which data (e.g., table data, etc.) necessary for viewing content, such as applications for executing various processes, including a generation process for generating speech language data for converting the text of the content into speech language (hereinafter referred to as "speech language data generation process"), is stored;
It has.

処理部１００は、記憶部１４０内の主記憶部１４２をワーク領域として各種処理を行う。処理部１００の機能は各種プロセッサ（ＣＰＵ、ＤＳＰ等）、ＡＳＩＣ（ゲートアレイ等）などのハードウェアや、プログラムにより実現できる。 The processing unit 100 performs various processes using the main memory unit 142 in the memory unit 140 as a work area. The functions of the processing unit 100 can be realized by hardware such as various processors (CPU, DSP, etc.) and ASICs (gate arrays, etc.), or by programs.

処理部１００は、情報記憶媒体１８０に格納されるプログラム（データ）に基づいて本実施形態の種々の処理を行う。即ち情報記憶媒体１８０には、本実施形態の各部としてコンピュータを機能させるためのプログラム（各部の処理をコンピュータに実行させるためのプログラム）が記憶される。 The processing unit 100 performs various processes of this embodiment based on programs (data) stored in the information storage medium 180. That is, the information storage medium 180 stores programs for causing a computer to function as each part of this embodiment (programs for causing a computer to execute the processes of each part).

例えば、処理部１００（プロセッサ）は、情報記憶媒体に記憶されているプログラムに基づいて、サーバ装置１０全体の制御を行うとともに、各部間におけるデータ等の受け渡しの制御などの各種の処理を行う。さらに、端末装置２０からの要求に応じた各種サービスを提供する処理を行う。 For example, the processing unit 100 (processor) controls the entire server device 10 based on a program stored in an information storage medium, and performs various processes such as controlling the transfer of data between each unit. Furthermore, it performs processes to provide various services in response to requests from the terminal device 20.

具体的には、本実施形態の処理部１００は、通信制御部１０１、Ｗｅｂ処理部１０２、ログイン管理部１０３、ユーザ管理部１０４、コンテンツ管理部１０５、発話音声生成処理部１０６、コスト管理部１０７、タイマ管理部１０９、及び、情報提供部１１０を少なくとも有している。 Specifically, the processing unit 100 of this embodiment has at least a communication control unit 101, a Web processing unit 102, a login management unit 103, a user management unit 104, a content management unit 105, a speech generation processing unit 106, a cost management unit 107, a timer management unit 109, and an information provision unit 110.

なお、例えば、本実施形態のユーザ管理部１０４は、本発明のユーザ情報管理手段及びユーザ状況検出手段を構成し、コンテンツ管理部１０５は、本発明のコンテンツ管理手段、及び、提供制御処理手段を構成する。また、発話音声生成処理部１０６は、本発明の生成処理手段及び組み合わせ検出手段を構成し、コスト管理部１０７は、本発明のコスト算出手段、コスト管理手段及び許可判定処理手段を構成する。 For example, the user management unit 104 of this embodiment constitutes the user information management means and user status detection means of the present invention, the content management unit 105 constitutes the content management means and provision control processing means of the present invention, the speech sound generation processing unit 106 constitutes the generation processing means and combination detection means of the present invention, and the cost management unit 107 constitutes the cost calculation means, cost management means, and permission determination processing means of the present invention.

通信制御部１０１は、端末装置２０とネットワークを介してデータを送受信する処理を行う。すなわち、サーバ装置１０は、通信制御部１０１によって端末装置２０等から受信した情報に基づいて各種処理を行う。 The communication control unit 101 performs processing to transmit and receive data to and from the terminal device 20 via the network. That is, the server device 10 performs various processing based on information received from the terminal device 20, etc. by the communication control unit 101.

特に、本実施形態の通信制御部１０１は、ユーザの端末装置２０からの要求に基づいて
、コンテンツデータ及び当該コンテンツデータの再生に用いられるデータや情報を、当該ユーザの端末装置２０に送信する処理を行う。 In particular, the communication control unit 101 of this embodiment performs processing for transmitting content data and data and information used for reproducing the content data to the terminal device 20 of the user, based on a request from the terminal device 20 of the user.

また、通信制御部１０１は、端末装置２０に入力されたユーザの指示を受け付けるための各種の処理を実行する。 The communication control unit 101 also performs various processes to accept user instructions input to the terminal device 20.

Ｗｅｂ処理部１０２は、Ｗｅｂサーバとして機能する。例えば、Ｗｅｂ処理部１０２は、ＨＴＴＰ（ＨｙｐｅｒｔｅｘｔＴｒａｎｓｆｅｒＰｒｏｔｏｃｏｌ）等の通信プロトコルを通じて、端末装置２０にインストールされているＷｅｂブラウザ２１１の要求に応じてデータを送信する処理、及び、端末装置２０のＷｅｂブラウザ２１１によって送信されるデータを受信する処理を行う。 The web processing unit 102 functions as a web server. For example, the web processing unit 102 performs a process of transmitting data in response to a request from a web browser 211 installed in the terminal device 20, and a process of receiving data transmitted by the web browser 211 of the terminal device 20, via a communication protocol such as HTTP (Hypertext Transfer Protocol).

なお、本実施形態のサーバ装置１０は、コンテンツ用のサーバと、ＳＮＳ用のサーバとを別々に形成してもよいし、同一のサーバによって構成されてもよい。また、本実施形態のコンテンツをユーザに視聴及び提供させるための各種の処理は、サーバ装置１０が一部又は全部を実行してもよいし、各ユーザの端末装置２０がその一部を実行してもよい。 The server device 10 of this embodiment may be configured with a server for content and a server for SNS separately, or may be configured with the same server. Also, the various processes for allowing users to view and receive the content of this embodiment may be executed in part or in whole by the server device 10, or part of them may be executed by each user's terminal device 20.

ログイン管理部１０３は、各ユーザのコンテンツ視聴サービスに対するログインに関する管理を行う。 The login management unit 103 manages each user's login to the content viewing service.

具体的には、ログイン管理部１０３は、各ユーザにおける、ログイン回数（総ログイン回数・所定期間内のログイン回数）及び連続ログイン日数、並びに、コンテンツの総視聴時間や所定期間内（例えば、直近１週間や１月）の視聴期間をユーザ情報記憶部１４８に登録し、プレーヤ毎にプレーヤ関連情報として管理する。 Specifically, the login management unit 103 registers the number of logins (total number of logins/number of logins within a specified period) and number of consecutive days of login for each user, as well as the total viewing time of content and the viewing period within a specified period (e.g., the past week or month) in the user information storage unit 148, and manages them as player-related information for each player.

ユーザ管理部１０４は、ユーザ毎に、ユーザ毎に、ユーザ情報記憶部１４８に、ユーザ情報とユーザコスト情報とを記録して管理する。 The user management unit 104 records and manages user information and user cost information for each user in the user information storage unit 148.

コンテンツ管理部１０５は、コンテンツ情報記憶部１４４に記憶されている各コンテンツにおけるコンテンツ情報と、音素情報記憶部１４６に記憶された各発話音素情報と、を管理する。 The content management unit 105 manages the content information for each piece of content stored in the content information storage unit 144 and the speech phoneme information stored in the phoneme information storage unit 146.

特に、コンテンツ管理部１０５は、端末装置２０にコンテンツデータを含むコンテンツ情報を提供し、各ユーザにおけるコンテンツデータの提供及び再生の制御に関する処理を実行する。 In particular, the content management unit 105 provides content information including content data to the terminal device 20, and performs processing related to the provision of content data to each user and control of playback.

発話音声生成処理部１０６は、プレーヤの指示に基づいて、又は、プログラムに従って、コンテンツデータに含まれるキャラクタに、発話音素情報に含まれる音素データを割り当てて、コンテンツデータの当該キャラクタが発話するテキストを音声言語化するための音声言語データを生成する音声言語データ生成処理を実行する。 The speech generation processing unit 106 executes a speech language data generation process that assigns phoneme data included in the speech phoneme information to a character included in the content data based on the player's instructions or in accordance with a program, and generates speech language data for converting text spoken by the character in the content data into speech language.

コスト管理部１０７は、生成処理に用いる音素データ、キャラクタデータ及びテキストデータの使用に関するコストが規定されたパラメータ（以下、「コストパラメータ」ともいう。）を管理する。 The cost management unit 107 manages parameters (hereinafter also referred to as "cost parameters") that define the costs associated with the use of phoneme data, character data, and text data used in the generation process.

特に、コスト管理部１０７は、生成処理などコンテンツをユーザに視聴するための処理を実行する際のコスト（以下、「実行コスト」という。）を算出する算出処理を実行する。 In particular, the cost management unit 107 executes a calculation process to calculate the cost (hereinafter referred to as "execution cost") of executing a process for allowing a user to view content, such as a generation process.

タイマ管理部１０９は、タイマ機能を有し、ストリーミングなどによって端末装置２０
にコンテンツを提供する際に、当該コンテンツの再生状況を管理するために用いる。特に、タイマ管理部１０９は、コンテンツ管理部１０５と連動し、現在時刻や予め設定された時刻を各部に出力する。また、タイマ管理部１０９は、各端末装置２０と同期を取るために用いられる。 The timer management unit 109 has a timer function and is configured to manage the terminal device 20 by streaming or the like.
The timer management unit 109 is used to manage the playback status of the content when providing the content to the terminal device 20. In particular, the timer management unit 109 works in conjunction with the content management unit 105 to output the current time and a preset time to each unit. The timer management unit 109 is also used to synchronize with each terminal device 20.

情報提供部１１０は、端末装置２０によってコンテンツを再生させるため各種のコンテンツ情報を生成して該当する端末装置２０に提供する。 The information providing unit 110 generates various content information to enable the terminal device 20 to play the content, and provides the information to the corresponding terminal device 20.

［３］端末装置
次に、図３を用いて本実施形態の端末装置２０について説明する。なお、図３は、本実施形態の端末装置２０の機能ブロックを示す図である。また、本実施形態の端末装置２０は図２の構成要素（各部）の一部を省略した構成としてもよい。 [3] Terminal Device Next, the terminal device 20 of this embodiment will be described with reference to Fig. 3. Fig. 3 is a diagram showing functional blocks of the terminal device 20 of this embodiment. Also, the terminal device 20 of this embodiment may have a configuration in which some of the components (parts) of Fig. 2 are omitted.

入力部２６０は、ユーザからの入力情報を入力するための機器であり、ユーザの入力情報を処理部２００に出力する。本実施形態の入力部２６０は、ユーザの入力情報（入力信号）を検出する検出部２６２を備える。入力部２６０は、例えば、レバー、ボタン、ステアリング、マイク、タッチパネル型ディスプレイ、キーボード、マウスなどがある。 The input unit 260 is a device for inputting input information from the user, and outputs the user's input information to the processing unit 200. In this embodiment, the input unit 260 includes a detection unit 262 that detects the user's input information (input signal). Examples of the input unit 260 include a lever, a button, a steering wheel, a microphone, a touch panel display, a keyboard, and a mouse.

記憶部２７０は、処理部２００や通信部２９６などのワーク領域となるもので、その機能はＲＡＭ（ＶＲＡＭ）などにより実現できる。そして、本実施形態の記憶部２７０は、ワーク領域として使用される主記憶部２７１と、最終的な表示画像等が記憶される画像バッファ２７２とを含む。なお、これらの一部を省略する構成としてもよい。 The memory unit 270 serves as a work area for the processing unit 200, the communication unit 296, etc., and its functions can be realized by a RAM (VRAM) or the like. The memory unit 270 of this embodiment includes a main memory unit 271 used as a work area, and an image buffer 272 in which the final display image, etc. are stored. Note that some of these may be omitted.

情報記憶媒体２８０（コンピュータにより読み取り可能な媒体）は、プログラムやデータなどを格納するものであり、その機能は、光ディスク（ＣＤ、ＤＶＤ）、光磁気ディスク（ＭＯ）、磁気ディスク、ハードディスク、磁気テープ、或いはメモリ（ＲＯＭ）などにより実現できる。 The information storage medium 280 (a computer-readable medium) stores programs, data, etc., and its functions can be realized by optical disks (CD, DVD), magneto-optical disks (MO), magnetic disks, hard disks, magnetic tapes, or memory (ROM), etc.

処理部２００は、情報記憶媒体２８０に格納されるプログラム（データ）に基づいて本実施形態の種々の処理を行う。情報記憶媒体２８０には、本実施形態の各部としてコンピュータを機能させるためのプログラム（各部の処理をコンピュータに実行させるためのプログラム）を記憶することができる。 The processing unit 200 performs various processes of this embodiment based on programs (data) stored in the information storage medium 280. The information storage medium 280 can store programs for causing a computer to function as each part of this embodiment (programs for causing a computer to execute the processing of each part).

なお、本実施形態では、サーバ装置１０が有する情報記憶媒体１８０や記憶部１４０に記憶されている本実施形態の各部としてコンピュータを機能させるためのプログラムやコンテンツデータを含むコンテンツ情報などを、ネットワークを介して受信し、受信したプログラムやデータを情報記憶媒体２８０に記憶する。 In this embodiment, content information including programs and content data for causing a computer to function as each part of this embodiment, which are stored in the information storage medium 180 and storage unit 140 of the server device 10, is received via a network, and the received programs and data are stored in the information storage medium 280.

なお、記憶部２７０には、サーバ装置１０から受信したプログラムやデータが記憶される。このようにプログラムやデータを受信してネットワークシステムを機能させる場合も本発明の範囲内に含む。 The storage unit 270 stores programs and data received from the server device 10. The scope of the present invention also includes cases where a network system functions by receiving programs and data in this manner.

表示部２９０は、本実施形態により生成された画像を出力するものであり、その機能は、ＣＲＴ、ＬＣＤ、タッチパネル型ディスプレイ、或いはＨＭＤ（ヘッドマウントディスプレー）などにより実現できる。 The display unit 290 outputs the image generated by this embodiment, and its function can be realized by a CRT, LCD, touch panel display, or HMD (head mounted display), etc.

音出力部２９２は、本実施形態により生成された音を出力するものであり、その機能は、スピーカ、或いはヘッドフォンなどにより実現できる。 The sound output unit 292 outputs the sound generated by this embodiment, and this function can be realized by a speaker or headphones, etc.

通信部２９６は、外部（例えば他の端末、サーバ）との間で通信を行うための各種制御
を行うものであり、その機能は、各種プロセッサ又は通信用ＡＳＩＣなどのハードウェアや、プログラムなどにより実現できる。 The communication unit 296 performs various controls for communicating with the outside (e.g., other terminals, servers), and its functions can be realized by hardware such as various processors or communication ASICs, programs, etc.

処理部２００（プロセッサ）は、通信部２９６を介してサーバ装置１０から取得したコンテンツデータを含むコンテンツに関する情報、取得し入力部２６０からの入力情報、又は、プログラムなどに基づいて、コンテンツ処理、表示制御、画像生成処理、或いは、音生成処理などの処理を行う。 The processing unit 200 (processor) performs processing such as content processing, display control, image generation processing, or sound generation processing based on information related to the content, including content data acquired from the server device 10 via the communication unit 296, input information acquired from the input unit 260, or a program, etc.

この処理部２００は、記憶部２７０内の主記憶部２７１をワーク領域として各種処理を行う。処理部２００の機能は各種プロセッサ（ＣＰＵ、ＤＳＰ等）、ＡＳＩＣ（ゲートアレイ等）などのハードウェアや、プログラムにより実現できる。 The processing unit 200 performs various processes using the main memory unit 271 in the memory unit 270 as a work area. The functions of the processing unit 200 can be realized by hardware such as various processors (CPU, DSP, etc.) and ASICs (gate arrays, etc.), or by programs.

処理部２００は、通信制御部２１０、Ｗｅｂブラウザ２１１、コンテンツ処理部２１２、表示制御部２１３、描画部２２０、音処理部２３０を含む。なお、これらの一部を省略する構成としてもよい。 The processing unit 200 includes a communication control unit 210, a web browser 211, a content processing unit 212, a display control unit 213, a drawing unit 220, and a sound processing unit 230. Note that some of these may be omitted.

通信制御部２１０は、サーバ装置１０、それぞれとデータを送受信する処理を行う。また、通信制御部２１０は、サーバ装置１０から受信したデータを記憶部２７０に格納する処理、受信したデータを解析する処理、その他のデータの送受信に関する制御処理等を行う。 The communication control unit 210 performs processing to transmit and receive data to and from each of the server devices 10. The communication control unit 210 also performs processing to store data received from the server device 10 in the storage unit 270, to analyze the received data, and to control processing related to the transmission and reception of other data.

なお、通信制御部２１０は、サーバの宛先情報（ＩＰアドレス、ポート番号）を情報記憶媒体２８０に記憶し、管理する処理を行うようにしてもよい。そして、通信制御部２１０は、ユーザからの通信開始の入力情報を受け付けた場合に、サーバ装置１０との通信を行うようにしてもよい。 The communication control unit 210 may store and manage the destination information (IP address, port number) of the server in the information storage medium 280. The communication control unit 210 may communicate with the server device 10 when it receives input information from the user to start communication.

特に、通信制御部２１０は、サーバ装置１０にユーザの識別情報や操作情報を送信して、コンテンツに関する情報（ユーザ情報、コンテンツ情報（音声言語データ及びテキストデータなどを含む、コンテンツデータ））、及び、ユーザのＷｅｂページをサーバ装置１０から受信する処理を行う。 In particular, the communication control unit 210 transmits user identification information and operation information to the server device 10, and performs processing to receive information about the content (user information, content information (content data including speech language data and text data, etc.)) and the user's web page from the server device 10.

なお、通信制御部２１０は、所定周期でサーバ装置１０とデータ送受信を行ってもよいし、入力部２６０からの入力情報を受け付けた場合に、サーバ装置１０とデータ送受信を行ってもよい。 The communication control unit 210 may transmit and receive data to and from the server device 10 at a predetermined interval, or may transmit and receive data to and from the server device 10 when input information is received from the input unit 260.

Ｗｅｂブラウザ２１１は、Ｗｅｂページ（コンテンツの表示画面）を閲覧するためのアプリケーションプログラムであって、Ｗｅｂサーバ（サーバ装置１０）から、ＨＴＭＬファイルや画像ファイル等をダウンロードし、レイアウトを解析して表示制御する。また、Ｗｅｂブラウザ２１１は、入力フォーム（リンクやボタンやテキストボックス等）を用いてデータをＷｅｂサーバ（サーバ装置１０）に送信する。 The web browser 211 is an application program for viewing web pages (content display screens), downloading HTML files, image files, etc. from a web server (server device 10), analyzing the layout, and controlling the display. The web browser 211 also transmits data to the web server (server device 10) using input forms (links, buttons, text boxes, etc.).

本実施形態のＷｅｂブラウザ２１１は、ブラウザコンテンツを実現することができる。例えば、Ｗｅｂブラウザ２１１は、Ｗｅｂサーバ（サーバ装置１０）から受信したＪａｖａＳｃｒｉｐｔ（登録商標）、ＦＬＡＳＨ（登録商標）、Ｊａｖａ（登録商標）等で記述されたプログラムを実行するものであってもよい。 The web browser 211 of this embodiment can realize browser content. For example, the web browser 211 may execute a program written in JavaScript (registered trademark), FLASH (registered trademark), Java (registered trademark), or the like, received from a web server (server device 10).

端末装置２０は、Ｗｅｂブラウザ２１１によって、インターネットを介してＵＲＬによって指定されたＷｅｂサーバからの情報を表示させることができる。例えば、端末装置２０は、サーバ装置１０から受信したコンテンツ（ＨＴＭＬ等のデータ）をＷｅｂブラウザ２１１によって表示させることができる。 The terminal device 20 can display information from a web server specified by a URL via the Internet using the web browser 211. For example, the terminal device 20 can display content (data such as HTML) received from the server device 10 using the web browser 211.

コンテンツ処理部２１２は、コンテンツを表示するための種々の処理を実行する。例えばコンテンツ処理部２１２は、コンテンツ開始条件が満たされた場合にコンテンツを開始する処理、コンテンツデータ及び音声言語データに基づいてコンテンツを再生制御する処理、及び、コンテンツの再生終了条件が満たされた場合にコンテンツの再生を終了する処理などがある。 The content processing unit 212 executes various processes for displaying content. For example, the content processing unit 212 executes processes such as starting the content when a content start condition is satisfied, controlling the playback of the content based on the content data and speech language data, and ending the playback of the content when a content playback end condition is satisfied.

特に、コンテンツ処理部２１２は、コンテンツデータに基づいて画像を生成しつつ、テキストデータによって示されるテキストに従って音声言語データを再生し、画像の再生に沿ってキャラクタの発話のための制御処理を実行する。 In particular, the content processing unit 212 generates images based on the content data, plays back speech language data according to the text indicated by the text data, and executes control processing for the character's speech in accordance with the playback of the images.

表示制御部２１３は、表示部２９０に表示する処理を行う。例えば、表示制御部２１３は、Ｗｅｂブラウザ２１１を用いて表示してもよい。 The display control unit 213 performs processing to display on the display unit 290. For example, the display control unit 213 may display using the web browser 211.

描画部２２０は、処理部２００で行われる種々の処理（例えば、コンテンツ処理）に基づいて描画処理を行い、これにより画像を生成し、表示制御部２１３によって表示部２９０に出力する。描画部２２０が生成する画像は、いわゆる２次元画像であってもよいし、いわゆる３次元画像であってもよい。 The drawing unit 220 performs drawing processing based on various processes (e.g., content processing) performed by the processing unit 200, thereby generating an image, which is output to the display unit 290 by the display control unit 213. The image generated by the drawing unit 220 may be a so-called two-dimensional image, or a so-called three-dimensional image.

音処理部２３０は、処理部２００で行われる種々の処理の結果に基づいて音処理を行い、ＢＧＭ、効果音、又は音声などのコンテンツ音を生成し、音出力部２９２に出力する。 The sound processing unit 230 performs sound processing based on the results of various processes performed in the processing unit 200, generates content sounds such as background music, sound effects, or voice, and outputs them to the sound output unit 292.

［４］本実施形態の手法
［４．１］概要
次に、図４及び図５を用いて本実施形態の手法（実行許可判定処理）の概要について説明する。 [4] Method of this embodiment [4.1] Overview
Next, an overview of the method (execution permission determination process) of this embodiment will be described with reference to FIG. 4 and FIG.

なお、図４及び図５は、本実施形態の手法（実行許可判定処理）を説明するための図である。 Note that Figures 4 and 5 are diagrams for explaining the method (execution permission determination process) of this embodiment.

本実施形態のサーバ装置１０は、端末装置２０と連動し、ユーザ毎に、ユーザが希望するコンテンツを当該ユーザに提供させる装置であって、コンテンツに登場するキャラクタに、ユーザの希望する音声によって発話させつつ、当該コンテンツをユーザに視聴させるための装置である。 The server device 10 of this embodiment is a device that works in conjunction with the terminal device 20 to provide each user with content that the user desires, and allows the user to view the content while having characters appearing in the content speak in the user's desired voice.

すなわち、本実施形態のサーバ装置１０は、ユーザの指示の下に、コンテンツに登場するキャラクタ（具体的には、キャラクタデータ）に音素データを割り当てるとともに、当該キャラクタのテキストに沿って割り当てた音素データによって発話するための音声言語データを生成する音声言語データ生成処理を実行し、当該生成した音声言語データ及びコンテンツデータを含むコンテンツ情報を該当する端末装置２０に提供し、コンテンツをユーザに視聴させる構成を有している。 In other words, the server device 10 of this embodiment is configured to assign phoneme data to characters (specifically, character data) appearing in the content under the instruction of a user, execute a speech language data generation process to generate speech language data for speaking using the assigned phoneme data in accordance with the text of the character, provide content information including the generated speech language data and content data to the corresponding terminal device 20, and allow the user to view the content.

そして、本実施形態のサーバ装置１０は、音声言語データを含めコンテンツを提供する際に、ユーザにおけるコストの支払いの有無によって、又は、既に支払ったコスト内に収まっているか否かによって、音声言語データの利用を制限させることが可能な構成を有している。 The server device 10 of this embodiment is configured to be able to restrict the use of speech language data when providing content including speech language data, depending on whether or not the user has paid the cost, or whether or not the cost is within the amount already paid.

具体的には、サーバ装置１０は、例えば、図４に示すように、
（Ａ１）ユーザが使用可能な音素データの情報を有するユーザ情報と、
（Ａ２）コンテンツに割り当てられる音素データを含む音素情報と、
（Ａ３）音声言語化するためのテキストがデータ化されたテキストデータ及び当該テキストを発話するキャラクタに関するキャラクタデータを含むコンテンツデータから構成されるコンテンツ情報と、
（Ａ４）音声言語データ生成処理に用いる音素データ、当該音声言語データ生成処理において音素データが割り当てられるキャラクタのキャラクタデータ若しくはテキストのテキストデータ、又は、これらの２以上のデータのコストを示すコストパラメータと、
を管理する構成を有している。 Specifically, the server device 10, for example, as shown in FIG.
(A1) user information having information on phoneme data available to a user;
(A2) phoneme information including phoneme data to be assigned to the content;
(A3) content information including content data including text data in which a text to be converted into voice language is digitized and character data regarding a character who speaks the text;
(A4) phoneme data used in the speech language data generation process, character data of a character to which the phoneme data is assigned in the speech language data generation process, or text data of a text, or a cost parameter indicating a cost of two or more of these data;
The system has a configuration for managing the above.

そして、サーバ装置１０は、例えば、図４に示すように、
（Ｂ１）プレーヤの指示又はプログラムの指示などの所与の指示に基づいて、キャラクタデータ（又は、テキストデータ）に、音素データを割り当てて、コンテンツのテキストを音声言語化するための音声言語データを生成する音声言語データ生成処理、
（Ｂ２）コンテンツのテキストに沿ってキャラクタの音声を再生出力する、該当する端末装置２０に、生成した音声言語データを提供する処理（以下、「コンテンツ提供制御処理」という。）、
（Ｂ３）コストパラメータに基づいて、音声言語データ生成処理を実行する際の実行コストを算出する実行コスト算出処理、及び、
（Ｂ４）算出処理によって算出された実行コストに対するユーザの支払いの有無に基づいて、音声言語データ生成処理、及び、コンテンツ提供制御御処理の少なくともいずれか一方の処理の実行許可を判定する実行許可判定処理、
を実行する構成を有している。 Then, the server device 10 performs the following process, for example, as shown in FIG.
(B1) A speech language data generation process for generating speech language data for converting the text of the content into speech language by allocating phoneme data to character data (or text data) based on a given instruction such as an instruction from a player or an instruction from a program;
(B2) A process of providing the generated speech language data to the corresponding terminal device 20 that reproduces and outputs the voice of the character along with the text of the content (hereinafter referred to as a “content provision control process”);
(B3) an execution cost calculation process for calculating an execution cost for executing the speech language data generation process based on the cost parameters; and
(B4) an execution permission determination process for determining whether or not at least one of the speech language data generation process and the content provision control process is permitted to be executed based on whether or not the user has paid the execution cost calculated by the calculation process;
The present invention has a configuration for executing the above.

なお、図４には、サーバ装置１０に、ユーザが使用可能な音素データの情報として音素データＩＤを含むユーザ情報と、音素データ及び各音素データに対応付けられたコストパラメータから構成される音素情報と、キャラクタデータ及びテキストデータを有するコンテンツ情報と、が管理されていることが示されている（図４の［１］データ管理を参照）。ただし、図４は、音素情報に含まれるコスト情報のみ使用した場合の例を示している。 In addition, FIG. 4 shows that the server device 10 manages user information including phoneme data IDs as information on phoneme data that the user can use, phoneme information consisting of phoneme data and cost parameters associated with each phoneme data, and content information having character data and text data (see [1] Data Management in FIG. 4). However, FIG. 4 shows an example in which only the cost information included in the phoneme information is used.

また、図４には、キャラクタ１、２及び３に割り当てる音素データＡ、Ｃ及びＥが選択され、かつ、当該選択された音素データＡ、Ｃ及びＥのコスト情報（すなわち、コストパラメータ）から実行コストが算出されたことが示されている（図４の［２］実行コスト算出処理を参照）。 Figure 4 also shows that phoneme data A, C, and E to be assigned to characters 1, 2, and 3 have been selected, and that the execution cost has been calculated from the cost information (i.e., cost parameters) of the selected phoneme data A, C, and E (see [2] Execution cost calculation process in Figure 4).

そして、図４には、算出された実行コストを端末装置２０に通知し、かつ、当該実行コストに対するユーザの支払いの有無（支払い済み又は未払い）を検出し、当該検出によって音声言語データ生成処理の実行の可否を判定すること（図４の［３］及び［４］の処理を参照）、及び、当該ユーザの支払いがあったことを検出した場合に、キャラクタ１、２及び３に音素データＡ、Ｃ及びＥを割り当てて音声言語データを生成指示及びその実行をし、かつ、当該生成された音声言語データを端末装置２０に提供されることが示されている（図４の［５］及び［６］の処理を参照）。 Figure 4 shows that the calculated execution cost is notified to the terminal device 20, and whether or not the user has paid the execution cost (paid or unpaid) is detected, and whether or not the speech language data generation process can be executed is determined based on the detection (see processes [3] and [4] in Figure 4), and if it is detected that the user has paid, phoneme data A, C, and E are assigned to characters 1, 2, and 3, and an instruction to generate speech language data is given and the generation is executed, and the generated speech language data is provided to the terminal device 20 (see processes [5] and [6] in Figure 4).

特に、図４の音声言語生成処理としては、キャラクタ１（キャラクタデータ１及びテキストデータ１）、キャラクタ２（キャラクタデータ２及びテキストデータ２）及びキャラクタ３（キャラクタデータ３及びテキストデータ３）にそれぞれ音素データＡ、音素データＣ及び音素データＥが割り当ててられて、音声言語データＡ、音声言語データＥ及び音声言語データＣが生成されている。 In particular, in the speech language generation process of FIG. 4, phoneme data A, phoneme data C, and phoneme data E are assigned to character 1 (character data 1 and text data 1), character 2 (character data 2 and text data 2), and character 3 (character data 3 and text data 3), respectively, and speech language data A, speech language data E, and speech language data C are generated.

一方、サーバ装置１０は、上記の（Ｂ４）の処理に代えて、例えば、図５に示すように、実行許可判定処理において、算出処理によって算出された音素データにおける実行コストと、予め定められた限界値（すなわち、コストの上限値）を比較し、所定の関係性情報
を具備した場合には、音声言語データ生成処理、及び、コンテンツ提供制御御処理の少なくともいずれか一方の処理の実行許可を判定する実行許可判定処理を実行してもよい、 On the other hand, instead of the process of (B4) above, the server device 10 may, for example, as shown in FIG. 5, in an execution permission determination process, compare the execution cost of the phoneme data calculated by the calculation process with a predetermined limit value (i.e., the upper limit value of the cost), and if the execution cost includes predetermined relationship information, execute an execution permission determination process to determine whether or not to permit the execution of at least one of the speech language data generation process and the content provision control process.

なお、図５には、サーバ装置１０に、図４に示す例と同様に、ユーザが使用可能な音素データの情報として音素データＩＤが含まれたユーザ情報と、音素データ及び各音素データに対応付けられたコストパラメータから構成される音素情報と、キャラクタデータ及びテキストデータが含まれるコンテンツ情報と、が管理されていることが示されている（図５の［１］データ管理を参照）。 In addition, FIG. 5 shows that the server device 10 manages user information including a phoneme data ID as information on phoneme data available to the user, phoneme information consisting of phoneme data and cost parameters associated with each phoneme data, and content information including character data and text data, similar to the example shown in FIG. 4 (see [1] Data Management in FIG. 5).

また、図５には、図４に示す例と同様に、キャラクタ１、２及び３に割り当てる音素データＡ、Ｃ及びＥが選択され、かつ、当該選択された音素データＡ、Ｃ及びＥのコスト情報（コストパラメータ）から実行コストが算出されたことが示されている（図５の［２］データ管理を参照）。 Also, in FIG. 5, similar to the example shown in FIG. 4, phoneme data A, C, and E to be assigned to characters 1, 2, and 3 are selected, and the execution cost is calculated from the cost information (cost parameters) of the selected phoneme data A, C, and E (see [2] Data Management in FIG. 5).

そして、図５には、実行コストと限界値とを比較して音声言語データ生成処理の実行の可否を判定すること（図５の［４］の処理を参照）、及び、実行コストが限界値よりも小さい場合に、キャラクタ１、２及び３に音素データＡ、Ｃ及びＥを割り当てて音声言語データを生成指示及びその実行をし、かつ、当該生成された音声言語データを端末装置２０に提供されることが示されている（図５の［５］及び［６］の処理を参照）。 Figure 5 shows that the execution cost is compared with a limit value to determine whether or not to execute the speech language data generation process (see process [4] in Figure 5), and if the execution cost is less than the limit value, phoneme data A, C, and E are assigned to characters 1, 2, and 3, and an instruction to generate speech language data is issued and executed, and the generated speech language data is provided to the terminal device 20 (see processes [5] and [6] in Figure 5).

特に、図５の音声言語生成処理としては、図４に示す例と同様に、キャラクタ１（キャラクタデータ１及びテキストデータ１）、キャラクタ２（キャラクタデータ２及びテキストデータ２）及びキャラクタ３（キャラクタデータ３及びテキストデータ３）にそれぞれ音素データＡ、音素データＣ及び音素データＥが割り当ててられて、音声言語データＡ、音声言語データＥ及び音声言語データＣが生成されている。 In particular, in the speech language generation process of FIG. 5, similar to the example shown in FIG. 4, phoneme data A, phoneme data C, and phoneme data E are assigned to character 1 (character data 1 and text data 1), character 2 (character data 2 and text data 2), and character 3 (character data 3 and text data 3), respectively, and speech language data A, speech language data E, and speech language data C are generated.

本実施形態においては、このような構成を有することによって、ユーザにおけるコストの支払いの有無によって音声言語データ及びコンテンツの利用を制限させることができるので、コンテンツや音素データを提供する事業者の収益を確保することができるとともに、従来のコンテンツ提供者によって予め定められたキャラクタ（声優や俳優などの）の音声に制限されること無く、純粋にユーザの好きな音声によってコンテンツの聴取や視聴を実行させることができるようになっている。 In this embodiment, by having such a configuration, the use of speech language data and content can be restricted depending on whether or not the user pays the cost, which ensures profits for the business that provides the content and phoneme data, and allows the user to listen to and watch content using the voice of their choice, without being limited to the voice of a character (such as a voice actor or actor) predetermined by a conventional content provider.

また、本実施形態においては、例えば、予めユーザが既に支払った範囲内か否か（サブスクリプションなどの予め定められた支払い額の範囲内か否か）によって、音声言語データの利用を制限させることができるので、コンテンツや音素データを提供する事業者の収益をも確保することができるとともに、従来のコンテンツ提供者によって予め定められたキャラクタ（声優や俳優などの）の音声に制限されること無く、純粋にユーザの好きな音声によってコンテンツの聴取や視聴を実行させることができるようになっている。 In addition, in this embodiment, the use of speech language data can be restricted depending on whether or not it is within the range of what the user has already paid (whether or not it is within the range of a predetermined payment amount such as a subscription), for example, which ensures profits for businesses that provide content and phoneme data, and allows users to listen to and watch content purely using their favorite voice, without being limited to the voices of characters (such as voice actors and actors) predetermined by conventional content providers.

したがって、本実施形態においては、コンテンツそもそもの制作費（すなわち、予算）などの現実的な状況を含めて様々な制限を排除し、よりリアルで聴取者や視聴者などの好みに合致した配役体験を提供し、ユーザのコンテンツに対する興趣性を向上させることができるようになっている。 Therefore, in this embodiment, various limitations, including realistic circumstances such as the production costs (i.e., budget) of the content itself, are eliminated, and a more realistic casting experience that matches the preferences of listeners and viewers is provided, thereby increasing the user's interest in the content.

［４．２］コンテンツ情報等
次に、図６を用いて、本実施形態のコンテンツ情報、音素情報及びユーザ情報について説明する。 [4.2] Content Information, etc. Next, the content information, phoneme information, and user information of this embodiment will be described with reference to FIG.

なお、図６は、本実施形態のコンテンツ情報記憶部１４４、又は、音素情報記憶部１４
６にそれぞれ記憶されコンテンツ情報、又は、音素情報の一例を示す図であり、図７は、本実施形態のユーザ情報記憶部１４８に記憶されユーザ情報の一例を示す図である。 6 shows the content information storage unit 144 or the phoneme information storage unit 14
FIG. 7 is a diagram showing an example of the content information or phoneme information stored in the user information storage unit 148 of the present embodiment.

（コンテンツ情報）
各コンテンツ情報は、それぞれ、例えば、映画、漫画、ゲーム、アニメーション、又は、小説などのユーザが視聴するコンテンツに関する情報であって、端末装置２０によって視聴可能な各種のデータ及び情報を有しており、コンテンツ情報記憶部１４４に記憶され、かつ、コンテンツ管理部１０５によって管理される。 (Content information)
Each piece of content information is information about content that a user views, such as movies, manga, games, animations, or novels, and has various data and information that can be viewed by the terminal device 20, is stored in the content information storage unit 144, and is managed by the content management unit 105.

例えば、各コンテンツ情報には、図６（Ａ）に示すように、コンテンツＩＤに対応付けて、
（Ａ１）画像データ、キャラクタデータ及びテキストがデータ化されたテキストデータを含むコンテンツデータ、
（Ａ２）コンテンツが視聴される際のコスト（アイテムやポイントなどの消費量を含む。）が規定されているコンテンツコスト情報及び各コンテンツにおける発話音声データ生成処理時などに設定されている実行コストの限界値を示す限界値情報を含むコスト関連情報、
（Ａ３）コンテンツデータに音声言語データを割り当てて、コンテンツデータ及びテキストデータを含めて、端末装置２０において再生させるための再生制御データから構成される再生制御情報、及び、
（Ａ４）題名、あらすじ及び予告編や広告のためのコンテンツなどの書誌的な情報であって付加的な情報（以下、「付加情報」という。）、
などが含まれる。 For example, as shown in FIG. 6A, each piece of content information is associated with a content ID and includes the following:
(A1) Content data including image data, character data, and text data in which text has been digitized;
(A2) Cost-related information including content cost information that specifies the cost (including consumption of items, points, etc.) when viewing content and limit value information that indicates the limit value of the execution cost set at the time of the speech voice data generation process for each content,
(A3) playback control information including playback control data for assigning speech language data to the content data and playing the content data and text data on the terminal device 20; and
(A4) Additional information such as the title, synopsis, trailer, and advertising content (hereinafter referred to as "additional information");
etc.

特に、キャラクタデータとしては、テキストをセリフとして発するキャラクタのデータであり、例えば、映画、漫画、ゲーム、アニメーション、又は、小説などのコンテンツ内に登場するキャラクタの属性に関する情報（すなわち、属性情報）が規定されたデータを示す。 In particular, character data refers to data about characters who speak text, and indicates data that specifies information about the attributes of characters that appear in content such as movies, manga, games, animation, or novels (i.e., attribute information).

例えば、属性には、動物・ロボット・人間などのキャラクタの種別や役割（役どころ）、キャラクタの性別や年齢、キャラクタの特徴（性格）、キャラクタが使用する方言やテキストの言語（他言語）の種別、及び、キャラクタの人気度などの属性が含まれる。 For example, attributes include the character's type (such as animal, robot, or human) and role (role), the character's gender and age, the character's characteristics (personality), the dialect or language of the text (other languages) that the character uses, and the character's popularity.

また、テキストデータは、コンテンツに登場する各キャラクタのセリフや各シーンを説明するための文章（例えば、ト書き）、１又は２以上の文章、又は、章・ページ・段落・節などの区切り毎のテキストのデータである。 Text data may be lines from each character appearing in the content, sentences explaining each scene (e.g., stage directions), one or more sentences, or data on text for each division such as a chapter, page, paragraph, or section.

なお、テキストデータには、形態素解析、構文解析、意味解析及び文脈解析などの自然言語処理が既に実行されて、その解析結果に関する情報（以下、「テキスト解析情報」という。）、及び、当該テキストが用いられるキャラクタの属性（すなわち、当該テキストを発話するキャラクタに関する属性）を示す情報（すなわち、属性情報）が規定されていてもよい。 Note that the text data may have already undergone natural language processing such as morphological analysis, syntactic analysis, semantic analysis, and contextual analysis, and information on the analysis results (hereinafter referred to as "text analysis information") and information indicating the attributes of the character for which the text is used (i.e., the attributes of the character speaking the text) (i.e., attribute information) may be specified.

例えば、テキスト解析情報には、単語、文字や文字列、及び、文章などの各テキストにおける品詞に関する情報、係り受けに関する情報、意味を示す情報、及び、推定された代名詞や省略された名詞の対象に関する情報などが含まれる。 For example, text analysis information includes information about parts of speech in each text, such as words, characters, strings of characters, and sentences, information about dependencies, information indicating meaning, and information about the targets of inferred pronouns and omitted nouns.

すなわち、テキストデータには、各テキストに関する品詞、係り受け、意味、代名詞や省略された対象などの各情報を有していてもよい。 In other words, the text data may contain information about each text, such as parts of speech, dependencies, meanings, pronouns, and omitted objects.

一方、コンテンツコスト情報において、キャラクタデータを規定するコスト情報は、コンテンツに登場する全てのキャラクタに対して規定されたパラメータ（すなわち、コストパラメータ）であってもよいし、メインのキャラクタに対して、又は、コンテンツ上、重要なキャラクタに対して規定されたコストパラメータであってもよい。 On the other hand, in the content cost information, the cost information defining the character data may be parameters (i.e., cost parameters) defined for all characters appearing in the content, or may be cost parameters defined for the main character or for characters that are important in the content.

また、コンテンツコスト情報において、テキストデータを規定するコスト情報は、コンテンツデータ全体のみならず、シーン毎、セリフ毎、及び、キャラクタ毎などの予め定められた部分毎に規定されていてもよい。 In addition, in the content cost information, the cost information defining the text data may be defined not only for the entire content data, but also for each predetermined part, such as for each scene, each line, or each character.

そして、コンテンツコスト情報は、コンテンツ視聴サービスで用いられるサービス内通貨若しくは当該サービスで用いるアイテム（例えば、数、又は、種別とその数）に基づいて規定される消費量などのコスト、又は、コストに対応する課金額が規定されているパラメータ（以下、「コンテンツコストパラメータ」ともいう。）である。 The content cost information is a cost such as a consumption amount defined based on the in-service currency used in the content viewing service or items used in the service (e.g., the number, or the type and the number of items), or a parameter that defines the charge amount corresponding to the cost (hereinafter also referred to as a "content cost parameter").

なお、本実施形態においては、コンテンツデータには、画像に関するデータが無く、テキストデータ及びキャラクタデータから構成されるものが含まれる。 In this embodiment, content data does not include image data, but includes text data and character data.

また、本実施形態のコンテンツ情報は、コンテンツ情報記憶部１４４に記憶されているが、図示ししない他のデータベースから取得してもよい。 In addition, the content information in this embodiment is stored in the content information storage unit 144, but may also be obtained from another database (not shown).

（音素情報）
各音素情報は、それぞれ、例えば、声優、俳優、又は、アナウンサーなどの発話者から予め採取されて生成された音素に関する情報であって、コンテンツのキャラクタをテキストに基づいて、発話させる際に用いる音声言語データを生成する際に用いる情報である。 (Phoneme information)
Each piece of phoneme information is information about a phoneme that has been collected and generated in advance from a speaker, such as a voice actor, actor, or announcer, and is used when generating speech language data to be used when making the characters of the content speak based on text.

例えば、各音素情報には、図６（Ｂ）に示すように、音素データＩＤに対応付けて、
（Ｂ１）子音・母音・半母音などの分節音素、当該分節音素の関係性を示す声調（トーン）・イントネーションを含む音の高さ、強勢やアクセント、方言の種別、言語の種別（日本語や英語）、及び、文字間における子音と母音のつながり（すなわち、連接要素）などが規定されている音素データ、
（Ｂ２）音素データが音声言語データ生成処理に用いられる際のコスト（アイテムなどの消費量や課金額）が規定されている音素コスト情報、及び、
（Ｂ３）音素情報をユーザに説明するため書誌的な情報などの付加的な情報（以下、「付加情報」ともいう。）、
が含まれる。 For example, as shown in FIG. 6B, each piece of phoneme information is associated with a phoneme data ID, and is stored as follows:
(B1) Phoneme data that specifies segmented phonemes such as consonants, vowels, and semivowels, pitch including tones and intonations that indicate the relationships between the segmented phonemes, stress and accent, dialect type, language type (Japanese or English), and connections between consonants and vowels between characters (i.e., connecting elements), etc.
(B2) phoneme cost information that specifies the cost (amount of items consumed or amount charged) when the phoneme data is used in the speech language data generation process; and
(B3) Additional information such as bibliographic information to explain the phoneme information to the user (hereinafter also referred to as "additional information");
Includes:

なお、音素コスト情報は、コンテンツコスト情報と同様に、コンテンツ視聴サービスで用いられるサービス内通貨若しくは当該サービスで用いるアイテム（例えば、数、又は、種別とその数）に基づいて規定される消費量などのコスト、又は、当該コストに対応する課金額が規定されているパラメータ（以下、「音素コストパラメータ」ともいう。）である。 Note that, like content cost information, phoneme cost information is a parameter (hereinafter also referred to as a "phoneme cost parameter") that specifies a cost such as a consumption amount defined based on the in-service currency used in the content viewing service or items used in the service (e.g., the number, or the type and the number of items) or the billing amount corresponding to the cost.

（ユーザ情報）
ユーザ情報には、図７に示すように、ユーザ毎に
（Ｃ１）ユーザのニックネームやユーザＩＤ、
（Ｃ２）現在のランク、ポイント、経験値、エネルギーパラメータ値（ライフエネルギー値、体力値やパワー値でコンテンツ視聴サービスなどにゲーム的な要素が含まれている場合など）などの属性に関する情報（以下、「属性情報」ともいう。）、
（Ｃ３）使用可能なコンテンツデータ、キャラクタデータ、テキストデータ及び音素情報に関する利用可能であることを示す情報（以下、「利用可能情報」といい、例えば、音素
ＩＤ又はコンテンツＩＤに対応付けて視聴回数などの数な制限、視聴期間などの時期的制限又はユーザレベルなどのユーザ毎の個別的な制限を示す情報）、
（Ｃ４）コンテンツ視聴サービスへの支払い状況などを含む。当該サービスに関する課金履歴及び課金額などの課金に関する情報（ユーザコスト情報）、及び、支払い制限などの発話音声データ生成処理時などの限界値を示す情報（以下、「限界値情報」ともいう。）と、
（Ｃ５）コンテンツ視聴サービスへのログインの回数、その時間及びその頻度などのログイン履歴に関する情報（以下、「アクセス履歴情報」という。）、
（Ｃ６）登録されたフレンドやフォロワーなどの一定の関係性を有する他のユーザ（以下、「関連ユーザ」ともいう。）に関する情報（以下、「関連ユーザ情報」という。）、
などが記憶される。 (User Information)
As shown in FIG. 7, the user information includes, for each user, (C1) the user's nickname and user ID,
(C2) Information regarding attributes such as current rank, points, experience points, energy parameter values (life energy values, physical strength values, and power values in cases where game-like elements are included in the content viewing service, etc.) (hereinafter also referred to as "attribute information");
(C3) Information indicating availability of available content data, character data, text data, and phoneme information (hereinafter referred to as "availability information", for example, information indicating numerical restrictions such as the number of times of viewing, time restrictions such as viewing period, or individual restrictions for each user such as user level, associated with a phoneme ID or content ID);
(C4) Including the payment status for the content viewing service, etc. Information on charges such as the charge history and charge amount for the service (user cost information), and information indicating limit values such as payment restrictions at the time of speech voice data generation processing (hereinafter also referred to as "limit value information");
(C5) Information regarding the login history, such as the number of logins to the content viewing service, the time and frequency of logins (hereinafter referred to as "access history information");
(C6) Information regarding other users with whom you have a certain relationship, such as registered friends or followers (hereinafter also referred to as "related users") (hereinafter referred to as "related user information");
etc. are stored.

［４．３］音声言語データ生成処理
次に、本実施形態の音声言語データ生成処理について説明する。 [4.3] Speech Language Data Generation Processing Next, the speech language data generation processing of this embodiment will be described.

（音声言語データ生成処理の概要）
発話音声生成処理部１０６は、実行許可判定処理の判定結果を前提に、プレーヤの指示に基づいて、又は、プログラムに従って、コンテンツデータに含まれるキャラクタデータ（すなわち、キャラクタ）に、音素情報に含まれる音素データを割り当てて、コンテンツのテキストを音声言語化するための音声言語データを生成する音声言語データ生成処理を実行する。 (Outline of speech language data generation process)
The speech generation processing unit 106 executes a speech language data generation process, based on the result of the execution permission determination process, and in accordance with the player's instructions or a program, in which the phoneme data contained in the phoneme information is assigned to character data (i.e., characters) contained in the content data, to generate speech language data for converting the text of the content into speech language.

具体的には、発話音声生成処理部１０６は、プレーヤの指示によって、又は、プログラムに従って自動的に、コンテンツ情報が選択されると、当該選択されたコンテンツ情報からコンテンツに登場するすべてのキャラクタ又は音素データを割り当て可能なキャラクタの情報（すなわち、キャラクタ情報）を抽出する。 Specifically, when content information is selected by the player or automatically according to a program, the speech generation processing unit 106 extracts information on all characters that appear in the content or characters to which phoneme data can be assigned (i.e., character information) from the selected content information.

そして、発話音声生成処理部１０６は、抽出した各キャラクタ情報に基づいて、プレーヤに音素データを割り当てるキャラクタの種別や当該キャラクタに関する情報を含む、キャラクタ選択情報をそれぞれ生成し、選択可能な各キャラクタをプレーヤに選択可能に提示させるために、当該生成したキャラクタ選択情報を送信する。 Then, based on each extracted character information, the speech generation processing unit 106 generates character selection information including the type of character for which phoneme data is to be assigned to the player and information about the character, and transmits the generated character selection information so that each selectable character is presented to the player for selection.

また、発話音声生成処理部１０６は、プレーヤが割り当て可能な音素データの発話音素情報を取得し、当該音素データの種別や特徴を示す情報を含む、割り当て可能音素選択情報を生成し、割り当て可能な音素データをプレーヤに選択可能に提示させるために、当該生成した割り当て音素選択情報を送信する。 The speech generation processing unit 106 also acquires speech phoneme information of phoneme data that can be assigned by the player, generates assignable phoneme selection information including information indicating the type and characteristics of the phoneme data, and transmits the generated assigned phoneme selection information so that the assignable phoneme data can be presented to the player in a selectable manner.

そして、発話音声生成処理部１０６は、プレーヤによって選択されたキャラクタと当該キャラクタに割り当てを希望する音素データとの組み合わせを示す組み合わせ情報を取得すると、当該組み合わせ情報に基づいて該当する音素データと、該当するキャラクタにおける各テキストを示すキャラクタデータと、に基づいて、選択されたキャラクタのテキストを音声言語化するための音声言語データを生成する音声言語データ生成処理を実行する。 Then, when the speech generation processing unit 106 acquires combination information indicating the combination of the character selected by the player and the phoneme data desired to be assigned to that character, it executes a speech language data generation process that generates speech language data for speechifying the text of the selected character based on the corresponding phoneme data based on the combination information and character data indicating each text for the corresponding character.

特に、このときに、実行許可判定処理が実行され、音声言語データ生成処理の実行が許可された場合（ユーザによって実行コストの支払いが実行されたと判定された場合又は当該実行コストが限界値内であると判定された場合）に、発話音声生成処理部１０６は、音声言語データ生成処理を、プレーヤが各キャラクタデータに割り当てた音素データに基づいて、各キャラクタの発話音声となる音声言語データを生成する音声言語データ生成処理を実行する。 In particular, at this time, an execution permission determination process is executed, and if execution of the speech language data generation process is permitted (if it is determined that the execution cost has been paid by the user or if it is determined that the execution cost is within the limit value), the speech language generation processing unit 106 executes the speech language data generation process to generate speech language data that will become the speech of each character based on the phoneme data assigned by the player to each character data.

（音声言語データ生成処理の原理）
発話音声生成処理部１０６は、音声言語データ生成処理としては、プレーヤに選択されたキャラクタのテキストにおける解析情報を取得し、又は、当該選択されたキャラクタのテキストに対して自然言語処理（すなわち、形態素解析や構文解析など）などの所定の解析を実行して当該解析情報を取得する。 (Principles of speech data generation processing)
As part of the speech language data generation process, the speech generation processing unit 106 obtains analytical information on the text of the character selected by the player, or performs a predetermined analysis such as natural language processing (i.e., morphological analysis, syntactic analysis, etc.) on the text of the selected character to obtain the analytical information.

そして、発話音声生成処理部１０６は、選択された音素データに基づきつつ、該当するテキストにおける解析情報に従って、かつ、各テキスト（すなわち、文字列や個々の文字）に沿って、子音・母音・半母音などの分節音素を割り当てつつ、当該分節音素の関係性を示す声調（トーン）・イントネーションを含む音の高さ、強勢やアクセント、方言の種別、言語の種別（日本語や英語）、及び、文字間における子音と母音のつながり（すなわち、連接要素）などを調整し、発話音声言語データを生成する。 Then, the speech generation processing unit 106 assigns segmented phonemes such as consonants, vowels, and semivowels based on the selected phoneme data, in accordance with the analysis information in the corresponding text, and in line with each text (i.e., character strings or individual characters), and adjusts the pitch including tone and intonation that indicate the relationship between the segmented phonemes, stress and accent, dialect type, language type (Japanese or English), and the connections between consonants and vowels between characters (i.e., connecting elements), etc., to generate speech language data.

具体的には、発話音声生成処理部１０６は、音素データをテキストに割り当てることによって発話音（すなわち、音声としての発話される音）を構築して発話音声データを生成する場合に、テキストの解析情報に基づいて、分節音素の割り当て、及び、声調（トーン）・イントネーションを含む音の高さ、強勢やアクセント、及び、連接などの調整を実行する際に、解析情報を含めて機械学習などのＡＩ技術を用いて発話音声言語データを生成する。 Specifically, when the speech generation processing unit 106 generates speech data by constructing speech sounds (i.e., sounds spoken as voice) by allocating phoneme data to text, it allocates segmental phonemes and adjusts pitch, stress, accent, and conjunctions, including tone and intonation, based on text analysis information, and generates speech language data using AI techniques such as machine learning, including the analysis information.

すなわち、発話音声生成処理部１０６は、文字列や個々の文字への音素（分節音素）の割り当て、及び、音の高さ・強勢やアクセント・連接などの調整を行う際に、自然言語処理における解析情報とともに、音素データの割り当て先のキャラクタの属性、又は、テキストの属性に基づいて予め生成された学習可能な当該音素データのデータモデルから構成される人工知能（ＡＩ：Artificial Intelligent）の技術を用いた音素データのモデル情報を用いる。 In other words, when the speech generation processor 106 assigns phonemes (segmented phonemes) to character strings or individual characters, and adjusts pitch, stress, accent, connection, etc., it uses analysis information from natural language processing, as well as phoneme data model information using artificial intelligence (AI) technology that is composed of a learnable data model of the phoneme data that is generated in advance based on the attributes of the character to which the phoneme data is assigned, or the attributes of the text.

特に、発話音声生成処理部１０６は、コンテンツ、キャラクタ又はテキストなどの各属性（特に、キャラクタの属性及びテキストの属性）に対応付けて各音素の変化量や変化態様などの特徴量を抽出し、当該抽出した特徴量について機械学習をすることによって生成された音素データのモデル情報を用いる。 In particular, the speech generation processing unit 106 extracts features such as the amount of change or the manner of change of each phoneme in association with each attribute of the content, character, text, etc. (in particular, the attributes of the character and the attributes of the text), and uses model information of the phoneme data generated by performing machine learning on the extracted features.

すなわち、発話音声生成処理部１０６は、自然言語処理における解析情報に基づきつつ、キャラクタの属性、及び、テキストの属性の少なくともいずれか一方の属性に基づいて生成された音素データのデータモデルを示すモデル情報に従って、テキストを発話者の音声にするための音声言語データを生成する音声言語データ生成処理を実行する。 In other words, the speech generation processing unit 106 executes a speech language data generation process that generates speech language data for converting text into the voice of a speaker, based on analysis information in natural language processing and in accordance with model information indicating a data model of phoneme data generated based on at least one of the attributes of the character and the attributes of the text.

例えば、発話音声生成処理部１０６は、既に、評価された発話音声言語データ（すなわち、発話された音）を教師データとして用いるサポートベクターマシンやニューラルネットワーク（例えば、再帰型ニューラルネットワーク）などのディープラーニングを含む機械学習、又は、ＧＡＮ（敵対的生成ネットワーク）やアソシエーション分析などの教師データ無しのディープラーニングを含む機械学習が実行された音素データのモデル情報を用いる。 For example, the speech generation processing unit 106 uses model information of phoneme data on which machine learning including deep learning such as support vector machines and neural networks (e.g., recurrent neural networks) that use evaluated speech language data (i.e., spoken sounds) as training data, or machine learning including deep learning without training data such as GAN (generative adversarial network) and association analysis, has been performed.

そして、発話音声生成処理部１０６は、このような音素データのモデル情報を用いて発話音声データ生成処理を実行する。 The speech generation processing unit 106 then uses the model information of this phoneme data to perform the speech data generation process.

一方、発話音声生成処理部１０６は、当該生成した音声言語データに基づいて音素データのモデル情報を学習させる学習処理を実行する。 On the other hand, the speech generation processing unit 106 executes a learning process to learn model information for phoneme data based on the generated speech language data.

すなわち、発話音声生成処理部１０６は、当該発話音声言語データを生成する毎に、当該モデル情報を学習させて新たなモデル情報を生成して更新し、更新したモデル情報を用いて次回以降の発話音声言語データの生成に用いている。 In other words, each time the speech speech generation processing unit 106 generates the speech language data, it learns the model information to generate new model information and updates it, and uses the updated model information to generate the next and subsequent speech language data.

（音素データの割り当てのないキャラクタの取り扱い）
発話音声生成処理部１０６は、割り当て可能な全てのキャラクタに対して音素情報との組み合わせを示す組み合わせ情報を取得できなかった場合（すなわち、ユーザによって音素情報の割り当てを希望しないキャラクタが存在した場合）には、デフォルトとして予め定められた音素データを当該キャラクタに割り当ててもよいし、発話音素情報及びキャラクタ情報に基づいてキャラクタの特徴と音素データの特徴とによってマッチングを実行して特定の音素データを当該キャラクタに割り当ててもよい。 (Handling characters with no phoneme data assigned)
If the speech generation processing unit 106 is unable to obtain combination information indicating combinations with phoneme information for all assignable characters (i.e., if there is a character to which the user does not wish to assign phoneme information), it may assign predetermined phoneme data as a default to the character, or it may perform matching between the character's characteristics and the phoneme data characteristics based on the speech phoneme information and character information to assign specific phoneme data to the character.

すなわち、発話音声生成処理部１０６は、ユーザの指示に基づいて、音素データが割り当てられていないキャラクタを特定キャラクタとして検出した場合には、当該特定キャラクタに、予め定められた音素データを設定してもよい。 In other words, when the speech generation processing unit 106 detects a character to which no phoneme data is assigned as a specific character based on a user instruction, it may set predetermined phoneme data to the specific character.

［４．４］実行コストに基づく実行許可判定処理
［４．４．１］実行コスト算出処理を含む実行コストに基づく実行許可判定処理
次に、図８及び図９を用いて、本実施形態の実行コスト算出処理を含む実行コストに基づく実行許可判定処理について説明する。 [4.4] Execution permission judgment processing based on execution cost [4.4.1] Execution permission judgment processing based on execution cost including execution cost calculation processing Next, using Figures 8 and 9, the execution permission judgment processing based on execution cost including the execution cost calculation processing of this embodiment will be described.

なお、図８及び図９は、本実施形態の実行コスト算出処理を含む実行コストに基づく実行許可判定処理を説明するための図である。 Note that Figures 8 and 9 are diagrams for explaining the execution permission determination process based on the execution cost, including the execution cost calculation process of this embodiment.

（基本原理）
コスト管理部１０７は、音素データなどのコストパラメータの管理を前提にしつつ、当該音素データなどの実行コスト（アイテムなどの消費量）に対するユーザの支払いの有無に基づいて、発話音声データ生成処理の実行の可否を判定する実行許可判定処理を実行する。 (Basic Principles)
The cost management unit 107 performs an execution permission determination process that determines whether or not to execute the speech voice data generation process based on whether or not the user has paid the execution cost (consumption of items, etc.) of the phoneme data, while assuming management of cost parameters such as phoneme data.

すなわち、コスト管理部１０７は、ユーザによって実行コストの支払いが無い場合には音声言語データの生成などを実行せず、当該ユーザによって実行コストの支払いがある場合には音声言語データの生成などを実行させるため、このような実行許可判定処理を実行する構成を有している。 In other words, the cost management unit 107 is configured to perform such an execution permission determination process so as not to execute the generation of speech language data, etc., if the execution cost is not paid by the user, but to execute the generation of speech language data, etc., if the execution cost is paid by the user.

具体的には、コスト管理部１０７は、生成処理に用いる音素データ、キャラクタデータ及びテキストデータのうち、いずれか１のデータの使用に関するコストが規定されたコストパラメータを管理する。 Specifically, the cost management unit 107 manages a cost parameter that specifies the cost associated with using any one of the phoneme data, character data, and text data used in the generation process.

特に、コスト管理部１０７は、コンテンツ情報記憶部１４４に記憶されているコンテンツコスト情報に含まれる各キャラクタのコストパラメータ、及び、各テキスト（コンテンツ全体のテキストやその一部のテキストを含む。）のコストパラメータ（基準値）を管理する。 In particular, the cost management unit 107 manages the cost parameters (reference values) of each character included in the content cost information stored in the content information storage unit 144, and the cost parameters (reference values) of each text (including the text of the entire content or part of the text).

また、コスト管理部１０７は、音素情報記憶部１４６に記憶されている音素コスト情報に含まれる各音素データの使用に関するコストパラメータ（基準値）を管理する。 The cost management unit 107 also manages cost parameters (reference values) related to the use of each phoneme data included in the phoneme cost information stored in the phoneme information storage unit 146.

そして、コスト管理部１０７は、上述した音声言語データ生成処理の実行時に、ユーザによって選択された、又は、プログラムによって従って自動的に選択された、音素データ
、キャラクタデータ、テキストデータ、又は、これらの２以上の組み合わせのそれぞれのコストパラメータ（基準値）に基づいて、当該音声言語データ生成処理のトータルのコスト（すなわち、実行コスト）を算出する算出処理（すなわち、実行コスト算出処理）を実行する。 Then, the cost management unit 107 executes a calculation process (i.e., an execution cost calculation process) to calculate the total cost (i.e., the execution cost) of the speech language data generation process described above based on the respective cost parameters (reference values) of the phoneme data, character data, text data, or a combination of two or more of these selected by the user or automatically selected according to the program when the speech language data generation process is executed.

そして、コンテンツ管理部１０５は、このように算出した実行コストに基づいて、該当するユーザに対して、所定の方法による支払いを要求（以下、「実行コスト支払い要求」ともいう。）し、当該ユーザの支払いの有無に基づいて、音声言語データ生成処理の実行可否を判定する実行許可判定処理を実行する。 Then, based on the execution cost calculated in this manner, the content management unit 105 requests the relevant user to make payment in a predetermined manner (hereinafter also referred to as an "execution cost payment request"), and executes an execution permission determination process that determines whether or not the speech language data generation process can be executed based on whether or not the user has made payment.

（実行コスト算出処理）
コスト管理部１０７は、実行コスト算出処理としては、キャラクタ又はテキストに割り当てる音素データがユーザによって又はプログラムによって１以上選択された場合に、当該選択された各音素データに、又は、当該各音素データが割り当てられると想定されるそれぞれのキャラクタデータ若しくはテキストデータに対応付けて管理されているコストパラメータを読み出す。 (Execution cost calculation process)
In the execution cost calculation process, when one or more phoneme data to be assigned to a character or text are selected by a user or by a program, the cost management unit 107 reads out cost parameters managed in association with each selected phoneme data or with each character data or text data to which each phoneme data is expected to be assigned.

そして、コスト管理部１０７は、読み出した各コストパラメータを合算など所与の演算を実行することによって実行コストを算出する。 Then, the cost management unit 107 calculates the execution cost by performing a given calculation, such as adding up each of the read cost parameters.

特に、コスト管理部１０７は、実行コスト算出処理としては、例えば、図８に示すように、ユーザが割り当てを希望する各音素データに規定されているコストパラメータに基づいて所定の演算（例えば、合算）を実行し、その演算結果を実行コストとして算出する処理（以下、「音素データ実行コスト算出処理」という。）を行う。 In particular, as an execution cost calculation process, the cost management unit 107 performs a predetermined calculation (e.g., summation) based on the cost parameters defined for each phoneme data that the user wishes to assign, and calculates the result of the calculation as the execution cost (hereinafter referred to as the "phoneme data execution cost calculation process"), as shown in FIG. 8.

例えば、図８に示すように、ユーザによって音素データＡ（コスト：５０ポイント）、音素データＢ（コスト：５０ポイント）、音素データＣ（コスト：４０ポイント）、音素データＤ（コスト：６０ポイント）及び音素データＥ（コスト：４５ポイント）が使用可能な状態であって、そのうち、音素データＡ、Ｃ及びＥが選択された場合を想定する。 For example, as shown in FIG. 8, assume that phoneme data A (cost: 50 points), phoneme data B (cost: 50 points), phoneme data C (cost: 40 points), phoneme data D (cost: 60 points), and phoneme data E (cost: 45 points) are available to the user, and that phoneme data A, C, and E are selected from among them.

この場合には、コスト管理部１０７は、図８に示すように、実行コスト算出処理を実行し、実行コストとして、１３５ポイントを算出する。 In this case, the cost management unit 107 executes the execution cost calculation process as shown in FIG. 8, and calculates 135 points as the execution cost.

一方、コスト管理部１０７は、実行コスト算出処理としては、上記に代えて、ユーザが希望する音素データを割り当て先である各キャラクタに規定されているコストに基づいて所定の演算（例えば、合算）を実行し、その演算結果を実行コストとして算出する処理（以下、「キャラクタコスト演算処理」という。）を実行してもよい。 On the other hand, instead of the above, the cost management unit 107 may execute an execution cost calculation process in which it performs a predetermined calculation (e.g., summation) based on the cost specified for each character to which the phoneme data desired by the user is assigned, and calculates the result of the calculation as the execution cost (hereinafter referred to as "character cost calculation process").

例えば、図９に示すように、ユーザによって音素データＡ、音素データＢ、音素データＣ、音素データＤ及び音素データＥが使用可能な状態であって、そのうち、音素データＡ、Ｃ及びＥが選択され、かつ、音素データを割り当てるコンテンツには、それぞれコストが設定されたキャラクタ１（コスト：１００ポイント）、２（コスト：５０ポイント）及び３（コスト：４０ポイント）が登場する場合を想定する。 For example, as shown in FIG. 9, assume that phoneme data A, phoneme data B, phoneme data C, phoneme data D, and phoneme data E are available to the user, of which phoneme data A, C, and E are selected, and that the content to which the phoneme data is assigned includes characters 1 (cost: 100 points), 2 (cost: 50 points), and 3 (cost: 40 points), each of which has a set cost.

この場合には、コスト管理部１０７は、図９に示すように、キャラクタコスト演算処理を実行し、実行コストとして、１９０ポイントを算出する。 In this case, the cost management unit 107 executes the character cost calculation process as shown in FIG. 9, and calculates an execution cost of 190 points.

なお、本実施形態においては、音素データ又はキャラクタに基づいて実行コストが算出されているが、音素データが割り当てられるテキスト（一部も含む。）に規定されているコストに基づいて、実行コストが算出されてもよいし、音声言語データ生成処理に用いる
音素データ、キャラクタ、又は、テキストの２以上の組み合わせのそれぞれのコストパラメータに基づいて、実行コストが算出されてもよい。 In this embodiment, the execution cost is calculated based on the phoneme data or characters, but the execution cost may also be calculated based on the cost specified in the text (including a part of it) to which the phoneme data is assigned, or the execution cost may be calculated based on the respective cost parameters of a combination of two or more of the phoneme data, characters, or text used in the speech language data generation process.

（実行許可判定処理）
コンテンツ管理部１０５は、上述のように実行コストが算出されると、実行コスト支払い要求として、該当するユーザの端末装置２０に、システム内通貨若しくはシステム内で用いるアイテム（例えば、数、又は、種別とその数）に基づいて規定されるコスト、又は、コストに対応する課金額を、提示する。 (Execution permission determination process)
When the content management unit 105 calculates the execution cost as described above, it presents to the terminal device 20 of the relevant user a request for payment of the execution cost, which is determined based on the system currency or items used in the system (e.g., number, or type and number), or a charge amount corresponding to the cost.

具体的には、コンテンツ管理部１０５は、算出したアイテムの消費量又は課金額を示す情報とともに、当該アイテムの消費又は課金額による支払いを促すための実行コスト支払い要求を、情報提供部１１０を介して、該当する端末装置２０に送信する。 Specifically, the content management unit 105 transmits, via the information provision unit 110, to the relevant terminal device 20, information indicating the calculated consumption amount or charge amount of the item, as well as an execution cost payment request to prompt the user to pay the consumption or charge amount of the item.

そして、コンテンツ管理部１０５は、端末装置２０を介して、提示した対価の支払いに関する情報、又は、課金に関する情報を受信すると、当該受信した情報によって当該ユーザの支払いの有無を判定する実行許可判定処理を実行する。 Then, when the content management unit 105 receives information regarding payment of the presented price or information regarding billing via the terminal device 20, it executes an execution permission determination process that determines whether or not the user has made a payment based on the received information.

このとき、コンテンツ管理部１０５は、例えば、上記の図４に示すように、実行許可判定処理において、実行コストに対する支払いが適正に実行されたことを示す情報を受信した場合には、音声言語データ生成処理の実行を許可し、対価に対する支払いが適正に実行されていないとことを示す情報を受信した場合には、音声言語データの生成処理を中止し、当該中止した旨を示す情報を生成して該当する端末装置２０に提供する中止処理を実行する。 At this time, as shown in FIG. 4 above, for example, if the content management unit 105 receives information indicating that the payment for the execution cost has been properly made in the execution permission determination process, it permits the execution of the speech language data generation process, and if it receives information indicating that the payment for the fee has not been properly made, it executes a stop process to stop the speech language data generation process and generate information indicating the stop and provide it to the relevant terminal device 20.

［４．４．２］限界値に基づく実行許可判定処理
次に、図１０及び図１１を用いて、本実施形態の実行コストと限界値とに基づく実行許可判定処理について説明する。 [4.4.2] Execution Permission Determination Processing Based on Limit Values Next, the execution permission determination processing based on the execution cost and limit values of this embodiment will be described with reference to FIGS.

なお、図１０及び図１１は、本実施形態の実行コストと限界値とに基づく実行許可判定処理について説明するための図である。 Note that Figures 10 and 11 are diagrams for explaining the execution permission determination process based on the execution cost and limit value in this embodiment.

コスト管理部１０７は、上記の実行許可判定処理に代えて、上記の実行コスト算出処理によって算出した実行コストと予め定められているコストの限界値とが所与の関係性条件を具備していると判定した場合には、発話音声データ生成処理の実行を許可する実行許可判定処理を実行してもよい。 Instead of the above-mentioned execution permission determination process, the cost management unit 107 may execute an execution permission determination process that permits the execution of the speech voice data generation process when it is determined that the execution cost calculated by the above-mentioned execution cost calculation process and a predetermined cost limit value satisfy a given relationship condition.

特に、コンテンツ管理部１０５は、予め設定されたコストの限界値としては、例えば、ユーザが予め支払ったコスト（事前にアイテムを消費させた消費量や課金額）、又は、サブスクリプションなど一定額（月額課金額や年会費など）を支払うと一定のサービスを享受できる場合の限度額（アイテムやポイントの消費量を含む。以下同じ。）、又は、コンテンツ提供者が予め設定した場合の限度額を示す値などの上限値やそれに対応する値であって、コンテンツ毎に対応付けられて設定されている値（例えば、コンテンツ毎に設定された値）であってもよいし、ユーザ毎に対応付けられた値であってもよい。 In particular, the content management unit 105 may set a preset cost limit value, for example, a cost paid in advance by the user (amount of items consumed in advance or a charge), or a limit amount (including the amount of items or points consumed; the same applies below) when a certain service can be enjoyed by paying a certain amount (such as a monthly charge or annual fee) for a subscription, or a value indicating a limit amount set in advance by a content provider, or an upper limit value or a corresponding value, which may be a value set in association with each piece of content (for example, a value set for each piece of content) or a value set in association with each user.

そして、コンテンツ管理部１０５は、所与の関係性条件として、生成処理に用いる音素データのコストが、これらの限界値を超えていない場合などの条件を用いる。 The content management unit 105 uses conditions such as the cost of the phoneme data used in the generation process not exceeding these limit values as a given relationship condition.

すなわち、限界値に基づく実行許可判定処理としては、コンテンツ管理部１０５は、生成処理に用いる音素データのコストが、予め定められたコストの限界値を超えていない場合など所与の関係性条件が具備されていない場合には音声言語データの生成などの実行を
させず、当該限界値を超えている場合など、所与の関係性条件が具備されている場合には、音声言語データの生成などを実行させるため、このような実行許可判定処理を実行する構成を有している。 In other words, as an execution permission determination process based on a limit value, the content management unit 105 is configured to perform such an execution permission determination process in such a way that it does not allow the generation of speech language data, etc. to be executed when a given relationship condition is not met, such as when the cost of the phoneme data used in the generation process does not exceed a predetermined cost limit value, and allows the generation of speech language data, etc. to be executed when a given relationship condition is met, such as when the cost exceeds the limit value.

具体的には、コスト管理部１０７は、上述と同様に、音素情報記憶部１４６に記憶されている音素コスト情報に含まれる各音素データのコストのパラメータ（基準値）を管理する。 Specifically, the cost management unit 107 manages the cost parameters (reference values) of each phoneme data included in the phoneme cost information stored in the phoneme information storage unit 146, as described above.

また、コスト管理部１０７は、上述した音声言語データ生成処理の実行時に、ユーザによって選択された、又は、プログラムによって従って自動的に選択された、音素データのコストパラメータ（基準値）に基づいて、当該音声言語データ生成処理のトータルのコスト（すなわち、実行コスト）を算出する算出処理（すなわち、実行コスト算出処理）を実行する。 In addition, the cost management unit 107 executes a calculation process (i.e., an execution cost calculation process) to calculate the total cost (i.e., the execution cost) of the speech language data generation process based on the cost parameters (reference values) of the phoneme data selected by the user or automatically selected according to the program when the above-mentioned speech language data generation process is executed.

そして、コンテンツ管理部１０５は、このように算出した実行コストが、例えば、予めユーザが既に支払ったコストに基づく限界値以内か否かを判定する実行許可判定処理を実行する。 Then, the content management unit 105 executes an execution permission determination process to determine whether the execution cost calculated in this manner is within a limit value based on, for example, the cost already paid by the user.

このとき、コンテンツ管理部１０５は、実行許可判定処理において、実行コストが限界値以内の場合には、発話音声データ生成処理の実行の許可し、実行コストが限界値を超えた場合には、音声言語データの生成処理を中止し、当該中止した旨を示す情報を生成して該当する端末装置２０に提供する中止処理を実行する。 At this time, in the execution permission determination process, if the execution cost is within the limit value, the content management unit 105 permits the execution of the speech voice data generation process, and if the execution cost exceeds the limit value, it executes a stop process to stop the speech language data generation process and generate information indicating the stop and provide it to the relevant terminal device 20.

例えば、図１０に示すように、ユーザによって音素データＡ（コスト：５０ポイント）、音素データＢ（コスト：５０ポイント）、音素データＣ（コスト：４０ポイント）、音素データＤ（コスト：６０ポイント）及び音素データＥ（コスト：４５ポイント）が使用可能な状態であって、ユーザＡに設定されたコスト上限（既にユーザＡが支払った金額に対応するポイント）が、１５０ｐｔの場合を想定する。 For example, as shown in FIG. 10, assume that a user has available phoneme data A (cost: 50 points), phoneme data B (cost: 50 points), phoneme data C (cost: 40 points), phoneme data D (cost: 60 points), and phoneme data E (cost: 45 points) and that the cost limit set for user A (points corresponding to the amount already paid by user A) is 150 pt.

この場合には、図１０に示すように、音素データＡ、Ｃ及びＥが選択された場合には、コスト管理部１０７は、実行コストとして、１３５ｐｔを算出し、上限値の１５０ｐｔ以内となるため、音声言語データ生成処理の実行を許可する旨の判定を行う。 In this case, as shown in FIG. 10, when phoneme data A, C, and E are selected, the cost management unit 107 calculates the execution cost to be 135 pt, which is within the upper limit of 150 pt, and therefore determines to permit the execution of the speech language data generation process.

その一方、上記と同様な場合であても、図１０に示すように、音素データＡ、Ｂ及びＣが選択された場合には、コスト管理部１０７は、実行コストとして、１６０ｐｔを算出し、上限値の１５０ｐｔを超えるため、音声言語データ生成処理の実行を中止する旨の判定をし、中止処理を実行する。 On the other hand, even in a similar case to the above, when phoneme data A, B, and C are selected as shown in FIG. 10, the cost management unit 107 calculates the execution cost to be 160 pt, which exceeds the upper limit of 150 pt, and therefore determines to stop the execution of the speech language data generation process, and executes the stop process.

なお、限界値は、ユーザ毎に設定されている点に代えて、コンテンツ毎に設定されていてもよい。 In addition, the limit value may be set for each content instead of for each user.

例えば、この場合には、図１１に示すように、ユーザによって音素データＡ（コスト：５０ポイント）、音素データＢ（コスト：５０ポイント）、音素データＣ（コスト：４０ポイント）、音素データＤ（コスト：６０ポイント）及び音素データＥ（コスト：４５ポイント）が使用可能な状態であって、コンテンツＩＤがＩＤＣ０００１のコンテンツに設定されたコスト上限が、１５０ｐｔの場合を想定する。 For example, in this case, as shown in FIG. 11, it is assumed that the user has available phoneme data A (cost: 50 points), phoneme data B (cost: 50 points), phoneme data C (cost: 40 points), phoneme data D (cost: 60 points), and phoneme data E (cost: 45 points), and the upper cost limit set for the content with content ID IDC0001 is 150 pt.

この場合には、図１１に示すように、音素データＡ、Ｃ及びＥが選択された場合には、コスト管理部１０７は、実行コストとして、１３５ｐｔを算出し、上限値の１５０ｐｔ以内となるため、音声言語データ生成処理の実行を許可する旨の判定を行う。 In this case, as shown in FIG. 11, when phoneme data A, C, and E are selected, the cost management unit 107 calculates the execution cost to be 135 pt, which is within the upper limit of 150 pt, and therefore determines to permit the execution of the speech language data generation process.

その一方、上記と同様な場合であても、図１１に示すように、音素データＡ、Ｂ及びＣが選択された場合には、コスト管理部１０７は、実行コストとして、１６０ｐｔを算出し、上限値の１５０ｐｔを超えるため、音声言語データ生成処理の実行を中止する旨の判定をし、中止処理を実行する。 On the other hand, even in a similar case to the above, when phoneme data A, B, and C are selected as shown in FIG. 11, the cost management unit 107 calculates the execution cost to be 160 pt, which exceeds the upper limit of 150 pt, and therefore determines to stop the execution of the speech language data generation process, and executes the stop process.

［４．５］コンテンツ提供制御御処理
次に、本実施形態のコンテンツ提供制御御処理について説明する。 [4.5] Content Provision Control Processing Next, the content provision control processing of this embodiment will be described.

コンテンツ管理部１０５は、情報提供部１１０と連動し、通信制御部１１１を介して、コンテンツのテキストに沿ってキャラクタの音声を再生出力する端末装置２０に、コンテンツ情報とともに、発話音声データ生成処理によって生成した音声言語データを送信（提供）する提供制御処理を実行する。 The content management unit 105 works in conjunction with the information provision unit 110 to execute a provision control process that transmits (provides) the speech language data generated by the speech voice data generation process together with the content information to the terminal device 20 that plays and outputs the voice of the character along with the text of the content via the communication control unit 111.

特に、コンテンツ管理部１０５は、提供制御処理としては、発話音声データとともに、テキストに沿って発話音声データに基づく発話させる発話制御、及び、テキストとに従って端末装置２０に画像を表示させるための画像生成制御などの再生制御データを含む、コンテンツ情報（再生制御情報を含む。）を該当する端末装置２０に提供する。 In particular, as a provision control process, the content management unit 105 provides the relevant terminal device 20 with content information (including playback control information) that includes, together with the spoken voice data, speech control for making speech based on the spoken voice data in accordance with the text, and playback control data such as image generation control for displaying an image on the terminal device 20 in accordance with the text.

一方、本実施形態においては、実行許可判定処理として発話音声データ生成処理の実行の可否が判断されているが、当該発話音声データ生成処理に代えて、又は、加えて、当該実行許可判定処理の判定結果に基づいて、コンテンツ提供制御御処理の実行の可否を判断させてもよい。 On the other hand, in this embodiment, the execution permission determination process determines whether or not to execute the speech voice data generation process, but instead of or in addition to the speech voice data generation process, the execution permission determination process may determine whether or not to execute the content provision control process based on the determination result of the execution permission determination process.

すなわち、コンテンツ管理部１０５は、最終的な判断としてのコンテンツ提供制御処の判定が理中止処理の場合には、発話音声データ生成処理を中止する代わりに、提供制御処理の実行も中止してもよい。 In other words, when the final judgment of the content provision control process is to stop the process, the content management unit 105 may also stop the execution of the provision control process instead of stopping the speech voice data generation process.

この場合には、音声言語データ生成処理によって発話音声言語データが生成されているものとし、コンテンツ管理部１０５は、実行許可判定処理において、
（Ａ１）実行コスト演算処理によって算出された実行コストに対する支払いが適正に実行されていると判定された場合、又は、
（Ａ２）生成処理に用いる音素データのコストと、所定の限界値と、が上記の関係性条件を具備していると判定された場合には、
音声言語データ生成処理によって生成された発話音声言語データを含めてコンテンツ情報を該当する端末装置２０に提供する提供制御処理を実行する。 In this case, it is assumed that the spoken voice language data has been generated by the voice language data generation process, and the content management unit 105 performs the following in the execution permission determination process:
(A1) When it is determined that the payment for the execution cost calculated by the execution cost calculation process has been properly executed, or
(A2) If it is determined that the cost of the phoneme data used in the generation process and the predetermined limit value satisfy the above-mentioned relationship condition,
A provision control process is executed to provide the content information including the spoken speech language data generated by the speech language data generation process to the corresponding terminal device 20 .

また、コンテンツ管理部１０５は、実行許可判定処理において、
（Ｂ１）実行コスト演算処理によって算出された実行コストに対する支払いが適正に実行されていないと判定された場合、又は、
（Ｂ２）生成処理に用いる音素データのコストと、所定の限界値と、が上記の関係性条件を具備していないと判定された場合には、
音声言語データ生成処理によって生成された発話音声言語データを含めコンテンツ情報を該当する端末装置２０に提供する提供制御処理を中止する。 In addition, the content management unit 105 performs the execution permission determination process as follows:
(B1) It is determined that the payment for the execution cost calculated by the execution cost calculation process has not been properly executed, or
(B2) If it is determined that the cost of the phoneme data used in the generation process and the predetermined limit value do not satisfy the above-mentioned relationship condition,
The provision control process of providing the content information including the spoken speech language data generated by the speech language data generation process to the corresponding terminal device 20 is stopped.

なお、提供制御処理が中止される場合には、発話音声言語データが生成されていなくてもよい。 Note that if the provision control process is stopped, spoken language data may not be generated.

［４．６］変形例
次に、本実施形態のオプション状況検出処理の変形例１（仕様変更処理の変更状況）に
ついて説明する。 [4.6] Modifications Next, a first modification of the option status detection process of this embodiment (change status of specification change process) will be described.

（コストパラメータの変動に伴う実行許可判定処理１／ユーザ状況）
本変形例は、上記の実施形態において、実行コストを算出する際にコストパラメータ（基準値）を用いている点に代えて、該当するユーザのコンテンツに関する状況（すなわち、ユーザ状況）に基づいて基準値から変動させたコストパラメータを用いて実行許可判定処理が実行されてもよい。 (Execution permission determination process 1 in response to fluctuations in cost parameters/user status)
In this modified example, instead of using a cost parameter (reference value) when calculating the execution cost in the above embodiment, the execution permission determination process may be performed using a cost parameter that is varied from a reference value based on the content-related situation of the relevant user (i.e., the user situation).

すなわち、本変形例においては、コンテンツ視聴サービスに対するログイン状況などユーザ状況に基づいて、音素データ、キャラクタデータ又はテキストデータにおけるコストパラメータを基準値から変動させて実行コストを変化させ、当該変化させた実行コストによって実行コストに基づく実行許可判定処理又は限界値に基づく実行許可判定処理が実行されてもよい。 In other words, in this modified example, the execution cost may be changed by varying the cost parameters in the phoneme data, character data, or text data from a reference value based on the user status, such as the login status for the content viewing service, and the execution permission determination process based on the execution cost or the execution permission determination process based on the limit value may be executed based on the changed execution cost.

具体的には、ユーザ管理部１０４は、ユーザのコンテンツに関する所与の状況を検出する。 Specifically, the user management unit 104 detects a given situation regarding the user's content.

例えば、ユーザ管理部１０４は、ユーザ状況として、
（Ａ１）ユーザの課金額、
（Ａ２）当該コンテンツを聴取や視聴するサービス（すなわち、コンテンツ視聴サービス）に対するログイン状況（ログインの頻度、総ログイン時間、又は、ログインによって獲得した特典）、
（Ａ３）コンテンツの利用時間（聴取時間や視聴時間）又は利用することによって獲得したポイント、及び、
（Ａ４）ユーザのランクやレベルなどの他のユーザからの優位性を示す優位度、
などを検出する。 For example, the user management unit 104 may store the following as the user status:
(A1) the amount charged to the user,
(A2) Login status (login frequency, total login time, or benefits acquired by logging in) for a service for listening to or viewing the content (i.e., a content viewing service),
(A3) The time spent using the content (listening time or viewing time) or the points earned through using the content, and
(A4) A degree of superiority indicating the superiority of the user over other users, such as the user's rank or level,
Detect etc.

また、コスト管理部１０７は、検出されたユーザ状況と、各ユーザ状況に対応付けて記憶されているコストパラメータ（基準値）の変動値を有するテーブルデータと、に基づいて、当該検出されたユーザの状況におけるコストパラメータ（音素データ、キャラクタデータ又はテキストデータのコストパラメータ）の変動値を特定する変動制御処理を実行し、特定した変動値に基づいて、上述のように、音素データなどの実行コストを算出する。 The cost management unit 107 also executes a variation control process to determine the variation value of the cost parameter (cost parameter of phoneme data, character data, or text data) in the detected user situation based on the detected user situation and table data having variation values of the cost parameters (reference values) stored in association with each user situation, and calculates the execution cost of the phoneme data, etc., as described above, based on the determined variation value.

そして、この場合には、コンテンツ管理部１０５は、変動値に基づいて算出された実行コストを用いて、各種の実行許可判定処理（すなわち、実行コストに基づく実行許可判定処理又は限界値に基づく実行許可判定処理）を実行する。 In this case, the content management unit 105 uses the execution cost calculated based on the variation value to perform various execution permission determination processes (i.e., execution permission determination processes based on the execution cost or execution permission determination processes based on the limit value).

なお、このように、実行コストを変動させることによって、コンテンツ視聴サービスのユーザに対する割引その他のサービスを充実させることができるので、ユーザのコンテンツ利用の満足度を向上させることができるようになっている。 Furthermore, by varying the execution cost in this way, it is possible to enhance discounts and other services for users of the content viewing service, thereby improving the user's satisfaction with content usage.

（コストパラメータの変動に伴う実行許可判定処理２／コンテンツ関連情報）
本変形例は、上記の実施形態において、実行コストを算出する際にコンテンツの種別やキャラクタの属性などのコンテンツに関する関連情報（以下、「コンテンツ関連情報」という。）又は音素に関する関連情報（以下、「音素データ関連情報」といいう。）に基づいて、基準値から変動させたコストパラメータを用いて実行許可判定処理が実行されてもよい。 (Execution permission determination process 2 in response to fluctuations in cost parameters/content-related information)
In this modified example, in the above embodiment, when calculating the execution cost, the execution permission determination process may be performed using a cost parameter that is varied from a reference value based on content-related information such as the type of content or character attributes (hereinafter referred to as "content-related information") or phoneme-related information (hereinafter referred to as "phoneme data-related information").

すなわち、本変形例においては、キャラクタの発話回数や人気度などの関連情報としての属性に応じて、音素データ、キャラクタデータ又はテキストデータにおけるコストパラ
メータを変動させて実行コストを変化させ、当該変化させた実行コストによって実行コストに基づく実行許可判定処理又は限界値に基づく実行許可判定処理が実行されてもよい。 In other words, in this modified example, the execution cost may be changed by varying a cost parameter in the phoneme data, character data or text data depending on attributes such as the number of times a character speaks or its popularity as related information, and an execution permission determination process based on the execution cost or an execution permission determination process based on a limit value may be executed based on the changed execution cost.

具体的には、コスト管理部１０７は、コンテンツ情報記憶部１４４に記憶されているキャラクタやテキストの情報を含む、該当するコンテンツ情報中からコンテンツ関連情報を特定し、又は、音素情報記憶部１４６に記憶されている該当する発話音素情報の中から音素データ関連情報を特定する。 Specifically, the cost management unit 107 identifies content-related information from the relevant content information, including character and text information stored in the content information storage unit 144, or identifies phoneme data-related information from the relevant speech phoneme information stored in the phoneme information storage unit 146.

特に、コスト管理部１０７は、コンテンツ関連情報としては、コンテンツのジャンル（コメディ、ホラー、恋愛、アクション又はアクション）を示すジャンル情報、テキストデータにおける小説・漫画・ノンフィクション・新聞などの属性を示す属性情報、又は、キャラクタデータにおける、動物・ロボット・人間などの種別、性別や年齢、方言（標準語、関西弁、東北弁又は九州訛りなど）やテキストの言語（他言語）の種別及び人気度などの属性を示す属性情報を特定する。 In particular, the cost management unit 107 identifies, as content-related information, genre information indicating the genre of the content (comedy, horror, romance, action, or action), attribute information indicating attributes of the text data such as novel, manga, non-fiction, newspaper, etc., or attribute information indicating attributes of the character data such as type (animal, robot, human, etc.), gender, age, dialect (standard Japanese, Kansai dialect, Tohoku dialect, Kyushu dialect, etc.), type of language of the text (other languages), and popularity.

また、コスト管理部１０７は、音素データ関連情報としては、声優やアナウンサーなどの発話者のジャンル、性別、年齢や年代、又は、人気度などの属性情報を特定する。 In addition, the cost management unit 107 identifies attribute information such as the genre, gender, age or generation, or popularity of the speaker, such as a voice actor or announcer, as phoneme data-related information.

そして、コスト管理部１０７は、これらのコンテンツ関連情報、音素データ関連情報又はその双方と、コンテンツ関連情報、音素データ関連情報又はその双方に対応付けてコストパラメータの変動値を有するテーブルデータと、に基づいて、コンテンツ関連情報や音素データ関連情報に対応するコストパラメータ（音素データ、キャラクタデータ又はテキストデータのコストパラメータ）の変動値を特定する変動制御処理を実行し、特定した変動値に基づいて、上述のように、音素データなどの実行コストを算出する。 Then, the cost management unit 107 executes a variation control process to identify the variation values of the cost parameters (cost parameters of phoneme data, character data, or text data) corresponding to the content-related information or the phoneme data-related information based on the content-related information, the phoneme data-related information, or both, and table data having variation values of the cost parameters corresponding to the content-related information, the phoneme data-related information, or both, and calculates the execution cost of the phoneme data, etc., as described above, based on the identified variation values.

そして、この場合には、コンテンツ管理部１０５は、上述のように、変動値に基づいて算出された実行コストを用いて、各種の実行許可判定処理（すなわち、実行コストに基づく実行許可判定処理又は限界値に基づく実行許可判定処理）を実行する。 In this case, the content management unit 105 performs various execution permission determination processes (i.e., execution permission determination processes based on the execution cost or limit value) using the execution cost calculated based on the fluctuation value as described above.

（コストパラメータの変動に伴う実行許可判定処理３／音素データの組み合わせ）
本変形例は、上記の実施形態において、実行コストを算出する際にコストパラメータ（基準値）を用いている点に代えて、音声言語データ生成処理に用いた（すなわち、キャラクタに割り当てた）音素データの組み合わせに基づいて基準値から変動させたコストパラメータを用いて実行許可判定処理が実行されてもよい。 (Execution permission determination process 3 according to fluctuation of cost parameters/combination of phoneme data)
In this modified example, instead of using a cost parameter (reference value) when calculating the execution cost in the above embodiment, the execution permission determination process may be performed using a cost parameter that is varied from a reference value based on the combination of phoneme data used in the speech language data generation process (i.e., assigned to the character).

すなわち、本変形例においては、同一の発話者によって採取されたデータであることなど、キャラクタに割り当てた音素データの組み合わせに応じて、音素データにおけるコストパラメータを変動させて実行コストを変化させ、当該変化させた実行コストによって実行コストに基づく実行許可判定処理又は限界値に基づく実行許可判定処理が実行されてもよい。 In other words, in this modified example, depending on the combination of phoneme data assigned to a character, such as data collected from the same speaker, the execution cost may be changed by varying the cost parameters in the phoneme data, and the execution permission determination process based on the execution cost or the execution permission determination process based on the limit value may be executed based on the changed execution cost.

具体的には、コスト管理部１０７は、音声言語データ生成処理が実行されると、キャラクタに割り当てられた各音素データにおける、音素情報記憶部１４６に記憶されている該当する発話音素情報の中から音素データ関連情報を特定する。 Specifically, when the speech language data generation process is executed, the cost management unit 107 identifies phoneme data related information from the corresponding speech phoneme information stored in the phoneme information storage unit 146 for each phoneme data assigned to the character.

特に、コスト管理部１０７は、音素データ関連情報としては、割り当てられた各音素デ
ータにおける、声優やアナウンサーなどの発話者のジャンル、性別、年齢や年代、又は、人気度などの属性情報を特定する。 In particular, the cost management unit 107 identifies attribute information such as the genre, gender, age or generation, or popularity of the speaker such as a voice actor or announcer for each assigned phoneme data, as the phoneme data related information.

そして、コスト管理部１０７は、特定した音素データ関連情報の組み合わせと、音素データ関連情報の組み合わせに対応付けてコストパラメータの変動値を有するテーブルデータと、に基づいて、当該組み合わせにおけるコストパラメータの変動値を特定する変動制御処理を実行し、特定した変動値に基づいて、上述のように、音素データなどの実行コストを算出する。 Then, the cost management unit 107 executes a variation control process to identify the variation value of the cost parameter in the combination based on the identified combination of phoneme data related information and table data having the variation value of the cost parameter corresponding to the combination of phoneme data related information, and calculates the execution cost of the phoneme data, etc., as described above, based on the identified variation value.

また、本変形例においては、同一の発話者によって採取されたデータであることなど、キャラクタに割り当てた音素データの組み合わせに応じて、キャラクタデータやテキストデータにおけるコストパラメータを変動させて実行コストを変化させてもよい。 In addition, in this modified example, the execution cost may be changed by varying the cost parameters in the character data and text data depending on the combination of phoneme data assigned to the character, such as whether the data was collected from the same speaker.

（コストパラメータの変動に伴う実行許可判定処理４／モデル情報の学習状況）
本変形例は、上記の実施形態において、実行コストを算出する際にコストパラメータ（基準値）を用いている点に代えて、音声言語データ生成処理が繰り返し実行された際の音素データのデータモデルの学習状況に基づいて基準値から変動させたコストパラメータを用いて実行許可判定処理が実行されてもよい。 (Execution permission determination process 4 in response to fluctuations in cost parameters/learning status of model information)
In this modified example, instead of using a cost parameter (reference value) when calculating the execution cost in the above embodiment, the execution permission determination process may be performed using a cost parameter that is varied from a reference value based on the learning status of the data model of the phoneme data when the speech language data generation process is repeatedly executed.

すなわち、本変形例においては、モデル情報の学習回数、学習進度（所与期間における学習回数）、又は、学習した音声言語データの評価値（例えば、人気度などの利用回数を含む。）などの学習状況に応じて、音素データにおけるコストパラメータを変動させて実行コストを変化させ、当該変化させた実行コストによって実行コストに基づく実行許可判定処理又は限界値に基づく実行許可判定処理が実行されてもよい。 In other words, in this modified example, the execution cost may be changed by varying the cost parameters in the phoneme data depending on the learning situation, such as the number of times the model information has been learned, the learning progress (the number of times it has been learned in a given period), or the evaluation value of the learned speech language data (including, for example, the number of times it has been used, such as popularity), and the execution permission determination process based on the execution cost or the execution permission determination process based on the limit value may be executed based on the changed execution cost.

具体的には、コスト管理部１０７は、音声言語データ生成処理の実行時に、音素情報記憶部１４６に記憶された音素データのモデル情報とともに、記憶された学習回数や人気度などの学習状況を示す学習状況情報を取得する。 Specifically, when executing the speech language data generation process, the cost management unit 107 acquires learning status information indicating the learning status, such as the number of times learning has been performed and popularity, along with model information of the phoneme data stored in the phoneme information storage unit 146.

そして、コスト管理部１０７は、学習状況情報と、当該学習状況情報に対応付けてコストパラメータの変動値を有するテーブルデータと、に基づいて、生成される発話音声言語データにおけるコストパラメータの変動値を特定する変動制御処理を実行し、特定した変動値に基づいて、上述のように、音素データなどの実行コストを算出する。 Then, the cost management unit 107 executes a variation control process to identify the variation value of the cost parameter in the generated spoken speech language data based on the learning status information and table data having the variation value of the cost parameter corresponding to the learning status information, and calculates the execution cost of the phoneme data, etc., as described above, based on the identified variation value.

なお、このように、実行コストを変動させることによって、コンテンツ視聴サービスのユーザに対する割引その他のサービスを充実させることができるので、ユーザのコンテンツ利用の満足度を向上させることができるようになっている。 Furthermore, by varying the execution cost in this way, it is possible to enhance discounts and other services for users of the content viewing service, thereby improving users' satisfaction with content usage.

（ユーザの音素データ）
上記の実施形態においては、予め発話者によって採取された音素データを用いているが、ユーザ自身の音声から音素データを生成し、当該生成した音素データを用いて音声言語データ生成処理を実行してもよい。 (User's phoneme data)
In the above embodiment, phoneme data collected in advance by a speaker is used, but phoneme data may be generated from the user's own voice, and the generated phoneme data may be used to perform the speech language data generation process.

［５］本実施形態における動作
［５．１］実行コストに基づく実行許可判定処理を含むコンテンツ視聴開始処理
次に、図１２を用いて本実施形態のサーバ装置１０によって実行される音声言語データ生成処理、及び、実行コストに基づく実行許可判定処理を含むコンテンツの視聴を開始する際のコンテンツ試聴開始処理の動作について説明する。 [5] Operation in this embodiment [5.1] Content viewing start processing including execution permission determination processing based on execution cost Next, using Figure 12, we will explain the operation of the speech language data generation processing executed by the server device 10 of this embodiment, and the content preview start processing when starting viewing of content including the execution permission determination processing based on execution cost.

なお、図１２は、本実施形態のサーバ装置１０によって実行される音声言語データ生成処理、及び、実行コストに基づく実行許可判定処理を含むコンテンツの視聴を開始する際のコンテンツ試聴開始処理の動作を示すフローチャートである。 Note that FIG. 12 is a flowchart showing the operation of the speech language data generation process executed by the server device 10 of this embodiment, and the content preview start process when starting to view content, including the execution permission determination process based on the execution cost.

本動作は、ユーザの選択によって音素データを、視聴を希望するコンテンツのキャラクタに、割り当てた場合に、当該ユーザのコストの支払いを前提に実行され、当該コンテンツがストリーミングによって端末装置２０に視聴可能に提供される際の動作である。 This operation is performed when the phoneme data is assigned to a character in the content the user wishes to view by selection, subject to the user's payment of the cost, and the content is provided to the terminal device 20 via streaming so that it can be viewed.

そして、本動作においては、ユーザが希望するコンテンツが既にコンテンツ情報記憶部１４４に記憶されているとともに、割り当てられる各音素データについては既にそのコストを示すコストパラメータを含めて音素情報記憶部１４６に記憶されているものとする。 In this operation, it is assumed that the content desired by the user is already stored in the content information storage unit 144, and that each piece of phoneme data to be assigned is already stored in the phoneme information storage unit 146, including a cost parameter indicating its cost.

なお、本動作の実行開始前には、ユーザに対して視聴させるコンテンツの選択及び割り当てを希望する音素情報の選択のための情報が提示されているものとする。 Before starting this operation, the user is presented with information for selecting the content to be viewed and the phoneme information to be assigned.

まず、コンテンツ管理部１０５によって、通信制御部１０１を介して端末装置２０から送信された、ユーザにおけるコンテンツの視聴指示とともに、視聴するコンテンツ、及び、当該コンテンツに登場するキャラクタに割り当てる音素データに関する情報（すなわち、ユーザによって選択された音素データの情報）が受信されると（ステップＳ１０１）、コスト管理部１０７は、割り当てられた音素データの音素コスト情報を音素情報記憶部１４６から読み出し、割り当てられた音素データによって音声言語データを生成する際の実行コストを算出する（ステップＳ１０２）。 First, when the content management unit 105 receives information regarding the content to be viewed and the phoneme data to be assigned to the characters appearing in the content (i.e., information regarding the phoneme data selected by the user) together with the user's instruction to view the content transmitted from the terminal device 20 via the communication control unit 101 (step S101), the cost management unit 107 reads the phoneme cost information of the assigned phoneme data from the phoneme information storage unit 146 and calculates the execution cost of generating speech language data using the assigned phoneme data (step S102).

次いで、情報提供部１１０は、算出された実行コストを該当するユーザに提示するの情報を、通信制御部１０１を介して当該ユーザの端末装置２０に送信し、ユーザの次の指示の受信を待機する（ステップＳ１０３）。 Next, the information providing unit 110 transmits information to present the calculated execution cost to the relevant user via the communication control unit 101 to the terminal device 20 of the user, and waits to receive the user's next instruction (step S103).

なお、コンテンツ管理部１０５は、このように算出した実行コストに基づいて、該当するユーザに対して、所定の方法による支払いを要求（すなわち、実行コスト支払い要求）する。 Based on the execution cost calculated in this manner, the content management unit 105 requests payment from the relevant user in a predetermined manner (i.e., requests payment of the execution cost).

次いで、コンテンツ管理部１０５は、該当するユーザの端末装置２０から提供された実行コストに対する指示とともに、実行コストに対する支払いの有無に関する情報を受信すると（ステップＳ１０４）、受信した情報に基づいて、実行コストの支払いが完了したか否か、すなわち、当該ユーザの支払いの有無を判定することによって音声言語データ生成
処理の実行可否を判定する実行許可判定処理を実行する（ステップＳ１０５）。 Next, when the content management unit 105 receives information regarding whether or not the execution cost has been paid, along with instructions regarding the execution cost provided by the terminal device 20 of the relevant user (step S104), it executes an execution permission determination process based on the received information to determine whether or not payment of the execution cost has been completed, i.e., whether or not the user has made payment, thereby determining whether or not the speech language data generation process can be executed (step S105).

このとき、コンテンツ管理部１０５は、受信した情報に基づいて、ユーザの実行コストの支払いが実行されたと判定した場合（すなわち、音声言語データ生成処理の実行が許可されなかった場合）には、ステップＳ１０６の処理に移行し、受信した情報に基づいて、ユーザの実行コストの支払いが実行されていないと判定した場合には、本動作を終了させる。 At this time, if the content management unit 105 determines based on the received information that the user's execution cost has been paid (i.e., the execution of the speech language data generation process is not permitted), it proceeds to the processing of step S106, and if it determines based on the received information that the user's execution cost has not been paid, it terminates this operation.

次いで、コスト管理部１０７によって、ユーザによって実行コストの支払いが実行されたと判定した場合（すなわち、音声言語データ生成処理の実行が許可された場合）には、発話音声生成処理部１０６は、ユーザによって選択された音素データを当該ユーザが希望するコンテンツのキャラクタに割り当てて、当該キャラクタに属するテキストに対する発話音声となる音声言語データ生成処理を実行する（ステップＳ１０７）。 Next, if the cost management unit 107 determines that the execution cost has been paid by the user (i.e., execution of the speech language data generation process is permitted), the speech generation processing unit 106 assigns the phoneme data selected by the user to a character in the content desired by the user, and executes the speech language data generation process to generate speech for the text belonging to that character (step S107).

最後に、コンテンツ管理部１０５は、ユーザが希望するコンテンツのコンテンツ情報と、当該生成された発話音声データと、に基づいて、該当する端末装置２０に対してストリーミング再生を実行するための各種のデータの送信を開始し（ステップＳ１０８）、本動作を終了させる。 Finally, the content management unit 105 starts transmitting various data for performing streaming playback to the corresponding terminal device 20 based on the content information of the content desired by the user and the generated speech voice data (step S108), and ends this operation.

［５．２］限界値に基づく実行許可判定処理を含むコンテンツ視聴開始処理
次に、図１３を用いて本実施形態のサーバ装置１０によって実行される音声言語データ生成処理、及び、限界値に基づく実行許可判定処理を含むコンテンツの視聴を開始する際のコンテンツ試聴開始処理の動作について説明する。 [5.2] Content viewing start processing including execution permission determination processing based on limit values Next, using Figure 13, we will explain the operation of the speech language data generation processing executed by the server device 10 of this embodiment, and the content preview start processing when starting viewing of content including execution permission determination processing based on limit values.

なお、図１３は、本実施形態のサーバ装置１０によって実行される音声言語データ生成処理、及び、限界値に基づく実行許可判定処理を含むコンテンツの視聴を開始する際のコンテンツ試聴開始処理の動作を示すフローチャートである。 Note that FIG. 13 is a flowchart showing the operation of the speech language data generation process executed by the server device 10 of this embodiment, and the content preview start process when starting to view content, including the execution permission determination process based on a limit value.

本動作は、ユーザの選択によって音素データを、視聴を希望するコンテンツのキャラクタに、割り当てるとともに、音素データの実行コストがコンテンツに予め設定された限界値内の場合に、当該コンテンツがストリーミングによって端末装置２０に視聴可能に提供される場合の動作である。 This operation involves assigning phoneme data to the character of the content the user wishes to view, based on the user's selection, and providing the content to the terminal device 20 via streaming so that it can be viewed if the execution cost of the phoneme data is within a preset limit value for the content.

また、本動作においては、視聴可能なコンテンツ毎に、割り当てることが可能な実行コストの上限値となる限界値が設定されているものとする。 In addition, in this operation, a limit value is set for each viewable content, which is the upper limit of the execution cost that can be allocated.

なお、本動作において上記した実行コストに基づく実行許可判定処理を含むコンテンツ視聴開始処理と同一の処理については同一の符号を付してその説明を省略する。 In addition, in this operation, the same processes as those in the content viewing start process, including the execution permission determination process based on the execution cost described above, are denoted by the same reference numerals and their explanations are omitted.

また、本動作の実行開始前には、ユーザに対して視聴させるコンテンツの選択及び割り当てを希望する音素情報の選択のための情報が提示されているものとする。 Before starting this operation, the user is presented with information for selecting the content to be viewed and the phoneme information to be assigned.

まず、コンテンツ管理部１０５によって、通信制御部１０１を介して端末装置２０から送信された、ユーザにおけるコンテンツの視聴指示とともに、視聴するコンテンツ、及び、当該コンテンツに登場するキャラクタに割り当てる音素データに関する情報（すなわち、ユーザによって選択された音素データの情報）が受信されると（ステップＳ１０１）、
コスト管理部１０７は、割り当てられた音素データの音素コスト情報を音素情報記憶部１４６から読み出し、割り当てられた音素データによって音声言語データを生成する際の実行コストを算出する（ステップＳ１０２）。 First, the content management unit 105 receives information on the content to be viewed and the phoneme data to be assigned to the characters appearing in the content (i.e., information on the phoneme data selected by the user) together with a content viewing instruction from the user transmitted from the terminal device 20 via the communication control unit 101 (step S101).
The cost management unit 107 reads out the phoneme cost information of the assigned phoneme data from the phoneme information storage unit 146, and calculates the execution cost when generating speech language data using the assigned phoneme data (step S102).

次いで、コスト管理部１０７は、算出した実行コストと、予め設定された割り当て可能なコストの上限値（すなわち、限界値）と、を比較し（ステップＳ２０３）、これらが所与の関係性を有する関係性条件を具備しているか否か（例えば、当該実行コストが上限値以内であるか否か）を判定する実行許可判定処理を実行する（ステップＳ２０４）。 Next, the cost management unit 107 compares the calculated execution cost with a preset upper limit (i.e., limit value) of the allocable cost (step S203), and executes an execution permission determination process to determine whether or not they satisfy a relationship condition having a given relationship (e.g., whether or not the execution cost is within the upper limit) (step S204).

このとき、コスト管理部１０７は、算出した実行コストが予め設定された割り当て可能なコストの上限値内であると判定した場合には、ステップＳ１０６の処理に移行し、算出した実行コストが予め設定された割り当て可能なコストの上限値を超えたと判定した場合には本動作を終了させる。 At this time, if the cost management unit 107 determines that the calculated execution cost is within the preset upper limit of the allocable cost, it proceeds to the processing of step S106, and if it determines that the calculated execution cost exceeds the preset upper limit of the allocable cost, it terminates this operation.

なお、本動作が終了した場合には、再度コンテンツと割り当てる音素データの選択を促し、コンテンツと割り当てる音素データが選択されることを前提に本動作を最初から実行してもよい。 When this operation is completed, the user may be prompted to select the content and the phoneme data to be assigned again, and this operation may be performed from the beginning, assuming that the content and the phoneme data to be assigned have been selected.

次いで、発話音声生成処理部１０６は、実行コストが予め設定された割り当て可能なコストの上限値内であると判定された場合には、ユーザによって選択された音素データを当該ユーザが希望するコンテンツのキャラクタに割り当てて、当該キャラクタに属するテキストに対する発話音声となる音声言語データ生成処理を実行する（ステップＳ１０６）。 Next, if the speech generation processing unit 106 determines that the execution cost is within a preset upper limit of the allocable cost, it assigns the phoneme data selected by the user to a character in the content desired by the user, and executes a speech language data generation process that becomes the speech for the text belonging to that character (step S106).

最後に、コンテンツ管理部１０５は、ユーザが希望するコンテンツのコンテンツ情報と、当該生成された発話音声データと、に基づいて、該当する端末装置２０に対してストリーミング再生を実行するための各種のデータの送信を開始し（ステップＳ１０７）、本動作を終了させる。 Finally, the content management unit 105 starts transmitting various data for performing streaming playback to the corresponding terminal device 20 based on the content information of the content desired by the user and the generated speech voice data (step S107), and ends this operation.

［６］その他
本発明は、上記実施形態で説明したものに限らず、種々の変形実施が可能である。例えば、明細書又は図面中の記載において広義や同義な用語として引用された用語は、明細書又は図面中の他の記載においても広義や同義な用語に置き換えることができる。 [6] Others The present invention is not limited to the above-described embodiment, and various modifications are possible. For example, a term cited in the description of the specification or drawings as a term with a broad meaning or synonymous meaning can be replaced with a term with a broad meaning or synonymous meaning in other descriptions of the specification or drawings.

本実施形態は、１のサーバ装置１０によって各コンテンツを端末装置２０に提供してもよいし、複数のサーバ装置１０を連動させてサーバシステムを構築し、各コンテンツを端末装置に提供してもよい。 In this embodiment, each content may be provided to the terminal device 20 by a single server device 10, or multiple server devices 10 may be linked together to form a server system and provide each content to a terminal device.

また、本実施形態においては、サーバ装置１０の機能を備えた単一のコンテンツ再生装置、すなわち、サーバ装置と端末装置とにわけることなく、ネットワークを介してコンテンツ情報及び音素情報を取得する装置だけで、上記の各処理及びコンテンツの再生などを実現してもよい。 In addition, in this embodiment, the above processes and content playback may be realized by a single content playback device having the functions of the server device 10, i.e., by a device that acquires content information and phoneme information via a network, without dividing the device into a server device and a terminal device.

特に、この場合には、コンテンツ再生装置は、内部に再生出力部を有し、当該再生出力部が、音声言語データに基づいてキャラクタによる音声を出力させつつ、当該コンテンツの画像を表示部に表示する構成を有している。 In particular, in this case, the content playback device has an internal playback output unit that displays an image of the content on the display unit while outputting voice from the character based on the voice language data.

そして、このようなコンテンツ端末装置を有線又は無線によって複数連結させ、１のコンテンツ装置がサーバ装置１０として機能して、複数のコンテンツ装置によって実現することも可能である。 It is also possible to realize a system using multiple content devices by connecting multiple such content terminal devices via wired or wireless connections, with one content device functioning as the server device 10.

また、本実施形態においては、ネットワークを通じて端末装置２０と連動して実行するサーバ装置１０に本発明のコンテンツ提供システムを適用しているが、タブレット型情報端末装置やパーソナルコンピュータなどの端末装置としても適用することができる。 In addition, in this embodiment, the content providing system of the present invention is applied to a server device 10 that operates in conjunction with a terminal device 20 via a network, but it can also be applied to a terminal device such as a tablet information terminal device or a personal computer.

すなわち、この場合には、端末装置は、上記のサーバ装置１０の各機能とコンテンツデータを再生する再生機能を有し、音素データを割り当てた音声言語データとともにコンテンツデータを再生する構成を有している。 In this case, the terminal device has each of the functions of the server device 10 described above and a playback function for playing content data, and is configured to play content data together with speech language data to which phoneme data is assigned.

本発明は、実施形態で説明した構成と実質的に同一の構成（例えば、機能、方法及び結果が同一の構成、あるいは目的及び効果が同一の構成）を含む。また、本発明は、実施形態で説明した構成の本質的でない部分を置き換えた構成を含む。また、本発明は、実施形態で説明した構成と同一の作用効果を奏する構成又は同一の目的を達成することができる構成を含む。また、本発明は、実施形態で説明した構成に公知技術を付加した構成を含む。 The present invention includes configurations that are substantially the same as the configurations described in the embodiments (for example, configurations with the same functions, methods, and results, or configurations with the same purpose and effect). The present invention also includes configurations that replace non-essential parts of the configurations described in the embodiments. The present invention also includes configurations that achieve the same effects as the configurations described in the embodiments, or that can achieve the same purpose. The present invention also includes configurations that add publicly known technology to the configurations described in the embodiments.

上記のように、本発明の実施形態について詳細に説明したが、本発明の新規事項及び効果から実体的に逸脱しない多くの変形が可能であることは当業者には容易に理解できるであろう。したがって、このような変形例はすべて本発明の範囲に含まれるものとする。 Although the embodiments of the present invention have been described in detail above, it will be readily apparent to those skilled in the art that many modifications are possible that do not substantially depart from the novel aspects and effects of the present invention. Therefore, all such modifications are intended to be included within the scope of the present invention.

１：コンテンツ提供システム
１０：サーバ装置
２０：端末装置
１００：処理部
１０１：通信制御部
１０２：Ｗｅｂ処理部
１０３：ログイン管理部
１０４：ユーザ管理部
１０５：コンテンツ管理部
１０６：発話音声生成処理部
１０７：コスト管理部
１０９：タイマ管理部
１１０：情報提供部
１１１：通信制御部
１２０：入力部
１３０：表示部
１４０：記憶部
１４２：主記憶部
１４４：コンテンツ情報記憶部
１４６：音素情報記憶部
１４８：ユーザ情報記憶部
１４９：アプリケーション情報記憶部
１８０：情報記憶媒体
１９６：通信部
２００：処理部
２１０：通信制御部
２１１：Ｗｅｂブラウザ
２１２：コンテンツ処理部
２１３：表示制御部
２２０：描画部
２３０：音処理部
２６０：入力部
２６２：検出部
２７０：記憶部
２７１：主記憶部
２７２：画像バッファ
２８０：情報記憶媒体
２９０：表示部
２９２：音出力部
２９６：通信部
1: Content providing system 10: Server device 20: Terminal device 100: Processing unit 101: Communication control unit 102: Web processing unit 103: Login management unit 104: User management unit 105: Content management unit 106: Speech generation processing unit 107: Cost management unit 109: Timer management unit 110: Information providing unit 111: Communication control unit 120: Input unit 130: Display unit 140: Storage unit 142: Main storage unit 144: Content information storage unit 146: Phoneme information storage unit 148: User information storage unit 149: Application information storage unit 180: Information storage medium 196: Communication unit 200: Processing unit 210: Communication control unit 211: Web browser 212: Content processing unit 213: Display control unit 220: Drawing unit 230: Sound processing unit 260 : Input unit 262 : Detection unit 270 : Storage unit 271 : Main storage unit 272 : Image buffer 280 : Information storage medium 290 : Display unit 292 : Sound output unit 296 : Communication unit

Claims

a user information management means for managing information stored in the storage means, the user information being related to a user and speech phoneme information being associated with the user information and being composed of phoneme data collected from a speaker;
a content management means for managing information related to the content as content information, the information being composed of content data including at least text data obtained by converting the text to be spoken by the voice of the speaker and character data related to a character who speaks the text;
a generation processing means for executing a generation process of assigning the phoneme data to the character data based on a given instruction, and generating speech language data for converting text of the text data included in the content information into speech language ;
a provision control means for executing a provision control process for providing the generated speech language data to a reproduction output means for reproducing and outputting the voice of the character along with the text of the content data;
a cost management means for managing a cost parameter that defines a cost of any one of the phoneme data, the character data, and the text data used in the generation process;
a cost calculation unit that executes a calculation process to calculate an execution cost for executing the generation process based on the cost parameters;
a presentation means for presenting the execution cost calculated by the calculation process to a user who wishes to view the content;
A content playback control system comprising:

2. The content playback control system according to claim 1,
The content playback control system further comprises an permission determination processing means for executing an execution permission determination process that determines whether or not to permit execution of at least one of the generation process and the provision control process based on whether or not the user has paid the execution cost calculated by the calculation process.

a user information management means for managing information stored in the storage means, the user information being related to a user and speech phoneme information being associated with the user information and being composed of phoneme data collected from a speaker;
a content management means for managing information related to the content as content information, the information being composed of content data including at least text data obtained by converting the text to be spoken by the voice of the speaker and character data related to a character who speaks the text;
a generation processing means for executing a generation process of assigning the phoneme data to the character data based on a given instruction, and generating speech language data for converting text of the text data included in the content information into speech language ;
a provision control means for executing a provision control process for providing the generated speech language data to a reproduction output means for reproducing and outputting the voice of the character along with the text of the content data;
a cost management means for managing a cost parameter that defines a cost related to the use of any one of phoneme data, character data, and text data used in the generation process;
a permission determination processing means for executing an execution permission determination process for permitting execution of at least one of the generation process and the provision control process when the cost parameter and a preset cost limit value satisfy a given relational condition;
A content playback control system comprising:

In the content playback control system according to any one of claims 1 to 3 ,
The method further comprises: detecting a given state of the user related to the content;
The cost management means:
A content playback control system that executes a variation control process for controlling a variation of the cost parameter based on the detected user's situation regarding the content.

The content playback control system according to any one of claims 1 to 4 ,
The cost management means:
The content playback control system executes a variation control process for controlling a variation of the cost parameter based on related information indicating information relating to any one of the content data, the character data, and the text data.

The content playback control system according to any one of claims 1 to 5 ,
A combination detection means for detecting a combination of phoneme data assigned by the generation process is further provided,
The cost management means:
A content playback control system that executes a variation control process for controlling a variation of the cost parameter based on information on the detected combination of phoneme data.

The content playback control system according to any one of claims 1 to 5 ,
The generation processing means
A content playback control system that, when a character to which no phoneme data is assigned is detected as a specific character based on the user's instruction as the given instruction, sets predetermined phoneme data to the specific character.

The content playback control system according to any one of claims 1 to 5 ,
The generation processing means
executing the generation process for generating the speech language data in accordance with model information indicating a data model of phoneme data generated based on at least one of the attributes of the character and the attributes of the text;
A learning process is executed to learn the model information based on the generated speech language data;
The cost management means:
A content playback control system that executes a variation control process for controlling a variation of the cost parameter based on a state of a learning process of the model information.

a user information management means for managing information stored in the storage means, the user information relating to a user and speech phoneme information associated with the user information and composed of phoneme data collected from a speaker;
a content management means for managing information related to the content as content information, the content information being composed of content data including at least text data obtained by converting a text to be spoken by the voice of the speaker and character data related to a character who speaks the text;
a generation processing means for executing a generation process of assigning the phoneme data to the character data based on a given instruction, and generating speech language data for converting text of the text data included in the content information into speech language ;
a provision control means for executing a provision control process for providing the generated speech language data to a reproduction output means for reproducing and outputting the voice of the character along with the text of the content data;
a cost management means for managing a cost parameter that defines the cost of any one of the phoneme data, character data, and text data used in the generation process;
a cost calculation unit that calculates an execution cost of the generation process based on the cost parameters; and
a presentation means for presenting the execution cost calculated by the calculation process to a user who wishes to view the content;
A program that causes a computer to function as a

The program according to claim 9,
A program that further causes the computer to function as an execution permission determination processing means that executes an execution permission determination process that determines whether or not to permit execution of at least one of the generation process and the provision control process based on whether or not the user has paid the execution cost calculated by the calculation process.

a user information management means for managing information stored in the storage means, the user information relating to a user and speech phoneme information associated with the user information and composed of phoneme data collected from a speaker;
a content management means for managing information related to the content as content information, the content information being composed of content data including at least text data obtained by converting a text to be spoken by the voice of the speaker and character data related to a character who speaks the text;
a generation processing means for executing a generation process of assigning the phoneme data to the character data based on a given instruction, and generating speech language data for converting text of the text data included in the content information into speech language ;
a provision control means for executing a provision control process for providing the generated speech language data to a reproduction output means for reproducing and outputting the voice of the character along with the text of the content data;
a cost management means for managing a cost parameter that defines a cost related to the use of any one of the phoneme data, the character data, and the text data used in the generation process; and
a permission determination processing means for executing an execution permission determination process for permitting execution of at least one of the generation process and the provision control process when the cost parameter and a preset cost limit value satisfy a given relational condition;
A program that causes a computer to function as a