JP7669555B1

JP7669555B1 - Information processing device, information processing method, and program

Info

Publication number: JP7669555B1
Application number: JP2024053719A
Authority: JP
Inventors: 大起青山; 未芙木戸脇; 隆一郎林
Original assignee: KDDI Corp
Current assignee: KDDI Corp
Priority date: 2024-03-28
Filing date: 2024-03-28
Publication date: 2025-04-28
Anticipated expiration: 2044-03-28
Also published as: JP7693145B1

Abstract

【課題】コンテンツに適合するよう素材を修正する負担を軽減する。【解決手段】動画コンテンツを構成するための素材と、動画コンテンツを生成するための生成指示と、を受付ける受付部１３１と、素材について記述されたデータであるメタデータを取得する取得部１３２と、生成指示と、素材のメタデータと、に基づいて、素材を動画コンテンツに適用するための修正内容を特定する特定部１３３と、特定した修正内容に基づき、素材を修正する指示を画像生成ＡＩに入力するためのプロンプトを生成する生成部１３４と、を有する情報処理装置１である。【選択図】図２[Problem] To reduce the burden of modifying materials to fit content. [Solution] An information processing device 1 having a receiving unit 131 that receives materials for constituting video content and generation instructions for generating video content, an acquisition unit 132 that acquires metadata that is data describing the materials, a specification unit 133 that specifies modification content for applying the materials to the video content based on the generation instructions and the metadata of the materials, and a generation unit 134 that generates a prompt for inputting instructions to modify the materials to an image generation AI based on the specified modification content. [Selected Figure] Figure 2

Description

本発明は、情報処理装置、情報処理方法及びプログラムに関する。 The present invention relates to an information processing device, an information processing method, and a program.

素材を組み合わせることでコンテンツを構成する装置が知られている（例えば、特許文献１を参照）。 Devices that create content by combining materials are known (see, for example, Patent Document 1).

特開２０２４－０２７４５３号公報JP 2024-027453 A

このような素材が所望の動画コンテンツに適合しない場合があり、コンテンツの生成に制限がある一方、素材を人手で修正する負担が大きいという問題が生じていた。 Such materials may not match the desired video content, which creates limitations on content generation while also creating problems such as the heavy burden of manually correcting the materials.

そこで、本発明はこれらの点に鑑みてなされたものであり、コンテンツに適合するよう素材を修正する負担を軽減することができるようにすることを目的とする。 The present invention was made in consideration of these points, and aims to reduce the burden of modifying materials to fit content.

本発明の第１の態様の情報処理装置においては、動画コンテンツを構成するための素材と、前記動画コンテンツを生成するための生成指示と、を受付ける受付部と、前記素材について記述されたデータであるメタデータを取得する取得部と、前記生成指示と、前記素材のメタデータと、に基づいて、前記素材を前記動画コンテンツに適用するための修正内容を特定する特定部と、特定した修正内容に基づき、前記素材を修正する指示を画像生成ＡＩに入力するためのプロンプトを生成する生成部と、を有する。 The information processing device of the first aspect of the present invention includes a receiving unit that receives materials for constituting video content and generation instructions for generating the video content, an acquisition unit that acquires metadata that is data describing the materials, a specification unit that specifies corrections for applying the materials to the video content based on the generation instructions and the metadata of the materials, and a generation unit that generates a prompt for inputting instructions to modify the materials to an image generation AI based on the specified corrections.

前記メタデータは、前記素材のアスペクト比を含み、前記生成指示は、前記動画コンテンツにおいて基準となるアスペクト比である基準アスペクト比を含み、前記特定部は、前記メタデータが示すアスペクト比と、前記基準アスペクト比と、が異なる場合、前記素材のアスペクト比の変更を前記修正内容として特定し、前記生成部は、前記素材のアスペクト比を前記特定部が特定したアスペクト比に変更するためのプロンプトを生成してもよい。 The metadata includes the aspect ratio of the material, and the generation instruction includes a reference aspect ratio that is a standard aspect ratio for the video content. When the aspect ratio indicated by the metadata differs from the reference aspect ratio, the identification unit may identify a change in the aspect ratio of the material as the modification content, and the generation unit may generate a prompt to change the aspect ratio of the material to the aspect ratio identified by the identification unit.

前記生成指示は、前記動画コンテンツを説明する記載を含み、前記生成部は、前記特定部がアスペクト比の変更を前記修正内容として特定した場合、前記生成指示に基づいて、前記素材のアスペクト比を前記修正内容に基づいて変更するために不足する部分を当該生成指示が示す動画コンテンツを説明する記載に基づいて補完することを指示するためのプロンプトを生成してもよい。 The generation instruction includes a description describing the video content, and when the identification unit identifies a change in aspect ratio as the modification content, the generation unit may generate a prompt based on the generation instruction to instruct the user to supplement a portion that is missing in order to change the aspect ratio of the material based on the modification content, based on the description describing the video content indicated by the generation instruction.

前記生成部は、前記特定部がアスペクト比の変更を前記修正内容として特定した場合、前記生成指示に基づいて、前記素材のアスペクト比を前記修正内容に基づいて変更した場合に余剰となる部分を切り取ることを指示するためのプロンプトを生成してもよい。 When the identification unit identifies a change in aspect ratio as the modification content, the generation unit may generate a prompt based on the generation instruction to instruct the user to cut out a portion that would be redundant if the aspect ratio of the material were changed based on the modification content.

前記素材は写真画像を含み、前記メタデータは、前記写真画像の被写体を説明する記載を含み、前記生成指示は、前記動画コンテンツを説明する記載を含み、前記生成部は、前記メタデータが示す前記写真画像の被写体を説明する記載と、前記生成指示が示す前記動画コンテンツを説明する記載と、が一致しない場合に、前記生成指示が示す前記動画コンテンツを説明する記載に基づいて、前記写真画像を修正するためのプロンプトを生成してもよい。 The material includes a photographic image, the metadata includes a description describing a subject of the photographic image, and the generation instructions include a description describing the video content, and the generation unit may generate a prompt to modify the photographic image based on the description describing the video content indicated by the generation instructions when the description describing the subject of the photographic image indicated by the metadata does not match the description describing the video content indicated by the generation instructions.

前記メタデータと、前記生成指示と、に基づいて、前記素材を前記動画コンテンツに適用するための修正が必要か否かを判定する判定部をさらに有し、前記特定部は、前記素材の修正が必要と前記判定部が判定する場合に、前記修正内容を特定してもよい。 The system may further include a determination unit that determines whether or not modification is required to apply the material to the video content based on the metadata and the generation instructions, and the identification unit may identify the content of the modification when the determination unit determines that modification of the material is required.

前記生成指示は、前記動画コンテンツにおいて表現が制限される事項を示す制限事項をさらに含み、前記生成部は、前記制限事項をさらに含むプロンプトを生成してもよい。 The generation instructions may further include restrictions indicating matters that are restricted from being expressed in the video content, and the generation unit may generate a prompt that further includes the restrictions.

本発明の第２の態様の情報処理方法においては、コンピュータが実行する、動画コンテンツを構成するための素材と、前記動画コンテンツを生成するための生成指示と、を受付けるステップと、前記素材について記述されたデータであるメタデータを取得するステップと、前記生成指示と、前記素材のメタデータと、に基づいて、前記素材を前記動画コンテンツに適用するための修正内容を特定するステップと、特定した修正内容に基づき、前記素材を修正する指示を画像生成ＡＩに入力するためのプロンプトを生成するステップと、を有する。 The information processing method of the second aspect of the present invention includes the steps of receiving materials for constituting video content and generation instructions for generating the video content, executed by a computer, acquiring metadata that is data describing the materials, identifying corrections for applying the materials to the video content based on the generation instructions and the metadata of the materials, and generating a prompt for inputting instructions to modify the materials to an image generation AI based on the identified corrections.

コンピュータに、動画コンテンツを構成するための素材と、前記動画コンテンツを生成するための生成指示と、を受付けるステップと、前記素材について記述されたデータであるメタデータを取得するステップと、前記生成指示と、前記素材のメタデータと、に基づいて、前記素材を前記動画コンテンツに適用するための修正内容を特定するステップと、特定した修正内容に基づき、前記素材を修正する指示を画像生成ＡＩに入力するためのプロンプトを生成するステップと、を実行させる。 The computer is caused to execute the steps of: accepting materials for constructing video content and generation instructions for generating the video content; acquiring metadata, which is data describing the materials; identifying modifications for applying the materials to the video content based on the generation instructions and the metadata of the materials; and generating a prompt for inputting instructions to modify the materials to an image generation AI based on the identified modifications.

本発明によれば、コンテンツに適合するよう素材を修正する負担を軽減するという効果を奏する。 The present invention has the effect of reducing the burden of modifying materials to fit content.

実施形態にかかる情報処理装置１の概要を説明するための図である。FIG. 1 is a diagram for explaining an overview of an information processing device 1 according to an embodiment. 情報処理装置１の構成を示すブロック図である。1 is a block diagram showing a configuration of an information processing device 1. FIG. 受付部１３１が表示させる受付画面の一例を示す図である。FIG. 11 is a diagram showing an example of a reception screen displayed by a reception unit 131. 特定部１３３の処理の一例を示す図である。FIG. 11 is a diagram illustrating an example of a process performed by a specification unit 133. 情報処理装置１における処理の流れを示すフローチャートである。4 is a flowchart showing a process flow in the information processing device 1.

［情報処理システムＳの概要］
図１は、実施形態にかかる情報処理装置１の概要を説明するための図である。情報処理システムＳは、動画コンテンツを提供するためのシステムである。情報処理システムＳは、ユーザが指定した素材に基づいて動画コンテンツを生成する。情報処理システムＳは、情報処理装置１、情報端末２及び生成装置３を有する。素材は、動画コンテンツを構成する要素となるコンテンツである。素材は、動画データ、静止画データ、テキストデータ、音声データ又は音楽データ等である。 [Overview of Information Processing System S]
1 is a diagram for explaining an overview of an information processing device 1 according to an embodiment. The information processing system S is a system for providing video content. The information processing system S generates video content based on materials specified by a user. The information processing system S has an information processing device 1, an information terminal 2, and a generation device 3. The materials are contents that are elements that make up the video content. The materials are video data, still image data, text data, audio data, music data, etc.

情報処理装置１は、機械学習モデルに指示を与えるためのプロンプトを生成するための装置である。情報処理装置１は、ユーザが指定したコンテンツの内容と、ユーザが指定した素材のメタデータと、に基づいて素材の修正の要否を判定する。情報処理装置１は、修正が必要な場合に素材の修正を生成装置３に指示するためのプロンプトを生成する。 The information processing device 1 is a device for generating a prompt for giving instructions to a machine learning model. The information processing device 1 determines whether or not the material needs to be modified based on the content specified by the user and the metadata of the material specified by the user. If modification is necessary, the information processing device 1 generates a prompt for instructing the generation device 3 to modify the material.

情報端末２は、情報処理システムＳのユーザが使用する端末である。情報端末２は、例えば、スマートフォン、タブレット又はパーソナルコンピュータである。情報端末２は、情報処理装置１に生成するコンテンツの内容についての指示を送信する。 The information terminal 2 is a terminal used by a user of the information processing system S. The information terminal 2 is, for example, a smartphone, a tablet, or a personal computer. The information terminal 2 transmits instructions to the information processing device 1 regarding the content to be generated.

生成装置３は、受付けた指示に基づいて動画コンテンツや静止画等を生成する。生成装置３は、自然言語で記述された指示に基づいて動画像又は静止画像を生成するよう学習された学習済みモデルである生成モデル（以下、「画像生成ＡＩ」と言う場合がある）を有し、入力された指示に基づいて生成モデルが生成した動画像又は静止画像を出力する。 The generating device 3 generates video content, still images, etc. based on the received instructions. The generating device 3 has a generative model (hereinafter sometimes referred to as "image generation AI") that is a trained model trained to generate video or still images based on instructions written in natural language, and outputs video or still images generated by the generative model based on the input instructions.

情報処理装置１の処理を説明する。情報処理装置１は、素材及び生成指示を情報端末２から取得する（図１における（１））。生成指示は、動画コンテンツを生成する指示である。一例として、生成指示は、動画コンテンツを説明する記載を含む。具体的には、生成指示においては、「ジョギングする男性」のように動画コンテンツが表現する内容や動画コンテンツに登場する被写体やその動作、情景等を示す情報を含む。生成指示においては、生成する動画コンテンツの再生時間を指定する情報を含んでいてもよいし、動画コンテンツにおいて基準となるアスペクト比（以下、「基準アスペクト比」と言う場合がある）を示す情報を含んでいてもよい。 The processing of the information processing device 1 will be described. The information processing device 1 acquires materials and a generation instruction from the information terminal 2 ((1) in FIG. 1). The generation instruction is an instruction to generate video content. As an example, the generation instruction includes a description that explains the video content. Specifically, the generation instruction includes information indicating the content expressed by the video content, such as "a man jogging", and subjects and their actions and scenes that appear in the video content. The generation instruction may include information specifying the playback time of the video content to be generated, and may include information indicating a standard aspect ratio for the video content (hereinafter sometimes referred to as a "standard aspect ratio").

情報処理装置１は、メタデータを取得する（図１における（２））。メタデータは、素材について記述されたデータである。メタデータは、一例として、素材のアスペクト比を含む。メタデータは、素材に写りこんだ被写体や素材が表現するテーマ等を示す素材を説明する記載を含んでいてもよい。 The information processing device 1 acquires metadata ((2) in FIG. 1). The metadata is data describing the material. As an example, the metadata includes the aspect ratio of the material. The metadata may also include descriptions describing the material, such as subjects captured in the material or the theme expressed by the material.

情報処理装置１は、生成指示と、メタデータと、に基づいて素材の修正内容を特定する（図１における（３））。一例として、情報処理装置１は、生成指示に含まれる動画コンテンツのアスペクト比と、素材のメタデータに含まれるアスペクト比と、を比較し、生成指示に含まれる動画コンテンツのアスペクト比と、素材のメタデータに含まれるアスペクト比と、が異なる場合に素材のアスペクト比を変更することを素材の修正内容として特定する。例えば、素材のアスペクト比が「４：３」であり、動画コンテンツの基準アスペクト比が「１６：９」である場合、情報処理装置１は、「アスペクト比を１６：９に変更すること」を修正内容として特定する。 The information processing device 1 identifies the modification content of the material based on the generation instruction and the metadata ((3) in FIG. 1). As an example, the information processing device 1 compares the aspect ratio of the video content included in the generation instruction with the aspect ratio included in the metadata of the material, and identifies changing the aspect ratio of the material as the modification content of the material when the aspect ratio of the video content included in the generation instruction differs from the aspect ratio included in the metadata of the material. For example, when the aspect ratio of the material is "4:3" and the standard aspect ratio of the video content is "16:9", the information processing device 1 identifies "changing the aspect ratio to 16:9" as the modification content.

情報処理装置１は、特定した修正内容に基づいて素材を修正するためのプロンプトを生成する（図１における（４））。プロンプトは、自然言語により記述された指示であり、機械学習モデルに与える指示である。例えば、特定した修正内容がアスペクト比の変更であり、生成指示に含まれる基準アスペクト比が「１６：９」である場合、「入力したコンテンツのアスペクト比を１６：９に変更してください」と記述したプロンプトを生成する。 The information processing device 1 generates a prompt for modifying the material based on the identified modification content ((4) in FIG. 1). The prompt is an instruction written in natural language and is an instruction to be given to the machine learning model. For example, if the identified modification content is a change in aspect ratio and the reference aspect ratio included in the generation instruction is "16:9", a prompt is generated stating "Please change the aspect ratio of the input content to 16:9".

情報処理装置１は、生成したプロンプトを生成装置３に送信する（図１における（５））。情報処理装置１は、生成したプロンプトを情報端末２に表示させ、ユーザの確認を促してもよい。情報処理装置１は、情報端末２においてユーザの確認の操作がされた場合、生成されたプロンプトと受付けた素材とを生成装置３に送信し、素材を修正させてもよい。一例として、生成装置３は、プロンプトに基づいて素材を修正し、修正した素材を情報端末２に表示させる。 The information processing device 1 transmits the generated prompt to the generation device 3 ((5) in FIG. 1). The information processing device 1 may display the generated prompt on the information terminal 2 to prompt the user for confirmation. When the user performs a confirmation operation on the information terminal 2, the information processing device 1 may transmit the generated prompt and the accepted material to the generation device 3 to modify the material. As an example, the generation device 3 modifies the material based on the prompt and displays the modified material on the information terminal 2.

情報処理システムＳがこのように構成されることで、コンテンツに適合するよう素材を修正する負担を軽減するという効果を奏する。 By configuring the information processing system S in this way, it has the effect of reducing the burden of modifying materials to fit the content.

［情報処理装置１の構成］
図２は、情報処理装置１の構成を示すブロック図である。情報処理装置１は、通信部１１、記憶部１２及び制御部１３を有する。制御部１３は、受付部１３１、取得部１３２、特定部１３３、生成部１３４及び判定部１３５を有する。 [Configuration of information processing device 1]
2 is a block diagram showing the configuration of the information processing device 1. The information processing device 1 has a communication unit 11, a storage unit 12, and a control unit 13. The control unit 13 has a reception unit 131, an acquisition unit 132, an identification unit 133, a generation unit 134, and a determination unit 135.

通信部１１は、ネットワークを介して他の装置とデータの送受信をするための通信インターフェースである。記憶部１２は、ＲＯＭ（Read Only Memory）、ＲＡＭ（Random Access Memory）、ＳＳＤ（Solid State Drive）、ハードディスクドライブ等を含む記憶媒体である。記憶部１２は、制御部１３が実行するプログラムを予め記憶している。 The communication unit 11 is a communication interface for transmitting and receiving data to and from other devices via a network. The storage unit 12 is a storage medium including a ROM (Read Only Memory), a RAM (Random Access Memory), an SSD (Solid State Drive), a hard disk drive, etc. The storage unit 12 pre-stores the programs to be executed by the control unit 13.

制御部１３は、例えばＣＰＵ（Central Processing Unit）等のプロセッサである。制御部１３は、記憶部１２に記憶されたプログラムを実行することにより、受付部１３１、取得部１３２、特定部１３３、生成部１３４及び判定部１３５として機能する。 The control unit 13 is a processor such as a CPU (Central Processing Unit). The control unit 13 executes the programs stored in the storage unit 12, thereby functioning as a reception unit 131, an acquisition unit 132, an identification unit 133, a generation unit 134, and a determination unit 135.

受付部１３１は、動画コンテンツを構成するための素材と、動画コンテンツを生成するための生成指示と、を受付ける。受付部１３１は、図３に示す受付画面を情報端末２に表示させる。受付画面は、生成指示を受付けるため画面である。受付画面においては、一例として、素材を選択するためのインターフェースＯ１、生成した動画のアスペクト比を指定するためのインターフェースＯ２及び動画の内容を指定するためのインターフェースＯ３を含む。受付部１３１は、ユーザが送信ボタンＯ４を押した場合に、ユーザが指定した内容を含む生成指示を取得する。なお、受付画面においては、アスペクト比を選択するインターフェースに変えて、アスペクト比や動画の再生時間等の動画の設定を関連付けたテンプレートを選択するインターフェースを備えていてもよい。 The reception unit 131 receives materials for constructing video content and generation instructions for generating video content. The reception unit 131 displays the reception screen shown in FIG. 3 on the information terminal 2. The reception screen is a screen for receiving generation instructions. As an example, the reception screen includes an interface O1 for selecting materials, an interface O2 for specifying the aspect ratio of the generated video, and an interface O3 for specifying the content of the video. When the user presses the send button O4, the reception unit 131 acquires a generation instruction including the content specified by the user. Note that the reception screen may include an interface for selecting a template associated with video settings such as the aspect ratio and video playback time, instead of the interface for selecting the aspect ratio.

取得部１３２は、素材についてのメタデータを取得する。取得部１３２は、情報端末２から素材についてのメタデータを取得してもよい。取得部１３２は、素材を管理する素材管理サーバ（不図示）からメタデータを取得してもよい。この場合、取得部１３２は、指定する素材の素材ＩＤを情報端末２から受付け、受付部１３１が受付けた素材ＩＤに対応するメタデータを素材管理サーバから取得する。
判定部１３５は、メタデータと、生成指示と、に基づいて、素材を動画コンテンツに適用するための修正が必要か否かを判定する。判定部１３５は、メタデータと、生成指示と、を比較し修正が必要か否かを判定する。判定部１３５は、メタデータ及び生成指示に含まれるアスペクト比を比較して修正の要否を判定してもよいし、メタデータ及び生成指示それぞれに含まれる素材及び動画コンテンツの内容を比較して修正の要否を判定してもよい。特定部１３３は、素材の修正が必要と判定部１３５が判定する場合に、修正内容を特定する。 The acquisition unit 132 acquires metadata about the material. The acquisition unit 132 may acquire the metadata about the material from the information terminal 2. The acquisition unit 132 may acquire the metadata from a material management server (not shown) that manages the material. In this case, the acquisition unit 132 accepts a material ID of a specified material from the information terminal 2, and acquires metadata corresponding to the material ID accepted by the acceptance unit 131 from the material management server.
The determination unit 135 determines whether or not modification is necessary to apply the material to the video content based on the metadata and the generation instruction. The determination unit 135 compares the metadata with the generation instruction to determine whether or not modification is necessary. The determination unit 135 may determine whether or not modification is necessary by comparing the aspect ratios included in the metadata and the generation instruction, or may determine whether or not modification is necessary by comparing the contents of the material and the video content included in the metadata and the generation instruction, respectively. The identification unit 133 identifies the content of the modification when the determination unit 135 determines that modification of the material is necessary.

特定部１３３は、生成指示と、素材のメタデータと、に基づいて、素材を動画コンテンツに適用するための修正内容を特定する。一例として、特定部１３３は、メタデータが示すアスペクト比と、基準アスペクト比と、が異なる場合、素材のアスペクト比の変更を修正内容として特定する。 The identification unit 133 identifies the modification content for applying the material to the video content based on the generation instruction and the metadata of the material. As an example, when the aspect ratio indicated by the metadata differs from the reference aspect ratio, the identification unit 133 identifies a change in the aspect ratio of the material as the modification content.

生成部１３４は、特定した修正内容に基づき、素材を修正する指示を画像生成ＡＩに入力するためのプロンプトを生成する。生成部１３４は、素材のアスペクト比を特定部１３３が特定したアスペクト比に変更するためのプロンプトを生成する。一例として、記憶部１２は、プロンプトを生成するためのテンプレートを記憶している。生成部１３４は、記憶部１２が記憶するテンプレートに特定した修正内容を当て嵌めることでプロンプトを生成する。また、記憶部１２は、修正内容を入力すると、自然言語で記述されたプロンプトを出力するように学習された学習済みモデル（以下、「プロンプト生成モデル」と言う）を記憶しており、生成部１３４は、特定部１３３が特定した修正内容をプロンプト生成モデルに入力し、修正内容に対応するプロンプトを出力させる。 The generation unit 134 generates a prompt for inputting an instruction to modify the material to the image generation AI based on the identified modification content. The generation unit 134 generates a prompt for changing the aspect ratio of the material to the aspect ratio identified by the identification unit 133. As an example, the storage unit 12 stores a template for generating a prompt. The generation unit 134 generates a prompt by fitting the identified modification content to the template stored in the storage unit 12. The storage unit 12 also stores a trained model (hereinafter referred to as a "prompt generation model") that has been trained to output a prompt written in natural language when the modification content is input, and the generation unit 134 inputs the modification content identified by the identification unit 133 into the prompt generation model and causes it to output a prompt corresponding to the modification content.

生成部１３４は、特定部１３３がアスペクト比の変更を修正内容として特定した場合、生成指示に基づいて、素材のアスペクト比を修正内容に基づいて変更するために不足する部分を当該生成指示が示す動画コンテンツを説明する記載に基づいて補完することを指示するためのプロンプトを生成する。一例として、図４に示すように横向きの画像の素材に基づいて縦向きの動画コンテンツを生成する場合、動画コンテンツにおいて素材が適用できない領域Ｒが生じる。一例として、特定部１３３は、素材を中央に配置した場合に素材により埋めることができない領域を領域Ｒとして特定してもよい。この場合、特定部１３３は、特定した領域Ｒを生成指示に含まれる動画コンテンツの内容に基づいて補完することを修正内容として特定する。この場合、生成部１３４は、特定部１３３が特定した領域Ｒを生成指示に含まれる動画の内容に基づいて生成するよう指示するプロンプトを生成する。この場合、一例として、生成部１３４は、「入力したコンテンツのアスペクト比を９：１６に変更してください。入力したコンテンツにより埋まらない部分は「ランニングする男性」に基づいて生成してください」と記述したプロンプトを生成する。 When the identification unit 133 identifies a change in aspect ratio as the modification content, the generation unit 134 generates a prompt based on the generation instruction to instruct the user to fill in the missing part in order to change the aspect ratio of the material based on the modification content based on the description explaining the video content indicated by the generation instruction. As an example, when generating a vertical video content based on a horizontal image material as shown in FIG. 4, a region R to which the material cannot be applied is generated in the video content. As an example, the identification unit 133 may identify as the region R an area that cannot be filled with the material when the material is placed in the center. In this case, the identification unit 133 identifies the specified region R to be filled based on the content of the video content included in the generation instruction as the modification content. In this case, the generation unit 134 generates a prompt that instructs the user to generate the region R specified by the identification unit 133 based on the content of the video included in the generation instruction. In this case, as an example, the generation unit 134 generates a prompt that describes "Please change the aspect ratio of the input content to 9:16. Please generate the part that is not filled by the input content based on 'a man running'."

このように領域Ｒに補完する内容は素材との関係で自然なものとなることが望ましい。そこで、プロンプト生成モデルにおいては、生成したプロンプトについてのユーザの評価を学習していてもよい。 In this way, it is desirable for the content to be supplemented in region R to be natural in relation to the material. Therefore, the prompt generation model may learn the user's evaluation of the generated prompt.

より具体的には、プロンプト生成モデルが生成したプロンプトに基づいて画像生成ＡＩが修正した素材を閲覧したユーザが、修正された素材を自然と感じるかを示す評価を情報処理装置１は取得する。ユーザの評価は、自然と感じるか否かの２値で表現されてもよいし、自然と感じる程度を示す情報で表現されてもよい。なお、ユーザの評価はこれらの例には限られない。一例として、生成モデルはプロンプト生成モデルが生成したプロンプトと、評価と、を関連付けた教師データを学習している。また、プロンプト生成モデルは、ユーザの評価が所定の閾値以上となるプロンプトを教師データとして学習していてもよい。 More specifically, the information processing device 1 acquires an evaluation indicating whether a user who has viewed material modified by the image generation AI based on a prompt generated by the prompt generation model feels the modified material is natural. The user's evaluation may be expressed as a binary value indicating whether it is felt to be natural or not, or may be expressed as information indicating the degree to which it is felt to be natural. Note that the user's evaluation is not limited to these examples. As an example, the generation model learns training data that associates the prompt generated by the prompt generation model with the evaluation. The prompt generation model may also learn training data that includes prompts that are rated by users at or above a predetermined threshold.

生成部１３４は、特定部１３３がアスペクト比の変更を修正内容として特定した場合、生成指示に基づいて、素材のアスペクト比を修正内容に基づいて変更した場合に余剰となる部分を切り取ることを指示するためのプロンプトを生成する。この場合、特定部１３３は、アスペクト比を変更することにより余剰となる素材の領域を特定し、余剰となる領域をトリミングするよう指示するプロンプトを生成する。 When the identification unit 133 identifies a change in aspect ratio as the modification content, the generation unit 134 generates a prompt based on the generation instruction to instruct the user to crop a portion that would become redundant if the aspect ratio of the material were changed based on the modification content. In this case, the identification unit 133 identifies an area of the material that would become redundant by changing the aspect ratio, and generates a prompt that instructs the user to trim the redundant area.

素材が動画であって、素材の再生時間がテンプレートに関連付けられた動画コンテンツの再生時間と異なる場合には、不足又は余剰な時間の処理方法を示すプロンプトを生成してもよい。この場合、一例として、テンプレートにおいては動画コンテンツの再生時間に対して素材の再生時間が短い場合又は長い場合の処理方法が関連付けられている。一例として、素材の再生時間が動画コンテンツの再生時間より短い場合には、素材をループして再生すること又は不足する時間に素材が示す内容を編集したダイジェスト映像を生成することが関連付けられている。また、素材の再生時間が動画コンテンツの再生時間より長い場合には、素材の所定の部分（例えば末尾）を削除することが関連付けられていてもよい。
特定部１３３は、素材の再生時間とテンプレートに関連付けられた動画コンテンツの再生時間と、が異なる場合にテンプレートに関連付けられた不足又は余剰な時間の処理方法を参照し、修正内容を特定する。 If the material is a video and the playback time of the material is different from the playback time of the video content associated with the template, a prompt may be generated that indicates how to handle the missing or surplus time. In this case, as an example, the template is associated with a processing method when the playback time of the material is shorter or longer than the playback time of the video content. As an example, when the playback time of the material is shorter than the playback time of the video content, it is associated with playing the material in a loop or generating a digest video in which the content indicated by the material is edited to fill the missing time. Also, when the playback time of the material is longer than the playback time of the video content, it may be associated with deleting a predetermined part of the material (e.g., the end).
When the playback time of the material differs from the playback time of the video content associated with the template, the identification unit 133 refers to a method of dealing with a shortage or surplus time associated with the template and identifies the correction content.

生成部１３４は、特定部１３３が特定した修正内容に基づくプロンプトを生成する。一例として、素材の再生時間が９分で動画コンテンツの再生時間が１０分の場合であって、余剰な時間の処理方法がダイジェスト映像の生成である場合、生成部１３４は、「入力したコンテンツを１分間にまとめたダイジェストを生成し、生成したダイジェストを入力したコンテンツの後に繋げて動画を生成してください。」と記述したプロンプトを生成する。また、素材の再生時間が１０分で動画コンテンツの再生時間が６０分の場合であって、余剰な時間の処理方法がループ再生である場合、「入力したコンテンツを６回繰り返す動画を生成してください」と記述したプロンプトを生成する。 The generation unit 134 generates a prompt based on the correction content identified by the identification unit 133. As an example, if the playback time of the material is 9 minutes and the playback time of the video content is 10 minutes, and the method of handling the excess time is to generate a digest video, the generation unit 134 generates a prompt stating, "Please generate a digest that summarizes the input content into one minute, and connect the generated digest after the input content to generate a video." In addition, if the playback time of the material is 10 minutes and the playback time of the video content is 60 minutes, and the method of handling the excess time is to play in a loop, the generation unit 134 generates a prompt stating, "Please generate a video that repeats the input content six times."

写真画像である素材に写りこんだ内容を修正するプロンプトを生成するよう情報処理装置１が構成されてもよい。 The information processing device 1 may be configured to generate a prompt to correct content captured in a photographic image.

生成部１３４は、メタデータが示す写真画像の被写体を説明する記載と、生成指示が示す動画コンテンツを説明する記載と、が一致しない場合に、生成指示が示す動画コンテンツを説明する記載に基づいて、写真画像を修正するためのプロンプトを生成する。一例として、メタデータにおいて素材である画像に含まれる内容が「餃子５個」であることを示し、生成指示において動画コンテンツの内容が「餃子６個」であることを示す場合、生成部１３４は素材を餃子６個に修正するためのプロンプトを生成する。 When the description describing the subject of the photographic image indicated by the metadata does not match the description describing the video content indicated by the generation instruction, the generation unit 134 generates a prompt to modify the photographic image based on the description describing the video content indicated by the generation instruction. As an example, when the metadata indicates that the content contained in the image, which is the material, is "5 dumplings" and the generation instruction indicates that the content of the video content is "6 dumplings," the generation unit 134 generates a prompt to modify the material to 6 dumplings.

コンテンツを生成する際の制限を指示としてさらに含むプロンプトを生成するよう情報処理装置１が構成されてもよい。 The information processing device 1 may be configured to generate a prompt that further includes, as an instruction, restrictions on content generation.

生成指示は、動画コンテンツにおいて表現が制限される事項を示す制限事項をさらに含む。制限事項は例えば、ブランドのロゴ画像や著作権の目的となるキャラクター等が生成されたコンテンツに含まれないようにすることである。制限事項は例えば、暴力や犯罪等を示唆する内容がコンテンツに含まれないことであってもよい。一例として、受付画面においては、生成されるコンテンツに指定する制限事項を選択するためのインターフェースが配置され、受付部１３１は、ユーザが受付画面において選択した制限事項を生成指示として受付けてもよい。 The generation instruction further includes restrictions indicating matters that are to be restricted from being expressed in the video content. For example, restrictions are to ensure that brand logo images, characters that are the subject of copyright, and the like are not included in the generated content. For example, restrictions may be to ensure that the content does not include any content that suggests violence, crime, or the like. As an example, an interface for selecting restrictions to be specified for the generated content may be provided on the reception screen, and the reception unit 131 may receive the restrictions selected by the user on the reception screen as a generation instruction.

生成部１３４は、制限事項をさらに含むプロンプトを生成する。一例として、生成部１３４はコンテンツの内容を指定する文章に制限事項を示す文章がさらに連続するように構成されたプロンプトを生成してもよい。 The generation unit 134 generates a prompt that further includes restrictions. As an example, the generation unit 134 may generate a prompt in which a sentence specifying the content is followed by a sentence indicating the restrictions.

［情報処理装置１における処理の流れ］
図５は、情報処理装置１における処理の流れを示すフローチャートである。受付部１３１は、素材と、生成指示と、を受付ける（Ｓ０１）。取得部１３２は、受付部１３１が受付けた素材についてのメタデータを取得する（Ｓ０２）。 [Processing flow in information processing device 1]
5 is a flowchart showing a flow of processing in the information processing device 1. The reception unit 131 receives a material and a generation instruction (S01). The acquisition unit 132 acquires metadata about the material received by the reception unit 131 (S02).

特定部１３３は、生成指示と、素材のメタデータと、に基づいて、素材の修正内容を特定する（Ｓ０３）。生成部１３４は、特定した修正内容に基づき、素材を修正する指示を画像生成ＡＩに入力するためのプロンプトを生成する（Ｓ０４）。そして情報処理装置１は処理を終了する。 The identification unit 133 identifies the correction content of the material based on the generation instruction and the metadata of the material (S03). The generation unit 134 generates a prompt for inputting an instruction to correct the material to the image generation AI based on the identified correction content (S04). Then, the information processing device 1 ends the process.

［情報処理装置１による効果］
以上説明したように情報処理装置１が構成されることで、コンテンツに適合するよう素材を修正する負担を軽減するという効果を奏する。 [Effects of information processing device 1]
By configuring the information processing device 1 as described above, an effect is achieved in that the burden of correcting material so as to match it with content is reduced.

なお、本発明により、国連が主導する持続可能な開発目標（SDGs）の目標９「産業と技術革新の基盤をつくろう」に貢献することが可能となる。 Furthermore, this invention will make it possible to contribute to Goal 9 of the United Nations' Sustainable Development Goals (SDGs), which is "Build resilient infrastructure, promote inclusive and sustainable industrialization, and promote innovation and infrastructure."

以上、本発明を実施の形態を用いて説明したが、本発明の技術的範囲は上記実施の形態に記載の範囲には限定されず、その要旨の範囲内で種々の変形及び変更が可能である。例えば、装置の全部又は一部は、任意の単位で機能的又は物理的に分散・統合して構成することができる。また、複数の実施の形態の任意の組み合わせによって生じる新たな実施の形態も、本発明の実施の形態に含まれる。組み合わせによって生じる新たな実施の形態の効果は、もとの実施の形態の効果を併せ持つ。 Although the present invention has been described above using embodiments, the technical scope of the present invention is not limited to the scope described in the above embodiments, and various modifications and changes are possible within the scope of the gist of the invention. For example, all or part of the device can be configured by distributing or integrating functionally or physically in any unit. In addition, new embodiments resulting from any combination of multiple embodiments are also included in the embodiments of the present invention. The effect of the new embodiment resulting from the combination also has the effect of the original embodiment.

１情報処理装置
２情報端末
３生成装置
１１通信部
１２記憶部
１３制御部
１３１受付部
１３２取得部
１３３特定部
１３４生成部
１３５判定部
Reference Signs List 1 Information processing device 2 Information terminal 3 Generation device 11 Communication unit 12 Storage unit 13 Control unit 131 Reception unit 132 Acquisition unit 133 Identification unit 134 Generation unit 135 Determination unit

Claims

a receiving unit that receives a source of photographic images for constituting video content and a generation instruction for generating the video content , the generation instruction including an explanation of the content of the video content ;
an acquisition unit that acquires metadata, which is data describing the material , the metadata including a description of the subject of the photographic image ;
a specification unit that specifies a modification content for applying the material to the video content based on the generation instruction and metadata of the material;
A generation unit that generates a prompt for inputting an instruction to modify the material to an image generation AI based on the identified modification content;
having
When a description of a subject of the photographic image indicated by the metadata does not match a description of the video content indicated by the generation instruction, the generation unit generates a prompt for modifying the material, which is the photographic image, based on the description of the video content indicated by the generation instruction.
Information processing device.

the metadata includes an aspect ratio of the material;
the generation instruction includes a reference aspect ratio that is a reference aspect ratio for the video content,
the identifying unit, when the aspect ratio indicated by the metadata is different from the reference aspect ratio, identifies a change in the aspect ratio of the material as the modification content;
the generating unit generates a prompt for changing the aspect ratio of the material to the aspect ratio specified by the specifying unit;
The information processing device according to claim 1 .

The generation instruction includes a description explaining the video content,
When the specification unit specifies a change in aspect ratio as the modification content, the generation unit generates, based on the generation instruction, a prompt for instructing to supplement a portion that is insufficient for changing the aspect ratio of the material based on the modification content, based on a description explaining the video content indicated by the generation instruction.
The information processing device according to claim 2 .

When the specification unit specifies a change in aspect ratio as the modification content, the generation unit generates, based on the generation instruction, a prompt for instructing to cut off a portion that would become redundant if the aspect ratio of the material is changed based on the modification content.
The information processing device according to claim 2 .

a determination unit that determines whether or not the material needs to be modified to apply to the video content based on the metadata and the generation instruction,
The identification unit identifies the content of the correction when the determination unit determines that the material needs to be corrected.
The information processing device according to claim 1 .

The generation instruction further includes a restriction indicating a restriction on an expression of the video content,
The generating unit generates a prompt further including the restriction.
The information processing device according to claim 1 .

The computer executes
receiving a source of photographic images for constituting video content and a generation instruction for generating the video content, the generation instruction including a description of the content of the video content;
obtaining metadata, which is descriptive data about the material , the metadata including a description of the content of a subject of the photographic image ;
determining, based on the generation instructions and metadata of the material, modifications to be applied to the material to the video content;
generating a prompt for inputting instructions to the image generation AI to modify the material based on the identified modification content;
having
In the generating step, if a description of the subject of the photographic image indicated by the metadata does not match a description of the content of the video content indicated by the generation instruction, a prompt is generated for modifying the material, which is the photographic image, based on the description of the content of the video content indicated by the generation instruction.
Information processing methods.

On the computer,
receiving a source of photographic images for constituting video content and a generation instruction for generating the video content, the generation instruction including a description of the content of the video content;
obtaining metadata, which is descriptive data about the material , the metadata including a description of the content of a subject of the photographic image ;
determining, based on the generation instructions and metadata of the material, modifications to be applied to the material to the video content;
generating a prompt for inputting instructions to the image generation AI to modify the material based on the identified modification content;
having
In the generating step, if a description of the subject of the photographic image indicated by the metadata does not match a description of the content of the video content indicated by the generation instruction, a prompt is generated for modifying the material, which is the photographic image, based on the description of the content of the video content indicated by the generation instruction.
A program that executes the following.