JP6442102B1

JP6442102B1 - Information processing system and information processing apparatus

Info

Publication number: JP6442102B1
Application number: JP2018097838A
Authority: JP
Inventors: 誉彦浅野; 晴太郎栗田; 敏光宮尾
Original assignee: 株式会社フランティック
Priority date: 2018-05-22
Filing date: 2018-05-22
Publication date: 2018-12-19
Anticipated expiration: 2038-05-22
Also published as: JP2019205025A

Abstract

【課題】画像制作に係る作業を効率的に行うことが可能な情報処理システム及び情報処理装置の提供。
【解決手段】情報処理システムでは、動画像に効果音を付加する場合、その動画像データの再生中に作業者が発話した音声を音声認識部が認識し、その認識した音声に対応する音データ（効果音）を検索部が検索して、付加する音データを取得する。この取得した音データを、再生中の動画像データに付加することで、音データと動画像データとを関連付けた再生データが得られる。このように、動画像に対する効果音の付加（関連付け）を音声認識により行えるようにすることで、再生中の動画像を見ながら効果音を付加することができるので、画像制作に係る作業効率の向上を図ることが可能となる。
【選択図】図２An information processing system and an information processing apparatus capable of efficiently performing work related to image production.
In an information processing system, when a sound effect is added to a moving image, a voice recognition unit recognizes a voice uttered by an operator during reproduction of the moving image data, and sound data corresponding to the recognized voice. The search unit searches for (sound effect) and acquires sound data to be added. By adding the acquired sound data to the moving image data being reproduced, reproduction data in which the sound data and the moving image data are associated with each other can be obtained. In this way, sound effects can be added to (or associated with) moving images by voice recognition, so that sound effects can be added while viewing the moving image being played back. It is possible to improve.
[Selection] Figure 2

Description

本発明は、情報処理システム及び情報処理装置に関し、特に、画像情報を含む再生データと音情報とを関連付けることが可能な情報処理システム及び情報処理装置に関する。 The present invention relates to an information processing system and an information processing apparatus, and more particularly, to an information processing system and an information processing apparatus capable of associating reproduction data including image information with sound information.

近年、パチンコ遊技機やスロットマシン等の遊技機、ビデオゲーム、ＷＥＢサイト等、画像を利用した娯楽やサービスの提供が広く普及している。こうした画像の制作には、画像を構成するキャラクタの作成やデザイン、動画編集、更には画像に付随する音声の編集など、幅広い作業が必要とされ、その作業には多くの者が関わるのが一般的である。こうした画像制作に係る作業を容易にするためのツールが提案されている（例えば特許文献１を参照）。 In recent years, the provision of entertainment and services using images, such as pachinko machines and slot machines, video games, WEB sites, etc., has become widespread. The production of such images requires a wide range of work, including the creation and design of the characters that make up the images, video editing, and sound editing that accompanies the images, and many people are involved in such work. Is. A tool for facilitating such work related to image production has been proposed (see, for example, Patent Document 1).

特開２００４−２６６７２１号公報JP 2004-266721 A

しかしながら、特許文献１に開示されているような画像編集ツールでは、編集画面の構成を分かりやすくすることで視覚的な作業性の向上は図れるものの、例えば、画像素材と音声素材を合成する等の編集作業（編集処理）自体が効率化されるものではないため、画像制作に係る作業の更なる効率化が望まれる。 However, an image editing tool such as that disclosed in Patent Document 1 can improve the visual workability by making the configuration of the editing screen easy to understand. Since the editing work (editing process) itself is not efficient, further efficiency of work related to image production is desired.

本発明は、上記事情に鑑みてなされたものであり、その目的とするところは、画像制作に係る作業を効率的に行うことが可能なシステム及び装置を提供することにある。 The present invention has been made in view of the above circumstances, and an object of the present invention is to provide a system and apparatus capable of efficiently performing work related to image production.

前述の課題を解決するために、本発明は以下の構成を採用した。
すなわち、手段１の情報処理システムは、
画像情報を含む再生データを再生する再生手段と、
前記再生データの再生中における入力を認識する認識手段と、
前記認識手段により認識された入力に基づいて音情報を生成する生成手段と、
前記生成手段により生成された音情報と前記再生データとを関連付ける関連手段と、
複数の音情報を記憶する記憶手段と、を備え、
前記生成手段は、前記記憶手段に記憶されている音情報の中から、前記認識手段により認識された入力に対応する音情報を取得して、前記再生データと関連付ける音情報を生成するものであり、
前記記憶手段に記憶されている音情報は分類別に管理されており、
前記分類のうち前記生成手段による生成の対象とする音情報の分類を指定可能な指定手段をさらに備え、
前記生成手段は、前記記憶手段に記憶されている音情報のうち、前記指定手段により指定された分類の音情報の中から、前記認識手段により認識された入力に対応する音情報を取得することを要旨とする。 In order to solve the above-described problems, the present invention employs the following configuration.
That is, the information processing system of means 1 is
Re co means you play including playback data image information,
And recognition means for recognizing input during playback of the previous SL re-raw data,
Generating means for generating sound information based on the input recognized by the recognition means;
And related means for associating the sound information and the prior SL playback data generated by the generating means,
Storage means for storing a plurality of sound information,
The generation means acquires sound information corresponding to the input recognized by the recognition means from the sound information stored in the storage means, and generates sound information associated with the reproduction data. ,
Sound information stored in the storage means is managed by classification,
A specifying unit capable of specifying a classification of sound information to be generated by the generating unit among the classifications;
The generating means acquires sound information corresponding to the input recognized by the recognizing means from the sound information of the classification specified by the specifying means among the sound information stored in the storage means. Is the gist.

これによれば、再生データの再生中における入力に基づいて音情報が生成され、当該生成された音情報と再生データとの関連付けが行われるので、画像制作に係る作業の効率化を図ることが可能となる。また、予め記憶された複数の音情報の中から、入力に応じた音情報が取得されて生成されるので、関連付ける音情報の多様化を図ることが可能となる。さらに、関連付ける音情報の分類（種類）を予め指定しておくことで、その指定された分類に即した音情報が、再生データの再生中における入力に基づいて生成されて該再生データと関連付けられるので、関連付ける音情報の選択の効率化を図ることが可能となる。 According to this, the sound information based on the input during the reproduction of playback data is generated, the association between the sound information the generation and playback data is performed, improve the efficiency of work related to the image production It becomes possible. In addition, since sound information corresponding to the input is acquired and generated from a plurality of pieces of sound information stored in advance, it is possible to diversify the associated sound information. Furthermore, by specifying the classification (type) of the sound information to be associated in advance, the sound information corresponding to the designated classification is generated based on the input during reproduction of the reproduction data and is associated with the reproduction data. Therefore, it is possible to improve the efficiency of selecting sound information to be associated.

また、前述の課題を解決するための手段２の情報処理装置は、
画像情報を含む再生データを再生可能な再生処理装置に接続可能な情報処理装置であって、
前記再生処理装置による再生データの再生中における入力を認識する認識手段と、
前記認識手段により認識された入力に基づいて音情報を生成する生成手段と、
前記生成手段により生成された音情報と前記再生データとの関連付けを前記再生処理装置に対して指示する関連指示手段と、
複数の音情報を記憶する記憶手段と、を備え、
前記生成手段は、前記記憶手段に記憶されている音情報の中から、前記認識手段により認識された入力に対応する音情報を取得して、前記再生データと関連付ける音情報を生成するものであり、
前記記憶手段に記憶されている音情報は分類別に管理されており、
前記分類のうち前記生成手段による生成の対象とする音情報の分類を指定可能な指定手段をさらに備え、
前記生成手段は、前記記憶手段に記憶されている音情報のうち、前記指定手段により指定された分類の音情報の中から、前記認識手段により認識された入力に対応する音情報を取得することを要旨とする。 Further, the information processing apparatus means 2 for solving the problems described above,
An information processing apparatus connectable to a reproduction processing apparatus capable of reproducing reproduction data including image information,
Recognizing means for recognizing input during reproduction of reproduction data by the reproduction processing device;
Generating means for generating sound information based on the input recognized by the recognition means;
Association instruction means for instructing the reproduction processing apparatus to associate the sound information generated by the generation means with the reproduction data;
Storage means for storing a plurality of sound information,
The generation means acquires sound information corresponding to the input recognized by the recognition means from the sound information stored in the storage means, and generates sound information associated with the reproduction data. ,
Sound information stored in the storage means is managed by classification,
A specifying unit capable of specifying a classification of sound information to be generated by the generating unit among the classifications;
The generating means acquires sound information corresponding to the input recognized by the recognizing means from the sound information of the classification specified by the specifying means among the sound information stored in the storage means. Is the gist.

これによれば、再生処理装置での再生データの再生中における入力に基づいて、音情報が生成されて再生データに関連付けられるので、画像制作に係る作業の効率化を図ることが可能となる。また、予め記憶された複数の音情報の中から、入力に応じた音情報が取得されて生成されるので、関連付ける音情報の多様化を図ることが可能となる。さらに、関連付ける音情報の分類（種類）を予め指定しておくことで、その指定された分類に即した音情報が、再生データの再生中における入力に基づいて生成されて該再生データと関連付けられるので、関連付ける音情報の選択の効率化を図ることが可能となる。 According to this, since the sound information is generated and associated with the reproduction data based on the input during reproduction of the reproduction data in the reproduction processing apparatus, it is possible to improve the efficiency of the work related to image production . In addition, since sound information corresponding to the input is acquired and generated from a plurality of pieces of sound information stored in advance, it is possible to diversify the associated sound information. Furthermore, by specifying the classification (type) of the sound information to be associated in advance, the sound information corresponding to the designated classification is generated based on the input during reproduction of the reproduction data and is associated with the reproduction data. Therefore, it is possible to improve the efficiency of selecting sound information to be associated.

以上の本発明によれば、画像制作に係る作業を効率的に行うことが可能となる。 According to the above-described present invention, it is possible to efficiently perform work related to image production.

本発明の実施例に係る情報処理システムのハードウェア構成例を示す図である。It is a figure which shows the hardware structural example of the information processing system which concerns on the Example of this invention. 本発明の実施例に係る情報処理システムのシステム構成例を示す図である。It is a figure which shows the system configuration example of the information processing system which concerns on the Example of this invention. 情報処理システムの起動により表示される編集画面の一例を示す図である。It is a figure which shows an example of the edit screen displayed by starting of an information processing system. ジャンル指定画面の一例を示す図である。It is a figure which shows an example of a genre designation | designated screen. 効果音データベースのテーブルの概略を示す図である。It is a figure which shows the outline of the table of a sound effect database. 編集処理のフローチャートである。It is a flowchart of an edit process. 検索処理のフローチャートである。It is a flowchart of a search process. 音データ付加処理のフローチャートである。It is a flowchart of a sound data addition process. 編集作業の一例を示す図である。It is a figure which shows an example of an edit operation | work.

次に、本発明の実施の形態について実施例を用いて説明する。 Next, embodiments of the present invention will be described using examples.

［システム全体構成］
図１は本発明の一実施例に係る情報処理システム１０のハードウェア構成例を示しており、図２は本実施例に係る情報処理システム１０のシステム構成例を示している。本実施例の情報処理システム１０は、市販のパーソナルコンピュータ（ＰＣ）に本システムのソフトウェア（プログラム）をインストールすることで、当該ＰＣを情報処理システム１０として機能させるものとして構成されている。このため、本実施例に係る情報処理システム１０のハードウェア構成は、一般的なパーソナルコンピュータと同様の構成を備えている。すなわち、システム（装置）全体の処理を制御するＣＰＵ１００と、ＰＣを起動させたり動作させたりするのに必要な基本プログラムやデータ等を記憶するＲＯＭ１０１と、ＣＰＵ１００による各種処理の実行に際してデータを一時的に記憶するワークメモリとして使用されるＲＡＭ１０２と、後述の表示部１０６に表示するための画像データを格納するフレームバッファメモリ１０３（ＶＲＡＭ）と、画像データを圧縮して圧縮画像データを生成したり圧縮画像データを伸張して再生したりする画像圧縮伸張部１０４と、ＨＤＤ等により構成され少なくとも本実施例に係る情報処理システム１０を構成するプログラム、画像データ及び音データ等を記憶する補助記憶部１０５と、液晶ディスプレイ等により構成され各種情報を表示する表示部１０６と、キーボードやマウス等により構成され操作入力を行うための操作入力部１０７と、１又は複数のマイク等（集音部）により構成され音声入力を行うための音声入力部１０８と、１又は複数のスピーカ等により構成され各種の音を出力する音出力部１０９と、を備えている。この他にも、一般的なパーソナルコンピュータが備える構成や機能を備えているが、これについては図示を省略している。 [Entire system configuration]
FIG. 1 shows a hardware configuration example of an information processing system 10 according to an embodiment of the present invention, and FIG. 2 shows a system configuration example of the information processing system 10 according to the embodiment. The information processing system 10 of the present embodiment is configured to cause the PC to function as the information processing system 10 by installing the software (program) of the system on a commercially available personal computer (PC). For this reason, the hardware configuration of the information processing system 10 according to the present embodiment has the same configuration as that of a general personal computer. That is, the CPU 100 that controls the processing of the entire system (device), the ROM 101 that stores basic programs and data necessary for starting up and operating the PC, and the data are temporarily stored when the CPU 100 executes various processes. A RAM 102 used as a work memory for storing data, a frame buffer memory 103 (VRAM) for storing image data to be displayed on the display unit 106, which will be described later, and generating or compressing compressed image data by compressing the image data An image compression / decompression unit 104 that decompresses and reproduces image data, and an auxiliary storage unit 105 that includes an HDD or the like and stores at least a program, image data, sound data, and the like that constitute the information processing system 10 according to the present embodiment. And a display that displays various information, such as a liquid crystal display 106, an operation input unit 107 that is configured by a keyboard, a mouse, and the like for performing operation input, a voice input unit 108 that is configured by one or a plurality of microphones (sound collecting unit) and performs voice input, and 1 or A sound output unit 109 configured by a plurality of speakers and outputting various sounds. In addition, although it has the structure and function with which a general personal computer is provided, illustration is abbreviate | omitted about this.

こうしたハードウェア構成のもとで機能する本実施例の情報処理システム１０は、図２に示すように、大別すると、動画像の画像データ（画像情報）を含む再生データを再生することが可能な再生処理装置２０と、再生処理装置２０にて再生される再生データと関連付ける音データ（音情報）を生成することが可能な情報処理装置３０とにより構成される。再生処理装置２０は、動画像の画像データ（以下「動画像データ」ともいう。）を含む再生データの再生に係る処理を実行する再生部２１と、動画像データや音データの編集に係る処理を実行する編集部２２と、情報処理装置３０にて生成された音データの読み込みに係る処理を実行する読込部２３とにより構成される。編集部２２は、読込部２３が読み込んだ音データを動画像データに付加したり、動画像データに対する音データの再生位置を調整したりする等の各種編集に係る処理を実行可能に構成されている。また、再生部２１は、編集部２２により編集された音データと動画像データとを含む再生データの再生に係る処理を実行可能に構成されている。 As shown in FIG. 2, the information processing system 10 of this embodiment that functions under such a hardware configuration can roughly reproduce reproduction data including moving image data (image information). And the information processing apparatus 30 capable of generating sound data (sound information) associated with the reproduction data reproduced by the reproduction processing apparatus 20. The reproduction processing device 20 includes a reproduction unit 21 that performs processing related to reproduction of reproduction data including image data of moving images (hereinafter also referred to as “moving image data”), and processing related to editing of moving image data and sound data. And an editing unit 22 that executes the processing and a reading unit 23 that executes processing relating to reading of sound data generated by the information processing device 30. The editing unit 22 is configured to be able to execute various types of editing processing such as adding the sound data read by the reading unit 23 to the moving image data and adjusting the reproduction position of the sound data with respect to the moving image data. Yes. Further, the playback unit 21 is configured to be able to execute processing related to playback of playback data including sound data and moving image data edited by the editing unit 22.

なお、再生処理装置２０にて再生される再生データには、１又は複数の動画像データ（画像情報）を含んで構成されるもの、１又は複数の音データ（音情報）を含んで構成されるもの、これらの動画像データ（画像情報）と音データ（音情報）の両方を含んで構成されるもの等が存在する。また、動画像（動画像データ）は、時系列的に並べられた複数のフレーム単位の画像（画像データ）からなるものである。以下では、動画像データを含む再生データのことを単に動画像データということがある。 The reproduction data reproduced by the reproduction processing device 20 includes one or a plurality of moving image data (image information), and one or a plurality of sound data (sound information). Some of them include both moving image data (image information) and sound data (sound information). The moving image (moving image data) is composed of a plurality of frame-by-frame images (image data) arranged in time series. Hereinafter, reproduction data including moving image data may be simply referred to as moving image data.

情報処理装置３０は、音声入力部１０８により入力された音声の認識に係る処理を実行する音声認識部３１と、補助記憶部１０５に記憶された複数の音データの中から音声認識部３１により認識した音声に対応する音データの検索に係る処理を実行する検索部３２と、検索部３２により検索された音データの取得および再生処理装置２０への供給に係る処理を実行する音データ取得部３３とにより構成される。ＣＰＵ１００は、操作入力部１０７により情報処理システム１０の起動指示が入力されると、これを受けて補助記憶部１０５に記憶されている本システムのプログラムをＲＡＭ１０２にロードし、当該プログラムを実行する。これにより、情報処理システム１０が起動して、図２に示す各部による処理が実行可能となる。 The information processing apparatus 30 recognizes the speech recognition unit 31 from among a plurality of sound data stored in the auxiliary storage unit 105 and a speech recognition unit 31 that performs processing related to recognition of speech input by the speech input unit 108. The search unit 32 that executes processing related to the search of sound data corresponding to the sound that has been played, and the sound data acquisition unit 33 that executes processing related to acquisition of the sound data searched by the search unit 32 and supply to the playback processing device 20 It consists of. When the activation instruction for the information processing system 10 is input from the operation input unit 107, the CPU 100 loads the system program stored in the auxiliary storage unit 105 into the RAM 102 in response to the input, and executes the program. As a result, the information processing system 10 is activated, and processing by each unit shown in FIG. 2 can be executed.

［編集画面］
次に、情報処理システム１０が起動されることで表示部１０６に表示される編集画面について説明する。図３は本実施例の情報処理システム１０に係る編集画面の一例を示している。図３に示すように、編集画面２００は複数の表示領域（ウィンドウ）により構成されるもので、具体的に、動画像タイトル表示領域２０１と、再生表示領域２０２と、タイムライン表示領域２０３と、音編集表示領域２０４と、音声認識表示領域２０５とを含んで構成される。なお、本実施例で説明する編集画面２００の構成はあくまでも一例であり、編集画面を構成する表示領域（ウィンドウ）の種類や数、サイズ、配置、表示内容等については種々の態様を採ることが可能である。 [Editing screen]
Next, an editing screen displayed on the display unit 106 when the information processing system 10 is activated will be described. FIG. 3 shows an example of an edit screen according to the information processing system 10 of the present embodiment. As shown in FIG. 3, the editing screen 200 is composed of a plurality of display areas (windows). Specifically, a moving image title display area 201, a reproduction display area 202, a timeline display area 203, A sound editing display area 204 and a voice recognition display area 205 are included. Note that the configuration of the editing screen 200 described in this embodiment is merely an example, and various types of display areas (windows) constituting the editing screen, the number, size, arrangement, display contents, and the like may be employed. Is possible.

動画像タイトル表示領域２０１は、補助記憶部１０５の所定のデータフォルダ（記憶領域）に記憶されている動画像データ（動画ファイル）のタイトル（ファイル名）を表示する領域であり、複数のタイトルを一覧で表示することができるように構成されている。本情報処理システム１０を用いて画像編集（画像制作）に係る作業を行う者（以下「作業者」ともいう。）は、操作入力部１０７としてのマウス等を操作して、動画像タイトル表示領域２０１に表示されたタイトルの中から編集対象とする動画像データのタイトルをクリックすることで、編集対象の動画像データを選択することができる。編集対象の動画像データを選択すると、当該動画像データの１フレーム目の再生開始位置の画像が再生表示領域２０２に静止した状態で表示される。 The moving image title display area 201 is an area for displaying titles (file names) of moving image data (moving image files) stored in a predetermined data folder (storage area) of the auxiliary storage unit 105, and lists a plurality of titles. It is configured so that it can be displayed. A person who performs work related to image editing (image production) using the information processing system 10 (hereinafter also referred to as “worker”) operates a mouse or the like as the operation input unit 107 to display a moving image title display area. By clicking the title of the moving image data to be edited from the titles displayed in 201, the moving image data to be edited can be selected. When the moving image data to be edited is selected, the image at the reproduction start position of the first frame of the moving image data is displayed in a stationary state in the reproduction display area 202.

再生表示領域２０２は、動画像タイトル表示領域２０１にて選択した編集対象の動画像データに基づく動画像を再生表示する領域である。再生表示領域２０２の下部には、「再生」、「停止」、「一時停止」等のメディア操作アイコンが設けられており、作業者は、操作入力部１０７としてのマウス等を操作してメディア操作アイコンをクリックすることで、再生表示領域２０２に表示された動画像の再生や一時停止等の指示を入力することができる。前述のように動画像タイトル表示領域２０１にて編集対象の動画像データを選択した状態で「再生」のアイコンをクリックすると、選択した編集対象の動画像データ（再生データ）の再生が開始され、当該データに係る動画像が再生表示領域２０２に再生表示される。また、再生表示領域２０２におけるメディア操作アイコンの左側には、再生表示中の動画像の再生時間の経過をリアルタイムで表示する再生時間表示部が設けられている。さらに、メディア操作アイコンの右側には「プレビュー」ボタンが設けられており、当該「プレビュー」ボタンを押下（クリック）すると、編集中の再生データ（編集対象の動画像データと音データとを合成した再生データ）が最初から再生されるように構成されている。 The reproduction display area 202 is an area for reproducing and displaying a moving image based on the moving image data to be edited selected in the moving image title display area 201. Media operation icons such as “play”, “stop”, and “pause” are provided at the bottom of the playback display area 202. The operator operates the mouse or the like as the operation input unit 107 to operate the media. By clicking the icon, it is possible to input an instruction such as playback or pause of the moving image displayed in the playback display area 202. As described above, when the “playback” icon is clicked while the moving image data to be edited is selected in the moving image title display area 201, reproduction of the selected moving image data (playback data) to be edited is started. A moving image related to the data is reproduced and displayed in the reproduction display area 202. In addition, on the left side of the media operation icon in the playback display area 202, a playback time display unit for displaying in real time the progress of the playback time of the moving image being played back is provided. Furthermore, a “Preview” button is provided on the right side of the media operation icon. When the “Preview” button is pressed (clicked), the reproduction data being edited (moving image data to be edited and sound data are combined). Playback data) is played back from the beginning.

タイムライン表示領域２０３は、再生表示領域２０２にて再生表示される動画像、すなわち編集対象の動画像データの時間軸の再生位置情報（動画像タイムライン）を表示する領域である。このタイムライン表示領域２０３には、音編集表示領域２０４に跨って上下方向に延びる１本のタイムラインカーソルＴＣが表示される。タイムラインカーソルＴＣは現在の再生位置を示すものであり、再生時間の経過に伴って時間軸方向（図３では左側から右側）に移動していくものである。また、タイムライン表示領域２０３は、再生表示領域２０２にて再生表示される動画像の再生データがＢＧＭ（バックグラウンドミュージック）等の音データを含む場合、音データの波形ＨＫを併せて表示するように構成されている。つまり、タイムライン表示領域２０３は、音データの時間軸の再生位置情報（音タイムライン）も表示することが可能となっている。これにより、作業者はタイムライン表示領域２０３を見ることで、画像再生表示領域２０２にて再生表示される動画像の再生位置やこれに付随して再生される音との対応関係、再生位置に応じて出力される音の質、強弱など、編集作業に役立つ情報を容易に把握することができる。 The timeline display area 203 is an area for displaying moving images reproduced and displayed in the reproduction display region 202, that is, reproduction position information (moving image timeline) on the time axis of moving image data to be edited. In the timeline display area 203, one timeline cursor TC extending in the vertical direction across the sound editing display area 204 is displayed. The timeline cursor TC indicates the current playback position, and moves in the time axis direction (from the left side to the right side in FIG. 3) as the playback time elapses. The timeline display area 203 also displays the waveform HK of the sound data when the reproduction data of the moving image reproduced and displayed in the reproduction display area 202 includes sound data such as BGM (background music). It is configured. That is, the timeline display area 203 can also display reproduction position information (sound timeline) on the time axis of sound data. As a result, the operator looks at the timeline display area 203, so that the correspondence between the reproduction position of the moving image reproduced and displayed in the image reproduction display area 202, the sound reproduced accompanying the moving image, and the reproduction position are displayed. It is possible to easily grasp information useful for editing work, such as the quality and strength of the output sound.

音編集表示領域２０４は、編集対象の動画像データ（再生データ、第１再生データ）に対して音データ（音情報）を付加（追加）する場合の音編集に関する情報を表示する領域である。本情報処理システム１０では、再生表示領域２０２にて動画像を再生表示しているときに、音声入力部１０８としてのマイク等を通じて作業者等が発話した音声を認識すると、その認識した音声に対応する効果音を、その音声認識タイミングに合わせて（略同期させて）、タイムライン上（時間軸上）の再生位置に付加することが可能となっている。なお、ここでいう音声認識タイミングは、当該音声認識の契機となった発話のタイミングと略同じである。このことに対応して、音編集表示領域２０４には、音声認識に基づいて付加される効果音に係る音データの情報（効果音の種類（タイトル）、再生タイミングを示す時間等）が、その再生位置に合わせて表示される。 The sound editing display area 204 is an area for displaying information related to sound editing when sound data (sound information) is added (added) to moving image data (reproduction data, first reproduction data) to be edited. In the information processing system 10, when a voice uttered by an operator or the like is recognized through a microphone or the like as the voice input unit 108 while a moving image is played back and displayed in the playback display area 202, the recognized voice is supported. The sound effect to be added can be added to the playback position on the timeline (on the time axis) in accordance with the voice recognition timing (substantially synchronized). Note that the voice recognition timing here is substantially the same as the utterance timing that triggered the voice recognition. Corresponding to this, in the sound editing display area 204, information of sound data (effect type (title), time indicating playback timing, etc.) related to the sound effect added based on the speech recognition is displayed. Displayed according to the playback position.

ここで、本実施例では効果音（音データ）の付加に際し、当該効果音が付加される位置を示すマークＭＫをタイムライン表示領域２０３の時間軸上（タイムライン上）に表示するものとしており、音編集表示領域２０４には、そのマークＭＫと対応付けて、付加した音データの情報を示す音アイコンＩＣを表示（配置）するものとしている。これにより、作業者はタイムライン表示領域２０３や音編集表示領域２０４を見ることで、再生表示領域２０２にて再生表示される動画像（編集対象の動画像データ）に効果音が付加されたことや当該付加された効果音の内容、再生表示中の動画像と効果音の再生位置との対応関係等を容易に把握することができる。本実施例では図３に示すように、マークＭＫと音アイコンＩＣとを破線で繋ぐことにより両者の対応付けを行うものとしている。なお、図３では、マークＭＫ及び音アイコンＩＣがそれぞれ３つ表示されている例を示している（マークＭＫ１〜ＭＫ３、音アイコンＩＣ１〜ＩＣ３）。 Here, in the present embodiment, when a sound effect (sound data) is added, a mark MK indicating the position where the sound effect is added is displayed on the time axis (on the time line) of the timeline display area 203. In the sound editing display area 204, a sound icon IC indicating information of the added sound data is displayed (arranged) in association with the mark MK. Thus, the sound effect is added to the moving image (moving image data to be edited) reproduced and displayed in the reproduction display region 202 by the operator looking at the timeline display region 203 and the sound editing display region 204. It is possible to easily grasp the contents of the added sound effect, the correspondence between the moving image being reproduced and displayed, and the reproduction position of the sound effect, and the like. In the present embodiment, as shown in FIG. 3, the mark MK and the sound icon IC are connected by a broken line to associate them. FIG. 3 shows an example in which three marks MK and three sound icons IC are displayed (marks MK1 to MK3, sound icons IC1 to IC3).

また、本実施例では、作業者が操作入力部１０７としてのマウス等を操作して、音編集表示領域２０４に表示（配置）されている音アイコンＩＣを左右方向にドラッグすることで、当該音アイコンＩＣに対応する効果音（音データ）の再生位置を調整できるように構成されている。このとき、タイムライン表示領域２０３の時間軸上に表示されるマークＭＫ（及び破線）も連動して左右方向に移動するように構成されているので、効果音の再生位置の微調整を容易に行うことできる。さらに、音編集表示領域２０４に１又は複数の音アイコンＩＣが表示されている状態、すなわち、動画像データに１又は複数の効果音の音データを付加した状態（編集中）において、再生表示領域２０２の「プレビュー」ボタンを押下（クリック）すると、音編集表示領域２０４に表示されている音アイコンに対応する音データが付加された動画像データ、つまり当該音データと編集対象の動画像データとを合成した再生データ（第２再生データ）が、最初から再生されるように構成されている。これにより、再生表示領域２０２に動画像が１フレーム目の再生開始位置から再生表示されるとともに、付加した音データの再生位置（再生タイミング）になると効果音が音出力部１０９（スピーカ）から出力される。 Further, in this embodiment, the operator operates the mouse or the like as the operation input unit 107 and drags the sound icon IC displayed (arranged) in the sound editing display area 204 in the left-right direction. The playback position of the sound effect (sound data) corresponding to the icon IC can be adjusted. At this time, the mark MK (and the broken line) displayed on the time axis of the timeline display area 203 is also configured to move in the left-right direction in conjunction with it, so that fine adjustment of the sound effect playback position is facilitated. Can be done. Further, in a state where one or more sound icons IC are displayed in the sound editing display area 204, that is, in a state where sound data of one or more sound effects is added to the moving image data (during editing), the reproduction display area When the “Preview” button 202 is pressed (clicked), moving image data to which sound data corresponding to the sound icon displayed in the sound editing display area 204 is added, that is, the sound data and moving image data to be edited are displayed. The reproduction data (second reproduction data) obtained by synthesizing the data is reproduced from the beginning. As a result, the moving image is reproduced and displayed in the reproduction display area 202 from the reproduction start position of the first frame, and a sound effect is output from the sound output unit 109 (speaker) when the reproduction position (reproduction timing) of the added sound data is reached. Is done.

また、本実施例では、動画像データのタイムライン上（時間軸上）に効果音の音データが付加される際、すなわち、音編集表示領域２０４に音アイコンＩＣが表示（配置）される際、これに伴って当該音データに基づく効果音が音出力部１０９から出力されるように構成されている。これにより、作業者は効果音が付加された動画像の印象をリアルタイムで感じ取ることが可能となる。さらに、音編集表示領域２０４に表示されている音アイコンＩＣをクリックして削除の指示を入力することで、その音アイコンＩＣに対応する音データ（つまり、付加された効果音）を削除する（再生データとの関連付けを解く）ことができるように構成されている。 In this embodiment, when sound effect sound data is added on the timeline (time axis) of the moving image data, that is, when the sound icon IC is displayed (arranged) in the sound editing display area 204. Accordingly, a sound effect based on the sound data is output from the sound output unit 109. Thereby, the operator can sense the impression of the moving image to which the sound effect is added in real time. Furthermore, by clicking the sound icon IC displayed in the sound editing display area 204 and inputting a deletion instruction, the sound data corresponding to the sound icon IC (that is, the added sound effect) is deleted ( It is configured so that the association with the reproduction data can be released.

音声認識表示領域２０５は、音声入力部１０８としてのマイク等を介して音声認識部３１により認識された音声の内容を表示する領域である。この音声認識表示領域２０５は、作業者等が発話した音声（認識された音声）をテキストで表示するように構成されている。本実施例では、動画像タイトル表示領域２０１にて選択した編集対象の動画像データの再生が開始されると、これに連動して、音声認識により効果音（音データ）を付加することが可能な状態（以下「音声認識モード」ともいう。）となるように構成されている。このため、本実施例では、音声認識モード中、音声認識表示領域２０５には、認識された音声の内容（テキスト）が上から下に向かって時系列で表示されるように構成されている。このように認識された音声の内容（テキスト）を表示することで、発話の内容（発話した音声）が正しく認識されているのかを確認することが可能となる。なお、音声認識モードは、動画像データの再生停止に伴って終了するように構成されている。 The voice recognition display area 205 is an area for displaying the contents of the voice recognized by the voice recognition unit 31 via a microphone or the like as the voice input unit 108. The voice recognition display area 205 is configured to display voice (recognized voice) uttered by an operator or the like as text. In this embodiment, when reproduction of moving image data to be edited selected in the moving image title display area 201 is started, sound effects (sound data) can be added by voice recognition in conjunction with this. (Hereinafter also referred to as “voice recognition mode”). For this reason, in this embodiment, during the speech recognition mode, the speech recognition display area 205 is configured to display the content (text) of the recognized speech in chronological order from top to bottom. By displaying the content (text) of the voice recognized in this way, it is possible to confirm whether the content of the utterance (spoken voice) is correctly recognized. Note that the voice recognition mode is configured to end when the reproduction of moving image data is stopped.

さらに、音編集表示領域２０４の右下部には「ジャンル指定」ボタンが設けられており、当該「ジャンル指定」ボタンを押下（クリック）すると、図４に示すジャンル指定画面２１０が編集画面２００の手前側に重畳して表示されるように構成されている。本実施例では後述するように、動画像に付加することが可能な効果音（音データ）をジャンル別（分類別）に管理するものとしており、このことに対応して、付加する効果音（音データ）のジャンル（分類）を指定するためのジャンル指定画面２１０を表示可能に構成されている。本実施例のジャンル指定画面２１０は、ジャンル毎にチェックボックスを形成して構成されており、操作入力部１０７としてのマウス等を操作して、指定するジャンルのチェックボックスにチェックを入れることで、付加する効果音のジャンルを１又は複数選択して指定することが可能となっている。ジャンル指定画面２１０で指定可能（選択可能）なジャンルは、後述する効果音データベースに登録されているジャンルに対応している（図５を参照）。効果音のジャンルを指定した場合には、この指定したジャンルの効果音を対象にして、付加する効果音の検索が行われる（後述のS108）。また、ジャンル指定画面２１０は、操作入力部１０７により所定の表示終了の指示入力を行うことで、表示部１０６への表示を終えるように構成されている。 Further, a “genre designation” button is provided at the lower right of the sound editing display area 204. When the “genre designation” button is pressed (clicked), the genre designation screen 210 shown in FIG. It is configured to be displayed superimposed on the side. In this embodiment, as will be described later, sound effects (sound data) that can be added to a moving image are managed for each genre (by classification). The genre designation screen 210 for designating the genre (classification) of (sound data) is configured to be displayed. The genre designation screen 210 of the present embodiment is configured by forming a check box for each genre. By operating a mouse or the like as the operation input unit 107 and checking a check box of a genre to be designated, It is possible to select and specify one or more genres of sound effects to be added. Genres that can be designated (selectable) on the genre designation screen 210 correspond to genres registered in a sound effect database to be described later (see FIG. 5). When the genre of the sound effect is designated, the sound effect to be added is searched for the sound effect of the designated genre (S108 described later). Further, the genre designation screen 210 is configured to finish the display on the display unit 106 by inputting a predetermined display end instruction through the operation input unit 107.

［編集処理］
次に、本実施例の情報処理システム１０の動作処理について、図６に基づいて説明する。本システム１０の動作処理は、再生処理装置２０と情報処理装置３０との協働により実行されるものである。ＣＰＵ１００は本システム１０が起動されると、前述した編集画面２００（図３を参照）を表示部１０６に表示して、図６に示す編集処理を実行する。 [Edit processing]
Next, operation processing of the information processing system 10 of the present embodiment will be described with reference to FIG. The operation processing of the system 10 is executed by cooperation between the reproduction processing device 20 and the information processing device 30. When the system 10 is activated, the CPU 100 displays the above-described editing screen 200 (see FIG. 3) on the display unit 106 and executes the editing process shown in FIG.

S100では、動画像データの再生が停止中（一時停止を含む）であるか否かを判定する。その結果、停止中でないと判定した場合（S100でNO）、すなわち、動画像データを再生中である場合、後述するS104の処理に移行し、停止中であると判定した場合（S100でYES）、動画像データの再生開始を指示する入力が行われたか否かを判定する（S101）。再生開始の指示入力は、編集画面２００の再生表示領域２０２に表示されるメディア操作アイコンのうち「再生」のアイコンをクリックすることにより行われる。再生開始の指示入力がないと判定した場合（S101でNO）、後述するS110の処理に移行し、再生開始の指示入力があると判定した場合（S101でYES）、再生部２１が動画像タイトル表示領域２０１にて選択された編集対象の動画像データの再生を開始して（S102）、音声認識部３１が音声認識モードを設定する（S103）。これにより、再生表示領域２０２において、編集対象の動画像データに基づく動画像が再生表示される。 In S100, it is determined whether or not reproduction of moving image data is being stopped (including pause). As a result, when it is determined that it is not stopped (NO in S100), that is, when moving image data is being reproduced, the process proceeds to S104 described later, and it is determined that it is stopped (YES in S100). Then, it is determined whether or not an input for instructing the start of reproduction of moving image data has been performed (S101). The instruction to start playback is input by clicking the “play” icon among the media operation icons displayed in the playback display area 202 of the editing screen 200. If it is determined that there is no playback start instruction input (NO in S101), the process proceeds to S110 described later, and if it is determined that there is a playback start instruction input (YES in S101), the playback unit 21 selects a moving image title. The reproduction of the moving image data to be edited selected in the display area 201 is started (S102), and the voice recognition unit 31 sets the voice recognition mode (S103). As a result, a moving image based on the moving image data to be edited is reproduced and displayed in the reproduction display area 202.

次いでS104では、音声入力部１０８による音声の入力が行われたか否かを判定し、音声入力がないと判定した場合（S104でNO）、後述するS110の処理に移行し、音声入力があると判定した場合（S104でYES）、音声認識部３１がその入力のあった音声を取得して音声認識を行う（S105）。音声認識部３１は、音声入力部１０８を介して入力された作業者等の音声を認識することが可能な音声認識機能であり、一般的な音声認識アルゴリズムによる音声認識を行うものである。この音声認識部３１としては、公知の音声認識機能を用いることが可能である。そして、入力のあった音声を適切に認識することができなかった場合には（S106でNO）、後述するS110の処理に移行し、認識することができた場合には（S106でYES）、その認識した音声をテキストデータに変換して、当該テキストデータに基づく音声の内容を編集画面２００の音声認識表示領域２０５に表示するとともに（S107）、当該テキストデータに基づいて検索部３２が検索処理を実行する（S108）。 Next, in S104, it is determined whether or not voice input has been performed by the voice input unit 108. If it is determined that there is no voice input (NO in S104), the process proceeds to S110 described later, and there is a voice input. If it is determined (YES in S104), the voice recognition unit 31 acquires the input voice and performs voice recognition (S105). The voice recognition unit 31 is a voice recognition function capable of recognizing the voice of an operator or the like input via the voice input unit 108, and performs voice recognition using a general voice recognition algorithm. As the voice recognition unit 31, a known voice recognition function can be used. If the input voice cannot be properly recognized (NO in S106), the process proceeds to S110 described later, and if it can be recognized (YES in S106), The recognized voice is converted into text data, and the content of the voice based on the text data is displayed in the voice recognition display area 205 of the editing screen 200 (S107), and the search unit 32 performs a search process based on the text data. Is executed (S108).

ここで、補助記憶部１０５には、動画像データ（再生データ、第１再生データ）に付加する（関連付ける）ことが可能な音データ（音情報）と、実際に音データを付加する（関連付ける）際に参照する効果音データベースが記憶されている。補助記憶部１０５に記憶される音データは、動画像に付加する効果音として用いられるであろう音をデータ化したものであり、本実施例では、自然現象、物の動き、物が出す音、動物の鳴き声など、様々なジャンルの音をベースとして作成された効果音のデータを音データとしている。そして、これらの音データに係る効果音の内容をそれぞれ言語化して表したもの（擬声語）を、音声認識による音データの付加にあたっての認識対象とする音声（以下「認識ワード」ともいう。）としており、この認識ワードと、音データ（効果音）との関係を規定した効果音データベースが補助記憶部１０５に記憶されている。 Here, sound data (sound information) that can be added (associated) with moving image data (reproduction data, first reproduction data) and sound data are actually added (associated) to the auxiliary storage unit 105. A sound effect database to be referred to is stored. The sound data stored in the auxiliary storage unit 105 is obtained by converting sound that will be used as a sound effect to be added to a moving image, and in this embodiment, a natural phenomenon, the movement of an object, and the sound produced by the object. Sound data created from sound of various genres, such as animal calls, is used as sound data. The contents of the sound effects related to the sound data are expressed in language (onomatopoeia) as speech (hereinafter also referred to as “recognition words”) to be recognized when the sound data is added by speech recognition. A sound effect database defining the relationship between the recognition word and sound data (sound effects) is stored in the auxiliary storage unit 105.

図５は本実施例の効果音データベースのテーブル構造の概略を示している。図５に示すように、本実施例の効果音データベースのテーブルは、予め用意された効果音のジャンル毎に（「自然現象」等）、各効果音の内容を言語化して表した擬声語（「ごろごろ」等）と、これに対応する効果音の音データの情報（データ保存先、ファイル名等）とを一対一で対応付けた構造となっている。本実施例の情報処理システム１０は、補助記憶部１０５に記憶される音データ（効果音）の更新（追加、削除等）や、これに対応する効果音データベースの更新（追加、削除等）を事後的に行うこと（いわゆるバージョンアップ）が可能に構成されている。なお、図５では説明の便宜上、効果音のジャンルと、各ジャンルの認識ワード及び音データをそれぞれ３つ例示しているが、実際には、これより多くのジャンル、認識ワード及び音データを備えており、付加することが可能な音データ（効果音）の多様化が図られている。また、図５では、各ジャンルの音データが他のジャンルに含まれない（重複しない）態様を例示しているが、複数のジャンルに含まれる（重複する）音データが存在していてもよい。 FIG. 5 shows an outline of the table structure of the sound effect database of the present embodiment. As shown in FIG. 5, the table of the sound effect database of the present embodiment is a pseudo-spoken word (““ natural phenomenon ”etc.) prepared by verbalizing and expressing the content of each sound effect for each genre of sound effect prepared in advance (“ And the like (ie, data storage destination, file name, etc.) corresponding to the sound effect data corresponding to this. The information processing system 10 according to the present embodiment updates (adds, deletes, etc.) sound data (sound effects) stored in the auxiliary storage unit 105 and updates (adds, deletes, etc.) the sound effect database corresponding thereto. It is configured to be able to do things afterwards (so-called version upgrade). In FIG. 5, for convenience of explanation, three genres of sound effects and three recognition words and sound data of each genre are illustrated, but actually, more genres, recognition words and sound data are provided. The sound data (sound effects) that can be added is diversified. In addition, FIG. 5 illustrates an example in which the sound data of each genre is not included in other genres (does not overlap), but sound data included in (overlapping) multiple genres may exist. .

本実施例では、図５に示すようなテーブル構造を有する効果音データベースを用いて音データ（効果音）を管理しており、検索部３２は、効果音データベースを参照して音データの検索処理（S108）を実行するのである。この検索処理（S108）は、図７に示すフローチャートにしたがって実行される。すなわち、検索処理（S108）ではまず、S106で認識した音声（認識ワード）に対応する音データを検索して取得するための命令文を生成する（S201）。この生成した命令文にしたがって、効果音データベースにアクセスして（S202）、認識した音声（認識ワード）に対応する音データを検索し（S203）、検索の結果、音データが特定されると、その特定された音データを、補助記憶部１０５に記憶されている音データの中から取得する（S204）。これにより、現在再生中の動画像データ（再生データ、第１再生データ）に付加する効果音の音データが生成（抽出）される。 In this embodiment, sound data (sound effects) is managed using a sound effect database having a table structure as shown in FIG. 5, and the search unit 32 refers to the sound effect database to search for sound data. (S108) is executed. This search process (S108) is executed according to the flowchart shown in FIG. That is, in the search process (S108), first, a command sentence for searching and acquiring sound data corresponding to the voice (recognition word) recognized in S106 is generated (S201). According to the generated command sentence, the sound effect database is accessed (S202), the sound data corresponding to the recognized voice (recognition word) is searched (S203), and the sound data is specified as a result of the search. The identified sound data is acquired from the sound data stored in the auxiliary storage unit 105 (S204). As a result, sound data of sound effects to be added to the moving image data (reproduction data, first reproduction data) currently being reproduced is generated (extracted).

例えば、本システム１０を利用している作業者の発話した音声が「どかーん」である場合、この音声を音声認識部３１が認識すると（S106でYES）、検索部３２が効果音データベースを参照して、その認識された音声「どかーん」（認識ワード）に対応する効果音の音データを検索する（S203）。図５に示すように「どかーん」の認識ワードに対応する効果音の音データは「音データＣ２（小爆発音）」であるため、補助記憶部１０５に記憶されている音データの中から「音データＣ２」を取得する（S204）。 For example, when the voice uttered by the worker using the system 10 is “don't care”, when the voice recognition unit 31 recognizes this voice (YES in S106), the search unit 32 stores the sound effect database. Referring to the sound data of the sound effect corresponding to the recognized voice “Dokan” (recognition word) is searched (S203). As shown in FIG. 5, the sound data of the sound effect corresponding to the recognition word “Dokan” is “Sound data C2 (explosive pronunciation)”, and therefore, among the sound data stored in the auxiliary storage unit 105 "Sound data C2" is acquired from (S204).

ここで、作業者の発話した音声（認識ワード）が複数のジャンルに重複する場合、例えば、認識ワードが図５に示す「ごろごろ」である場合、検索部３２は、「ごろごろ」に対応する音データＡ１（自然現象：雷の音）および音データＢ１（物の動き：転がる音）を検索結果として抽出するが、本実施例では、このように一の認識ワードに対応する音データが複数存在する場合、当該複数の音データの中から一の音データをランダムに特定（抽出）して取得するように構成されている。なお、一の認識ワードに対応する音データが複数存在する場合の一の音データの特定（抽出）方法はランダム抽出に限られず、一の音データを特定するための条件をプログラム上で予め定めておき、当該条件に基づいて特定することが可能である。例えば、認識ワードが共通する音データ毎に優先順位を定めておき当該優先順位にしたがって一の音データを特定したり、認識した音声の特徴によって認識ワードに対応する一の音データを特定したり、編集対象（再生中）の動画像データの種類（タイトル、ジャンル等）を識別してこれに適した一の音データを特定したりすること等が可能である。また、前述したジャンル指定画面２１０により事前に効果音のジャンルを指定しておくことで、認識ワード（音データ）の重複を回避することが可能であり、音データの検索（選択）の効率を向上させることが可能である。 Here, when the voice (recognition word) uttered by the operator overlaps with a plurality of genres, for example, when the recognition word is “about” shown in FIG. 5, the search unit 32 makes a sound corresponding to “about”. Data A1 (natural phenomenon: lightning sound) and sound data B1 (object movement: rolling sound) are extracted as search results. In this embodiment, there are a plurality of sound data corresponding to one recognition word in this way. In this case, one sound data is randomly specified (extracted) from the plurality of sound data and acquired. Note that the method of specifying (extracting) one sound data when there are a plurality of sound data corresponding to one recognition word is not limited to random extraction, and the conditions for specifying one sound data are determined in advance in the program. It is possible to specify based on the conditions. For example, a priority order is set for each sound data with a common recognition word, and one sound data is specified according to the priority order, or one sound data corresponding to the recognition word is specified according to the characteristics of the recognized speech. It is possible to identify the type (title, genre, etc.) of moving image data to be edited (playing back) and specify one sound data suitable for this. Also, by specifying the genre of the sound effect in advance on the genre specification screen 210 described above, it is possible to avoid duplication of recognition words (sound data), and the efficiency of sound data search (selection) can be reduced. It is possible to improve.

また、本実施例では、S203による検索の結果が０件となって音データの特定ができなかった場合（認識ワード未対応の場合）、その旨を示すメッセージ画像を編集画面２００に表示し、S109の処理を行うことなくS110の処理に移行する。この場合、効果音の音データは生成されず、これに伴い動画像データには音データ（効果音）が付加されないこととなる。 In this embodiment, when the result of the search by S203 is 0 and the sound data cannot be specified (when the recognition word is not supported), a message image indicating that fact is displayed on the editing screen 200, The process proceeds to S110 without performing S109. In this case, sound data of sound effects is not generated, and accordingly, sound data (sound effects) is not added to the moving image data.

図６に戻り、ＣＰＵ１００は検索処理（S108）を終えると、当該検索処理のS204で取得（生成）された音データを、現在再生中の動画像データ（再生データ、第１再生データ））に付加するための音データ付加処理（S109）を実行する。この音データ付加処理（S109）は、図８に示すフローチャートにしたがって実行される。すなわち、音データ付加処理（S109）ではまず、指示部３３が再生処理装置２０に対して、先のS204で取得（生成）された音データの読み込み及び当該音データと再生中の動画像データとの合成を指示する信号（以下「指示信号」ともいう。）を出力する（S301）。指示部３３（情報処理装置３０）からの指示信号を再生処理装置２０が受信すると、読込部２３が先のS204で取得（生成）された音データを読み込み（S302）、当該読み込んだ音データと、現在再生中の動画像データ（再生データ、第１再生データ）とを、編集部２２が合成する（S303）。S303では、S302で読み込んだ音データを、現在再生中の動画像データ（再生データ）における現在の再生位置情報（再生時間情報）と関連付けて当該再生データ（第１再生データ）に貼り付ける処理が実行される。これにより、現在再生中の動画像データ（再生データ、第１再生データ）に対して、作業者が発話した音声に対応する効果音の音データが、その発話タイミング（音声認識タイミング）と略同期して付加される（関連付けられる）こととなる。このとき、編集画面２００のタイムライン表示領域２０３と音編集表示領域２０４には、それぞれ今回の音データが付加されることとなる再生位置に、マークＭＫと、音アイコンＩＣが表示される（図３を参照）。また、本実施例では、S303にて音データと動画像データとを合成する際、その音データを再生部２１が再生して、当該音データに基づく効果音が音出力部１０９から出力される。 Returning to FIG. 6, when the CPU 100 finishes the search process (S108), the sound data acquired (generated) in S204 of the search process is converted into moving image data (playback data, first playback data) currently being played back. A sound data adding process (S109) for adding is executed. This sound data addition process (S109) is executed according to the flowchart shown in FIG. That is, in the sound data addition process (S109), first, the instruction unit 33 reads the sound data acquired (generated) in the previous S204, and the sound data and the moving image data being reproduced, to the reproduction processing apparatus 20. (S301) is output (S301). When the reproduction processing device 20 receives the instruction signal from the instruction unit 33 (information processing device 30), the reading unit 23 reads the sound data acquired (generated) in the previous S204 (S302), and the read sound data and Then, the editing unit 22 synthesizes the moving image data (reproduction data, first reproduction data) currently being reproduced (S303). In S303, a process of pasting the sound data read in S302 in association with the current reproduction position information (reproduction time information) in the moving image data (reproduction data) currently reproduced is pasted to the reproduction data (first reproduction data). Executed. Thereby, the sound data of the sound effect corresponding to the voice uttered by the operator is substantially synchronized with the utterance timing (voice recognition timing) with respect to the moving image data (reproduction data, first reproduction data) currently being reproduced. Will be added (associated). At this time, in the timeline display area 203 and the sound edit display area 204 of the edit screen 200, a mark MK and a sound icon IC are displayed at the reproduction positions where the current sound data is to be added (FIG. 3). Further, in this embodiment, when the sound data and the moving image data are synthesized in S303, the sound data is reproduced by the reproduction unit 21, and a sound effect based on the sound data is output from the sound output unit 109. .

図６に戻り、ＣＰＵ１００は音データ付加処理（S109）を終えると、本編集処理の終了指示の入力が行われたか否かを判定する（S110）。終了指示の入力は、例えば、操作入力部１０７としてのマウス等を操作して編集画面２００の右上の終了アイコン（×印）をクリックすることにより行われる。S110にて終了指示の入力があると判定した場合（S110でYES）、本編集処理を終了し、これにより本情報処理システム１０の動作が終了する。一方、終了指示の入力がないと判定した場合（S110でNO）、S100に戻り、上述したS100〜S110の処理が繰り返される。この間、一の編集対象の動画像データの再生が終了するまでは、音データの付加を続けて行うことが可能である。したがって、一の動画像データに対し１又は複数の音データ（効果音）を付加することが可能である。なお、S303により合成した動画像データと音データに関する情報、すなわち、再生中の動画像データの種類、当該動画像データと合成する音データの種類、合成位置（再生位置）等の情報は、編集処理の実行中（編集作業中）、ＲＡＭ１０２等の所定の記憶領域に記憶される。また、編集対象の動画像データについての編集作業を終える際、その作業の結果を確定させる指示入力を行うことで、編集済の動画像データ（音データが付加された動画像データ、第２再生データ）が補助記憶部１０５の所定の記憶領域（フォルダ等）に記憶される。 Returning to FIG. 6, when the sound data adding process (S109) is completed, the CPU 100 determines whether or not an instruction to end the editing process has been input (S110). The input of the end instruction is performed by, for example, operating the mouse or the like as the operation input unit 107 and clicking the end icon (x mark) on the upper right of the editing screen 200. If it is determined in S110 that an end instruction has been input (YES in S110), the editing process is terminated, and the operation of the information processing system 10 is terminated. On the other hand, when it is determined that the end instruction is not input (NO in S110), the process returns to S100, and the processes of S100 to S110 described above are repeated. During this time, it is possible to continue adding sound data until the reproduction of the moving image data to be edited is completed. Therefore, it is possible to add one or a plurality of sound data (sound effects) to one moving image data. Information relating to the moving image data and sound data synthesized in S303, that is, information such as the type of moving image data being reproduced, the type of sound data to be synthesized with the moving image data, and the composition position (reproduction position) is edited. While processing is being performed (during editing), the data is stored in a predetermined storage area such as the RAM 102. Further, when the editing work on the moving image data to be edited is finished, an instruction input for confirming the result of the work is performed, so that the edited moving image data (moving image data to which sound data is added, second reproduction is performed). Data) is stored in a predetermined storage area (folder or the like) of the auxiliary storage unit 105.

以上の編集処理が、本実施例の情報処理システム１０の主要な動作処理であるが、この他にも、ＣＰＵ１００は、本システム１０による編集作業に係る処理を実行可能に構成されている。例えば、本システム１０を起動して編集作業を行うなか、編集対象の動画像データに１又は複数の音データを付加した状況、すなわち、図３に示すように音編集表示領域２０４に音アイコンＩＣ（図３では音アイコンＩＣ１〜ＩＣ３の３つ）が表示されている状況で、再生表示領域２０２に設けられた「プレビュー」ボタンが押下されると、編集部２２が、そのとき再生表示領域２０２に表示されている動画像に係る画像データ（編集対象の動画像データ）と、音編集表示領域２０４に表示されている音アイコンＩＣに係る音データ（付加した効果音の音データ）とを合成して編集後の再生データ（第２再生データ）を作成し、当該再生データを再生部２１が再生する。すると、当該再生データに基づく動画像が再生表示領域２０２に再生表示されるとともに、当該再生データに基づく効果音が音出力部１０９（スピーカ）から出力される。これにより、編集作業を行う作業者は、効果音が付加された動画像を最初から再生して、編集内容を確認することができる。 The editing process described above is the main operation process of the information processing system 10 of the present embodiment. In addition to this, the CPU 100 is configured to be able to execute a process related to editing work by the system 10. For example, while the system 10 is activated and editing is performed, one or more sound data is added to the moving image data to be edited, that is, the sound icon IC is displayed in the sound editing display area 204 as shown in FIG. When the “preview” button provided in the reproduction display area 202 is pressed in a state where the sound icons IC1 to IC3 in FIG. 3 are displayed, the editing unit 22 then displays the reproduction display area 202 at that time. The image data (moving image data to be edited) related to the moving image displayed on the screen and the sound data (sound data of the added sound effect) related to the sound icon IC displayed in the sound editing display area 204 are combined. Then, the edited reproduction data (second reproduction data) is created, and the reproduction unit 21 reproduces the reproduction data. Then, a moving image based on the reproduction data is reproduced and displayed in the reproduction display area 202, and a sound effect based on the reproduction data is output from the sound output unit 109 (speaker). Thus, the operator who performs the editing work can reproduce the moving image with the sound effect added from the beginning and check the editing content.

また、例えば、本システム１０を起動して編集作業を行うなか、音編集表示領域２０４に表示されている音アイコンＩＣがマウス操作により左右方向（タイムラインの時間軸方向）に移動（ドラッグ）されると、編集部２２が、当該音アイコンＩＣに対応する音データの再生位置情報（再生時間情報）を、当該音データのタイムライン上（時間軸上）での移動に合わせて変更し、編集対象の動画像データとの関連付けを更新する。これにより、動画像データに付加した音データ（効果音）の再生位置が変更され、音データ（効果音）の再生位置の事後的な調整が可能となる。 Further, for example, while the system 10 is activated and editing is performed, the sound icon IC displayed in the sound editing display area 204 is moved (dragged) in the left-right direction (time axis direction of the timeline) by a mouse operation. Then, the editing unit 22 changes and edits the reproduction position information (reproduction time information) of the sound data corresponding to the sound icon IC according to the movement of the sound data on the timeline (on the time axis). Update the association with the target moving image data. Thereby, the playback position of the sound data (sound effect) added to the moving image data is changed, and the playback position of the sound data (sound effect) can be adjusted afterwards.

なお、「プレビュー」ボタンの押下に基づく再生データの作成・再生処理や、音アイコンＩＣの移動による音データの再生位置（再生タイミング）変更処理の実行に際しては、音声認識モードの設定が解除され、音声認識部３１（音声認識機能）が働かないように構成されている。 Note that the setting of the voice recognition mode is canceled when executing reproduction data creation / reproduction processing based on pressing of the “preview” button or sound data reproduction position (reproduction timing) change processing by moving the sound icon IC, The voice recognition unit 31 (voice recognition function) is configured not to work.

［編集作業の例］
次に、本実施例に係る情報処理システム１０を利用した画像制作に係る作業（編集作業）の一例を説明する。ここでは、図９に示すように、本システム１０として機能するノート型のパーソナルコンピュータ（ノートＰＣ）に接続された大型の液晶モニタに編集画面２００（図３を参照）を表示するものとし、編集作業に関わる者として５人の作業者（作業者Ａ〜Ｅ）が居るものとして説明する。また、音声入力用のマイク（音声入力部１０８）として、１個の全指向性（無指向性）のマイクを使用するものとする。なお、音声入力部１０８としてのマイクは、単一指向性のマイクとすることも可能であり、この場合、作業者の人数分のマイクを用意したり、１個のマイクを作業者全員で使用したりする等の対応が可能である。また、音声入力部１０８としてのマイクは有線、無線を問わない。さらに、ノートＰＣに有線または無線で接続されたスピーカ（音出力部１０９）から各種音声が出力されるものとする。 [Example of editing work]
Next, an example of work (editing work) related to image production using the information processing system 10 according to the present embodiment will be described. Here, as shown in FIG. 9, the editing screen 200 (see FIG. 3) is displayed on a large-sized liquid crystal monitor connected to a notebook personal computer (notebook PC) functioning as the system 10. A description will be given assuming that there are five workers (workers A to E) who are involved in the work. Further, it is assumed that one omnidirectional (omnidirectional) microphone is used as a voice input microphone (voice input unit 108). The microphone as the voice input unit 108 can be a unidirectional microphone. In this case, microphones for the number of workers are prepared, or one microphone is used by all workers. Or the like. Further, the microphone as the voice input unit 108 may be wired or wireless. Furthermore, it is assumed that various sounds are output from a speaker (sound output unit 109) connected to the notebook PC by wire or wirelessly.

作業を開始するにあたっては、まず、本システム１０を起動して、モニタ（表示部）に編集画面２００を表示させる。そして、編集画面２００上の動画像タイトル表示領域２０１にて編集対象の動画像データ（再生データ）を選択して、当該データに基づく動画像を再生表示領域２０２にて再生表示させる。この再生表示が進行して行く中、各作業者はその再生表示中の動画像を見ながら、効果音を付加したいと思うタイミングで、その効果音の擬声語（認識ワード）を発話する。この発話した音声（認識ワード）がマイク（音声入力部１０８）を介して情報処理装置３０に入力され、音声認識部３１により認識されると、その認識した音声に対応する効果音の音データが生成されて編集対象の動画像データ（再生データ、第１再生データ）に付加される。このとき、編集画面２００の音編集表示領域２０４には、付加された効果音（音データ）に関する情報を示す音アイコンＩＣが表示され、また、スピーカ（音出力部１０９）から当該効果音が出力される In starting the work, first, the present system 10 is activated and the editing screen 200 is displayed on the monitor (display unit). Then, moving image data (reproduction data) to be edited is selected in the moving image title display area 201 on the editing screen 200, and a moving image based on the data is reproduced and displayed in the reproduction display area 202. While the reproduction display is progressing, each worker utters an onomatopoeia (recognition word) of the sound effect at a timing when he / she wants to add the sound effect while watching the moving image being reproduced and displayed. When the spoken voice (recognition word) is input to the information processing apparatus 30 via the microphone (voice input unit 108) and recognized by the voice recognition unit 31, sound data of a sound effect corresponding to the recognized voice is obtained. It is generated and added to the moving image data (reproduction data, first reproduction data) to be edited. At this time, a sound icon IC indicating information regarding the added sound effect (sound data) is displayed in the sound edit display area 204 of the edit screen 200, and the sound effect is output from the speaker (sound output unit 109). Be done

本システム１０では、こうした発話による効果音（音データ）の付加を、５人の作業者Ａ〜Ｅの各人が自由に行うことが可能であり、各人の音声認識による音データ（効果音）の付加が可能となっている。そして、効果音を付加した動画像の内容（編集後の再生データ）を確認したい場合には、再生表示領域２０２の「プレビュー」ボタン（図３を参照）を押下（クリック）することで、効果音付きの動画像（音データと動画像データとを合成した再生データ、第２再生データ）が再生され、これをすぐに確認することができる。 In the present system 10, sound effects (sound data) due to such utterances can be freely added by each of the five workers A to E, and sound data (sound effects) based on the voice recognition of each person. ) Can be added. When it is desired to confirm the content of the moving image to which the sound effect is added (reproduced data after editing), the effect is obtained by pressing (clicking) the “preview” button (see FIG. 3) in the reproduction display area 202. A moving image with sound (reproduction data obtained by synthesizing sound data and moving image data, second reproduction data) is reproduced and can be immediately confirmed.

このように、音声認識による効果音（音データ）の付加を数名（ここでは５人）で行いながら作業を進めることで、編集作業を効率的に行うことが可能となる。具体的に、例えば、音声デザインの担当者が作成した効果音付きの動画像のサンプルデータを、別の作業担当者や作業リーダー、顧客等のもとへ持参し、そのサンプルデータに基づく効果音付きの動画像を確認する場合、本システム１０の動作環境さえ整っていれば、その場で効果音の修正や調整、変更等に対応することが可能となる。しかも、音声認識による効果音の付加にあたっての認識ワードを擬声語とし、これに対応する効果音を生成して付加することが可能に構成されているので、その場に居る者の個々の持つイメージにできる限り近似した効果音を付加することが可能となる。したがって、サンプルデータに基づく効果音付きの動画像を確認した結果、効果音の修正等が必要になったとしても、その場で、別の作業担当者等の意向を踏まえた形のデータを直ちに作成することができるので、サウンド担当者がサンプルデータを持ち帰って修正等を行う必要がなくなる。これにより、編集作業の効率が格段に向上することとなる。 As described above, the editing operation can be efficiently performed by proceeding the operation while adding the sound effect (sound data) by the voice recognition by several persons (here, five persons). Specifically, for example, bring sample data of a moving image with sound effects created by a person in charge of voice design to another worker, work leader, customer, etc., and sound effects based on the sample data When the attached moving image is confirmed, if the operating environment of the system 10 is in place, it is possible to correct, adjust, and change the sound effect on the spot. Moreover, the recognition word when adding sound effects by voice recognition is made to be onomatopoeia, and it is possible to generate and add sound effects corresponding to this, so that the image of each person on the spot has an image It is possible to add sound effects that are as close as possible. Therefore, as a result of confirming the moving image with sound effects based on the sample data, even if it is necessary to correct the sound effects, the data in the form based on the intention of another person in charge is immediately Since it can be created, it is not necessary for the sound person to take home the sample data and make corrections. Thereby, the efficiency of the editing work is greatly improved.

以上に説明した本実施例の情報処理システム１０では、動画像データを含む再生データ（第１再生データ）に対して効果音の音データを付加する（関連付ける）場合、その再生データの再生中（動画像の再生表示中）に作業者が発話した音声（認識ワード）を認識し、その認識した音声に対応する音データ（効果音）を付加する（関連付ける）ことが可能となっている。このように、動画像に対する効果音の付加（関連付け）を、動画像の再生表示中の音声認識により行えるように構成することで、再生表示される動画像を確認しながら発話するだけで効果音を任意のタイミングで付加することができるので、画像制作に係る作業の効率化を図ることが可能となる。 In the information processing system 10 according to the present embodiment described above, when sound data of sound effects is added (associated) to reproduction data (first reproduction data) including moving image data, the reproduction data is being reproduced ( It is possible to recognize a voice (recognition word) uttered by an operator during playback and display of a moving image and add (associate) sound data (sound effect) corresponding to the recognized voice. In this way, by adding sound effects to a moving image so as to be able to be performed by voice recognition during the reproduction and display of the moving image, the sound effect can be obtained simply by speaking while confirming the reproduced and displayed moving image. Can be added at an arbitrary timing, so that it is possible to improve the efficiency of work relating to image production.

特に、本実施例では、認識する音声（認識ワード）を擬声語とし、認識ワードに対応する効果音の音データを効果音データベースにより管理している。そして、作業者が発話した擬声語（認識ワード）に基づいて音データ（効果音）を動画像データ（再生データ、第１再生データ）に付加するように構成されているため、作業者の持つイメージにより感覚的に編集作業を進めることが可能となる。また、前述した編集作業の例のように、複数人が同じ編集対象の動画像データ（再生データ、第１再生データ）に対して音データ（効果音）を付加する作業に関与することが可能となる。これにより、画像制作に係る作業効率の向上を図ることが可能となる。 In particular, in this embodiment, the recognized speech (recognition word) is an onomatopoeia, and sound data of sound effects corresponding to the recognition word is managed by the sound effect database. And since it is comprised so that sound data (sound effect) may be added to moving image data (reproduction data, 1st reproduction data) based on the onomatopoeia (recognition word) which the operator uttered, the image which a worker has This makes it possible to proceed editing work sensuously. Further, as in the example of the editing work described above, a plurality of people can be involved in the work of adding sound data (sound effect) to the same editing target moving image data (playback data, first playback data). It becomes. As a result, it is possible to improve work efficiency related to image production.

なお、本発明は前述した実施例と異なる構成（以下「変形例」ともいう。）を採ることも可能である。以下、変形例について説明する。 It should be noted that the present invention can adopt a configuration (hereinafter also referred to as “modification”) different from the above-described embodiments. Hereinafter, modified examples will be described.

［変形例１］
前述した実施例の情報処理システム１０では、音声入力部１０８を介して音声認識部３１により認識可能な音声について特に制限を設けていなかったが、これについて制限を設けることも可能である。例えば、前述した実施例に対して音声認証機能を追加し、事前にシステム利用者として登録した者の音声だけを音声認識部３１が認識するように構成してもよい。音声認証機能としては公知のものを利用することが可能である。本変形例１に係るシステム構成としては、例えば、本システム１０により編集作業を行う者の音声に関する情報（音声情報）を登録しておく登録部と、音声入力部１０８を介して入力された音声が登録部に登録された者の音声であるか否かを識別する識別部を情報処理装置３０に設け、登録部に登録された情報に対応する者の音声のみを音声認識部３１が認識する構成を例示できる。このような構成では、まず、本システム１０を利用する者が、事前に音声入力部１０８を介して自己の音声を登録しておく。この登録は、入力された音声を登録部が分析して当該音声の特徴データ（周波数等）を抽出し、これを特定の個人の音声モデル（音声情報）として記録することにより行われる。そして、本システム１０を利用して実際に編集作業を行う場合には、作業者が、本システム１０の利用開始に際して自己の所定の音声を音声入力部１０８により入力する。この入力した音声と、登録部に登録（記録）されている音声モデル（音声情報）との比較・照合が識別部により行われ、両者が一致すれば、その音声を入力した作業者は、以後、前述した音声認識による音データの付加を行うことが可能となる。このような変形例１によれば、事前に登録した者のみが、音声認識による音データの付加を行うことができるので、編集作業を行わない者の音声を認識して音データが付加されることがない。また、一度にシステムを利用する作業者の人数を制限することも可能となるので、音声認識（音声認識機能）が適切に機能する環境を維持しやすくなる。 [Modification 1]
In the information processing system 10 of the above-described embodiment, there is no particular limitation on the voice that can be recognized by the voice recognition unit 31 via the voice input unit 108. However, a limitation can be provided on this. For example, a voice authentication function may be added to the above-described embodiment, and the voice recognition unit 31 may recognize only the voice of a person who has been registered as a system user in advance. As the voice authentication function, a known function can be used. As a system configuration according to the first modification, for example, a registration unit that registers information (speech information) related to a voice of a person who performs editing work using the system 10, and a voice that is input via the voice input unit 108. The information processing device 30 is provided with an identification unit for identifying whether or not the voice of the person registered in the registration unit is present, and the voice recognition unit 31 recognizes only the voice of the person corresponding to the information registered in the registration unit. The configuration can be exemplified. In such a configuration, first, a person using the system 10 registers his / her voice via the voice input unit 108 in advance. This registration is performed by the input unit analyzing the input voice, extracting characteristic data (frequency, etc.) of the voice and recording it as a voice model (voice information) of a specific individual. When the editing work is actually performed using the system 10, the worker inputs his / her predetermined voice through the voice input unit 108 when the use of the system 10 is started. The identification unit compares and compares the input voice with the voice model (voice information) registered (recorded) in the registration unit. If the two match, the worker who has input the voice It becomes possible to add sound data by the above-described speech recognition. According to the first modified example, only a person who has registered in advance can add sound data by voice recognition. Therefore, sound data is added by recognizing the voice of a person who does not perform editing work. There is nothing. In addition, since it is possible to limit the number of workers who use the system at one time, it is easy to maintain an environment in which voice recognition (voice recognition function) functions properly.

［変形例２］
前述した実施例の情報処理システム１０では、音声入力部１０８を介して入力された音声を音声認識部が認識し、この認識した音声（認識ワード）に対応する音データを動画像データに付加するように構成していた。これに対し、音声入力以外の入力に基づいて音データを動画像データに付加するように構成してもよく、例えば、キーボードやマウス等のパーソナルコンピュータ（ＰＣ）が備える操作入力部１０７や、効果音データベースに登録された認識ワードを入力するための専用のコントローラ、入力画面等を用いた作業者による入力（操作入力）に基づいて、動画像データに付加する音データの選択（検索）や音データの付加を行うように構成してもよい。この場合、操作入力部１０７やコントローラ、入力画面等からの入力を認識可能な入力認識部を情報処理装置３０に設け、入力認識部が認識した入力に対応する音データ（効果音）を取得して動画像データに付加するように構成することが可能である。また、操作入力部１０７やコントローラ等による入力に基づいて電子音等の音データを生成可能な音生成部を情報処理装置３０に設け、音生成部が生成した音データを動画像データに付加するように構成してもよい。これらの構成においても、前述した実施例のように、編集画面２００の再生表示領域２０２に再生表示される動画像やタイムライン表示領域２０３に表示されるタイムラインカーソルＴＣ等を確認しながら、任意のタイミングで操作入力部１０７やコントローラ等を操作することで、任意の効果音（音データ）を付加することが可能である。このような変形例２によれば、編集作業を行わない者の音声やその他周囲の雑音等の認識による音データの付加がなされないので、音声認識の場合に懸念されるノイズを考慮する必要がなくなる。 [Modification 2]
In the information processing system 10 of the embodiment described above, the voice recognition unit recognizes the voice input via the voice input unit 108, and adds sound data corresponding to the recognized voice (recognition word) to the moving image data. It was configured as follows. On the other hand, the sound data may be added to the moving image data based on an input other than the voice input. For example, the operation input unit 107 provided in a personal computer (PC) such as a keyboard and a mouse, and the effect Selection (search) of sound data to be added to moving image data and sound based on input (operation input) by an operator using a dedicated controller, input screen, etc. for inputting a recognition word registered in the sound database You may comprise so that addition of data may be performed. In this case, an input recognition unit capable of recognizing input from the operation input unit 107, the controller, the input screen, or the like is provided in the information processing apparatus 30, and sound data (sound effect) corresponding to the input recognized by the input recognition unit is acquired. It can be configured to be added to moving image data. In addition, a sound generation unit capable of generating sound data such as an electronic sound based on an input from the operation input unit 107 or a controller is provided in the information processing apparatus 30, and the sound data generated by the sound generation unit is added to the moving image data. You may comprise as follows. In these configurations as well, as in the above-described embodiment, any video can be reproduced while being displayed in the reproduction display area 202 of the editing screen 200, the timeline cursor TC displayed in the timeline display area 203, etc. Any sound effect (sound data) can be added by operating the operation input unit 107, the controller, or the like at this timing. According to the second modified example, since the sound data is not added by the recognition of the voice of the person who does not perform the editing work and other surrounding noises, it is necessary to consider the noise concerned in the case of the voice recognition. Disappear.

［変形例３］
前述した実施例の情報処理システム１０では、図５に示すようなテーブル構造を有する効果音データベースにより音データ（効果音）を管理しており、音声認識部により認識された音声（認識ワード）に対応する音データを取得して、動画像データに付加するように構成していた。そして、音声認識部により認識された音声が認識ワードとして効果音データベースに存在しない場合（認識ワード未対応の場合）には音データが生成されず、動画像データに音データが付加されないものとなっていた。これに対し、音声認識部により認識された音声に対応する認識ワードが存在しない場合、認識された音声に近似する一の音データを、補助記憶部１０５に記憶されている１又は２以上の音データを合成して生成し、これを動画像データに付加する（関連付ける）ように構成してもよい。例えば、図５に示すように「かーん」の認識ワードに対応する効果音の音データＣ１は存在するものの、音声認識部により認識された音声が「かーんかーん」であり、これに対応する認識ワード及び音データが存在しない場合、「かーん」の音データＣ１を２つ合成して（組み合わせて）一の音データ（「かーんかーん」）を生成することが可能である。また、実施例のような効果音データベースに対応する効果音の音データ（効果音データ）に加え、直音、拗音、清音、濁音、半濁音、鼻濁音等の単音に該当する音データ（単音データ）を補助記憶部１０５に記憶しておき、この単音データと効果音データを合成して一の音データを生成したり（例えば「ぱ」＋「かーん」＝「ぱかーん」）、単音データ同士を合成して一の音データを生成したりすること（例えば「きゅ」＋「い」＋「ん」＝「きゅいん」）も可能である。このような変形例３によれば、音声認識部により認識された音声によっては音データが生成されず動画像に効果音が付加されないといったことを極力排除することが可能となる。また、音声認識により付加することが可能な効果音の多様化を図ることが可能となる。 [Modification 3]
In the information processing system 10 of the embodiment described above, sound data (sound effects) is managed by a sound effect database having a table structure as shown in FIG. 5, and the sound (recognition word) recognized by the speech recognition unit is used. Corresponding sound data is acquired and added to moving image data. When the voice recognized by the voice recognition unit does not exist as a recognition word in the sound effect database (when the recognition word is not supported), no sound data is generated and no sound data is added to the moving image data. It was. On the other hand, when there is no recognition word corresponding to the speech recognized by the speech recognition unit, one sound data approximating the recognized speech is stored as one or more sounds stored in the auxiliary storage unit 105. The data may be generated by being combined and added to (associated with) the moving image data. For example, as shown in FIG. 5, the sound data C1 of the sound effect corresponding to the recognition word “Kan” exists, but the voice recognized by the voice recognition unit is “Kankan”. If there is no recognition word and sound data to be recognized, two pieces of sound data C1 of “Kan” can be synthesized (combined) to generate one sound data (“Kankan”). In addition to sound effect sound data (sound effect data) corresponding to the sound effect database as in the embodiment, sound data corresponding to a single sound such as direct sound, stuttering sound, clear sound, muddy sound, semi-turbid sound, nasal muddy sound (monophonic data) ) Is stored in the auxiliary storage unit 105, and the single sound data and the sound effect data are synthesized to generate a single sound data (for example, “pa” + “ka” = “pa ka”)) It is also possible to synthesize data and generate one sound data (for example, “Kyu” + “I” + “N” = “Kyuin”). According to Modification 3, it is possible to eliminate as much as possible that sound data is not generated and sound effects are not added to the moving image depending on the speech recognized by the speech recognition unit. Also, it is possible to diversify the sound effects that can be added by voice recognition.

［変形例４］
前述した実施例の情報処理システム１０では、音声認識により音データを再生中の動画像データに対して付加する際、当該音データを再生中の動画像データの現在の再生位置情報（再生時間情報）と関連付けて、発話者の発話タイミング（音声認識タイミング）に合わせて（略同期させて）音データを付加するように構成していた。これに対し、動画像データに対する音データを付加する位置（音の再生タイミング）を、実際の発話タイミング（音声認識タイミング）よりも僅かに早くする（前倒しとする）ことができるように構成してもよい。例えば、再生表示領域２０２に再生表示される編集対象の動画像が作業者にとって初見である場合、作業者の発話タイミングが遅れることが想定される。これに鑑み、前述した実施例のジャンル指定画面２１０と同様にして音設定画面（図示せず）を設け、当該画面を通じて効果音を付加する位置（音の再生タイミング）を若干早めにする設定を事前に行えるように構成する（付加位置設定機能）。このような変形例４によれば、使い勝手の良いシステムとすることが可能となる。 [Modification 4]
In the information processing system 10 of the above-described embodiment, when the sound data is added to the moving image data being reproduced by voice recognition, the current reproduction position information (reproduction time information) of the moving image data being reproduced. ), The sound data is added in accordance with (substantially in synchronization with) the utterance timing (voice recognition timing) of the speaker. On the other hand, the position (sound reproduction timing) to which the sound data is added to the moving image data can be made slightly earlier (advanced) than the actual speech timing (speech recognition timing). Also good. For example, when a moving image to be edited reproduced and displayed in the reproduction display area 202 is first seen by the worker, it is assumed that the utterance timing of the worker is delayed. In view of this, a sound setting screen (not shown) is provided in the same manner as the genre specification screen 210 of the above-described embodiment, and a setting (sound reproduction timing) for adding a sound effect is set slightly earlier through the screen. Configure so that it can be done in advance (additional position setting function). According to Modification 4 as described above, a user-friendly system can be achieved.

［変形例５］
前述した実施例の情報処理システム１０では、検索処理（S108）により取得した音データを動画像データに付加する音データ付加処理（S109）において、音データと再生中の動画像データとを合成し（S303）、この合成した音データをすぐに再生部２１が再生することで、当該音データに基づく効果音が動画像への付加とともに音出力部１０９により出力されるように構成していた。これに対し、音データ付加処理（S109）では、音データと動画像データとを合成せずに、再生中の動画像データに対して関連付ける音データの再生位置情報（再生時間情報）を記憶するに止めておき、音データの再生も行わないように構成してもよい。つまり、音データ付加処理（S109）では、音データそのものの付加を行わず、音データに関する情報（音データの種類、タイトル等）を、現在再生中の動画像データの現在の再生位置情報（再生時間情報）と関連付けて記憶するように構成する。この場合、検索処理（S108）による音データの検索（S203）は行うが、その検索結果に基づく音データの取得（S204）については音データに関する情報（音データの種類、タイトル等）を取得することとし、この情報を音データ付加処理（S109）の中で読み込んで、動画像データの現在の再生位置情報（再生時間情報）と関連付けて記憶するように構成する（S302）。そして、その後に、例えば再生表示領域２０２の「プレビュー」ボタンが押下される等、音データと編集対象の動画像データとを合成したものの再生を指示する旨の入力に基づいて、編集部２２が、S302にて記憶した情報に対応する音データを補助記憶部１０５から読み込み、当該音データと編集対象の動画像データとを合成して、合成済の再生データを再生するように構成する。このような変形例５によれば、編集を施している動画像データの再生中におけるＣＰＵの処理負担が軽減されるので、編集作業をより円滑にすることが可能となる。なお、このような変形例５においても、編集作業中の画面表示に関しては、前述した実施例と同様、図３に示すような編集画面２００を表示することが可能である。こうすれば、音データ付加処理（S109）において、内部的には、音データと動画像データとを合成せず、動画像データ（再生データ）に対して関連付ける音データの再生位置情報（再生時間情報）を記憶するに止めたとしても、作業者にとっては、見かけ上、動画像データ（再生データ）に対する音データの付加状況が分かりやすいものとなる。 [Modification 5]
In the information processing system 10 of the above-described embodiment, the sound data and the moving image data being reproduced are synthesized in the sound data addition processing (S109) for adding the sound data acquired by the search processing (S108) to the moving image data. (S303) The reproduction unit 21 immediately reproduces the synthesized sound data, so that the sound effect based on the sound data is output by the sound output unit 109 along with the addition to the moving image. On the other hand, in the sound data addition process (S109), the reproduction position information (reproduction time information) of the sound data associated with the moving image data being reproduced is stored without synthesizing the sound data and the moving image data. The sound data may not be reproduced. That is, in the sound data addition process (S109), the sound data itself is not added, and information related to the sound data (sound data type, title, etc.) is displayed as the current reproduction position information (reproduction of the moving image data being reproduced). The time information is stored in association with the time information. In this case, although the sound data search (S203) by the search process (S108) is performed, the sound data acquisition (S204) based on the search result acquires information about the sound data (sound data type, title, etc.). This information is read in the sound data addition process (S109) and stored in association with the current reproduction position information (reproduction time information) of the moving image data (S302). Then, based on an input for instructing reproduction of the synthesized sound data and moving image data to be edited, for example, when the “Preview” button in the reproduction display area 202 is pressed, the editing unit 22 The sound data corresponding to the information stored in S302 is read from the auxiliary storage unit 105, the sound data and the moving image data to be edited are combined, and the combined reproduction data is reproduced. According to the fifth modification, the processing load on the CPU during reproduction of the moving image data being edited is reduced, so that the editing operation can be made smoother. Note that in the fifth modification as well, regarding the screen display during editing work, the editing screen 200 as shown in FIG. 3 can be displayed as in the above-described embodiment. In this way, in the sound data addition process (S109), internally, the sound data and the moving image data are not synthesized, but the reproduction position information (reproduction time) of the sound data associated with the moving image data (reproduction data) is used. Even if it is stopped to store (information), it will be easy for the operator to easily understand the state of addition of sound data to moving image data (reproduction data).

［変形例６］
前述した実施例の情報処理システム１０では、音声認識部により認識する音声（認識ワード）を擬声語とし、その擬声語に基づいて音データ（効果音）を動画像データに付加するように構成しており、その音声認識の対象は日本語を前提としていた（図５を参照）。これに対し、音声認識の対象とする言語に関し、日本語以外の言語も対象とすることができるように構成してよい。例えば「犬の鳴き声」を表す擬声語に関し、日本語では「わんわん」、英語では「Ｂｏｗｗｏｗ」といったように、同じ内容の擬声語であっても言語によって表現（認識ワード）が異なるものが多く存在する。このことに対応して、効果音データベースにおける認識ワードを、日本語だけでなく英語等の他の言語についても予め設定し、言語に応じた検索が可能となるように構成する。そして、前述した実施例のジャンル指定画面２１０と同様にして言語指定画面（図示せず）を設け、当該画面を通じて何れの言語で音声認識を行うのかを事前に指定できるように構成する（言語指定機能）。このような変形例６によれば、日本語だけでなく他国の言語にも対応し得るので、利便性の高いシステムとすることが可能となる。 [Modification 6]
In the information processing system 10 of the above-described embodiment, the voice (recognition word) recognized by the voice recognition unit is used as an onomatopoeia, and sound data (sound effect) is added to the moving image data based on the onomatopoeia. The target of speech recognition was premised on Japanese (see FIG. 5). On the other hand, with respect to the language targeted for speech recognition, a language other than Japanese may be targeted. For example, regarding an onomatopoeia representing “dog cry”, there are many onomatopoeia having the same content, such as “Wanwan” in Japanese and “Bowwow” in English, but having different expressions (recognition words) depending on the language. Corresponding to this, the recognition word in the sound effect database is set not only for Japanese but also for other languages such as English, so that a search according to the language is possible. A language designation screen (not shown) is provided in the same manner as the genre designation screen 210 of the above-described embodiment, and is configured so that it is possible to designate in advance in which language voice recognition is to be performed through the screen (language designation). function). According to the sixth modified example, it is possible to deal with not only Japanese but also languages of other countries, so that a highly convenient system can be achieved.

［変形例７］
前述した実施例の情報処理システム１０では、当該システムを構成するパーソナルコンピュータ（ＰＣ）が備える補助記憶部１０５に効果音データベース及び音データを記憶し、この効果音データベースにアクセスして検索処理（S108）を行うように構成していた。これに対して、情報処理システム１０を構成するＰＣがインターネット等のネットワークを介して接続可能なサーバーに、効果音データベース及び音データを記憶する記憶部を設け、サーバー側で検索処理を行うように構成してよい。この場合、サーバー側での検索処理により抽出された音データを、ＰＣ側がネットワークを介して取得し、この取得した音データと動画像データとを合成するように構成する。このような変形例７によれば、ネットワーク環境さえ整っていれば、様々な場所で本システムを利用した編集作業が可能となる。また、効果音データベースや音データの更新やメンテナンス等の作業を、システム利用者に委ねることなく、本システムの提供者（販売者、製造者、管理者等）が行えるようになる。したがって、より使い勝手の良いシステムとすることが可能となる。 [Modification 7]
In the information processing system 10 of the above-described embodiment, the sound effect database and sound data are stored in the auxiliary storage unit 105 provided in the personal computer (PC) constituting the system, and the sound effect database is accessed to perform a search process (S108). ). On the other hand, a storage unit that stores a sound effect database and sound data is provided in a server that can be connected to a PC that constitutes the information processing system 10 via a network such as the Internet, and search processing is performed on the server side. May be configured. In this case, the sound data extracted by the search process on the server side is acquired by the PC side via the network, and the acquired sound data and moving image data are synthesized. According to the seventh modified example, editing work using the present system can be performed in various places as long as the network environment is in place. In addition, the provider (seller, manufacturer, administrator, etc.) of this system can perform the operations such as updating and maintenance of the sound effect database and sound data without entrusting the system user. Therefore, it is possible to make the system more convenient to use.

以上、本発明の実施形態として実施例および変形例を説明したが、本発明はこれらに限定されるものではなく、各請求項に記載した範囲を逸脱しない限り、各請求項の記載文言に限定されず、当業者がそれらから容易に置き換えられる範囲にも及び、かつ、当業者が通常有する知識に基づく改良を適宜付加することが可能である。 As mentioned above, although an example and a modification were explained as an embodiment of the present invention, the present invention is not limited to these, and unless it deviates from a range indicated in each claim, it limits to a statement word of each claim. However, it is possible to appropriately add improvements based on the knowledge that a person skilled in the art normally has, and to the extent that those skilled in the art can easily replace them.

例えば、前述した実施例等では、音声認識により音データ（効果音）を動画像データに関連付ける構成としていたが、例えば、各種エフェクト画像の画像データを動画像データに関連付けるようにしてもよい。こうすれば、効果音及びエフェクト画像の何れか一方または両方と動画像データ（再生データ）との関連付けに係る作業を、音声認識より行うことが可能となる。 For example, in the above-described embodiments, sound data (sound effects) is associated with moving image data by voice recognition. However, for example, image data of various effect images may be associated with moving image data. In this way, it is possible to perform work related to associating one or both of sound effects and effect images with moving image data (reproduction data) by voice recognition.

また、前述した実施例等では、再生処理装置２０及び情報処理装置３０により構成される情報処理システム１０のソフトウェア（プログラム）をパーソナルコンピュータ（ＰＣ）することで、当該ＰＣを情報処理システム１０として機能させるものとしていたが、例えば、市販の再生処理装置のソフトウェア（プログラム）をインストール済のＰＣに対して、情報処理装置のソフトウェア（プログラム）をインストールした場合にも、前述した実施例等と同様の情報処理システムを構築することができるように構成してもよい。この場合、情報処理装置を、ＰＣが既に備えている再生処理装置と接続可能かつ通信可能に構成し、再生処理装置と情報処理装置をそれぞれ起動することで、両装置が接続されるように構成する。また、両装置を起動することで、ＰＣの表示部には、再生処理装置に対応する画面（以下「第１画面」ともいう。）と、情報処理装置に対応する画面（以下「第２画面」ともいう。）とが、それぞれ独立して（別々のウィンドウで）表示されるように構成する。ここで、第１画面は、例えば、前述した実施例等の編集画面２００（図３を参照）のうち音声認識表示領域２０５を除いた他の領域２０１〜２０４からなるものとすることができ、第２画面は、例えば、同編集画面２００のうち音声認識表示領域２０５からなるものとすることができる。 In the above-described embodiment, the personal computer (PC) is used as the software (program) of the information processing system 10 constituted by the reproduction processing device 20 and the information processing device 30 so that the PC functions as the information processing system 10. For example, when the software (program) of the information processing apparatus is installed on the PC on which the software (program) of the commercially available reproduction processing apparatus is already installed, the same as in the above-described embodiment etc. You may comprise so that an information processing system can be constructed | assembled. In this case, the information processing apparatus is configured to be connectable and communicable with the reproduction processing apparatus already provided in the PC, and configured so that both apparatuses are connected by starting the reproduction processing apparatus and the information processing apparatus. To do. In addition, by activating both devices, the display unit of the PC has a screen corresponding to the playback processing device (hereinafter also referred to as “first screen”) and a screen corresponding to the information processing device (hereinafter referred to as “second screen”). Are also displayed independently (in separate windows). Here, for example, the first screen can be composed of other areas 201 to 204 excluding the voice recognition display area 205 in the editing screen 200 (see FIG. 3) of the above-described embodiment, For example, the second screen can be composed of the voice recognition display area 205 in the editing screen 200.

このように、再生処理装置のソフトウェアと情報処理装置のソフトウェアとを別々にインストールする構成においても、前述した実施例等と同様にして、再生処理装置により再生する動画像データ（再生データ）に対して、情報処理装置により生成した音データを付加する（関連付ける）ことが可能である。また、このような構成によれば、情報処理システム全体（再生処理装置＋情報処理装置）のソフトウェアの他、情報処理装置用のソフトウェアだけを単独で提供することも可能なので、既に再生処理装置のソフトウェアをインストールしたＰＣを所有する利用者（ユーザー）にとっては、実施例等で説明したシステムによる画像制作に係る作業環境を手軽に導入することが可能となり、利用者（ユーザー）にとっての利便性が高まる。 As described above, even in the configuration in which the software of the reproduction processing device and the software of the information processing device are installed separately, the moving image data (reproduction data) reproduced by the reproduction processing device is similar to the above-described embodiment. Thus, the sound data generated by the information processing apparatus can be added (associated). Further, according to such a configuration, since only the software for the information processing apparatus can be provided alone in addition to the software of the entire information processing system (reproduction processing apparatus + information processing apparatus), the reproduction processing apparatus already has For the user (user) who owns the PC on which the software is installed, it is possible to easily introduce the work environment related to image production by the system described in the embodiments and the like, which is convenient for the user (user). Rise.

なお、前述の第２画面には、音声認識の開始（音声認識モードの設定）を指示する「開始」ボタン（図示せず）と、音声認識の終了（音声認識モードの設定解除）を指示する「終了」ボタン（図示せず）とを設けてもよい。これは、再生処理装置と情報処理装置は別々に起動するもの（別々のソフトウェア）であり、再生処理装置における動画像データ（再生データ）の再生開始・終了に係る指示と、情報処理装置における音声認識の開始・終了に係る指示とを別系統にすることも可能だからである。この場合、作業者は、例えば、第１画面にて「再生」のアイコン（図３を参照）をクリックして動画像データの再生開始の指示入力を行った後、第２画面にて「開始」ボタンをクリックして音声認識開始の指示入力を行うことで、前述の実施例等と同様にして編集作業を行うことが可能である。 The second screen described above instructs a “start” button (not shown) for instructing the start of voice recognition (setting of the voice recognition mode) and the end of voice recognition (resetting the voice recognition mode). An “end” button (not shown) may be provided. This is because the playback processing device and the information processing device are started separately (separate software), instructions relating to the start and end of playback of moving image data (playback data) in the playback processing device, and audio in the information processing device This is because it is possible to make a separate system for the instructions related to the start / end of recognition. In this case, for example, the operator clicks the “play” icon (see FIG. 3) on the first screen and inputs an instruction to start playback of moving image data, and then “start” on the second screen. By clicking the “” button and inputting a voice recognition start instruction, the editing operation can be performed in the same manner as in the above-described embodiment.

また、本発明に係る情報処理システム及び情報処理装置は、パチンコ遊技機やスロットマシン等の遊技機で使用される各種遊技演出画像の制作、家庭用ゲームや携帯ゲーム、ネットゲーム、アーケードゲーム等のゲーム機で使用される各種ゲーム画像の制作、ＷＥＢサイト上の宣伝広告等で使用される各種ＷＥＢサイト画像の制作、テレビや映画等で使用される各種アニメーション画像の制作等、あらゆる分野の画像の制作において利用することが可能である。特に、製品アイテム数が多かったり製品のライフサイクルが短かったりする遊技機分野やゲーム分野等において本発明は有用である。
［その他］
以下、本明細書で開示した実施形態（実施例）に関連する発明を参考発明として開示しておく。
（１）参考発明１の情報処理システムは、
画像情報を含む第１再生データを再生する第１再生手段と、
前記第１再生データの再生中における入力を認識する認識手段と、
前記認識手段により認識された入力に基づいて音情報を生成する生成手段と、
前記生成手段により生成された音情報と前記第１再生データとを関連付ける関連手段と、
を備えることを要旨とする。
これによれば、第１再生データの再生中における入力に基づいて音情報が生成され、当該生成された音情報と第１再生データとの関連付けが行われるので、画像制作に係る作業の効率化を図ることが可能となる。
（２）参考発明２の情報処理システムは、前述の参考発明１の情報処理システムにおいて、
前記音情報と前記第１再生データとを関連付けた第２再生データを再生する第２再生手段を備えることを要旨とする。
これによれば、第２再生データの再生により、音情報を含めた第１再生データ（つまり、生成した音情報と画像情報とを含む再生データ）の確認作業が容易となる。
（３）参考発明３の情報処理システムは、前述の参考発明１または参考発明２の情報処理システムにおいて、
前記関連手段は、前記生成手段による音情報の生成の契機となった入力のタイミングに合わせて、該入力に基づいて生成された音情報と前記第１再生データとを関連付けることを要旨とする。
これによれば、第１再生データの再生中における入力タイミングと、音情報の再生タイミングとを同期させることが可能となり、音情報の再生タイミング（再生位置）の設定（決定）が容易となる。
（４）参考発明４の情報処理システムは、前述の参考発明１から参考発明３の何れか一つの情報処理システムにおいて、
前記第１再生データと関連付けた前記音情報の再生タイミングを調整可能な調整手段を備えることを要旨とする。
これによれば、関連付けた音情報の再生タイミング（再生位置）を調整することが可能となるので、音情報の再生タイミングの最適化を図ることが可能となる。
（５）参考発明５の情報処理システムは、前述の参考発明１から参考発明４の何れか一つの情報処理システムにおいて、
複数の音情報を記憶する記憶手段を備え、
前記生成手段は、前記記憶手段に記憶されている音情報の中から、前記認識手段により認識された入力に対応する音情報を取得して、前記第１再生データと関連付ける音情報を生成することを要旨とする。
これによれば、予め記憶された複数の音情報の中から、入力に応じた音情報が取得されて生成されるので、関連付ける音情報の多様化を図ることが可能となる。
（６）参考発明６の情報処理システムは、前述の参考発明５の情報処理システムにおいて、
前記生成手段は、前記記憶手段に記憶されている音情報の中に、前記認識手段により認識された入力に対応する音情報が複数存在する場合、該複数の音情報のうち何れかを所定条件に基づいて特定して取得することを要旨とする。
これによれば、入力に対応する音情報が複数存在する場合であっても、何れかの音情報が第１再生データと関連付けられることとなるので、関連付ける音情報の選択の効率化を図ることが可能となる。
（７）参考発明７の情報処理システムは、前述の参考発明５または参考発明６の情報処理システムにおいて、
前記記憶手段に記憶されている音情報は分類別に管理されており、
前記分類のうち前記生成手段による生成の対象とする音情報の分類を指定可能な指定手段を備え、
前記生成手段は、前記記憶手段に記憶されている音情報のうち、前記指定手段により指定された分類の音情報の中から、前記認識手段により認識された入力に対応する音情報を取得することを要旨とする。
これによれば、関連付ける音情報の分類（種類）を予め指定しておくことで、その指定された分類に即した音情報が、第１再生データの再生中における入力に基づいて生成されて該第１再生データと関連付けられるので、関連付ける音情報の選択の効率化を図ることが可能となる。
（８）参考発明８の情報処理システムは、前述の参考発明５から参考発明７の何れか一つの情報処理システムにおいて、
前記生成手段は、前記記憶手段に記憶されている音情報を合成して、前記認識手段により認識された入力に対応する一の音情報を生成することが可能であることを要旨とする。
これによれば、第１再生データの再生中における入力が行われた際、当該入力に合致する音情報が記憶手段に記憶されていないとしても、記憶手段に記憶されている音情報の合成により、その入力に対応する一の音情報が生成されて第１再生データと関連付けられるので、入力の内容によっては音情報が生成されずに第１再生データとの関連付けが行われないといったことを極力排除することが可能となる。
（９）参考発明９の情報処理システムは、前述の参考発明１から参考発明８の何れか一つの情報処理システムにおいて、
前記認識手段は、前記第１再生データの再生中に発話された音声を認識する音声認識手段であることを要旨とする。
これによれば、第１再生データの再生中に発話された音声に基づいて音情報が生成され、当該生成された音情報と第１再生データとの関連付けが行われるので、音情報と第１再生データとの関連付けを音声入力（音声認識）によって簡便に行うことが可能となる。
（１０）参考発明１０の情報処理装置は、
画像情報を含む再生データを再生可能な再生処理装置に接続可能な情報処理装置であって、
前記再生処理装置による再生データの再生中における入力を認識する認識手段と、
前記認識手段により認識された入力に基づいて音情報を生成する生成手段と、
前記生成手段により生成された音情報と前記再生データとの関連付けを前記再生処理装置に対して指示する関連指示手段と、
を備えることを要旨とする。
これによれば、再生処理装置での再生データの再生中における入力に基づいて、音情報が生成されて再生データに関連付けられるので、画像制作に係る作業の効率化を図ることが可能となる。
（１１）参考発明１１の情報処理装置は、前述の参考発明１０の情報処理装置において、
前記音情報を関連付けた再生データの再生を前記再生処理装置に対して指示する再生指示手段を備えることを要旨とする。
これによれば、音情報を含めた再生データ（つまり、生成した音情報と画像情報とを含む再生データ）を再生処理装置にて再生させることが可能となるので、音情報と画像情報とを含む再生データの確認作業が容易となる。
（１２）参考発明１２の情報処理装置は、前述の参考発明１０または参考発明１１の情報処理装置において、
前記関連指示手段は、前記生成手段による音情報の生成の契機となった入力のタイミングに合わせて、該入力に基づいて生成された音情報と前記再生データとの関連付けを指示することを要旨とする。
これによれば、再生データの再生中における入力タイミングと、音情報の再生タイミングとを同期させることが可能となり、音情報の再生タイミング（再生位置）の設定（決定）が容易となる。
（１３）参考発明１３の情報処理装置は、前述の参考発明１０から参考発明１２の何れか一つの情報処理装置において、
複数の音情報を記憶する記憶手段を備え、
前記生成手段は、前記記憶手段に記憶されている音情報の中から、前記認識手段により認識された入力に対応する音情報を取得して、前記再生データと関連付ける音情報を生成することを要旨とする。
これによれば、予め記憶された複数の音情報の中から、入力に応じた音情報が取得されて生成されるので、関連付ける音情報の多様化を図ることが可能となる。
（１４）参考発明１４の情報処理装置は、前述の参考発明１３の情報処理装置において、
前記生成手段は、前記記憶手段に記憶されている音情報の中に、前記認識手段により認識された入力に対応する音情報が複数存在する場合、該複数の音情報のうち何れかを所定条件に基づいて特定して取得することを要旨とする。
これによれば、入力に対応する音情報が複数存在する場合であっても、何れかの音情報が再生データに関連付けられることとなるので、関連付ける音情報の選択の効率化を図ることが可能となる。
（１５）参考発明１５の情報処理装置は、前述の参考発明１３または参考発明１４の情報処理装置において、
前記記憶手段に記憶されている音情報は分類別に管理されており、
前記分類のうち前記生成手段による生成の対象とする音情報の分類を指定可能な指定手段を備え、
前記生成手段は、前記記憶手段に記憶されている音情報のうち、前記指定手段により指定された分類の音情報の中から、前記認識手段により認識された入力に対応する音情報を取得することを要旨とする。
これによれば、関連付ける音情報の分類（種類）を予め指定しておくことで、その指定された分類に即した音情報が、再生データの再生中における入力に基づいて生成されて該再生データと関連付けられるので、関連付ける音情報の選択の効率化を図ることが可能となる。
（１６）参考発明１６の情報処理装置は、前述の参考発明１３から参考発明１５の何れか一つの情報処理装置において、
前記生成手段は、前記記憶手段に記憶されている音情報を合成して、前記認識手段により認識された入力に対応する一の音情報を生成することが可能であることを要旨とする。
これによれば、再生データの再生中における入力が行われた際、当該入力に合致する音情報が記憶手段に記憶されていないとしても、記憶手段に記憶されている音情報の合成により、その入力に対応する一の音情報が生成されて再生データに関連付けられるので、入力の内容によっては音情報が生成されずに再生データとの関連付けが行われないといったことを極力排除することが可能となる。
（１７）参考発明１７の情報処理装置は、前述の参考発明１０から参考発明１６の何れか一つの情報処理装置において、
前記認識手段は、前記再生処理装置による再生データの再生中に発話された音声を認識する音声認識手段であることを要旨とする。
これによれば、再生データの再生中に発話された音声に基づいて音情報が生成され、当該生成された音情報が再生データに関連付けられるので、音情報と再生データとの関連付けを音声入力（音声認識）によって簡便に行うことが可能となる。 In addition, the information processing system and information processing apparatus according to the present invention are used to produce various game effect images used in gaming machines such as pachinko gaming machines and slot machines, home games, mobile games, net games, arcade games, etc. Production of various game images used in game machines, production of various WEB site images used in advertising on the WEB site, production of various animation images used in television and movies, etc. It can be used in production. In particular, the present invention is useful in the gaming machine field, the game field, and the like where the number of product items is large and the product life cycle is short.
[Others]
Hereinafter, inventions related to the embodiments (examples) disclosed in this specification are disclosed as reference inventions.
(1) The information processing system of Reference Invention 1 is
First reproducing means for reproducing first reproduction data including image information;
Recognition means for recognizing an input during reproduction of the first reproduction data;
Generating means for generating sound information based on the input recognized by the recognition means;
Association means for associating the sound information generated by the generation means with the first reproduction data;
It is a summary to provide.
According to this, sound information is generated based on the input during the reproduction of the first reproduction data, and the generated sound information and the first reproduction data are associated with each other. Can be achieved.
(2) The information processing system of Reference Invention 2 is the information processing system of Reference Invention 1 described above.
The gist of the present invention is to provide second reproduction means for reproducing second reproduction data in which the sound information and the first reproduction data are associated with each other.
According to this, the confirmation of the first reproduction data including the sound information (that is, the reproduction data including the generated sound information and the image information) is facilitated by the reproduction of the second reproduction data.
(3) The information processing system of Reference Invention 3 is the information processing system of Reference Invention 1 or Reference Invention 2 described above,
The gist of the association means is that the sound information generated based on the input is associated with the first reproduction data in accordance with the input timing that triggered the generation of the sound information by the generation means.
According to this, it is possible to synchronize the input timing during the reproduction of the first reproduction data and the reproduction timing of the sound information, and it becomes easy to set (determine) the reproduction timing (reproduction position) of the sound information.
(4) The information processing system according to Reference Invention 4 is the information processing system according to any one of Reference Invention 1 to Reference Invention 3,
The gist of the present invention is to provide an adjusting means capable of adjusting the reproduction timing of the sound information associated with the first reproduction data.
According to this, since it is possible to adjust the reproduction timing (reproduction position) of the associated sound information, it is possible to optimize the reproduction timing of the sound information.
(5) The information processing system of Reference Invention 5 is the information processing system of any one of Reference Invention 1 to Reference Invention 4 described above,
Comprising storage means for storing a plurality of sound information;
The generation means acquires sound information corresponding to the input recognized by the recognition means from the sound information stored in the storage means, and generates sound information associated with the first reproduction data. Is the gist.
According to this, since the sound information corresponding to the input is acquired and generated from the plurality of pieces of sound information stored in advance, it is possible to diversify the associated sound information.
(6) The information processing system of Reference Invention 6 is the information processing system of Reference Invention 5 described above,
When the sound information stored in the storage means includes a plurality of sound information corresponding to the input recognized by the recognition means, the generation means determines any one of the plurality of sound information as a predetermined condition. The gist is to specify and acquire based on the above.
According to this, even if there is a plurality of sound information corresponding to the input, any one of the sound information is associated with the first reproduction data, so that the selection of the sound information to be associated is made efficient. Is possible.
(7) The information processing system of Reference Invention 7 is the information processing system of Reference Invention 5 or Reference Invention 6 described above,
Sound information stored in the storage means is managed by classification,
A designation unit capable of designating a classification of sound information to be generated by the generation unit among the classifications;
The generating means acquires sound information corresponding to the input recognized by the recognizing means from the sound information of the classification specified by the specifying means among the sound information stored in the storage means. Is the gist.
According to this, by specifying the classification (type) of the sound information to be associated in advance, the sound information corresponding to the specified classification is generated based on the input during the reproduction of the first reproduction data, and Since it is associated with the first reproduction data, it is possible to improve the efficiency of selection of the sound information to be associated.
(8) An information processing system according to Reference Invention 8 is the information processing system according to any one of Reference Invention 5 to Reference Invention 7,
The gist of the present invention is that the generation means can synthesize the sound information stored in the storage means to generate one sound information corresponding to the input recognized by the recognition means.
According to this, when an input is made during the reproduction of the first reproduction data, even if the sound information matching the input is not stored in the storage means, the sound information stored in the storage means is synthesized. Since one piece of sound information corresponding to the input is generated and associated with the first reproduction data, depending on the content of the input, the sound information is not generated and no association with the first reproduction data is performed as much as possible. It becomes possible to eliminate.
(9) An information processing system according to Reference Invention 9 is the information processing system according to any one of Reference Invention 1 to Reference Invention 8,
The gist of the present invention is that the recognizing means is a sound recognizing means for recognizing a voice spoken during reproduction of the first reproduction data.
According to this, sound information is generated based on the voice uttered during the reproduction of the first reproduction data, and the generated sound information and the first reproduction data are associated with each other. Association with reproduction data can be easily performed by voice input (voice recognition).
(10) The information processing apparatus of Reference Invention 10 is
An information processing apparatus connectable to a reproduction processing apparatus capable of reproducing reproduction data including image information,
Recognizing means for recognizing input during reproduction of reproduction data by the reproduction processing device;
Generating means for generating sound information based on the input recognized by the recognition means;
Association instruction means for instructing the reproduction processing apparatus to associate the sound information generated by the generation means with the reproduction data;
It is a summary to provide.
According to this, since the sound information is generated and associated with the reproduction data based on the input during reproduction of the reproduction data in the reproduction processing apparatus, it is possible to improve the efficiency of the work related to image production.
(11) The information processing apparatus of Reference Invention 11 is the information processing apparatus of Reference Invention 10 described above,
The gist of the present invention is to provide reproduction instruction means for instructing the reproduction processing device to reproduce reproduction data associated with the sound information.
According to this, reproduction data including sound information (that is, reproduction data including generated sound information and image information) can be reproduced by the reproduction processing device. The confirmation work of the reproduction data including becomes easy.
(12) The information processing apparatus of Reference Invention 12 is the information processing apparatus of Reference Invention 10 or Reference Invention 11 described above,
The related instructing means instructs the association between the sound information generated based on the input and the reproduction data in accordance with the input timing that triggered the generation of the sound information by the generating means. To do.
According to this, it becomes possible to synchronize the input timing during the reproduction of the reproduction data and the reproduction timing of the sound information, and the setting (determination) of the reproduction timing (reproduction position) of the sound information becomes easy.
(13) An information processing apparatus according to Reference Invention 13 is the information processing apparatus according to any one of Reference Invention 10 to Reference Invention 12,
Comprising storage means for storing a plurality of sound information;
The generating means acquires sound information corresponding to the input recognized by the recognizing means from sound information stored in the storage means, and generates sound information associated with the reproduction data. And
According to this, since the sound information corresponding to the input is acquired and generated from the plurality of pieces of sound information stored in advance, it is possible to diversify the associated sound information.
(14) The information processing apparatus of Reference Invention 14 is the information processing apparatus of Reference Invention 13 described above,
When the sound information stored in the storage means includes a plurality of sound information corresponding to the input recognized by the recognition means, the generation means determines any one of the plurality of sound information as a predetermined condition. The gist is to specify and acquire based on the above.
According to this, even if there is a plurality of sound information corresponding to the input, since any sound information is associated with the reproduction data, it is possible to improve the efficiency of selecting the associated sound information. It becomes.
(15) The information processing apparatus of Reference Invention 15 is the information processing apparatus of Reference Invention 13 or Reference Invention 14 described above,
Sound information stored in the storage means is managed by classification,
A designation unit capable of designating a classification of sound information to be generated by the generation unit among the classifications;
The generating means acquires sound information corresponding to the input recognized by the recognizing means from the sound information of the classification specified by the specifying means among the sound information stored in the storage means. Is the gist.
According to this, by specifying the classification (type) of the sound information to be associated in advance, the sound information corresponding to the designated classification is generated based on the input during the reproduction of the reproduction data, and the reproduction data Therefore, it is possible to improve the efficiency of selecting sound information to be associated.
(16) An information processing apparatus according to Reference Invention 16 is the information processing apparatus according to any one of Reference Invention 13 to Reference Invention 15,
The gist of the present invention is that the generation means can synthesize the sound information stored in the storage means to generate one sound information corresponding to the input recognized by the recognition means.
According to this, when input is performed during reproduction of reproduction data, even if sound information that matches the input is not stored in the storage means, the sound information stored in the storage means is synthesized. Since one sound information corresponding to the input is generated and associated with the reproduction data, it is possible to eliminate as much as possible that the sound information is not generated and the association with the reproduction data is not performed depending on the content of the input. Become.
(17) An information processing apparatus according to Reference Invention 17 is the information processing apparatus according to any one of Reference Invention 10 to Reference Invention 16,
The gist of the present invention is that the recognizing means is a voice recognizing means for recognizing a voice uttered during reproduction of reproduction data by the reproduction processing device.
According to this, sound information is generated based on the speech uttered during the reproduction of the reproduction data, and the generated sound information is associated with the reproduction data, so that the association between the sound information and the reproduction data is input by voice ( (Speech recognition) can be easily performed.

１０情報処理システム、２０再生処理装置、２１再生部、２２編集部、２３読込部、３０情報処理装置、３１音声認識部、３２検索部、３３指示部、１００ＣＰＵ、１０１ＲＯＭ、１０２ＲＡＭ、１０３フレームバッファメモリ、１０４画像圧縮伸張部、１０５補助記憶部、１０６表示部、１０７操作入力部、１０８音声入力部、１０９音出力部、２００編集画面、２０１動画像タイトル表示領域、２０２再生表示領域、２０３タイムライン表示領域、２０４音編集表示領域、２０５音声認識表示領域、２１０ジャンル指定画面、ＴＣタイムラインカーソル、ＨＫ波形、ＭＫマーク、ＩＣ音アイコン。 DESCRIPTION OF SYMBOLS 10 Information processing system, 20 Playback processing apparatus, 21 Playback part, 22 Editing part, 23 Reading part, 30 Information processing apparatus, 31 Voice recognition part, 32 Search part, 33 Instruction part, 100 CPU, 101 ROM, 102 RAM, 103 Frame buffer memory, 104 image compression / decompression unit, 105 auxiliary storage unit, 106 display unit, 107 operation input unit, 108 audio input unit, 109 sound output unit, 200 editing screen, 201 moving image title display area, 202 playback display area, 203 Timeline display area, 204 Sound edit display area, 205 Voice recognition display area, 210 Genre specification screen, TC timeline cursor, HK waveform, MK mark, IC sound icon.

Claims

Re co means you play including playback data image information,
And recognition means for recognizing input during playback of the previous SL re-raw data,
Generating means for generating sound information based on the input recognized by the recognition means;
And related means for associating the sound information and the prior SL playback data generated by the generating means,
Storage means for storing a plurality of sound information,
The generation means acquires sound information corresponding to the input recognized by the recognition means from the sound information stored in the storage means, and generates sound information associated with the reproduction data. ,
Sound information stored in the storage means is managed by classification,
A specifying unit capable of specifying a classification of sound information to be generated by the generating unit among the classifications;
The generating means acquires sound information corresponding to the input recognized by the recognizing means from the sound information of the classification specified by the specifying means among the sound information stored in the storage means. An information processing system characterized by

An information processing apparatus connectable to a reproduction processing apparatus capable of reproducing reproduction data including image information,
Recognizing means for recognizing input during reproduction of reproduction data by the reproduction processing device;
Generating means for generating sound information based on the input recognized by the recognition means;
Association instruction means for instructing the reproduction processing apparatus to associate the sound information generated by the generation means with the reproduction data;
Storage means for storing a plurality of sound information,
The generation means acquires sound information corresponding to the input recognized by the recognition means from the sound information stored in the storage means, and generates sound information associated with the reproduction data. ,
Sound information stored in the storage means is managed by classification,
A specifying unit capable of specifying a classification of sound information to be generated by the generating unit among the classifications;
The generating means acquires sound information corresponding to the input recognized by the recognizing means from the sound information of the classification specified by the specifying means among the sound information stored in the storage means. information processing apparatus said.