JP2001222712A

JP2001222712A - Image processing apparatus, convolution integration circuit and method thereof

Info

Publication number: JP2001222712A
Application number: JP2000030188A
Authority: JP
Inventors: Seisuke Morioka; 誠介森岡
Original assignee: Sega Corp
Current assignee: Sega Corp
Priority date: 2000-02-08
Filing date: 2000-02-08
Publication date: 2001-08-17

Abstract

(57)【要約】【課題】コンピュータ画像処理において、畳み込み積分
処理を高速に行う。【解決手段】畳み込み積分処理に、加算器（４３〜５
１）をカスケード接続したパイプライン演算器を用い
る。これにより、畳み込み積分の演算を並列に実行で
き、高速の畳み込み積分演算が可能となる。更に、各加
算ユニット（４３〜５１）を、共通の表示データとフィ
ルタの各要素データとを乗算する乗算回路（６０）と、
加算器（６１〜６４）とで構成することにより、一度参
照した表示データをフィルタの各要素で利用できるよう
にして、データの参照回数を低減する。 (57) [Summary] In computer image processing, convolution integration processing is performed at high speed. An adder (43 to 5) is used for convolution integration processing.
A pipeline arithmetic unit in which 1) is connected in cascade is used. As a result, the convolution integral calculation can be performed in parallel, and high-speed convolution integral calculation can be performed. A multiplying circuit (60) for multiplying each of the adding units (43 to 51) by common display data and each element data of the filter;
With the configuration including the adders (61 to 64), the display data once referred to can be used in each element of the filter, and the number of times of data reference is reduced.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、画像データに対
し、フィルター処理を行う画像処理装置、畳み込み積分
回路及びその方法に関し、特に、画像データに対し、畳
み込み積分を行う画像処理装置、畳み込み積分回路及び
その方法に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to an image processing apparatus for performing a filtering process on image data, a convolution integration circuit and a method therefor, and more particularly, to an image processing apparatus for performing convolution integration on image data, and a convolution integration circuit. And its method.

【０００２】近年、コンピュータグラフィックスの進展
は目ざましいものがある。特に、ゲーム装置、シュミレ
ーション装置では、複雑かつ色彩あふれる画像をリアル
タイムで生成することが求められている。In recent years, the progress of computer graphics has been remarkable. In particular, in a game device and a simulation device, it is required to generate a complex and colorful image in real time.

【０００３】[0003]

【従来の技術】ゲーム装置やシュミレーション装置にお
いては、操作入力に応じてキャラクタ等を構成する複数
のポリゴンの位置を求め、それらのポリゴンの内、表示
されるポリゴンに対してレンダリング（描画）処理を行
い、表示スクリーンの画像データを生成する。この画像
データは、フレームバッファに書き込まれた後、フレー
ムバッファの画像データが表示装置に表示される。2. Description of the Related Art In a game device or a simulation device, the positions of a plurality of polygons constituting a character or the like are determined in accordance with an operation input, and rendering (drawing) processing is performed on the displayed polygons among the polygons. Then, image data of the display screen is generated. After the image data is written to the frame buffer, the image data in the frame buffer is displayed on the display device.

【０００４】一方、コンピュータによる画像処理によっ
て出力される画像は、通常全てのオブジェクトがぼかし
等のない焦点のあった鮮明な画像である。しかし、各オ
ブジェクトの種類、動き、位置等によっては、画像をぼ
かしたり、エッジ強調して、画像を加工した方がより画
像表示上効果的であり、且つ現実に近い場合がある。[0004] On the other hand, an image output by image processing by a computer is a clear image in which all objects are normally focused without blurring. However, depending on the type, movement, position, and the like of each object, it may be more effective on image display to process the image by blurring the image or emphasizing the edge, and may be closer to reality.

【０００５】例えば、遠近感を出すためには、単なる透
視変換により近くの物を大きく、遠くのものを小さくと
いうだけでなく、被写界深度に応じて、焦点近くの物
は、鮮明で、焦点から外れた物は、ぼかしがある方が好
ましい。又、ゲームの進行上、特定のオブジェクト又は
ポリゴンに対し、常にぼかしやエッジ強調等の画像加工
を施した方が、効果的である。[0005] For example, in order to give a perspective, not only a near object is made large and a distant object is made small by a simple perspective transformation, but also an object near a focal point is sharp according to the depth of field. Objects that are out of focus are preferably blurred. In the course of the game, it is more effective to always perform image processing such as blurring or edge enhancement on a specific object or polygon.

【０００６】このような画像加工のため、畳み込み積分
処理が利用されている。この処理は、前述のぼかし等の
効果を与えるために、使用される処理であり、以下の
（１）式で表される演算を行う。For such image processing, a convolution integration process is used. This process is a process used to provide the above-described effects such as blurring, and performs an operation represented by the following equation (1).

【０００７】[0007]

【数１】 [Equation 1]

【０００８】ここで、ｓ（ｘ，ｙ）は、生成画像であ
り、ｆ（ｘ，ｙ）は、原画像、ｇ（ｘ，ｙ）は、フィル
タ関数である。１次元での画像加工の例を、図１６に示
す。この例では、ｘ方向に一次元の原画像ｆ（ｘ）が、
ボックスフィルタｇ（ｘ）により畳み込まれ、ぼけた画
像ｓ（ｘ）を生成することを示す。このような画像加工
により、ぼかしを始めとした各種の画像上の効果を行う
ことができる。Here, s (x, y) is a generated image, f (x, y) is an original image, and g (x, y) is a filter function. FIG. 16 shows an example of one-dimensional image processing. In this example, a one-dimensional original image f (x) in the x direction is
This shows that a blurred image s (x) is generated by being convolved by the box filter g (x). By such image processing, effects on various images including blurring can be performed.

【０００９】しかしながら、このような畳み込み処理
は、非常に演算量が多く、且つデータを参照するための
メモリアクセスの回数も多いため、非常に高速なプロセ
ッサを使用しても、時間のかかる処理であった。However, such a convolution process requires a very large amount of computation and a large number of memory accesses for referring to data. Therefore, even if a very high-speed processor is used, it takes a long time. there were.

【００１０】例えば、ある１つのピクセルに対して、１
５×１５ピクセルのフィルタを用いた畳み込み積分処理
を行うには、以下の（２）式の計算を行う必要がある。For example, for one pixel, 1
In order to perform the convolution integral processing using the filter of 5 × 15 pixels, it is necessary to calculate the following equation (2).

【００１１】[0011]

【数２】 [Equation 2]

【００１２】これを、ＣＰＵやＤＳＰという汎用プロセ
ッサで実行した場合には、ある１つのピクセルの画像デ
ータを得るため、乗算２２５回、加算２２４回の演算が
必要となる。処理の簡略化のため、フィルタ関数ｇi,j
を「０」／「１」で示されるデジタルフィルタとするこ
とにより、乗算を省略することができる。しかし、この
演算を、表示サイズとして一般的なＶＧＡ（６４０×４
８０）のサイズの画像に対し行った場合には、加算だけ
でも、６８、８１２、８００回となり、演算量が極めて
多い。同様に、この逐次計算では、データを参照するた
めのメモリアクセスの回数も、同様の回数となり、こち
らも無視できない。When this is executed by a general-purpose processor such as a CPU or a DSP, 225 multiplications and 224 additions are required to obtain image data of a certain pixel. To simplify the processing, the filter function gi, j
Is a digital filter represented by “0” / “1”, multiplication can be omitted. However, this calculation is performed using a general VGA (640 × 4
When the process is performed on an image having the size of 80), 68, 812, and 800 times are obtained only by addition, and the amount of calculation is extremely large. Similarly, in this sequential calculation, the number of memory accesses for referring to data is also the same, and cannot be ignored.

【００１３】この汎用プロセッサによる逐次処理による
演算量及びメモリアクセスを解消するため、畳み込み積
分専用のハードウェアを設けることが提案されている
（例えば、特願平９−２９６８１２号明細書）。この提
案は、畳み込み積分を並列に行う回路を示しており、フ
ィルタのサイズ（ピクセル数）だけ、乗算器を並列に設
け、且つ各乗算器の出力を加算する加算器を設けたもの
であった。It has been proposed to provide hardware dedicated to convolution integration in order to eliminate the amount of calculation and memory access by the sequential processing by the general-purpose processor (for example, Japanese Patent Application No. 9-296812). This proposal shows a circuit that performs convolution integration in parallel, in which multipliers are provided in parallel by the size of the filter (the number of pixels), and adders that add the outputs of the multipliers are provided. .

【００１４】[0014]

【発明が解決しようとする課題】しかしながら、かかる
従来の提案では、次の問題があった。However, such a conventional proposal has the following problems.

【００１５】(1) 画像データを並列に並んだ乗算器に入
力するための並列配線パスと、並列に並んだ多数の乗算
器と、加算器とを接続するための並列配線パスが必要で
ある。このため、ＬＳＩチップで構成した場合に、配線
スペースが大きくなり、チップサイズが大型となるとい
う問題があり、且つ他の画像処理回路と一体のＬＳＩの
形成が困難であった。特に、カラー画像データを対象と
する場合には、更に、並列配線パスを各色のビット数分
設ける必要があり、実現が困難である。(1) A parallel wiring path for inputting image data to the multipliers arranged in parallel and a parallel wiring path for connecting a number of multipliers arranged in parallel and an adder are required. . For this reason, in the case of being constituted by an LSI chip, there is a problem that a wiring space becomes large and a chip size becomes large, and it is difficult to form an LSI integrated with another image processing circuit. In particular, when color image data is targeted, it is necessary to provide parallel wiring paths for the number of bits of each color, which is difficult to realize.

【００１６】(2) 通常、画像データは、メモリに格納さ
れており、アドレスに応じてシーケンシャル読みだされ
るため、前述の並列演算方式では、フィルタサイズ分の
画像データの読み出しを、対象領域のピクセル数だけ繰
り返す必要がある。例えば、６×６ピクセルの領域を、
３×３のフィルタサイズで畳み込み積分して、４×４の
画像データを得るには、読み出しに、９サイクル、演算
に１サイクルかかり、これを１６サイクル繰り返す必要
がある。このため、演算時間は少なくなるが、メモリア
クセス回数が多くなり、全体として処理時間を短くでき
ない。(2) Normally, image data is stored in a memory and is sequentially read in accordance with an address. Therefore, in the above-described parallel operation method, image data of a filter size is read out of a target area. It is necessary to repeat by the number of pixels. For example, an area of 6 × 6 pixels is
In order to obtain 4 × 4 image data by performing convolution integration with a filter size of 3 × 3, it takes 9 cycles for reading and 1 cycle for calculation, and this must be repeated for 16 cycles. For this reason, the calculation time is reduced, but the number of memory accesses is increased, and the processing time cannot be shortened as a whole.

【００１７】従って、本発明の目的は、畳み込み積分の
演算を並列化しても、配線パスを短くするための画像処
理装置、畳み込み積分回路及びその方法を提供すること
にある。Accordingly, it is an object of the present invention to provide an image processing apparatus, a convolution integration circuit, and a method for shortening the wiring path even when the convolution integration operation is parallelized.

【００１８】本発明の他の目的は、畳み込み積分の演算
を並列化しても、処理時間を短縮するための画像処理装
置、畳み込み積分回路及びその方法を提供することにあ
る。Another object of the present invention is to provide an image processing apparatus, a convolution integration circuit, and a method for shortening the processing time even if the convolution integration operation is parallelized.

【００１９】本発明の更に他の目的は、畳み込み積分の
処理を並列化しても、メモリの参照回数を少なくするた
めの画像処理装置、畳み込み積分回路及びその方法を提
供することにある。Still another object of the present invention is to provide an image processing apparatus, a convolution integrator circuit, and a method for reducing the number of times of referencing a memory even when convolution integral processing is parallelized.

【００２０】本発明の更に他の目的は、畳み込み積分の
処理を並列化しても、回路規模の大型化を防止するため
の画像処理装置、畳み込み積分回路及びその方法を提供
することにある。Still another object of the present invention is to provide an image processing apparatus, a convolution integration circuit, and a method for preventing an increase in the circuit scale even when the convolution integration processing is parallelized.

【００２１】[0021]

【課題を解決するための手段】本発明の一態様の画像処
理装置は、各画像データから表示データを生成する生成
ユニットと、前記生成された表示データを、複数の要素
で指定された特性のフィルタで、畳み込み積分して、加
工された前記スクリーン用表示データを作成する畳み込
み積分回路とを有する。そして、前記畳み込み積分回路
は、前記フィルタの各要素に対応して設けられ、前記表
示データが共通に供給された複数の加算ユニットをカス
ケード接続したパイプライン演算器を有し、前記各加算
ユニットは、前記表示データと前記フィルタの要素デー
タとを乗算する乗算回路と、入力と、前記乗算結果とを
加算する加算器とを備える。According to an aspect of the present invention, there is provided an image processing apparatus comprising: a generating unit configured to generate display data from each image data; A convolution integration circuit that performs convolution integration by a filter to create the processed screen display data. The convolution integrator circuit includes a pipeline arithmetic unit provided in correspondence with each element of the filter and cascading a plurality of addition units to which the display data is supplied in common. , A multiplication circuit for multiplying the display data by the element data of the filter, and an adder for adding an input and the multiplication result.

【００２２】本発明の一態様の畳み込み積分回路は、フ
ィルタの各要素に対応して設けられ、前記表示データが
共通に供給された複数の加算ユニットをカスケード接続
したパイプライン演算器を有し、前記各加算ユニット
は、前記表示データと前記フィルタの要素データとを乗
算する乗算回路と、入力と、前記乗算結果とを加算する
加算器とを備える。A convolution integrator according to one embodiment of the present invention includes a pipeline arithmetic unit provided in correspondence with each element of a filter and cascading a plurality of addition units to which the display data is commonly supplied, Each of the adding units includes a multiplying circuit that multiplies the display data and element data of the filter, and an adder that adds an input and the multiplication result.

【００２３】本発明の一態様の画像処理方法は、各画像
データから表示データを生成する生成ステップと、前記
生成された表示データを、複数の要素で指定された特性
のフィルタで、畳み込み積分して、加工された前記スク
リーン用表示データを作成する畳み込み積分ステップと
を有する。そして、前記畳み込み積分ステップは、前記
フィルタの各要素に対応して設けられた複数の加算ユニ
ットをカスケード接続したパイプライン演算器に、前記
表示データを供給するステップと、前記各加算ユニット
において、前記表示データと前記フィルタの要素データ
とを乗算した後、入力と、前記乗算結果とを加算する加
算ステップとを備える。The image processing method according to one aspect of the present invention includes a generating step of generating display data from each image data, and convolving the generated display data with a filter having characteristics designated by a plurality of elements. And a convolution integration step for creating the processed screen display data. The convolution integration step includes a step of supplying the display data to a pipeline arithmetic unit in which a plurality of addition units provided in correspondence with the respective elements of the filter are cascade-connected. After the display data is multiplied by the element data of the filter, an addition step of adding an input and the multiplication result is provided.

【００２４】本発明のこの態様では、畳み込み積分処理
に、加算器をカスケード接続したパイプライン演算器を
用いている。パイプライン演算器は、並列処理に適して
おり、畳み込み積分の演算を並列に実行でき、このた
め、高速の畳み込み積分演算が可能となる。次に、単
に、加算器をカスケード接続したパイプライン演算器を
用いただけでは、演算は高速化するが、メモリの参照回
数は減少しない。本発明では、読みだした表示データを
効率良く使用するように、加算ユニット及びパイプライ
ン演算器を構成し、メモリの参照回数を大幅に低減し、
トータルの演算速度を向上するものである。即ち、パイ
プライン演算器の各加算ユニットを、共通の表示データ
とフィルタの各要素データとを乗算する乗算回路と、加
算器とで構成することにより、一度参照した表示データ
をフィルタの各要素で利用できるようにして、データの
参照回数を低減するものである。In this embodiment of the present invention, a pipeline arithmetic unit in which adders are connected in cascade is used for the convolution integral processing. The pipeline operation unit is suitable for parallel processing, and can execute convolution integral operations in parallel, thereby enabling high-speed convolution integral operations. Next, by simply using a pipeline arithmetic unit in which adders are cascaded, the operation speed is increased, but the number of references to the memory is not reduced. In the present invention, the addition unit and the pipeline arithmetic unit are configured to efficiently use the read display data, and the number of times of referring to the memory is significantly reduced.
This is to improve the total calculation speed. That is, by configuring each addition unit of the pipeline arithmetic unit with a multiplication circuit that multiplies the common display data and each element data of the filter and an adder, the display data once referred to by each element of the filter. It can be used to reduce the number of data references.

【００２５】本発明の他の態様の画像処理装置は、前記
パイプライン演算器は、前記パイプライン中のデータの
次段の加算ユニットへの入力を制御するセレクタを更に
有する。In an image processing apparatus according to another aspect of the present invention, the pipeline arithmetic unit further includes a selector for controlling an input of data in the pipeline to a next-stage addition unit.

【００２６】本発明の他の態様の畳み込み積分回路は、
前記パイプライン演算器は、前記パイプライン中のデー
タの次段の加算ユニットへの入力を制御するセレクタを
更に有する。According to another embodiment of the convolution integrator of the present invention,
The pipeline arithmetic unit further includes a selector that controls an input of data in the pipeline to a next-stage addition unit.

【００２７】本発明の他の態様の画像処理方法は、前記
畳み込み積分ステップは、セレクタにより、前記パイプ
ライン中のデータの次段の加算ユニットへの入力を制御
するステップを更に有する。In the image processing method according to another aspect of the present invention, the convolution integration step further includes a step of controlling the input of the data in the pipeline to the next addition unit by a selector.

【００２８】この態様では、パイプライン中に、データ
の流れを制御するためのセレクタを設けることにより、
加算ユニットをカスケード接続しても、加算結果同志を
選択して加算することができる。このため、表示データ
を共通に供給しても、必要な畳み込み積分結果が得られ
る。In this aspect, by providing a selector for controlling the flow of data in the pipeline,
Even if the addition units are cascaded, addition results can be selected and added. Therefore, even if the display data is commonly supplied, a necessary convolution integration result can be obtained.

【００２９】本発明の別の態様の画像処理装置は、前記
パイプライン演算器は、前記パイプライン中のデータを
前記加算ユニットへのフィードバック入力を制御するセ
レクタを更に有する。In an image processing apparatus according to another aspect of the present invention, the pipeline operation unit further includes a selector for controlling a feedback input of data in the pipeline to the addition unit.

【００３０】本発明の別の態様の畳み込み積分回路は、
前記パイプライン演算器は、前記パイプライン中のデー
タを前記加算ユニットへのフィードバック入力を制御す
るセレクタを更に有する。A convolution integrator according to another aspect of the present invention is
The pipeline arithmetic unit further includes a selector that controls a feedback input of data in the pipeline to the addition unit.

【００３１】本発明の別の態様の画像処理方法は、前記
畳み込み積分ステップは、セレクタにより、前記パイプ
ライン中のデータを前記加算ユニットへのフィードバッ
ク入力を制御するステップを更に有する。In the image processing method according to another aspect of the present invention, the convolution integration step further includes a step of controlling a feedback input of the data in the pipeline to the addition unit by a selector.

【００３２】この態様では、加算ユニットへのフィード
バックルートを設け、加算ユニットをデータ保持回路に
使用するものである。これにより、参照領域分の回路が
必要とないため、回路規模を大幅に低減できる。In this embodiment, a feedback route to the addition unit is provided, and the addition unit is used for the data holding circuit. This eliminates the need for a circuit for the reference area, so that the circuit scale can be significantly reduced.

【００３３】本発明の更に別の態様の画像処理装置は、
前記パイプライン演算器は、前記パイプライン中のデー
タを一時保持する保持回路を更に有する。An image processing apparatus according to still another aspect of the present invention comprises:
The pipeline arithmetic unit further includes a holding circuit that temporarily holds data in the pipeline.

【００３４】本発明の更に別の態様の畳み込み積分回路
は、前記パイプライン演算器は、前記パイプライン中の
データを一時保持する保持回路を更に有する。[0034] In a convolution integrator according to still another aspect of the present invention, the pipeline operation unit further includes a holding circuit for temporarily holding data in the pipeline.

【００３５】本発明の更に別の画像処理方法は、前記畳
み込み積分ステップは、保持回路により、前記パイプラ
イン中のデータを一時保持するステップを更に有する。
パイプライン中のデータを保持する回路を設けているた
め、参照領域がフィルタサイズより大きくても、パイプ
ライン中にデータを保持できる。これにより、メモリの
参照回数を低減できる。In another image processing method according to the present invention, the convolution integration step further includes a step of temporarily holding data in the pipeline by a holding circuit.
Since the circuit for holding data in the pipeline is provided, data can be held in the pipeline even if the reference area is larger than the filter size. Thereby, the number of times of referring to the memory can be reduced.

【００３６】本発明の別の態様の画像処理装置は、前記
保持回路を、前記フィルタのサイズと前記表示データの
処理領域とのサイズに応じた数分設けた。In an image processing apparatus according to another aspect of the present invention, the holding circuits are provided in a number corresponding to the size of the filter and the size of the display data processing area.

【００３７】本発明の別の態様の畳み込み積分回路は、
前記保持回路を、前記フィルタのサイズと前記表示デー
タの処理領域とのサイズに応じた数分設けた。According to another embodiment of the convolution integrator of the present invention,
The holding circuits are provided in a number corresponding to the size of the filter and the size of the processing area of the display data.

【００３８】この態様では、保持回路の個数をフィルタ
サイズと参照領域とにより決定するため、最小限の個数
の保持回路で、畳み込み積分を実行できる。これによ
り、回路規模を最小とし、且つ畳み込み積分のパイプラ
イン演算を高速にできる。In this embodiment, since the number of holding circuits is determined by the filter size and the reference area, convolution integration can be performed with a minimum number of holding circuits. Thus, the circuit scale can be minimized, and the pipeline operation of convolution integration can be performed at high speed.

【００３９】本発明の更に別の態様の画像処理装置は、
前記畳み込み積分回路は、前記表示データの読み出しに
応じて、前記パイプライン演算器の一部の加算ユニット
の乗算回路の動作を無効にするためのマスク回路を更に
有する。An image processing apparatus according to still another aspect of the present invention comprises:
The convolution integration circuit further includes a mask circuit for invalidating an operation of a multiplication circuit of a part of the addition units of the pipeline arithmetic unit in response to reading of the display data.

【００４０】本発明の更に別の態様の畳み込み積分回路
は、前記表示データの読み出しに応じて、前記パイプラ
イン演算器の一部の加算ユニットの乗算回路の動作を無
効にするためのマスク回路を更に有する。A convolution integrator according to still another aspect of the present invention includes a mask circuit for invalidating an operation of a multiplying circuit of an addition unit of a part of the pipeline arithmetic unit in response to reading of the display data. Have more.

【００４１】本発明の更に別の態様の画像処理方法は、
前記畳み込み積分ステップは、マスク回路により、前記
表示データの読み出しに応じて、前記パイプライン演算
器の一部の加算ユニットの乗算回路の動作を無効にする
ステップを更に有する。According to still another aspect of the present invention, there is provided an image processing method comprising:
The convolution integration step further includes a step of disabling an operation of a multiplication circuit of a part of the addition units of the pipeline arithmetic unit in response to reading of the display data by a mask circuit.

【００４２】この態様では、マスク回路を設けたため、
各加算ユニットに、共通に表示データを供給しても、必
要なフィルタ要素との演算が選択的に可能となる。この
ため、表示データを並列に入力する必要がなく、並列パ
スを減少でき、しかも、１回のメモリ参照により、畳み
込み積分処理を実行できる。In this embodiment, since the mask circuit is provided,
Even if display data is commonly supplied to each addition unit, it is possible to selectively perform an operation with a necessary filter element. Therefore, there is no need to input display data in parallel, the number of parallel paths can be reduced, and the convolution integration process can be executed by one memory reference.

【００４３】[0043]

【発明の実施の形態】以下、本発明を、画像処理装置、
画像処理動作、畳み込み積分回路、他の実施の態様の順
で説明する。DESCRIPTION OF THE PREFERRED EMBODIMENTS Hereinafter, the present invention will be described with reference to an image processing apparatus,
The image processing operation, the convolution integration circuit, and other embodiments will be described in this order.

【００４４】・・画像処理装置・・図１は、本発明の一実施の態様の画像処理装置のブロッ
ク図である。Image Processing Apparatus FIG. 1 is a block diagram of an image processing apparatus according to an embodiment of the present invention.

【００４５】図１は、ゲーム装置又はシュミレーション
装置を示す。レンダリング処理部（プロセッサ、以下、
レンダラーという）３４は、画像処理装置であり、ゲー
ム装置やシュミレーション装置内に設けられる。ゲーム
装置は、レンダラー３４と、ジオメトリ処理部（ジオメ
トリ演算器）３２と、メインＣＰＵ３０と、ゲームプロ
グラムを記憶するプログラムＲＯＭ３６と、ゲームプロ
グラム実行時等に利用されるワークＲＡＭ３８とが、内
部バス４０を介して接続される。FIG. 1 shows a game device or a simulation device. Rendering processing unit (processor,
A renderer 34 is an image processing device, and is provided in a game device or a simulation device. The game device includes a renderer 34, a geometry processing unit (geometry operation unit) 32, a main CPU 30, a program ROM 36 for storing a game program, and a work RAM 38 used for executing the game program. Connected via.

【００４６】メインＣＰＵ３０は、図示しないオペレー
タから操作入力に応答して、ＲＯＭ３６のゲームプログ
ラムを実行し、必要な画像処理を行う。この例では、画
像処理に必要なオブジェクトを構成するポリゴンデータ
や表示画面の視点情報等を生成する。ＣＰＵ３０は、ワ
ークＲＡＭ３８に、そのポリゴンデータ等を書き込み、
ジオメトリ処理部３２やレンダラー３４に、画像処理を
実行させる。The main CPU 30 executes a game program in the ROM 36 and performs necessary image processing in response to an operation input from an operator (not shown). In this example, polygon data constituting an object necessary for image processing, viewpoint information of a display screen, and the like are generated. The CPU 30 writes the polygon data and the like in the work RAM 38,
It causes the geometry processing unit 32 and the renderer 34 to execute image processing.

【００４７】ジオメトリ処理部３２は、ポリゴンデータ
に対して、三次元座標空間内のポリゴンの配置変換のマ
トリクス演算や表示画面の二次元座標への透視変換等の
ジオメトリー処理を行う。勿論、このジオメトリ処理
を、メインＣＰＵ３０で行っても良い。The geometry processing unit 32 performs a geometry process on the polygon data, such as a matrix operation for converting the arrangement of polygons in the three-dimensional coordinate space and a perspective conversion to two-dimensional coordinates on the display screen. Of course, this geometry processing may be performed by the main CPU 30.

【００４８】レンダラー３４は、ジオメトリ処理部３２
から与えられるポリゴンの頂点座標からピクセル単位の
画像データを生成して、フレームバッファ１７に書き込
むものである。Ｚ値バッファ１５は、表示画面の最も前
面にあるピクセルのＺ値を記憶するものである。尚、ジ
オメトリ処理部３２とレンダラー３４の各部について
は、図２以下で後述する。The renderer 34 is provided with the geometry processing unit 32
The image data is generated in pixel units from the coordinates of the vertices of the polygon given by, and is written into the frame buffer 17. The Z value buffer 15 stores the Z value of the pixel at the forefront of the display screen. The components of the geometry processing unit 32 and the renderer 34 will be described later with reference to FIG.

【００４９】フィルタバッファ１６は、ぼかし処理等の
画像加工を行うためのフィルタデータが、ピクセル単位
又は所定の複数個のピクセル単位で記憶する。このフィ
ルターデータの作成は、後述するように、レンダラー３
４が行う。畳み込み積分回路１８は、フレームバッファ
１７に書き込まれた画像データに対して、フィルタバッ
ファ１６で設定されたフィルタ処理を行い、フレームバ
ッファ１７に書き込むものである。この畳み込み積分回
路１８の詳細は、図５以下にて後述する。The filter buffer 16 stores filter data for performing image processing such as blurring processing in pixel units or a predetermined plurality of pixel units. The creation of the filter data is performed by the renderer 3 as described later.
4 does. The convolution integration circuit 18 performs a filtering process set in the filter buffer 16 on the image data written in the frame buffer 17 and writes the image data in the frame buffer 17. The details of the convolution integration circuit 18 will be described later with reference to FIG.

【００５０】この実施の態様では、フレームバッファ１
７内に、ＲＧＢのカラーデータを含む画像データの書き
込みが終了した後、畳み込み積分回路１８が、フィルタ
バッファ１６のピクセル単位又は所定数のピクセル単位
のフィルタデータを読み出し、フレームバッファ１７の
画像データに、フィルタ処理する。例えば、ぼかし処理
である。In this embodiment, the frame buffer 1
After the writing of the image data including the RGB color data is completed in 7, the convolution integrator circuit 18 reads out the filter data of the filter buffer 16 in pixel units or a predetermined number of pixel units, and , Filter. For example, blur processing.

【００５１】表示制御部１９は、フレームバッファ１７
の画像データを読み出し、表示装置２０に供給して、表
示を行う。フレームの周波数が６０Ｈｚの場合には、１
／６０秒毎に、フレームバッファ１７への画像の書き込
み、畳み込み積分回路１８によるフィルタ処理が行われ
る。The display control unit 19 includes the frame buffer 17
Is read and supplied to the display device 20 for display. If the frame frequency is 60 Hz, 1
Every / 60 seconds, the image is written into the frame buffer 17 and the convolution integration circuit 18 performs the filtering process.

【００５２】・・画像処理動作・・次に、図１の画像処理動作について、説明する。図２
は、図１の画像処理のフローチャート図、図３は、その
ポリゴンの頂点データの説明図、図４は、ポリゴンのピ
クセルデータの説明図である。Next, the image processing operation of FIG. 1 will be described. FIG.
Is a flowchart of the image processing of FIG. 1, FIG. 3 is an explanatory diagram of vertex data of the polygon, and FIG. 4 is an explanatory diagram of pixel data of the polygon.

【００５３】（Ｓ１）前述のゲームプログラムの実行に
より、ＣＰＵ３０は、ＲＡＭ３８に、ポリゴンの頂点デ
ータと処理モードを指定するレジスタファンクションを
書き込む。図３に示すように、ポリゴンデータは、ポリ
ゴンを構成する各頂点データ、例えば、各頂点の三次元
座標（ｘ、ｙ、ｚ）と、カラーデータ（Ｒ、Ｇ、Ｂ、透
明度ａ）と、ポリゴンの素材であるテクスチャデータを
記憶するテクスチャバッファメモリ１４のテクスチャ座
標(Tx,Ty) 、ＣＰＵ３０からポリゴン毎に与えられるぼ
かし値（フィルタ値）及び速度ベクトル等を含む。又、
場合によっては、法線ベクトルデータ(Nx,Ny,Nz)が与え
られる。(S1) By executing the above-described game program, the CPU 30 writes the polygon vertex data and the register function for designating the processing mode in the RAM 38. As shown in FIG. 3, the polygon data includes vertex data constituting the polygon, for example, three-dimensional coordinates (x, y, z) of each vertex, color data (R, G, B, transparency a), The texture coordinates (Tx, Ty) of the texture buffer memory 14 for storing texture data as a material of polygons, a blur value (filter value) given from the CPU 30 for each polygon, and a velocity vector are included. or,
In some cases, normal vector data (Nx, Ny, Nz) is provided.

【００５４】この実施の態様では、フィルタ処理とし
て、ぼかし処理を例にしてあるため、ぼかし値及び速度
ベクトルが与えられている。ジオメトリ処理部３２で
は、データロード回路３が、ＲＡＭ３８のこのようなポ
リゴンデータを順次読み出し、座標変換回路４に出力す
る。In this embodiment, since the blur processing is taken as an example of the filter processing, a blur value and a velocity vector are given. In the geometry processing unit 32, the data load circuit 3 sequentially reads such polygon data from the RAM 38 and outputs the polygon data to the coordinate conversion circuit 4.

【００５５】（Ｓ２）座標変換回路４では、図３のポリ
ゴンバッファ内の座標系から三次元空間内の座標系への
座標変換を行う。即ち、図３の頂点データに対し、ＣＰ
Ｕ３０から与えられたマトリクス情報に従って、３次元
空間内のオブジェクトの配置（変換）を行う。具体的に
は、ポリゴンの平行移動、回転移動等の指令が、マトリ
クス情報として、ＣＰＵ３０から与えられ、その指令に
従い、頂点座標、速度ベクトルが座標変換される。更
に、座標変換回路４では、視点情報に従って、３次元空
間内でのビューポートの設定を行う。(S2) The coordinate conversion circuit 4 performs coordinate conversion from the coordinate system in the polygon buffer of FIG. 3 to a coordinate system in a three-dimensional space. That is, the vertex data of FIG.
The arrangement (conversion) of the object in the three-dimensional space is performed according to the matrix information provided from U30. Specifically, instructions such as parallel movement and rotational movement of the polygon are given from the CPU 30 as matrix information, and the vertex coordinates and the velocity vector are coordinate-converted in accordance with the instructions. Further, the coordinate conversion circuit 4 sets a viewport in a three-dimensional space according to the viewpoint information.

【００５６】（Ｓ３）次に、クリッピング回路５は、ビ
ューポート外にある頂点を取り除き、且つビューポート
の境界部に新たな頂点を生成する。これにより、頂点で
画定されるポリゴンが全てビューポート領域内に収まる
ようにする。この処理は、一般的なクリッピング処理で
ある。(S3) Next, the clipping circuit 5 removes vertices outside the viewport and generates a new vertex at the boundary of the viewport. As a result, all polygons defined by vertices are made to fit within the viewport area. This process is a general clipping process.

【００５７】そして、透視変換回路６により、そのビュ
ーポート領域内にある頂点データに対し、３次元座標か
ら表示画面の二次元座標への透視変換を行う。即ち、頂
点の三次元座標、速度ベクトルに対し、透視変換を行
う。又、表示画面内の奥行きを表すＺ値も同時に生成さ
れる。Then, the perspective transformation circuit 6 performs the perspective transformation from the three-dimensional coordinates to the two-dimensional coordinates on the display screen for the vertex data in the viewport area. That is, perspective transformation is performed on the three-dimensional coordinates and the velocity vector of the vertex. Also, a Z value representing the depth in the display screen is generated at the same time.

【００５８】以上が、ジオメトリ処理部３２内での処理
であり、これら処理は、パイプライン制御により、各回
路によりポリゴン毎に行われる。勿論、回路でなく、ソ
フトウェアにより実現することもできる。The above is the processing in the geometry processing unit 32. These processings are performed for each polygon by each circuit under pipeline control. Of course, it can also be realized by software instead of a circuit.

【００５９】（Ｓ４）ついで、レンダラー３４の処理を
説明する。先ず、塗り潰し回路７が、二次元座標系に変
換された頂点データから、その頂点で画定されるポリゴ
ン領域内のピクセルデータを演算する。図４に示すよう
に、ピクセルデータは、各ピクセルの二次元座標（ｘ、
ｙ）と、Ｚ値と、テクスチャ座標、法線ベクトル、カラ
ーデータ、固有の与えられたぼかし値、速度ベクトル等
からなる。図４では、１つのポリゴンが、ピクセル１、
２、３で構成されている例を示している。このピクセル
の座標（ｘ、ｙ）は、頂点データの座標値から補間法に
より求める。他の属性データも、頂点データからの補間
法により演算により求めることができる。これ以降の処
理は、ピクセル単位に、パイプライン制御に従い、次々
に行われる。(S4) Next, the processing of the renderer 34 will be described. First, the filling circuit 7 calculates pixel data in the polygon area defined by the vertices from the vertex data converted into the two-dimensional coordinate system. As shown in FIG. 4, the pixel data includes two-dimensional coordinates (x,
y), Z values, texture coordinates, normal vectors, color data, unique given blur values, velocity vectors, etc. In FIG. 4, one polygon is pixel 1,
The example which consists of 2, 3 is shown. The coordinates (x, y) of the pixel are obtained from the coordinate values of the vertex data by an interpolation method. Other attribute data can also be obtained by calculation by interpolation from the vertex data. Subsequent processes are performed one after another according to pipeline control in pixel units.

【００６０】（Ｓ５）Ｚ値バッファメモリ１５には、表
示画面の各ピクセル位置での最も前面（手前）にあるピ
クセルのＺ値を記憶する。Ｚ値比較回路８は、Ｚ値バッ
ファメモリ１５内の同じ位置のピクセルのＺ値と、処理
中のピクセルのＺ値とを比較することにより、陰面処理
を行う。例えば、処理中のピクセルのＺ値が、Ｚ値バッ
ファ１５のＺ値より小さい場合には、処理中ピクセルが
より手前に位置することを意味する。このため、この処
理中ピクセルを表示画面に表示するため、フレームバッ
ファ１７のそのピクセル位置に、処理中ピクセルの画像
データ（Ｒ、Ｇ、Ｂデータ等）を書き込む。従って、フ
レームバッファ１７内の画像データは、Ｚ値バッファメ
モリ１５に、そのＺ値が書き込まれたピクセルの画像デ
ータである。(S5) The Z value buffer memory 15 stores the Z value of the foreground (front) pixel at each pixel position on the display screen. The Z value comparison circuit 8 performs hidden surface processing by comparing the Z value of the pixel at the same position in the Z value buffer memory 15 with the Z value of the pixel being processed. For example, if the Z value of the pixel being processed is smaller than the Z value of the Z value buffer 15, it means that the pixel being processed is located further forward. Therefore, in order to display the pixel being processed on the display screen, the image data (R, G, B data, etc.) of the pixel being processed is written into the pixel position of the frame buffer 17. Therefore, the image data in the frame buffer 17 is the image data of the pixel whose Z value has been written in the Z value buffer memory 15.

【００６１】（Ｓ６）次に、テクスチャ発生回路９は、
図４のピクセルの一つの属性であるテクスチャ座標に従
って、テクスチャバッファ１４内のテクスチャデータを
読み出し、対応するピクセルのテクスチャカラーを演算
する。この理由は、テクスチャバッファ１４内のデータ
と、表示画面内のピクセルの位置とが、必ずしも、一対
一の対応関係でないため、かかる演算が行われる。この
テクスチャバッファ１４には、テクスチャデータが、プ
ログラムＲＯＭ３６から直接ダウンロードされ、保存さ
れる。又、このテクスチャバッファ１４は、ワークＲＡ
Ｍ３８内の記憶領域に設けることもできる。(S6) Next, the texture generation circuit 9
The texture data in the texture buffer 14 is read in accordance with the texture coordinates, which is one attribute of the pixel in FIG. 4, and the texture color of the corresponding pixel is calculated. This is because such a calculation is performed because the data in the texture buffer 14 and the positions of the pixels in the display screen are not always in one-to-one correspondence. In the texture buffer 14, texture data is directly downloaded from the program ROM 36 and stored. The texture buffer 14 stores the work RA
It can also be provided in a storage area in M38.

【００６２】（Ｓ７）次に、輝度計算回路１０が、光源
からの影響に従って、処理中のピクセルでの輝度情報を
計算する。この輝度情報は、例えば、オブジェクトに照
射される光に従う拡散光（ディフューズドカラー）と、
オブジェクト自身が発散する鏡面反射光（スペキュラー
カラー）とが含まれる。(S7) Next, the luminance calculation circuit 10 calculates luminance information at the pixel being processed according to the influence from the light source. This luminance information includes, for example, diffused light (diffused color) according to the light emitted to the object,
Specularly reflected light (specular color) emitted from the object itself is included.

【００６３】（Ｓ８）ぼかし値生成回路１１は、ピクセ
ルのデータから、処理中のピクセルがその周囲にピクセ
ルに与える影響度を示す値として、次の値を演算する。
そして、これを、フィルタバッファ１６にピクセル毎
に、又は所定数のピクセル毎に、書き込む。(S8) The blur value generation circuit 11 calculates the following value from the pixel data as a value indicating the degree of influence of the pixel under processing on the surrounding pixels.
Then, this is written into the filter buffer 16 for each pixel or for a predetermined number of pixels.

【００６４】(1) Ｚ値と被写界深度との差（ピントぼか
し値） (2) 速度ベクトル（モーションブラによるぼかし値） (3) 輝度（光源としてのぼかし値） (4) 半透明のぼかし値（半透明面の背後のぼかし値） (5) ポリゴンに固有のぼかし値（指定されたぼかし値）これらの値を個別に、書き込むのではなく、これらの５
つのデータから、周囲のピクセルに与える影響度を直接
示す重み付け値であることが望ましい。又、処理の重さ
と表示速度とのトレードオフにより、どのぼかし値を、
バッファ１６に格納すべきかを決定することもできる。(1) Difference between Z value and depth of field (focus blur value) (2) Velocity vector (blur value by motion bra) (3) Luminance (blur value as light source) (4) Translucent Blur value (blur value behind translucent surface) (5) Blur value specific to polygon (specified blur value) Rather than writing these values individually,
It is desirable to use a weight value that directly indicates the degree of influence on surrounding pixels from one set of data. Also, depending on the trade-off between processing weight and display speed,
It is also possible to determine whether to store in the buffer 16.

【００６５】（Ｓ９）色変調回路１２は、テクスチャバ
ッファ１４から読みだしたテクスチャカラーと、前記輝
度計算回路１０で求めた拡散光と鏡面反射光とから、輝
度情報に従ったピクセルのカラーデータを求める。テク
スチャカラーは、例えば、その素材が１００％明るい所
で存在する時の素材の色情報である。ピクセルのカラー
データは、次の演算式により、求められる。(S9) The color modulation circuit 12 converts the color data of the pixel according to the luminance information from the texture color read from the texture buffer 14 and the diffuse light and the specular reflection light obtained by the luminance calculation circuit 10. Ask. The texture color is, for example, color information of the material when the material exists in a place that is 100% bright. Pixel color data is obtained by the following equation.

【００６６】カラーデータ＝（テクスチャカラー）×
（拡散光）＋（鏡面反射光）即ち、明るい場所のピクセルであれば、テクスチャカラ
ーがそのまま表現され、これに鏡面としての反射光が加
算される。更に、色変調回路１２は、霧（フォグ）等の
影響も考慮して、変調がかけられる。霧の中のＺ値が大
きいピクセルに対しては、霧の色がＺ値の大きさに従
い、ブレンドされる。Color data = (texture color) ×
(Diffuse light) + (specular reflection light) That is, if the pixel is in a bright place, the texture color is expressed as it is, and the reflection light as a mirror surface is added thereto. Further, the color modulation circuit 12 performs modulation in consideration of the influence of fog or the like. For pixels having a large Z value in the fog, the color of the fog is blended according to the magnitude of the Z value.

【００６７】（Ｓ１０）ブレンド回路１３は、前述の処
理中のピクセルのカラーデータと、フレームバッファ１
７の既に書き込まれたカラーデータとをブレンドして、
ブレンド後のカラーデータを、フレームバッファ１７に
書き込む。例えば、ピクセルの透明度ａに従い、半透明
のピクセルのカラーデータと、その背面にあるピクセル
のカラーデータ（フレームバッファ１７に既に書き込ま
れている）とがブレンドされる。このブレンドされたカ
ラーデータは、画像加工処理（ここでは、ぼかし処理）
が施されていないため、境界部分が鮮明のままである。(S10) The blend circuit 13 stores the color data of the pixel being processed and the frame buffer 1
7 and the already written color data,
The color data after blending is written to the frame buffer 17. For example, according to the transparency a of the pixel, the color data of the translucent pixel is blended with the color data of the pixel on the back side (already written in the frame buffer 17). This blended color data is subjected to image processing (here, blur processing)
, The boundary portion remains sharp.

【００６８】このようにして、各ピクセルのカラーデー
タが、フレームバッファ１７に書き込まれる。In this way, the color data of each pixel is written to the frame buffer 17.

【００６９】（Ｓ１１）次に、畳み込み積分回路１８
は、フィルタバッファ１６のフィルタ値（ぼかし値）に
より、フレームバッファ１７のカラーデータの畳み込み
積分を行い、結果をフレームバッファ１７に書き込む。(S11) Next, the convolution integration circuit 18
Performs convolution integration of the color data of the frame buffer 17 with the filter value (blur value) of the filter buffer 16 and writes the result to the frame buffer 17.

【００７０】（Ｓ１２）表示制御部１９は、フレームバ
ッファ１７の加工されたカラーデータを読み出し、表示
装置２０で表示する。(S12) The display controller 19 reads out the processed color data from the frame buffer 17 and displays it on the display device 20.

【００７１】・・畳み込み積分回路・・次に、図１の畳み込み積分回路１８について、説明す
る。図５は、本発明の一実施の形態の畳み込み積分回路
のブロック図、図６は、その参照領域、フィルタ及び処
理結果の説明図、図７は、その加算ユニットのブロック
図、図８は、画像データの説明図、図９は、そのマスク
回路の構成図、図１０は、マスク動作の説明図、図１１
は、その動作説明図である。Next, the convolution integrator 18 in FIG. 1 will be described. FIG. 5 is a block diagram of a convolution integrator according to an embodiment of the present invention, FIG. 6 is an explanatory diagram of its reference region, filter, and processing result, FIG. 7 is a block diagram of its addition unit, and FIG. FIG. 9 is an explanatory diagram of image data, FIG. 9 is a configuration diagram of the mask circuit, FIG.
FIG.

【００７２】先ず、説明の簡略化のため、図６に示すよ
うに、４×４の領域に対し、３×３のフィルタｇ（ｘ、
ｙ）をかける場合について、説明する。即ち、４×４の
処理結果（生成画像）ｓ（ｘ、ｙ）を得るため、６×６
の原画像ｆ（ｘ、ｙ）を参照し、３×３のフィルタｇ
（ｘ、ｙ）をかける。First, for simplicity of description, as shown in FIG. 6, a 3 × 3 filter g (x, x,
The case of applying y) will be described. That is, to obtain a processing result (generated image) s (x, y) of 4 × 4, 6 × 6
3 × 3 filter g with reference to the original image f (x, y)
Multiply (x, y).

【００７３】この場合の畳み込み積分回路１８は、図５
に示す。図５に示すように、４１、４２は、各々セレク
タであり、セレクト信号によって、パイプライン中のデ
ータの流れを制御する。４３〜５１は、加算ユニットで
あり、畳み込み積分のための演算を行う。５２、５３
は、パイプライン中で、一時的にデータを保持するフリ
ップフロップ（ＦＦ）である。In this case, the convolution integrator 18 is arranged as shown in FIG.
Shown in As shown in FIG. 5, reference numerals 41 and 42 denote selectors, respectively, which control the flow of data in the pipeline by a select signal. Reference numerals 43 to 51 denote addition units, which perform calculations for convolution integration. 52, 53
Is a flip-flop (FF) that temporarily holds data in the pipeline.

【００７４】５４は、マスク回路であり、入力されたフ
ィルタデータＧ00〜Ｇ22から、演算の無効な領域をマス
クするマスク信号Mask0 〜Mask8 を生成する。５５は、
正規化回路であり、加算により得られた結果を正規化す
る。Reference numeral 54 denotes a mask circuit, which generates mask signals Mask0 to Mask8 for masking an invalid operation region from the input filter data G00 to G22. 55 is
The normalization circuit normalizes the result obtained by the addition.

【００７５】加算ユニット４３〜５１は、３×３のフィ
ルタの要素データＧ00〜Ｇ22（図６参照）の数である９
個設けられ、基本的にカスケード接続されている。各加
算ユニット４３〜５１は、後述するマスク回路１４か
ら、対応するフィルタの要素データＧ00〜Ｇ22が供給さ
れている。又、加算ユニット４３〜５１には、ソース
（フレームバッファ１７）から画像データＦ00〜Ｆ55が
並列に供給される。The addition units 43 to 51 are the number of the element data G00 to G22 (see FIG. 6) of the 3 × 3 filter, that is, 9
And are basically cascaded. Each of the adding units 43 to 51 is supplied with corresponding filter element data G00 to G22 from the mask circuit 14 described later. The image data F00 to F55 are supplied in parallel from the source (frame buffer 17) to the addition units 43 to 51.

【００７６】加算ユニット４３〜５１は、基本的には、
供給されたフィルタ要素データと画像データとを乗算
し、その乗算結果を前段の加算ユニットの出力に加算す
る。即ち、加算ユニット４３は、フィルタの要素データ
Ｇ00と入力された画像データとの乗算を行い、入力と加
算する。以下、同様、加算ユニット４４〜５１は、フィ
ルタの要素データＧ10〜Ｇ22と画像データとの乗算を行
い、入力と加算する。The addition units 43 to 51 are basically
The supplied filter element data is multiplied by the image data, and the multiplication result is added to the output of the preceding addition unit. That is, the addition unit 43 multiplies the filter element data G00 by the input image data and adds the result to the input. Hereinafter, similarly, the addition units 44 to 51 multiply the element data G10 to G22 of the filter by the image data and add the result to the input.

【００７７】従って、各加算ユニット４３〜５１に、画
像データを供給することにより、パイプライン演算によ
り、フィルタ処理した画像データＳ00〜Ｓ33が得られ
る。図７は、かかる加算ユニットの回路図、図８は、ソ
ースから供給される画像データの説明図である。Accordingly, by supplying the image data to each of the adding units 43 to 51, filtered image data S00 to S33 can be obtained by pipeline operation. FIG. 7 is a circuit diagram of such an addition unit, and FIG. 8 is an explanatory diagram of image data supplied from a source.

【００７８】図８に示すように、画像データとして、カ
ラーデータを用い、Ｒ、Ｇ、Ｂデータを各々８ビットと
し、後述する正規化のための１ビットのウェイトデータ
（Ｗ）を設けている。従って、元の１画素のカラーデー
タは、２５ビットとなる。As shown in FIG. 8, color data is used as image data, R, G, and B data each have 8 bits, and 1-bit weight data (W) for normalization described later is provided. . Therefore, the original color data of one pixel is 25 bits.

【００７９】図７に示すように、各加算ユニット４３〜
５１は、乗算器６０と、加算器６１〜６４と、保持用フ
リップフロップ６５とからなる。乗算器６０は、例え
ば、各フィルタの要素データ（Filter) を、１／０の１
ビットのデジタルフィルタとした場合は、アンドゲート
で構成される。アンドゲート６０は、画像データSrc
を、要素データに応じて、通過／非通過する。As shown in FIG. 7, each of the adding units 43 to
51 includes a multiplier 60, adders 61 to 64, and a holding flip-flop 65. The multiplier 60 converts, for example, the element data (Filter) of each filter into 1/0.
In the case of a bit digital filter, it is configured by an AND gate. The AND gate 60 stores the image data Src
Is passed / not passed according to the element data.

【００８０】加算器６１〜６４は、Ｒ、Ｇ、Ｂ、Ｗに応
じて設けられ、アンドゲート６０からのＲ、Ｇ、Ｂ、Ｗ
データに、前段の加算器からの入力Ｒ、Ｇ、Ｂ、Ｗデー
タを各々加算する。加算結果は、フリップフロップ６５
で一時保持され、出力される。The adders 61 to 64 are provided in accordance with R, G, B, and W, and the R, G, B, and W from the AND gate 60 are provided.
The input R, G, B, and W data from the preceding adder are added to the data. The addition result is the flip-flop 65
Is temporarily held and output.

【００８１】次に、図５に戻り、この実施例では、更
に、各画像データＦ00〜Ｆ55を一回読みだす（参照す
る）だけで、フィルタ処理した画像データＳ00〜Ｓ33を
演算できるように工夫している。即ち、カスケード接続
し、パイプライン演算するためには、各フィルタの要素
データを乗算する加算ユニットに、対応する画像データ
を分配して供給する必要がある。これを、並列パスを設
けて、分配すると、配線パスが増加し、且つ読み出し回
数も増加する。Returning to FIG. 5, in this embodiment, furthermore, the image data S00 to S33 subjected to the filtering process can be calculated only by reading (referencing) each of the image data F00 to F55 once. are doing. That is, in order to perform cascade connection and pipeline operation, it is necessary to distribute and supply corresponding image data to an addition unit that multiplies element data of each filter. If this is provided and distributed by providing a parallel path, the number of wiring paths increases and the number of times of reading also increases.

【００８２】この実施例では、これを防止するため、シ
ーケンシャルに読みだされた画像データＦ00〜Ｆ55に同
期して、対応する画像データが入力された時に、各加算
ユニット４３〜５１に、演算を行わせるための工夫をし
ている。In this embodiment, in order to prevent this, when the corresponding image data is input in synchronization with the sequentially read image data F00 to F55, the arithmetic operation is performed in each of the adding units 43 to 51. We are trying to make it work.

【００８３】このため、第１に、加算ユニットの演算を
有効／無効にするためのマスク信号を発生するマスク回
路１４を設けている。第２に、パイプライン中のデータ
を保持するため、フリップフロップ５２、５５を設け、
且つセレクタ４１、４２を設けて、加算ユニットを保持
回路に利用している。For this reason, first, a mask circuit 14 for generating a mask signal for validating / invalidating the operation of the adding unit is provided. Second, flip-flops 52 and 55 are provided to hold the data in the pipeline,
Further, selectors 41 and 42 are provided, and the addition unit is used for the holding circuit.

【００８４】先ず、マスク回路１４を、図９及び図１０
により、説明する。図９に示すように、画像データの読
み出しクロックＣＬをカウントするＸカウンタ７０と、
Ｘカウンタ７０が所定値となった時に、ゲート信号を発
生するＸ信号発生回路７１と、Ｘカウンタ７０のカウン
トアップをカウントするＹカウンタ７２と、Ｙカウンタ
７２が所定値となった時に、Ｙ信号を発生するＹ信号発
生回路７３とを有する。First, the mask circuit 14 is connected to the circuit shown in FIGS.
Will be described. As shown in FIG. 9, an X counter 70 that counts a read clock CL of image data,
An X signal generating circuit 71 for generating a gate signal when the X counter 70 has a predetermined value, a Y counter 72 for counting up the count of the X counter 70, and a Y signal when the Y counter 72 has a predetermined value. And a Y signal generation circuit 73 for generating

【００８５】Ｙシフトレジスタ７４は、３ビットのシフ
トレジスタで構成され、入力されたＹ信号をＸカウンタ
７０のカウントアップによりシフトする。Ｙシフトレジ
スタ７４の各ビットは、アンドゲート７５、７６、７７
に入力されている。アンドゲート７５、７６、７７の出
力は、３つのＸシフトレジスタ７８、７９、８０の入力
となる。各Ｘシフトレジスタ７８、７９、８０は、各々
３ビットのシフトレジスタで構成され、入力された信号
をクロックＣＬによりシフトする。The Y shift register 74 is constituted by a 3-bit shift register, and shifts the input Y signal by counting up the X counter 70. Each bit of the Y shift register 74 includes AND gates 75, 76, 77
Has been entered. The outputs of the AND gates 75, 76, 77 are input to the three X shift registers 78, 79, 80. Each of the X shift registers 78, 79, and 80 is constituted by a 3-bit shift register, and shifts an input signal by a clock CL.

【００８６】各Ｘシフトレジスタ７８、７９、８０の各
ビットは、各々９つのアンドゲート８１〜８９に入力さ
れている。このアンドゲート８１〜８９は、３×３のフ
ィルタの要素数に対応しており、各々要素データＧ22〜
Ｇ00が入力され、各々マスク信号Mask8 〜Mask0 を出力
する。このマスク信号Mask0 〜Mask8 は、図５の各加算
ユニット４３〜５１に、（図７のフイルタ信号Filter）
として、入力される。Each bit of each of the X shift registers 78, 79 and 80 is input to nine AND gates 81 to 89, respectively. The AND gates 81 to 89 correspond to the number of elements of the 3 × 3 filter, and element data G22 to G22 respectively.
G00 is input and outputs mask signals Mask8 to Mask0, respectively. These mask signals Mask0 to Mask8 are supplied to each of the adding units 43 to 51 in FIG. 5 (the filter signal Filter in FIG. 7).
Is input.

【００８７】この動作を、図１０により、説明する。Ｘ
カウンタ７２は、クロックを計数し、「０」〜「７」ま
で変化し、カウントアップする。そして、Ｘ信号発生回
路７１は、Ｘカウンタ７２が、「０」〜「３」の値を示
す時に、ゲート信号を「１」とする。Ｙカウンタ７２
は、Ｘカウンタ７０のカウントアップクロックを計数
し、「０」〜「５」まで変化する。そして、Ｙ信号発生
回路７４は、Ｙカウンタ７３が、「０」〜「３」の値を
示す時に、Ｙ信号を「１」とする。This operation will be described with reference to FIG. X
The counter 72 counts the clock, changes from “0” to “7”, and counts up. Then, the X signal generation circuit 71 sets the gate signal to “1” when the X counter 72 indicates a value of “0” to “3”. Y counter 72
Counts the count-up clock of the X counter 70 and changes from “0” to “5”. Then, the Y signal generation circuit 74 sets the Y signal to “1” when the Y counter 73 indicates a value of “0” to “3”.

【００８８】図１０は、Ｙ及びＸカウンタの値と、各３
ビットの３つのシフトレジスタ７８〜８０の状態を３×
３のマトリックスで表示したものである。図１０におい
て、３×３のマトリックスの上段は、シフトレジスタ７
８の３ビットの状態を示し、左端は、Mask0 、中央は、
Mask1 、右端は、Mask2 である。又、３×３のマトリッ
クスの中段は、シフトレジスタ７９の３ビットの状態を
示し、左端は、Mask3、中央は、Mask4 、右端は、Mask5
である。更に、３×３のマトリックスの下段は、シフ
トレジスタ８０の３ビットの状態を示し、左端は、Mask
6 、中央は、Mask7 、右端は、Mask8 である。FIG. 10 shows the values of the Y and X counters and 3
The state of the three bit shift registers 78 to 80 is 3 ×
3 is displayed in a matrix. In FIG. 10, the upper stage of the 3 × 3 matrix is the shift register 7
8 indicates the state of 3 bits, the left end is Mask0, and the center is
Mask1 and the right end are Mask2. The middle stage of the 3 × 3 matrix shows the 3-bit state of the shift register 79. The left end is Mask3, the center is Mask4, and the right end is Mask5.
It is. Further, the lower stage of the 3 × 3 matrix shows the 3-bit state of the shift register 80, and the left end is the Mask
6, the center is Mask7 and the right end is Mask8.

【００８９】図１１のタイムチャート図に従い、図５の
回路の畳み込み積分動作を説明する。図１１において、
Src は、ソース（フレームバッファ１７）からの画像デ
ータであり、Ｇ00〜Ｇ22で示される列は、フィルタの各
要素が供給される加算ユニット４３〜５１が、保持する
値が、生成画像の要素Ｓ00〜Ｓ33のどれであるかを示
す。Ｘ0 、Ｘ1 は、ＦＦ５２、５３が保持する生成画像
の要素である。Ｓｅｌは、セレクタ信号の状態を示す。The convolution operation of the circuit of FIG. 5 will be described with reference to the time chart of FIG. In FIG.
Src is the image data from the source (frame buffer 17), and the columns indicated by G00 to G22 indicate that the values held by the addition units 43 to 51 to which the respective elements of the filter are stored are the elements S00 of the generated image. To S33. X0 and X1 are elements of the generated image held by the FFs 52 and 53. Sel indicates the state of the selector signal.

【００９０】セレクタ信号が「１」の時は、セレクタ４
１により、Ｘ０のデータは、Ｇ00の加算ユニット４３
に、フィードバックされ、セレクタ４２により、Ｘ１の
データは、Ｇ01の加算ユニット４６にフィードバックさ
れる。一方、セレクタ信号が「０」の時は、セレクタ４
２により、Ｘ０のデータは、Ｇ01の加算ユニット４６
に、入力され、セレクタ４１により、「０」データが、
Ｇ00の加算ユニット４３に入力され、Ｘ１のデータは、
Ｇ02の加算ユニット４９に渡される。When the selector signal is "1", the selector 4
According to 1, the data of X0 is added to the addition unit 43 of G00.
The selector 42 feeds back the data of X1 to the adding unit 46 of G01. On the other hand, when the selector signal is “0”, the selector 4
2, the data of X0 is added to the addition unit 46 of G01.
And the selector 41 outputs “0” data,
The data input to the addition unit 43 of G00 and X1 is
G02 is passed to the addition unit 49.

【００９１】Ｇ00〜Ｇ22の行において、実線で囲まれて
いる部分は、演算が行われないように、マスク回路５４
によりマスクされていることを示す。実線で囲まれてい
ない部分は、各加算ユニット４３〜５１は、フィルタ要
素Ｇｉｊが「１」である時に、画像データSrc を、入力
データに加算する。In the rows of G00 to G22, the portion surrounded by the solid line is the mask circuit 54 so that the operation is not performed.
Indicates that it is masked. The addition units 43 to 51 add the image data Src to the input data when the filter element Gij is “1” in the portion not surrounded by the solid line.

【００９２】クロック０で、画像データＦ00が各加算ユ
ニット４３〜５１に供給された時に、マスク信号の０
（図１０）により、加算ユニット４３以外の加算ユニッ
ト４４〜５１の演算は無効にされるため、加算ユニット
４３のみが、生成画像の要素Ｓ00( ＝G00 ・F00)の演算
を行う。At the clock 0, when the image data F00 is supplied to each of the adding units 43 to 51, the mask signal 0
By (FIG. 10), the operations of the addition units 44 to 51 other than the addition unit 43 are invalidated, and only the addition unit 43 performs the operation of the element S00 (= G00 · F00) of the generated image.

【００９３】同様に、クロック１で、画像データＦ10が
各加算ユニット４３〜５１に供給された時に、マスク信
号の１（図１０）により、加算ユニット４３、４４以外
の加算ユニット４５〜５１の演算は無効にされるため、
加算ユニット４３は、生成画像Ｓ10( ＝G00 ・F10)、加
算ユニット４４は、生成画像の要素Ｓ00( ＝G00 ・F00
＋G10 ・F10)の演算を行う。Similarly, when the image data F10 is supplied to each of the addition units 43 to 51 at the clock 1, the operation of the addition units 45 to 51 other than the addition units 43 and 44 is performed by the mask signal 1 (FIG. 10). Will be disabled,
The adding unit 43 generates the generated image S10 (= G00 · F10), and the adding unit 44 generates the generated image element S00 (= G00 · F00).
+ G10 ・ F10) is calculated.

【００９４】更に、クロック２で、画像データＦ20が各
加算ユニット４３〜５１に供給された時に、マスク信号
の２（図１０）により、加算ユニット４３、４４、４５
以外の加算ユニット４６〜５１の演算は無効にされるた
め、加算ユニット４３は、生成画像Ｓ20( ＝G00 ・F2
0)、加算ユニット４４は、生成画像の要素Ｓ10( ＝G00
・F10 ＋G10 ・F20)、加算ユニット４５は、生成画像の
要素Ｓ00( ＝G00 ・F00＋G10 ・F10 ＋G20 ・F20)の演
算を行う。Further, when the image data F20 is supplied to each of the adding units 43 to 51 at the clock 2, when the mask signal 2 (FIG. 10) is used, the adding units 43, 44 and 45 are used.
Since the operations of the addition units 46 to 51 other than are invalidated, the addition unit 43 outputs the generated image S20 (= G00 · F2
0), the adding unit 44 determines the element S10 (= G00
-F10 + G10-F20) and the addition unit 45 calculate the element S00 (= G00-F00 + G10-F10 + G20-F20) of the generated image.

【００９５】更に、クロック３で、画像データＦ30が各
加算ユニット４３〜５１に供給された時に、マスク信号
の３（図１０）により、加算ユニット４３、４４、４５
以外の加算ユニット４６〜５１の演算は無効にされるた
め、加算ユニット４３は、生成画像Ｓ30( ＝G00 ・F3
0)、加算ユニット４４は、生成画像の要素Ｓ20( ＝G00
・F20 ＋G10 ・F30)、加算ユニット４５は、生成画像の
要素Ｓ10( ＝G00 ・F10＋G10 ・F20 ＋G20 ・F30)の演
算を行う。生成画像の要素Ｓ00( ＝G00 ・F00 ＋G10 ・
F10 ＋G20 ・F20)は、ＦＦ５２に保持され、セレクタ４
１により、加算ユニット４３にフィードバックされる。Further, when the image data F30 is supplied to each of the addition units 43 to 51 at clock 3, when the mask signal 3 (FIG. 10) is used, the addition units 43, 44, and 45 are output.
Since the operations of the addition units 46 to 51 other than are invalidated, the addition unit 43 outputs the generated image S30 (= G00 · F3
0), the addition unit 44 determines whether the element S20 (= G00
-F20 + G10-F30) and the addition unit 45 calculate the element S10 (= G00-F10 + G10-F20 + G20-F30) of the generated image. Element S00 of generated image (= G00 · F00 + G10 ·
F10 + G20 · F20) is held in the FF 52 and the selector 4
1 is fed back to the adding unit 43.

【００９６】クロック４で、画像データＦ40が各加算ユ
ニット４３〜５１に供給された時に、マスク信号の４
（図１０）により、加算ユニット４４、４５以外の加算
ユニット４３、４６〜５１の演算は無効にされるため、
加算ユニット４３は、フィードバックされた生成画像Ｓ
00を保持し、加算ユニット４４は、生成画像の要素Ｓ30
( ＝G00 ・F30 ＋G10 ・F40)、加算ユニット４５は、生
成画像の要素Ｓ20( ＝G00 ・F20 ＋G10 ・F30 ＋G20 ・
F40)の演算を行う。生成画像の要素Ｓ10( ＝G00・F10
＋G10 ・F20 ＋G20 ・F30)は、ＦＦ５２に保持され、セ
レクタ４１により、加算ユニット４３にフィードバック
される。At the clock 4, when the image data F40 is supplied to each of the adding units 43 to 51, the mask signal 4
By (FIG. 10), the operations of the addition units 43 and 46 to 51 other than the addition units 44 and 45 are invalidated.
The adding unit 43 outputs the generated image S
00, and the addition unit 44 outputs the element S30 of the generated image.
(= G00 · F30 + G10 · F40), the addition unit 45 generates the element S20 (= G00 · F20 + G10 · F30 + G20 ·) of the generated image.
Perform the calculation of F40). Element S10 of generated image (= G00 · F10
+ G10 · F20 + G20 · F30) is held in the FF 52, and is fed back to the addition unit 43 by the selector 41.

【００９７】クロック５で、画像データＦ50が各加算ユ
ニット４３〜５１に供給された時に、マスク信号の５
（図１０）により、加算ユニット４５以外の加算ユニッ
ト４３、４４、４６〜５１の演算は無効にされるため、
加算ユニット４３、４４は、フィードバックされた生成
画像Ｓ01、Ｓ00を保持し、加算ユニット４５は、生成画
像の要素Ｓ30( ＝G00 ・F30 ＋G10 ・F40 ＋G20 ・F50)
の演算を行う。生成画像の要素Ｓ20は、ＦＦ５２に保持
され、セレクタ４１により、加算ユニット４３にフィー
ドバックされる。At the clock 5, when the image data F50 is supplied to each of the addition units 43 to 51, the mask signal 5
By (FIG. 10), the operations of the addition units 43, 44, 46 to 51 other than the addition unit 45 are invalidated.
The addition units 43 and 44 hold the generated images S01 and S00 that have been fed back, and the addition unit 45 generates the elements S30 (= G00 · F30 + G10 · F40 + G20 · F50) of the generated images.
Is calculated. The element S20 of the generated image is held in the FF 52, and is fed back to the addition unit 43 by the selector 41.

【００９８】クロック６、７では、画像データは供給さ
れず、全加算ユニット４３〜５１がマスクされる。この
ため、各加算ユニット４３〜４５と、ＦＦ５２に保持さ
れたデータが、後段の回路にシフトされる。これによ
り、生成画像Ｓ00が、ＦＦ５２に保持される。即ち、生
成画像Ｓ00〜Ｓ30を、加算ユニット４６へ出力する準備
が整う。この事は、原画像の１列の６つの画像データを
直列に入力した際に、加算ユニットをデータシフト回路
に利用して、ハードウェアの規模を小さくしている。In the clocks 6 and 7, no image data is supplied, and the full addition units 43 to 51 are masked. For this reason, the data held in each of the adding units 43 to 45 and the FF 52 are shifted to a subsequent circuit. As a result, the generated image S00 is held in the FF 52. That is, the preparation for outputting the generated images S00 to S30 to the adding unit 46 is completed. This means that when six image data in one row of the original image are input in series, the addition unit is used for the data shift circuit to reduce the scale of hardware.

【００９９】クロック８では、画像データＦ01が各加算
ユニット４３〜５１に供給され、マスク信号の８（図１
０）により、加算ユニット４３、４６以外の加算ユニッ
ト４４、４５、４７〜５１の演算は無効にされるため、
加算ユニット４３は、生成画像の要素Ｓ01( ＝Ｇ00・F0
1)を演算し、加算ユニット４４は、Ｓ30を保持し、加算
ユニット４５は、Ｓ20を保持し、FF52は、Ｓ10を保持す
る。加算ユニット４６は、生成画像の要素Ｓ00( ＝G00
・F00 ＋G10 ・F10 ＋G20 ・F20 ＋G01 ・F01)の演算を
行う。At clock 8, the image data F01 is supplied to each of the adding units 43 to 51, and the mask data 8 (FIG. 1)
0), the operations of the addition units 44, 45, 47 to 51 other than the addition units 43, 46 are invalidated.
The addition unit 43 determines the element S01 (= G00 · F0) of the generated image.
1) is calculated, the adding unit 44 holds S30, the adding unit 45 holds S20, and the FF 52 holds S10. The addition unit 46 generates the element S00 (= G00
・ F00 + G10 ・ F10 + G20 ・ F20 + G01 ・ F01) is calculated.

【０１００】以下、同様にして、クロック９〜１３で、
加算ユニット４３〜４８は、生成画像Ｓ00〜Ｓ31の演
算、フィードバックを行う。そして、クロック１４、１
５で、生成画像Ｓ00、Ｓ01が、ＦＦ５２、５３に戻る。Hereinafter, similarly, at clocks 9 to 13,
The addition units 43 to 48 perform calculations and feedback on the generated images S00 to S31. And clocks 14, 1
At 5, the generated images S00 and S01 return to the FFs 52 and 53.

【０１０１】クロック１６で、画像データＦ02が各加算
ユニット４３〜５１に供給され、マスク信号の１６（図
１０）により、加算ユニット４３、４６、４９以外の加
算ユニット４４、４５、４７〜４８、５０〜５１の演算
は無効にされるため、加算ユニット４３は、生成画像の
要素Ｓ02( ＝Ｇ00・F02)を演算し、加算ユニット４４、
４５は、Ｓ31、Ｓ21を保持し、FF52は、Ｓ11を保持す
る。加算ユニット４６は、生成画像の要素Ｓ01( ＝G00
・F10 ＋G10 ・F20 ＋G20 ・F30 ＋G01 ・F02)の演算を
行い、加算ユニット４９は、生成画像の要素Ｓ00( ＝G0
0 ・F00 ＋G10 ・F10 ＋G20 ・F20 ＋G01 ・F01 ＋G11
・F11 ＋G21 ・F21 ＋G02 ・F02 ) の演算を行う。At the clock 16, the image data F02 is supplied to each of the adding units 43 to 51, and by the mask signal 16 (FIG. 10), the adding units 44, 45, 47 to 48 other than the adding units 43, 46, and 49 are used. Since the operations of 50 to 51 are invalidated, the adding unit 43 calculates the element S02 (= G00 · F02) of the generated image, and the adding unit 44,
45 holds S31 and S21, and FF52 holds S11. The adder unit 46 generates the element S01 (= G00
・ F10 + G10 ・ F20 + G20 ・ F30 + G01 ・ F02), and the addition unit 49 outputs the element S00 (= G0) of the generated image.
0 ・ F00 + G10 ・ F10 + G20 ・ F20 + G01 ・ F01 + G11
・ F11 + G21 ・ F21 + G02 ・ F02) is calculated.

【０１０２】このようにして、クロック１８で、生成画
像Ｓ00( ＝G00 ・F00 ＋G10 ・F10＋G20 ・F20 ＋G01
・F01 ＋G11 ・F11 ＋G21 ・F21 ＋G02 ・F02 ＋G12 ・
F12＋G22 ・F22 ) の演算が終了し、正規化回路５５に
より、正規化され、フレームバッファ１７に書き込まれ
る。As described above, the generated image S00 (= G00.F00 + G10.F10 + G20.F20 + G01) is generated by the clock 18.
・ F01 + G11 ・ F11 + G21 ・ F21 + G02 ・ F02 + G12 ・
The calculation of (F12 + G22 · F22) is completed, normalized by the normalization circuit 55, and written into the frame buffer 17.

【０１０３】以下、生成画像Ｓ10〜Ｓ33も同様である。
この図から判るように、４×４の領域に対し、３×３の
フィルタをかける畳み込み演算を、わずか４５サイクル
で行うことができる。しかも、その際のメモリの参照回
数も、参照画素数の３６回と全く無駄がない。即ち、各
画像データを１回しか参照していない。Hereinafter, the same applies to the generated images S10 to S33.
As can be seen from this figure, a convolution operation of applying a 3 × 3 filter to a 4 × 4 area can be performed in only 45 cycles. In addition, the number of references to the memory at that time is 36, which is the number of reference pixels, and there is no waste. That is, each image data is referred to only once.

【０１０４】又、セレクタ４１、４２を設けて、パイプ
ライン中のデータの流れを制御しているため、加算ユニ
ットを、畳み込み積分に必要でないサイクルに、データ
シフト回路に使用でき、６×６のデータを処理するため
に、加算ユニットの他に、２つのＦＦを設けるだけで良
く、回路規模を大幅に小さくできる。例えば、前述の図
６の２５ビットのカラーデータの場合には、パイプライ
ンを流れるデータは、９回累積されるため、各々Ｒ，
Ｇ，Ｂ，Ｗに３ビット付加され、計３７ビットとなる。
従って、各ＦＦは、３７ビットのＦＦで構成され、加算
ユニットの他に、７４ビットのＦＦを追加するだけでよ
い。Further, since the selectors 41 and 42 are provided to control the flow of data in the pipeline, the adder unit can be used for the data shift circuit in cycles not required for convolution integration, and the 6 × 6 In order to process data, it is only necessary to provide two FFs in addition to the addition unit, and the circuit scale can be significantly reduced. For example, in the case of the 25-bit color data of FIG. 6 described above, the data flowing through the pipeline is accumulated nine times, so that each of R, R
Three bits are added to G, B, and W, for a total of 37 bits.
Therefore, each FF is composed of a 37-bit FF, and it is only necessary to add a 74-bit FF in addition to the addition unit.

【０１０５】更に、マスク回路を設け、パイプライン演
算器の一部の演算を無効に制御しているため、加算ユニ
ットをデータシフト回路に利用でき、回路規模を大幅に
小さくできる。Further, since a mask circuit is provided to partially control the operation of the pipeline arithmetic unit, the addition unit can be used as a data shift circuit, and the circuit scale can be significantly reduced.

【０１０６】前述の正規化データＷの意味は、フィルタ
の要素データに応じて、各ＲＧＢデータの加算数が異な
るため、正規化データＷを付加し、その加算数を積算
し、得られたＲＧＢデータを、正規化データで割ること
により、正規化された出力を得るものである。The meaning of the above-described normalized data W is that the number of additions of each of the RGB data is different depending on the element data of the filter. Therefore, the normalized data W is added, the added number is integrated, and the obtained RGB is integrated. By dividing the data by the normalized data, a normalized output is obtained.

【０１０７】このフィルタ処理は、フレームバッファ１
７の画面の特定領域又は全領域に対して施される。従っ
て、同様の回路を、１６×１６の領域に対して、１５×
１５のフィルタをかけるケースについて、作成した場合
には、１５×１５の加算ユニットと、１４個のＦＦを設
けることにより、実現できる。この場合には、演算を９
６０サイクル、メモリアクセスを９００回で、１６×１
６の領域のフィルタ処理を完了できる。前述のＶＧＡの
全領域に対して行った場合には、演算は、１１５，２０
０サイクルとなり、ＣＰＵ等で行うのに対して、非常に
高速に全領域をフィルタ処理できる。This filtering process is performed in the frame buffer 1
7 is applied to a specific area or the entire area of the screen. Therefore, a similar circuit is applied to a 16 × 16 area by 15 ×
In the case of creating a case where 15 filters are applied, it can be realized by providing a 15 × 15 addition unit and 14 FFs. In this case, the calculation is 9
60 cycles, 900 memory accesses, 16 × 1
The filtering process for the area No. 6 can be completed. When the above operation is performed for all the areas of the VGA, the operations are 115, 20
This is 0 cycle, and the entire area can be filtered at a very high speed as compared with the case where the processing is performed by the CPU or the like.

【０１０８】・・他の実施の態様・・図１２は、本発明の他の実施の態様の畳み込み演算回路
の回路図である。この実施の態様は、３×３のフィルタ
処理を行う図５の一実施の態様の回路から、セレクタ４
１，４２、マスク回路５４を削除したものであり、図５
で示したものと同一のものは、同一の記号で示してあ
る。FIG. 12 is a circuit diagram of a convolution circuit according to another embodiment of the present invention. This embodiment is different from the circuit of the embodiment of FIG.
1 and 42, and the mask circuit 54 is deleted.
Those that are the same as those shown by are indicated by the same symbols.

【０１０９】加算ユニット４３〜５１は、図５のものと
同一のものである。加算ユニット４３の後段に、５つの
ＦＦ５２が設けられている。又、加算ユニット４８の後
段に、５つのＦＦ５３が設けられている。The addition units 43 to 51 are the same as those in FIG. Five FFs 52 are provided at the subsequent stage of the addition unit 43. Further, five FFs 53 are provided at the subsequent stage of the adding unit 48.

【０１１０】図１３は、そのタイムチャート図であり、
図１１のタイムチャートに合わせて示してある。この実
施の態様では、図１１と同様に、８サイクルで、図６の
原画像の１列のデータの演算が終了するようにしてあ
る。このため、前述のように、５つのＦＦ５２、５３を
設け、演算結果を遅延している。更に、例えば、Ｆ40・
G00 、F50 ・G00 等、畳み込み積分に必要でない演算も
行われるが、図示しない正規化回路５５の後段で、無効
とすればよい。即ち、有効な演算結果のみをバッファに
取り込めばよい。FIG. 13 is a time chart thereof.
This is shown in accordance with the time chart of FIG. In this embodiment, similarly to FIG. 11, the calculation of the data of one column of the original image of FIG. 6 is completed in eight cycles. Therefore, as described above, five FFs 52 and 53 are provided to delay the operation result. Further, for example, F40.
Although operations not required for convolution integration, such as G00, F50 and G00, are also performed, they may be invalidated at a stage subsequent to the normalization circuit 55 (not shown). That is, only the effective operation results need to be captured in the buffer.

【０１１１】この実施の態様では、図１１のタイムチャ
ートと合わせて、４５サイクルで処理を完了するように
示しているが、更に演算時間を短縮できる。即ち、図１
１の例では、加算ユニットをデータシフト回路に用い、
必要なデータをシフトするため、データを読みださず、
演算を行わないクロック周期（例えば、６、７、１４、
１５、２２、２３、３０、３１、３８、３９）を設けて
いたが、これを省くことができる。回路としては、５つ
のＦＦ５２、５３を、３つのＦＦで構成する。In this embodiment, the processing is completed in 45 cycles together with the time chart of FIG. 11, but the operation time can be further reduced. That is, FIG.
In the example of 1, the addition unit is used for the data shift circuit,
To shift the required data, do not read the data,
A clock cycle at which no operation is performed (for example, 6, 7, 14,
15, 22, 23, 30, 31, 38, 39), but this can be omitted. As a circuit, five FFs 52 and 53 are constituted by three FFs.

【０１１２】これにより、演算サイクルは、図１３の前
記クロック周期が削除され、１０サイクル短縮され、３
６サイクルに短縮できる。但し、図１１の構成に比し、
ＦＦが、４つ余計に必要となる。従って、演算速度と、
回路規模に応じて、必要な構成を選択できる。As a result, the operation cycle is shortened by 10 cycles by eliminating the clock cycle of FIG.
It can be reduced to 6 cycles. However, compared to the configuration of FIG.
Four more FFs are required. Therefore, the calculation speed and
The required configuration can be selected according to the circuit scale.

【０１１３】図１４は、本発明の別の実施の態様の畳み
込み演算回路の回路図である。この実施の態様は、３×
３のフィルタ処理を行う図５の一実施の態様の回路か
ら、セレクタ４１、４２を削除したものであり、図５で
示したものと同一のものは、同一の記号で示してある。FIG. 14 is a circuit diagram of a convolution operation circuit according to another embodiment of the present invention. This embodiment is 3 ×
5 is obtained by removing selectors 41 and 42 from the circuit of the embodiment of FIG. 5 that performs the filtering process 3 and the same components as those shown in FIG. 5 are denoted by the same symbols.

【０１１４】加算ユニット４３〜５１は、図５のものと
同一のものである。加算ユニット４３の後段に、５つの
ＦＦ５２が設けられている。又、加算ユニット４８の後
段に、５つのＦＦ５３が設けられている。マスク回路５
４は、図５と同一のものである。The addition units 43 to 51 are the same as those in FIG. Five FFs 52 are provided at the subsequent stage of the addition unit 43. Further, five FFs 53 are provided at the subsequent stage of the adding unit 48. Mask circuit 5
4 is the same as FIG.

【０１１５】図１５は、そのタイムチャート図であり、
図１１のタイムチャートに合わせて示してある。この実
施の態様では、図１１と同様に、８サイクルで、図６の
原画像の１列のデータの演算が終了するようにしてあ
る。このため、前述のように、５つのＦＦ５２、５３を
設け、演算結果を遅延している。更に、例えば、Ｆ40・
G00 、F50 ・G00 等、畳み込み積分に必要でない演算
を、図１１と同様に、マスク回路５４により、無効とし
ている。これにより、必要でないデータが「０」とな
り、誤動作を防止できる。又、マスク回路の代わりに、
セレクタを設けることにより、同様の動作を実現でき
る。FIG. 15 is a time chart thereof.
This is shown in accordance with the time chart of FIG. In this embodiment, similarly to FIG. 11, the calculation of the data of one column of the original image of FIG. 6 is completed in eight cycles. Therefore, as described above, five FFs 52 and 53 are provided to delay the operation result. Further, for example, F40.
Operations not required for convolution integration, such as G00, F50 and G00, are invalidated by the mask circuit 54 as in FIG. As a result, unnecessary data becomes “0”, and malfunction can be prevented. Also, instead of a mask circuit,
By providing a selector, a similar operation can be realized.

【０１１６】この実施の態様では、図１１のタイムチャ
ートと合わせて、４５サイクルで処理を完了するように
示しているが、更に演算時間を短縮できる。即ち、図１
１の例では、加算ユニットをデータシフト回路に用い、
必要なデータをシフトするため、データを読みださず、
演算を行わないクロック周期（例えば、６、７、１４、
１５、２２、２３、３０、３１、３８、３９）を設けて
いたが、これを省くことができる。回路としては、５つ
のＦＦ５２、５３を、３つのＦＦで構成する。In this embodiment, the processing is completed in 45 cycles together with the time chart of FIG. 11, but the operation time can be further reduced. That is, FIG.
In the example of 1, the addition unit is used for the data shift circuit,
To shift the required data, do not read the data,
A clock cycle at which no operation is performed (for example, 6, 7, 14,
15, 22, 23, 30, 31, 38, 39), but this can be omitted. As a circuit, five FFs 52 and 53 are constituted by three FFs.

【０１１７】これにより、演算サイクルは、１０サイク
ル短縮され、３６サイクルに短縮できる。但し、図１１
の構成に比し、ＦＦが、４つ余計に必要となる。従っ
て、演算速度と、回路規模に応じて、必要な構成を選択
できる。As a result, the operation cycle is reduced by 10 cycles, and can be reduced to 36 cycles. However, FIG.
As compared with the configuration of the above, four extra FFs are required. Therefore, a necessary configuration can be selected according to the calculation speed and the circuit scale.

【０１１８】上述の実施の態様の他に、本発明は、次の
ような変形が可能である。In addition to the above-described embodiment, the present invention can be modified as follows.

【０１１９】(1) 前述の実施の態様では、フィルタ処理
を画像のぼかし処理で説明したが、エッジ強調処理等他
の画像加工処理を適用できる。(1) In the above embodiment, the filter processing is described as the image blur processing, but other image processing such as edge enhancement processing can be applied.

【０１２０】(2) ピクセル単位に、フィルタ特性を設定
する例で、説明したが、画像全体のい単位、領域単位
に、フィルタ特性を設定できる。(2) In the above description, the filter characteristics are set for each pixel. However, the filter characteristics can be set for every unit or region of the entire image.

【０１２１】(3) 加算ユニットの乗算器を、アンドゲー
トで説明したが、又、フィルタ要素データを１／０の１
ビットで説明したが、フィルタ要素データを２ビット以
上とし、他の乗算器を用いることもできる。(3) The multiplier of the addition unit has been described as an AND gate.
Although described in terms of bits, the filter element data may be two bits or more, and another multiplier may be used.

【０１２２】以上、本発明を実施の形態により説明した
が、本発明の主旨の範囲内で種々の変形が可能であり、
これらを本発明の範囲から排除するものではない。Although the present invention has been described with reference to the embodiment, various modifications are possible within the scope of the present invention.
They are not excluded from the scope of the present invention.

【０１２３】[0123]

【発明の効果】以上説明したように、本発明によれば、
次の効果を奏する。As described above, according to the present invention,
The following effects are obtained.

【０１２４】(1) 畳み込み積分処理に、加算器をカスケ
ード接続したパイプライン演算器を用いているため、畳
み込み積分の演算を並列に実行でき、高速の畳み込み積
分演算が可能となる。特に、画像加工処理を表示に間に
合うように、高速に実行できる。(1) Since a pipeline arithmetic unit in which adders are cascaded is used for the convolution integral processing, the convolution integral operation can be executed in parallel, and a high-speed convolution integral operation can be performed. In particular, the image processing can be executed at a high speed in time for displaying.

【０１２５】(2) 読みだした表示データを効率良く使用
するように、加算ユニット及びパイプライン演算器を構
成し、メモリの参照回数を大幅に低減し、トータルの演
算速度を向上できる。即ち、パイプライン演算器の各加
算ユニットを、共通の表示データとフィルタの各要素デ
ータとを乗算する乗算回路と、加算器とで構成すること
により、一度参照した表示データをフィルタの各要素で
利用できるようにして、データの参照回数を低減する。(2) The addition unit and the pipeline operation unit are configured so as to use the read display data efficiently, so that the number of times of referring to the memory can be greatly reduced and the total operation speed can be improved. That is, by configuring each addition unit of the pipeline arithmetic unit with a multiplication circuit that multiplies the common display data and each element data of the filter and an adder, the display data once referred to by each element of the filter. Make it available and reduce the number of data references.

[Brief description of the drawings]

【図１】本発明の一実施の形態の画像処理装置の構成図
である。FIG. 1 is a configuration diagram of an image processing apparatus according to an embodiment of the present invention.

【図２】図１の画像処理のための処理フロー図である。FIG. 2 is a processing flowchart for the image processing of FIG. 1;

【図３】本発明の一実施の形態のポリゴンデータの説明
図である。FIG. 3 is an explanatory diagram of polygon data according to an embodiment of the present invention.

【図４】本発明の一実施の形態のピクセルデータの説明
図である。FIG. 4 is an explanatory diagram of pixel data according to an embodiment of the present invention.

【図５】図１の畳み込み積分回路の回路図である。FIG. 5 is a circuit diagram of the convolution integrator of FIG. 1;

【図６】図５の画像処理の説明図である。FIG. 6 is an explanatory diagram of the image processing of FIG. 5;

【図７】図５の加算ユニットの構成図である。FIG. 7 is a configuration diagram of an addition unit of FIG. 5;

【図８】図５の画像データの構成図である。FIG. 8 is a configuration diagram of the image data of FIG. 5;

【図９】図５のマスク回路の構成図である。FIG. 9 is a configuration diagram of the mask circuit of FIG. 5;

【図１０】図９のマスク回路の動作説明図である。FIG. 10 is an operation explanatory diagram of the mask circuit of FIG. 9;

【図１１】図５の回路のタイムチャート図である。FIG. 11 is a time chart of the circuit of FIG. 5;

【図１２】本発明の他の実施の態様の畳み込み積分回路
の構成図である。FIG. 12 is a configuration diagram of a convolution integrator according to another embodiment of the present invention.

【図１３】図１２の回路のタイムチャート図である。FIG. 13 is a time chart of the circuit of FIG. 12;

【図１４】本発明の別の実施の態様の畳み込み積分回路
の構成図である。FIG. 14 is a configuration diagram of a convolution integrator according to another embodiment of the present invention.

【図１５】図１４の回路のタイムチャート図である。FIG. 15 is a time chart of the circuit of FIG. 14;

【図１６】従来技術を説明するための画像フィルタ処理
の説明図である。FIG. 16 is an explanatory diagram of image filter processing for explaining a conventional technique.

[Explanation of symbols]

１８畳み込み積分回路４１、４２セレクタ４３〜５１加算ユニット５２、５３ＦＦ５４マスク回路５５正規化回路６０乗算器６１〜６４加算器 18 Convolution integrator 41, 42 Selector 43-51 Adder unit 52, 53 FF 54 Mask circuit 55 Normalizer 60 Multiplier 61-64 Adder

Claims

[Claims]

An image processing apparatus for generating screen display data from image data, comprising: a generation unit that generates display data from each image data; and a display unit that generates the display data with characteristics specified by a plurality of elements. A convolution integrator that creates convoluted screen display data by convolving and integrating with a filter, wherein the convolution integrator is provided corresponding to each element of the filter, and the display data is shared A plurality of addition units supplied in a cascade connection with a pipeline arithmetic unit, wherein each of the addition units multiplies the display data and element data of the filter, an input, and the multiplication result. An image processing device comprising: an adder that adds

2. The image processing apparatus according to claim 1, wherein said pipeline operation unit further includes a selector for controlling an input of data in said pipeline to an addition unit at a next stage. apparatus.

3. The image processing apparatus according to claim 1, wherein the pipeline operation unit further includes a selector for controlling a feedback input of data in the pipeline to the addition unit. .

4. The image processing apparatus according to claim 1, wherein said pipeline arithmetic unit further includes a holding circuit for temporarily holding data in said pipeline.

5. The image processing apparatus according to claim 4, wherein the holding circuits are provided in a number corresponding to a size of the filter and a size of a processing area of the display data.

6. The image processing apparatus according to claim 1, wherein the convolution integrator circuit disables an operation of a part of the adder unit of the pipeline arithmetic unit in response to the reading of the display data. An image processing apparatus further comprising:

7. A convolution integrator circuit for convolving image data with a filter having characteristics specified by a plurality of elements to create processed image data, the convolution integration circuit being provided corresponding to each element of the filter. A pipeline arithmetic unit in which a plurality of addition units to which the image data is commonly supplied are connected in cascade, wherein each of the addition units includes a multiplication circuit that multiplies the image data by element data of the filter; And a adder for adding the result of the multiplication.

8. The convolution integration circuit according to claim 7, wherein said pipeline operation unit further includes a selector for controlling an input of data in said pipeline to an addition unit at a next stage. circuit.

9. The convolution integrator according to claim 7, wherein said pipeline operation unit further comprises a selector for controlling a feedback input of data in said pipeline to said addition unit. .

10. The convolution integrator according to claim 7, wherein said pipeline operation unit further includes a holding circuit for temporarily holding data in said pipeline.

11. The convolution integrator according to claim 10, wherein the holding circuits are provided in a number corresponding to the size of the filter and the size of the processing area of the display data.

12. The convolution integration circuit according to claim 7, further comprising a mask circuit for invalidating an operation of a part of the addition units of the pipeline arithmetic unit in response to the reading of the display data. Convolution circuit.

13. An image processing method for generating display data for screen from image data, comprising: a generating step of generating display data from each image data; and generating the display data having characteristics specified by a plurality of elements. A convolution integration step of performing convolution integration with a filter to create the processed screen display data, wherein the convolution integration step includes a plurality of addition units provided corresponding to each element of the filter. In the cascaded pipeline operation unit,
Supplying the display data, and, in each of the adding units, adding the input and the multiplication result after multiplying the display data and the element data of the filter. Image processing method.

14. The image processing method according to claim 13, wherein the convolution integration step further includes a step of controlling input of data in the pipeline to an addition unit at a next stage by a selector. Image processing method.

15. The image processing method according to claim 13, wherein the convolution integration step further comprises a step of controlling a feedback input of the data in the pipeline to the addition unit by a selector. Processing method.

16. The image processing method according to claim 13, wherein the convolution integration step further includes a step of temporarily holding data in the pipeline by a holding circuit.

17. The image processing method according to claim 13, wherein in the convolution integration step, the operation of a part of the addition units of the pipeline operation unit is invalidated by a mask circuit in response to reading of the display data. An image processing method, further comprising a step.