WO1999008204A1 - Dispositif et procede de traitement de donnees - Google Patents
Dispositif et procede de traitement de donnees Download PDFInfo
- Publication number
- WO1999008204A1 WO1999008204A1 PCT/JP1997/002708 JP9702708W WO9908204A1 WO 1999008204 A1 WO1999008204 A1 WO 1999008204A1 JP 9702708 W JP9702708 W JP 9702708W WO 9908204 A1 WO9908204 A1 WO 9908204A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- register
- matrix
- operand
- instruction
- register file
- Prior art date
Links
- 238000012545 processing Methods 0.000 title claims abstract description 42
- 238000000034 method Methods 0.000 title claims description 24
- 239000011159 matrix material Substances 0.000 claims abstract description 88
- 239000013598 vector Substances 0.000 claims abstract description 34
- 230000008569 process Effects 0.000 claims description 12
- 229940050561 matrix product Drugs 0.000 claims description 8
- 238000003672 processing method Methods 0.000 claims description 4
- 238000006243 chemical reaction Methods 0.000 abstract description 10
- 230000007246 mechanism Effects 0.000 abstract description 4
- 238000010586 diagram Methods 0.000 description 18
- 230000006870 function Effects 0.000 description 11
- 230000004044 response Effects 0.000 description 9
- 230000006835 compression Effects 0.000 description 6
- 238000007906 compression Methods 0.000 description 6
- 230000002093 peripheral effect Effects 0.000 description 6
- 230000009466 transformation Effects 0.000 description 5
- 238000004364 calculation method Methods 0.000 description 4
- 238000013139 quantization Methods 0.000 description 4
- 230000006837 decompression Effects 0.000 description 3
- 230000014509 gene expression Effects 0.000 description 3
- 102100023882 Endoribonuclease ZC3H12A Human genes 0.000 description 2
- 101710112715 Endoribonuclease ZC3H12A Proteins 0.000 description 2
- 230000005540 biological transmission Effects 0.000 description 2
- 238000004891 communication Methods 0.000 description 2
- 238000000354 decomposition reaction Methods 0.000 description 2
- 238000006073 displacement reaction Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- QGVYYLZOAMMKAH-UHFFFAOYSA-N pegnivacogin Chemical compound COCCOC(=O)NCCCCC(NC(=O)OCCOC)C(=O)NCCCCCCOP(=O)(O)O QGVYYLZOAMMKAH-UHFFFAOYSA-N 0.000 description 2
- 230000001603 reducing effect Effects 0.000 description 2
- 101100140855 Arabidopsis thaliana RFL1 gene Proteins 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000002427 irreversible effect Effects 0.000 description 1
- 229910021421 monocrystalline silicon Inorganic materials 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 239000000758 substrate Substances 0.000 description 1
- 230000033772 system development Effects 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
- 230000001131 transforming effect Effects 0.000 description 1
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/42—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/60—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding
- H04N19/61—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding in combination with predictive coding
Definitions
- the present invention relates to a data processing device such as a microprocessor or a microcomputer, and more particularly to a data processing device suitable for executing an image processing application program including two-dimensional discrete cosine transform and two-dimensional inverse discrete cosine transform.
- a data processing device such as a microprocessor or a microcomputer
- a data processing device suitable for executing an image processing application program including two-dimensional discrete cosine transform and two-dimensional inverse discrete cosine transform.
- the data volume will be enormous, and there will be problems such as the need for a large-capacity memory when storing the image data and the long transmission time when transferring the data. Therefore, measures are taken such as compressing the image data before storing it in the memory and decompressing it immediately before use, or compressing the image data before transmission and decompressing it after reception.
- compression is a combination of the following two methods.
- elongation is a combination of the following two methods.
- the two-dimensional discrete cosine transform (1) is performed on a value group of a two-dimensional block of 8 ⁇ 8 pixels. Specifically, a product is obtained by multiplying a value group of an 8 ⁇ 8 pixel two-dimensional block by a determinant called a DCT base. Therefore, the result of the transformation (referred to as DCT coefficient) is also a value group of a 2D block of 8 ⁇ 8 pixels.
- the transformed DCT coefficients are quantized using a quantization table having a different value for each coefficient position (the process of replacing a value in a certain section with a representative value in that section). image In the two-dimensional discrete cosine transform of data, the lower right part of the transformed block usually has many values close to 0, and most of them have the characteristic of being 0 in the quantization process.
- the Huffman coding (2) is a process of converting the quantized value group of the 8 ⁇ 8 pixel block into a bit stream. At this time, encoding is performed using the point where the value in the 8 ⁇ 8 pixel block has many 0s. In other words, variable-length coding is performed in which a short bit string code is assigned to a signal value having a high appearance probability. As a result, the number of bytes in the bitstream after encoding is about 1/10 of the number of bytes in the data stream before conversion.
- the Huffman decoding (3) is a reverse process of the Huffman coding (2). In other words, it is the process of restoring the bit stream into the value group of the 8 ⁇ 8 pixel block.
- the two-dimensional inverse discrete cosine transform (4) is an inverse transform of the two-dimensional discrete cosine transform (1). That is, the value group of the 8 ⁇ 8 pixel block is subjected to the inverse processing of the two-dimensional discrete cosine transform (1), and the value group of the first 8 ⁇ 8 pixel block is restored. Specifically, the image data is restored by obtaining a product by multiplying the value group (DCT coefficient) of the 8 ⁇ 8 pixel block decoded by the Huffman decoding process and the DCT base.
- DCT coefficient value group of the 8 ⁇ 8 pixel block decoded by the Huffman decoding process and the DCT base.
- the quantization performed in the two-dimensional discrete cosine transform (1) allows the value group of the 8 ⁇ 8 pixel block before the two-dimensional discrete cosine transform and the value group after the two-dimensional inverse discrete cosine transform (4) to be performed. Will not exactly match the value group of the 8 ⁇ 8 pixel block in. In other words, irreversible compression and decompression processing. However, unless the quantization is extremely coarse, the difference between the current image and the restored image can hardly be discerned by the human eye, so there is no practical problem. As described above, the advantage that image compression reduces the required number of bytes of image data to about 1/10 and the storage efficiency and transfer efficiency to the storage device by about 10 times has been described.
- the required number of instructions per 640 x 480 pixel colored image is simply converted, the number of instructions required for each color is 4,800 times that of the above processing, and the total number of instructions is 14,400 times.
- conventional microprocessors need to execute about 28.8M-43.2M instructions (M means one million [MEGA]) per image to encode and decode image data .
- M means one million [MEGA]
- a micro-processor operating at 100 MHz requires a processing time of 288 nf 432 msec per image. At such a processing speed, even if a still image is to be displayed continuously, the processing pitch is 2 to 4 per second, which makes it possible to produce a moving image effect by displaying the still image continuously. It will be difficult.
- An object of the present invention is to provide a data processing device capable of executing two-dimensional discrete cosine transform and two-dimensional inverse discrete cosine transform at high speed.
- An object of the present invention is to provide a basic mechanism (minimum hardware conditions) of a data processing device necessary for executing a two-dimensional discrete cosine transform or a two-dimensional inverse discrete cosine transform at a high speed and to effectively utilize the same.
- Form for instruction and basic mechanism by the instruction To provide a control method.
- a multiply-accumulator of a number corresponding to the number of elements for linearly transforming a vector having four elements is used as a basic mechanism for speeding up the two-dimensional discrete cosine transform and the two-dimensional inverse discrete cosine transform.
- the first register file for storing the 4x4 matrix of the linear transformation the second register file for storing the vector of 4 elements to be subjected to the linear transformation, and the result of the linear transformation Prepare a third Registrar file to do this.
- two types of matrix operation instructions of the first type and the second type are prepared as operation instructions for the matrix of the first register file and the vector of the second register file, and the first type of matrix operation instruction is executed.
- the arithmetic circuit is controlled so that the reading direction of each value from the first register file is set to the row direction, and when the second type matrix operation instruction is executed, the reading direction is set to the column direction.
- FIG. 1 is a block diagram showing an embodiment of a microprocessor suitable for applying the present invention.
- FIG. 2 is a block diagram showing a specific example of a central processing unit (CPU) constituting a microprocessor.
- Figure 3 shows a coprocessor (F) suitable for efficiently executing the TRV and TRVT instructions.
- FIG. 3 is a diagram showing an example of an arithmetic circuit constituting P U).
- FIG. 4 is a diagram showing an embodiment of the product-sum unit in the arithmetic circuit of the coprocessor.
- FIG. 5 is a diagram showing an embodiment of the register unit of the coprocessor.
- FIG. 6 is a diagram showing a matrix related to the inverse discrete cosine transform and its submatrix decomposition.
- FIG. 7 is a diagram showing a matrix subjected to the inverse discrete cosine transform and its decomposition into submatrices.
- Fig. 8 is a diagram showing the definition formula of the two-dimensional inverse discrete cosine transform, and decomposing it into sub-matrices to expand the definition formula.
- FIG. 9 is a diagram showing a procedure for calculating (Equation 3-10) in FIG.
- FIG. 10 is a diagram showing the concept of a register file.
- FIG. 11 is a diagram showing the instruction format of the TRV instruction and the TRVT instruction.
- FIG. 12 is a diagram showing arithmetic expressions for explaining the instruction functions of the TRV instruction and the TRVT instruction.
- FIG. 13 is a diagram showing an arrangement of data stored in the memory when performing the inverse discrete cosine transform as viewed from a conversion program.
- Figure 14 is a diagram showing the definition formula of the two-dimensional discrete cosine transform and the expansion of the definition formula by decomposing it into sub-matrices.
- FIG. 15 is a diagram illustrating another configuration example of the register unit of the coprocessor capable of executing the TRV instruction and the TRVT instruction.
- FIG. 1 shows a block diagram of a microprocessor suitable for applying the present invention.
- 1 is a central processing unit (hereinafter, referred to as a CPU)
- 2 is a coprocessor (hereinafter, referred to as an FPU) that performs operations such as matrix multiplication and floating-point operation in place of the CPU 1
- 3 is a peripheral circuit.
- 4 is a memory management unit (MMU) that converts the address signal output from the CPU 1 onto the bus 8a to manage virtual memory
- 5 is a logical address. It is an address conversion circuit consisting of an address conversion table for converting to a physical address.
- MMU memory management unit
- Reference numeral 6 denotes a high-speed cache memory for storing the program data frequently used by the CPU 1
- reference numeral 7 denotes an address signal output from the CPU 1 to the bus, and a predetermined replacement algorithm.
- the data in the external main memory (such as a hard disk storage device, not shown) is transferred to the cache memory 6 in a predetermined block unit according to the above, or the unnecessary data in the cache memory 6 is discarded or written to the cache memory 6.
- the cache memory 6 and the external main memory are accessed by the physical address signal converted in the address conversion table 5.
- the address conversion table 5 is provided separately from the logical address bus 8a and the data bus 9a for transmitting the logical address signal and the data signal output from the CPU 1.
- a physical address bus 8b for transmitting the converted physical address signal and a data bus 9b for transferring data between the cache memory 6 and an external main memory are provided.
- An external bus interface circuit 10 for interfacing signals between the buses 8b and 9b and the external bus is provided.
- a serial communication interface circuit 11 for serial communication, clocking of the current time
- peripheral address bus 8c and a peripheral data bus 9c to which peripheral circuits such as a real-time clock circuit 12 having a function such as a calendar and a timer circuit 13 for giving a timer function to the CPU 1 are connected.
- reference numeral 14 denotes a bus controller that controls the path states of the buses 8b and 9b on the physical address side and the peripheral buses 8c and 9c.
- (Phase-Locked Loop) circuit and CPU 1 inside the chip and each circuit block A clock generation circuit that generates the clock signal required for clock operation, 16 is a watchdog timer for detecting hardware errors, and 17 is a peripheral bus 8c via the external interface circuit 10 , 9c and external bus enable 1 ⁇ port, 18 can execute program at any point (instruction or address) to support system debugging during user system development This is a break controller that provides a function to stop.
- FIG. 1 shows a specific configuration example of the CPU 1.
- reference numeral 20 denotes a program counter indicating the address of an instruction to be executed
- 21 denotes a 32-bit instruction code for holding an instruction code fetched from the cache memory 6 or an external main memory via a data bus 9a.
- the 22 is an instruction decoder that decodes the instruction code fetched into the instruction register 21 to generate a control signal
- 23 is a general-purpose register that holds data before operation and data after operation.
- REG1 to REGn adder / subtractor ALU for performing address operation, data addition / subtraction, and logical operation, barrel shifter SFT for performing data bit shift, address output register ADR, data input / output register DTR, etc.
- Arithmetic buses BUS 1, 2, and 3 are provided in the instruction execution circuit 23.
- the arithmetic buses BUS 1, 2, and 3 provide the above registers REG1 to REGn, ADR, DTR, adder / subtractor ALU, and barrel shifter.
- connection between the SFTs is enabled and the gates GT1 to GTm provided between each register and the bus to the arithmetic unit are sequentially controlled by the control signals CS1 to CSi output from the instruction register 22.
- control signals CS1 to CSi output from the instruction register 22.
- CPU 1 If the instruction fetched into the instruction register 22 is determined to be a dedicated instruction for the FPU (coprocessor) 2, the execution of the instruction is left to the FPU 2, and the processor itself shifts to a standby state or execution of the next instruction.
- status register SR to reflect internal control status, etc.
- status register SR to save the contents of status register SR when an exception occurs
- Status register SSR to save contents of program counter 20 when an exception occurs
- Control register consisting of registers such as SPC, base address register GBR that stores the base address in indirect addressing mode, and vector address register VBR that stores vector addresses for exception processing and interrupt processing.
- the status of each bit is read / written by the output from the instruction decoder 22, and the execution contents of the instruction are controlled according to the status of a predetermined bit in the control register 24.
- FIG. 3 shows a specific configuration example of the FPU 2. As shown in Fig.
- the FPU 2 is composed of multiply-accumulators 910, 911, 912, and 913, each capable of performing a matrix product of the register section 901 and 4x4, and four latches corresponding to each accumulator.
- An arithmetic unit 900 comprising circuits 920, 921, 922, 923 and a latch circuit 924 common to the accumulator, and an arithmetic control unit 990 for controlling the arithmetic unit 900 according to an instruction.
- the arithmetic control unit 990 includes an instruction register and an instruction decoder similar to the CPU 1, and the instruction fetched into the instruction register is a dedicated instruction of its own (the first matrix operation instruction and the second matrix operation instruction).
- FIG. 4 shows a configuration example of the accumulators 910 to 913.
- Each accumulator comprises a multiplier 960, an adder 961 and a temporary register 962.
- the multiplier 960 performs an operation for multiplying 16-bit data supplied from the signal lines 940 and 944.
- the adder 96 1 is used to calculate the operation result of the multiplier 960 and the temporary register 962. The sum with the held data is stored, and the resulting value is stored in the temporary register 962 to update the content.
- FIG. 5 shows a specific configuration of the register section 901.
- the register section 901 is composed of four register files 500, 501, 502, 503 and selectors (selectors) 55, 55, 51 corresponding to the latch circuits 920, 921, 922, 923, 924.
- Each of the register files 500, 501, 502, and 503 has 16 registers each, and these registers have four sub-registers. Has been split into files.
- the registry file 500 has subfiles 5110, 511, 512, and 513 consisting of four registers.
- registers 0 to 15 can be identified by a register number code consisting of a 4-bit binary number.
- Reference numeral 936 denotes a signal line for supplying a control signal for giving write permission to the register file
- reference numeral 950 denotes a common signal line for supplying write data to the register file.
- the upper 2 bits of the 4-bit register number code for specifying the register via the signal lines 930, 931, and 932 are input to the subfile.
- a subfile 510 data read from the register specified by the selection signal supplied via the signal line 930 is sent to the selector 550, and the data is read via the signal line 934.
- the data read from the register corresponding to the selection signal (high-order 2 bits of the register number code) supplied via the signal line 931 is sent to the selector 5 16 and the signal line 9 3
- the register data corresponding to the selection signal (lower 2 bits of the register number code) supplied via 3 is selected.
- the write data sent via the signal line 950 is stored in the register corresponding to the selection signal supplied via the signal line 932 if the signal on the signal line 9336 permits writing. Written. Properties of matrix products
- Equation 1 an 8 ⁇ 8 matrix M is defined by (Equation 1) in FIG.
- This matrix M is a matrix related to the inverse discrete cosine transform, and this matrix can be obtained by replacing some rows and columns of the matrix of the discrete cosine transform.
- the present invention will be described using the matrix itself. I will. Now, it is assumed that the 4 ⁇ 4 matrices A and C are defined by (Equation 2) and (Equation 3) in FIG. Then, the matrix M can be expressed as (Equation 4) in Fig. 6.
- Equation 2-1 an 8 ⁇ 8 matrix X is defined by (Equation 2-1) in FIG.
- This matrix X is divided into four, and each of the submatrices XI, X2, X3, and X4 is represented by (Equation 2-2), (Equation 2-3), (Equation 2-4), (Equation 2-5) Then, the matrix X can be expressed by (Equation 2-6).
- FIG. 8 is a diagram for illustrating a definition expression of the discrete cosine transform and a calculation method when it is decomposed into a 4 ⁇ 4 submatrix.
- (Equation 3-1) is the definition of the discrete cosine transform.
- (Equation 3-2) expresses the 8 ⁇ 8 matrix appearing in (Equation 3-1) by decomposing it into a 4 ⁇ 4 submatrix.
- the matrix M in (Equation 3-1) is replaced by the side of (Equation 4), and the matrix X in (Equation 3-1) is replaced by the right side of (Equation 2-6).
- this (Equation 3-2) is expanded, it becomes (Equation 3-3), (Equation 3-4), and (Equation 3-5).
- FIG. 9 is a diagram for explaining the processing.
- the individual elements of the 4X4 matrices Tl, T2, T3, and T4 appearing in (Equation 3-10) in Fig. 8 are marked with symbols according to (Equation 4-1) to (Equation 4-4).
- a 4X16 matrix T is defined by (Equation 4-5).
- a 4 ⁇ 4 constant matrix B is defined by (Equation 4-6).
- Ixtl0 + ixt20 + ixt30 + ixt40 tl0 + t20 + t30 + t40
- the processor to which the present invention is applied has at least three sets of register files composed of 16 registers.
- Figure 10 shows a case where there are four sets of register files RFL0, RFL1, RFL2, and RFL3. Note that the register numbers added to the left side of FIG. 10 are shown in the subfiles 5110 to 513 in FIG. 5, and the correspondence between the register numbers in the register files in FIG. 5 and FIG. 10 is shown. Represents a relationship.
- the processor to which the present invention is applied has two types of instructions for performing a matrix product of a 4 ⁇ 4 matrix and a vector having 4 elements. The instruction shall be described as follows.
- the instruction format is, for example, as shown in FIG. 11, an instruction code field I CF in which an instruction code is stored, and four fields OP F l, OP F in which operands m, s, d, and n are stored. 2, OP F 3, ⁇ PF 4. Of these operands, m, s, and d are numbers that specify the registry file. Also, n is a number that specifies a register in the register file, and is a multiple of 4 (0, 4, 8, 12). In other words, n is a 4-bit code, the lower 2 bits of n specify the subfile in Figure 5, and the upper 2 bits specify the register in the subfile. Always set to 0). Next, the function of the type-1 matrix operation instruction TRV will be described first.
- the function of the TRV instruction is defined by (Equation 5-1) in Figure 12. That is, the group value of 16 registers in the register file m is regarded as a 4 ⁇ 4 matrix, and the value group of the registers n, n + 1, n ⁇ 2, n + 3 in the register file s is divided into four elements. Treat as a vector, multiply the matrix by the vector, and store the result in registers n, n + 1, n + 2, n + 3 in register file d. In other words, the operands are extracted one by one from the four subfiles in Fig. 5, and the operation results are stored in the four subfiles. In the register file of Figure 10, vectors are read from four consecutive registers and the results are stored in the corresponding four registers ; therefore, the following instruction:
- the matrix operation instruction of the second kind TRVT m, n, s, d regards the 16 registers in the register file m as a 4 ⁇ 4 matrix, and registers t, 4 + t, 8 + t, is regarded as a four-element vector, the matrix is multiplied by the vector, and the result is stored in the registers n, n + 1, n + 2, n + 3 in the register file d.
- the vector to be operated on takes the values of the four registers in one of the four subfiles in Fig. 5, and distributes the operation results to the corresponding registers of the four subfiles. Is stored.
- the values of every four registers out of 16 registers are read out as vector operands, and the results are stored consecutively in the specified register file. It is stored in four registers (the first one has a number that is a multiple of four).
- the instruction indicated by LD4 is used to store four data from the address (in this embodiment, one of the main memory addresses) separated from the base address b by the displacement value disp (the address in the main memory). This is an instruction to load the register group n, n + l, n + 2, n + 3 in d.
- the instruction indicated by ST is an instruction to store the register n in the register file s from the base address b to an address (in the embodiment, any address in the main memory) separated by the displacement value disp. It is. Instruction execution procedure
- control unit 990 first sends “00” as a 2-bit binary code to the subfiles 510 to 513 of each of the registry files 500 to 503 via the signal line 930.
- the subfiles 510 to 513 select the contents of register 0 to selector 550, the contents of register 1 to selector 551, and the contents of register 2 respectively.
- control unit 990 sends number m designating the register file to signal line 934.
- the data specified by m passes through the selectors 550 to 553 and is latched by the latches 92 0, 92 1, 922, 923 (that is, the contents of registers 0, 1, 2, and 3 of the register file m are Latch 9 20, 92 1, 922, 923).
- control unit 990 outputs a signal given as a 4-bit code through the signal line 931.
- the upper 2 bits of the disk number n are sent to each register file.
- each register file sends the contents of the register corresponding to the upper two bits of n to the selector 516.
- control unit 990 sends the lower two bits of n to selector 5 16 via signal line 933. Then, the selected data passes through the selector 5 16 and is sent to the selector 5 5 4.
- control unit 990 sends number s specifying the register file to signal line 935. Then, the data specified by s passes through the selector 554 and is latched by the latch 924.
- the control unit 990 controls the selector 963 via the signal line 935 so that the initial value “0” is set in the temporary register 962.
- the contents of the latches 920, 921, 922, 923 are sent to the corresponding integrators 910, 911, 912, 913, respectively, and the contents of the latches 924 are read out.
- the three accumulators 9 1 0, 9 1 1, 9 1 2, 9 13 Then, the first sum of products is performed in each of the accumulators 910 to 913, and the result is temporarily set in the register 962 or the like.
- the binary code “01” is sent to each of the register files 500 to 503 via the signal line 930 at the same time.
- each register file sends the contents of register 4 to selector 5 50, the contents of register 5 to selector 5 51, the contents of register 6 to selector 5 52, and the contents of register 7 respectively.
- control unit 990 sends number m designating the register file to signal line 934.
- the data specified by m passes through the selectors 550 to 553 and is latched by the latches 920, 921, 922, and 923.
- the control unit 990 sends the upper 2 bits of the value n + 1 obtained by incrementing (+1) the 4-bit register number n via the signal line 931 to each of the register files 500 to 503.
- each register file sends the contents of the register corresponding to the upper two bits of n + 1 to the selector 5 16.
- control section 990 sends the lower two bits of n + 1 to selector 5 16 via signal line 933. Then, the selected data passes through the selector 516 and is sent to the selector 554. Then, control unit 990 sends number s designating the register file to signal line 934. Then, the data specified by s passes through the selector 554 and is latched by the latch 924.
- the operation of the third step is the same as the operation of the second step, except that The difference is that 4, 5, 6, 7 becomes 8, 9, 10, 1 1 and the 4-bit register number n11 becomes n + 2.
- the operation of the fourth step is the same as the operation of the second step, except that the register numbers 4, 5, 6, 7 are 12, 13, 14, 1 5 and the 4-bit register number n 10 1 is The difference is that n + 3.
- the fifth step is a step of writing back the four values latched in the accumulators 9, 10, 91 1, 9 12, 913 to the register section 901.
- the value latched by the accumulator 910 or the like is sent to the register unit 901 via the signal line 920 or the like.
- the control unit 990 sends the upper 2 bits of the 4-bit register number n to each subfile via the signal line 932. Further, writing to the register file corresponding to the operand "d" is permitted via the signal line 936.
- the four values are then set in the four subfiles of the register file specified by "d". With the above operation, the operation defined by (Equation 5-1) is performed.
- TRVT m, n, s, d the following type 2 matrix operation instruction TRVT m, n, s, d will be described.
- control unit 990 first sends a binary code “00” to each registry file via a signal line 930.
- each register file selects the contents of register 0 to selector 550, the contents of register 1 to selector 55 1, the contents of register 2 to selector 552, and the contents of register 3 respectively.
- control unit 990 sends number m designating the register file to signal line 934.
- the data specified by m passes through the selectors 550 to 553 and is latched by the latches 920, 921, 922, and 923.
- the control unit 990 sends the lower 2 bits of the 4-bit register number n to each register file via the signal line 931 (note that the upper 2 bits are used in the TRV instruction). In response, each register file sends to register 516 the contents of the register corresponding to the lower two bits of n.
- control unit 990 sends the upper two bits of register number n to selector 516 via signal line 933. Then, the selected data passes through the selector 516 and is sent to the selector 554. , And the control unit 990 specifies the registration file. Number s to signal line 935. Then, the data specified by s passes through the selector 554 and is latched by the latch 924. At the same time, the control unit 990 controls the selector 963 via the signal line 935 so that the initial value “0” is set in the temporary register 962.
- each register file selects the contents of register 4 to selector 5 50, the contents of register 5 to selector 551, the contents of register 6 to selector 552, and the contents of register 7 respectively. Send to child 553.
- control unit 990 sends number m designating the registration file to signal line 934. Then, the data specified by m passes through selectors 550 to 553 and is latched by latches 920, 921, 922, and 923. Also, the lower two bits of register number n + 1 are sent to each register file via signal line 931. In response, the individual register file sends to register 516 the contents of the register corresponding to the lower two bits of register number n + 1. Further, the upper two bits of the register number n + 1 are sent to the selector 516 via the signal line 933. The selected data then passes through selector 516 and is sent to selector 554. Then, control unit 990 sends number s designating the register file to signal line 935. Then, the data specified by s passes through the selector 554 and is latched by the latch 924.
- the operation of the third step is the same as that of the second step, except that the register numbers 4, 5, 6, 7 are 8, 9, 10, 11 and the 4-bit register number n + 1 is The difference is that n + 2.
- the operation of the fourth step is the same as that of the second step, except that the register numbers 4, 5, 6, and 7 are 12, 13, 14, 15, and the 4-bit register number n + 1 is n + 3. Is different.
- the fifth step consists of four products latched in the accumulator 9 10, 9 1 1, 912, 9 13 PC first 97/02708
- the value latched by the accumulator 910 or the like is sent to the register section 901 via the signal line 920 or the like.
- the control unit 990 sends the upper two bits of the register number n to each subfile via the signal line 932. Further, writing to the register file corresponding to the number d is permitted via the signal line 936. Then the four values will be set in the four subfiles of the register file with number d. With the above operation, the operation defined by (Equation 5-4) is performed. Data required for inverse discrete cosine transform
- data to be converted (DCT coefficients) is required, which is stored in storage areas indicated by X1T, X2T, X3T, and X4T in the main memory in FIG.
- the constant matrices (corresponding to the DCT basis) required for the transformation are related to (Equation 2), (Equation 3), and (Equation 4-6). Each is stored in the indicated storage area. Then, the conversion result is stored in the storage area indicated by Y in FIG. Inverse discrete cosine transform
- the inverse discrete cosine transform may be performed by sequentially performing (Equation 3-6), (Equation 3-7), (Equation 3-8), (Equation 3-9) (Equation 3-10) in FIG.
- the inverse discrete cosine transform is mainly described as an example.
- the present invention can be applied to the discrete cosine transform.
- Figure 14 shows the definition of the discrete cosine transform and the expanded form of the definition by decomposing it into submatrices. After all, (Equation 6-6) to (Equation 6-10) should be calculated sequentially. At this time, the number of executed instructions can be significantly reduced by effectively utilizing the TRV and TRVT instructions described in the above embodiment.
- the TRU instruction and the TRUT instruction according to the present invention include at least three register files 500, 50 as shown in FIG. It can be executed by a coprocessor having a register section composed of 1,502 and an arithmetic circuit as shown in FIG.
- the discrete cosine transform and the inverse discrete cosine transform by the coprocessor have been described.
- a coprocessor having an arithmetic circuit having a configuration as shown in FIG. It is possible to perform an operation.
- a coprocessor (second processor) 2 for performing a matrix operation and a floating-point operation is provided separately from a central processing unit (first processor) 1 has been described. It is also possible to configure so that the functions of one processor are realized by one processor.
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Complex Calculations (AREA)
Abstract
Un circuit arithmétique comportant des unités de somme de produit dont le nombre correspond à celui des éléments utilisé pour convertir de manière linéaire des vecteurs possédant quatre éléments, une première série de piles destinées au stockage de matrices 4 x 4 de conversion linéaire, une deuxième série de piles destinées au stockage de vecteurs possédant quatre éléments devenant l'objet de la conversion linéaire et une troisième série de piles destinées au stockage des résultats de la conversion linéaire pourvoient, en tant que mécanisme de base, à l'amélioration de la vitesse de traitement de cosinus discrets à deux dimensions ou de cosinus inverses discrets à deux dimensions. Dans le même temps, le circuit arithmétique est commandé de manière que la direction d'extraction de valeurs des premières piles puisse correspondre à la direction de ligne lorsqu'une instruction arithmétique de matrice de classe 1 est exécutée et correspondre à la direction de colonne lorsqu'une instruction arithmétique de matrice de classe 2 est exécutée et ce, par prescription des deux types d'instructions arithmétiques de matrice de classes 1 et 2 en tant qu'instructions arithmétiques pour les matrices de la première série de piles et pour les vecteurs de la deuxième série.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/JP1997/002708 WO1999008204A1 (fr) | 1997-08-05 | 1997-08-05 | Dispositif et procede de traitement de donnees |
TW086116630A TW379301B (en) | 1997-08-05 | 1997-11-07 | Data processor and data processing method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/JP1997/002708 WO1999008204A1 (fr) | 1997-08-05 | 1997-08-05 | Dispositif et procede de traitement de donnees |
Publications (1)
Publication Number | Publication Date |
---|---|
WO1999008204A1 true WO1999008204A1 (fr) | 1999-02-18 |
Family
ID=14180933
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP1997/002708 WO1999008204A1 (fr) | 1997-08-05 | 1997-08-05 | Dispositif et procede de traitement de donnees |
Country Status (2)
Country | Link |
---|---|
TW (1) | TW379301B (fr) |
WO (1) | WO1999008204A1 (fr) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2011002908A (ja) * | 2009-06-16 | 2011-01-06 | Fujitsu Semiconductor Ltd | プロセッサ及び情報処理システム |
USRE48845E1 (en) | 2002-04-01 | 2021-12-07 | Broadcom Corporation | Video decoding system supporting multiple standards |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
TWI455020B (zh) * | 2012-01-31 | 2014-10-01 | Mstar Semiconductor Inc | 資料包裝裝置及資料包裝方法 |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH02235174A (ja) * | 1989-02-10 | 1990-09-18 | Intel Corp | バス・マトリツクス |
JPH0320864A (ja) * | 1989-05-20 | 1991-01-29 | Fujitsu Ltd | ファジィ計算用ベクトル命令セットを有する情報処理装置 |
JPH04280368A (ja) * | 1991-03-08 | 1992-10-06 | Fujitsu Ltd | Dctマトリクス演算回路 |
JPH0540776A (ja) * | 1991-08-02 | 1993-02-19 | Fujitsu Ltd | 二次元dctマトリクス演算回路 |
-
1997
- 1997-08-05 WO PCT/JP1997/002708 patent/WO1999008204A1/fr active Application Filing
- 1997-11-07 TW TW086116630A patent/TW379301B/zh not_active IP Right Cessation
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH02235174A (ja) * | 1989-02-10 | 1990-09-18 | Intel Corp | バス・マトリツクス |
JPH0320864A (ja) * | 1989-05-20 | 1991-01-29 | Fujitsu Ltd | ファジィ計算用ベクトル命令セットを有する情報処理装置 |
JPH04280368A (ja) * | 1991-03-08 | 1992-10-06 | Fujitsu Ltd | Dctマトリクス演算回路 |
JPH0540776A (ja) * | 1991-08-02 | 1993-02-19 | Fujitsu Ltd | 二次元dctマトリクス演算回路 |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
USRE48845E1 (en) | 2002-04-01 | 2021-12-07 | Broadcom Corporation | Video decoding system supporting multiple standards |
JP2011002908A (ja) * | 2009-06-16 | 2011-01-06 | Fujitsu Semiconductor Ltd | プロセッサ及び情報処理システム |
Also Published As
Publication number | Publication date |
---|---|
TW379301B (en) | 2000-01-11 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11126428B2 (en) | Computer processor for higher precision computations using a mixed-precision decomposition of operations | |
TWI842911B (zh) | 用於存取矩陣運算元的多變數跨步讀取操作 | |
US5893145A (en) | System and method for routing operands within partitions of a source register to partitions within a destination register | |
US5909572A (en) | System and method for conditionally moving an operand from a source register to a destination register | |
US6154831A (en) | Decoding operands for multimedia applications instruction coded with less number of bits than combination of register slots and selectable specific values | |
US6009505A (en) | System and method for routing one operand to arithmetic logic units from fixed register slots and another operand from any register slot | |
KR100329339B1 (ko) | 압축데이터에의한승산-가산연산수행장치 | |
US5761103A (en) | Left and right justification of single precision mantissa in a double precision rounding unit | |
US7034849B1 (en) | Method and apparatus for image blending | |
US6877020B1 (en) | Method and apparatus for matrix transposition | |
US5630160A (en) | Floating point exponent compare using repeated two bit compare cell | |
US6573846B1 (en) | Method and apparatus for variable length decoding and encoding of video streams | |
JP3750820B2 (ja) | パック・データの乗加算演算を実行する装置 | |
CN110955453A (zh) | 用于执行矩阵压缩和解压缩指令的系统和方法 | |
US6693643B1 (en) | Method and apparatus for color space conversion | |
CN112069459A (zh) | 用于稀疏-密集矩阵乘法的加速器 | |
US7681013B1 (en) | Method for variable length decoding using multiple configurable look-up tables | |
US6288723B1 (en) | Method and apparatus for converting data format to a graphics card | |
US6697076B1 (en) | Method and apparatus for address re-mapping | |
CN1532686B (zh) | 处理器以及由处理器为矩阵处理使用两组寄存器的方法 | |
TWI603262B (zh) | 緊縮有限脈衝響應(fir)濾波器處理器,方法,系統及指令 | |
JPH07236143A (ja) | 高速デジタル信号復号化方法 | |
JP2021057004A (ja) | 行列演算アクセラレータの命令のための装置、方法、及びシステム | |
US7015921B1 (en) | Method and apparatus for memory access | |
TWI550508B (zh) | 用於複製資料結構之設備及方法 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AK | Designated states |
Kind code of ref document: A1 Designated state(s): CN JP KR SG US |
|
AL | Designated countries for regional patents |
Kind code of ref document: A1 Designated state(s): AT BE CH DE DK ES FI FR GB GR IE IT LU MC NL PT SE |
|
DFPE | Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101) | ||
121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
NENP | Non-entry into the national phase |
Ref country code: KR |
|
122 | Ep: pct application non-entry in european phase |