WO1992003777A1 - Unite de suivi de registres pour transfert de blocs dans des systemes informatiques - Google Patents
Unite de suivi de registres pour transfert de blocs dans des systemes informatiques Download PDFInfo
- Publication number
- WO1992003777A1 WO1992003777A1 PCT/US1991/005885 US9105885W WO9203777A1 WO 1992003777 A1 WO1992003777 A1 WO 1992003777A1 US 9105885 W US9105885 W US 9105885W WO 9203777 A1 WO9203777 A1 WO 9203777A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- scoreboard
- register
- unit
- bits
- register file
- Prior art date
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30003—Arrangements for executing specific machine instructions
- G06F9/3004—Arrangements for executing specific machine instructions to perform operations on memory
- G06F9/30043—LOAD or STORE instructions; Clear instruction
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
- G06F9/3836—Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
- G06F9/3836—Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution
- G06F9/3838—Dependency mechanisms, e.g. register scoreboarding
Definitions
- This invention relates to data processing systems and, more particularly, to register scoreboarding schemes used therein.
- Register scoreboarding has been used since the mid 1960's to prevent the functional unit of a pipelined data processor from operating on the contents of a register file that has not yet been loaded with data.
- a register scoreboard unit contains 1 bit for each register of a register file to be scoreboarded.
- individual register scoreboard bits may be set by either a data load operation (memory-to-register file operation) , or an execution operation.
- a register scoreboard bit is typically set upon the initiation of an operand decode/register file fetch instruction, and is reset when the register file location is reloaded. If a functional unit attempts to operate on a register file location whose corresponding scoreboard bit is set, the instruction pipeline will stall until the location is reloaded.
- Prior register scoreboard units could support only single load transfers. More recently, simple power-of-two word load transfers were also supported (1, 2, 4) by the use of a revised register scoreboard unit. Both of these scoreboard unit architectures are inadequate to support a more general Block Transfer instruction which is capable of loading an arbitrary number of words from external memory to the register file, or visa versa.
- the objective of the present invention is to provide a scoreboard unit architecture that will support both block transfer load and store operations, as well as those of the single transfer load and store type. Such an architecture will permit block transfer data load and store operations and will allow execution unit or program control unit instructions to operate in parallel, thus permitting a high-speed data processor to execute multiple instructions during a single machine clock cycle.
- an instruction is fetched by the instruction fetch unit (IFU) of a central processor unit.
- instruction operational code and function code fields are used to decode the instruction, and register file (RF) access is performed as directed by two source field instruction operands. If the issued instruction is not blocked by a scoreboard bit hit, then the destination location or locations (if a Block Transfer Instruction) of the issued instructions are sampled during the low state of the master clock, and at the end of this cycle, all register scoreboard bits associated with the decoded instruction destination locations are set. Thirdly, the register file (RF) scoreboard bits are reset as the corresponding loads occur.
- the loads can be the result of the completion of an execution unit (EU) instruction, a program control unit (PCU) instruction, or a single or block transfer load instruction. Any instructions issued that request a register that has a pending load (scoreboard bit set) will stall until the corresponding scoreboard bit is reset.
- EU execution unit
- PCU program control unit
- Single or block transfer instructions can occur in parallel with an execution unit instruction or program control instruction (e.g., branch, condition code, branch and link) .
- a block transfer instruction in combination with a block transfer register scoreboard unit (BTRSU) , permits the register file to be operated as a double buffered memory with the execution occurring in parallel with the loading and storing of data. Hence, multiple instructions during a single machine cycle can be processed.
- BTRSU block transfer register scoreboard unit
- thermometer unit decoder generates a bar graph type output for a binary input.
- One of the two decoder units (unit A) generates a bar graph pattern from the lowest significant bit (LSB) to the most significant bit (MSB)
- MSB most significant bit
- unit B generates a bar graph pattern from MSB to LSB.
- An AND operation is performed on the outputs from the A and B thermometer decoder units, with the result being used to set the appropriate scoreboard bits at the end of a cycle. In this manner, N number of scoreboard bits can be set starting from the initial register file destination address.
- the number of 32-bit words or 64-bit double words transferred is established by the COUNT field of the block transfer instruction format.
- thermometer decoder units TDUs
- TCUs thermometer control units
- the first, or RD field, TCU is used to generate the binary input to the A TDU, which sets all bits greater than or equal to the value of the first location of the Block Transfer Load or Single Cycle Load instruction.
- the vector, TB is used to set all bits that are less than or equal to the value of the last destination register location. Both 32-bit word transfers and 64-bit double word transfers are supported by left shifting the count field of the instruction 1 place to generate COUNT if a double word single transfer or double word block transfer is issued.
- Figure 1 is a functional block diagram of the microprocessor chip that contains the block transfer register scoreboard unit
- Figure 2 is a block diagram of the four instruction formats executed by the microprocessor and supported by the block transfer register scoreboard unit;
- Figure 3 is a functional block diagram of the block transfer register scoreboard unit
- FIG 4 is a detailed functional block diagram of the two 64-bit thermometer decoders which are referenced in Figure 1;
- Figure 5 is a logic diagram of a 32-bit STHERM 32 block, from which the 64-bit thermometer decoder units are constructed;
- Figure 6 is a logic diagram of an STHERM module which is used to construct the STHERM32 blocks
- Figure 7 is a logic diagram of the scoreboard test cell used to generate the 64-bit scoreboard test array
- Figure 8 is a logic diagram of the scoreboard register cell used to generate the 64-bit scoreboard register array
- Figure 9 is a logic diagram of the scoreboard reset cell used to generate the 64-bit reset array
- Figure 10 is a logic diagram of the RD field thermometer Control Block referenced in Figure 1;
- FIG 11 is a logic diagram of the count field thermometer Control Block referenced in Figure 1.
- a preferred microprocessor architecture contains seven functional units: a bus unit (BUSU) 11; a program control unit (PCU) 12; an instruction fetch unit (IFU) 13; a register file (RF) 14; an execution unit (EXU) 15; a block transfer register scoreboard unit (BTRSU) 16; and a data load store unit (DLSU) 17.
- PCU 12 is responsible for generating a next instruction address (NIA) and generates clock control for the microprocessor pipeline.
- An NIA is generated as the result of one of the following instructions: a continue operation, a conditional branch, a conditional branch and link, a jump to register, or a jump immediate.
- IFU 13 consists of a set associative instruction cache (not shown) and an instruction load unit (not shown) that is responsible for the loading the instruction cache if the NIA is not contained in one of the instruction cache sets.
- RF 14 has three read ports [(RSI), (RS2) , and IDS] and two write ports [ (RD) and IDL] .
- the parentheses around the port names RSI, RS2 and RD indicate that these correspond to particular fields within multiple instruction formats. The multiple instruction formats will be explained below. Thus, for every machine cycle, three reads and two writes can be performed.
- the (RSI) and (RS2) read ports and the (RD) write ports are dedicated to EXU 15. The remaining read and write ports are dedicated to DLSU 17.
- DLSU 17 is responsible for generating all data memory load and store values, DA, as well as the corresponding register file load and store addresses, RA.
- BUSU 11 arbitrates bus requests between IFU 13 and selects either an instruction address, IA, from IFU 13, or a data address, DA, from DLSU 17.
- BUSU 11 generates all external memory addresses and timing for DRAM and SRAM memory, ADC, as well as I/O devices, and supports a bidirectional external data bus, D.
- BTRSU 16 monitors issued machine instruction and, according to one of four instruction formats outlined in Figure 2, tests the register scoreboard to establish whether an issued instruction can continue or must be stalled due to a pending load operation.
- a scoreboard signal (SCOREBRD) 20 indicates to DLSU 17 that a scoreboard hit condition (i.e. a load is pending) exists for the issued instruction.
- BTRSU 16 can be reset by either a write port 2 address, WP2, from DLSU 17, or by a write port 1 address, WPl, from EXU 15.
- PCU 12 can be stalled by either an instruction buffer stall condition (IBSTALL signal) or a data load store stall condition (DLSTALL signal) .
- the data load store stall condition is generated by a logical OR of a scoreboard hit from BTRSU 16 (SCOREBRD signal) or the condition that a subsequent load or store instruction has been issued prior to the completion of the previous load or store instruction.
- the I-type format 21 is the primary instruction format for single transfer load or store operations performed by the Data Load Store Unit.
- This format comprises an operational code (OPCODE) field, an RSI field, an RD field, and an IMMEDIATE field.
- OPCODE operational code
- RSI field RSI field
- RD field RD field
- IMMEDIATE IMMEDIATE field
- Both word and double word register file locations can be loaded by an I-type format instruction.
- I-type format for a store instruction in the I-type format, on the other hand, the contents of the register pointed to by the RSI field is added to the contents of the IMMEDIATE field to provide the store memory address.
- the register file address value to be stored is specified by the RD field. Both word and double word register file locations can be stored.
- the J-type instruction format 22 generates a destination register word address of 63 for jump and link or conditional branch and link instructions.
- the control signal corresponding to this condition is LDR63C.
- the R-type instruction format 23 is the principal format for execution unit type instructions which predominantly access two source operands indicated by the RSI and RS2 field, and load to a single destination register RD.
- the RSI, RS2 and RD fields of this format are either word or double word registers.
- the R-type instruction format is also used by the data load store unit for instructions of the block transfer load and store types.
- the COUNT field within the R-type format designates the number of words to be loaded from external memory to the register file.
- the initial source address in the external memory is designated by the RSI field of the R-type format, the initial destination address within the register file is designated by the RD field, and the memory address increment is designated by the RS2 field.
- the COUNT field within the R-type format designates the number of words to be transferred from the register file to external memory.
- the initial destination address within the external memory is given by the RSI field value, the initial source address within the register file by the RD field value, and the memory address increment is designated by the RS2 field.
- Register file transfer locations are sequential. There are both word and doubleword block load and store instructions.
- the L-type instruction format 24 is used only for a load upper immediate instruction.
- the load upper immediate instruction is used to left shift the 20-bit IMMEDIATE field of the L-type instruction and zero fill the lower order bits so that the most significant bits of a 32-bit word can be loaded in the upper end of the register.
- the control signal RS1DC is used to signal that the destination of the left-shifted IMMEDIATE field is given by the RSI field.
- the architecture of the block transfer register scoreboard unit is depicted in Figure 3.
- the unit contains first and second 64-bit thermometer decoders (designated decoder A 31 and decoder B 32, respectively); an RD field thermometer control unit 33, which generates control vector bits TA5B and TA4-TA0 for thermometer decoder A 31; a count field thermometer control unit 34, which generates control vector bits TB5B and TB4-TB0 for thermometer decoder B 32; a 64-bit NAND array 35; a 64-bit scoreboard register array 36; a scoreboard reset array 37; a scoreboard test array 38; and a 64-bit OR unit 39.
- thermometer decoder A The outputs from thermometer decoder A are designated XO through X63; the outputs from thermometer decoder B are designated Y0 through Y63. These outputs are paired as inputs to 64-bit NAND array 35.
- the outputs from 64-bit NAND array 35 are designated SBOB through SB63B, and serve as scoreboard bit set inputs to scoreboard register array 36.
- Scoreboard bit reset inputs to scoreboard register array 36 are generated by scoreboard reset array 37 and are designated SBRSO through SBRS63.
- the test outputs from scoreboard register array 36 are designated SBTO through SBT63, and serve as inputs to scoreboard test array 38.
- 64-bit OR unit 39 receives inputs SBHO through SBH63 from scoreboard test array 38, and produces the scoreboard hit output signal SCOREBRD.
- thermometer decoder A 31 provides more detail of the structure of thermometer decoder A 31 and thermometer decoder B 32.
- Each of the thermometer decoders is a mirror image of the other. Except for the mirror image relationship, decoder A 31 is identical to decoder B 32.
- a pair of 32-bit STHERM blocks 41, as well as an array of OR gates 42 and AND gates 43 make up each of the 64-bit thermometer decoders (whether decoder A 31 or decoder b 32) .
- FIG. 5 is a logic diagram of a 32-bit STHERM32 block, multiples of which are used to construct thermometer decoder A 31 and thermometer decoder B 32.
- Each STHERM block contains four STHERM macros 51.
- Figure 6 depicts the logic diagram of an STHERM macro 51.
- the following truth table results for different inputs to A2, Al, and A0:
- the scoreboard test cell logic diagram is shown in Figure 7. This cell is replicated in array format to create the scoreboard test array 37.
- Two control bits RSIFENC and RS2FENC enable the RSI and RS2 fields of the instruction according to the issued instruction's format.
- the scoreboard register output bit for each position (SBTX) is tested against the RSI and RS2 fields. If a match occurs, this indicates that a pending load condition exists for either the RSI or RS2 operand, and that the instruction must stall and wait for the load to occur before proceeding.
- the scoreboard hit signal SCOREBRD is generated by a 64-bit OR of the entire scoreboard test array.
- FIG. 8 A logic diagram of a scoreboard register cell is depicted in Figure 8. This cell is replicated in array format to create the scoreboard test array. It will be noted that the output from the NAND array is sampled on the negative phase of the clock cycle, which, in turn, enables the positive-edge-triggered register. The positive-edge- triggered register is thus enabled so that a scoreboard bit is set at the end of a current clock cycle. This prevents a possible lockup condition, which would otherwise have to be checked for by the compiler or assembler.
- Figure 9 depicts the scoreboard reset cell logic diagram. Sixty-four of these cells comprise the scoreboard reset array 37. Doubleword control bits DWRP2 and DWRP1 ensure that two bits are correctly reset for doubleword values, in accordance with write port load address bits WP1A5-0 and WP2A5-0.
- Write port 1 address bits WP1A5-0 correspond to the write address bits that come from the execution unit, and are simply the pipelined destination bits from the instruction RD field.
- Write port 2 address bits WP2A5-0 are generated by the data load store unit and correspond to the destination address bits from a single or block transfer load operation from memory to the register file. As the bus unit supports multiple banks of DRAM, as well as SRAM, directly, the DLAT control signal is used to indicate the arrival of the data word from memory. Load address values that arrive from write port 1 are always valid.
- thermometer control unit which is used to control 64-bit thermometer decoder A, is detailed in Figure
- LDR63C 1
- the vector represented by the control bits TA5B and TA4-TA0 is set to 011111, which in turn, sets the register scoreboard bit 63.
- the LDR63C control bit is used to indicate the destination address for the jump and link, and the branch and link-type instructions.
- the RD field instruction bits (119-114) are used to indicate the destination address.
- the control bit RS1DC selects the RSI field (instructions bits 125-120) as the destination address, which is used for L-type instruction formats.
- MAX is the number of register file locations (63 for the preferred embodiment)
- RX is the load destination field
- COUNT is the number of transfers (in units of 32-bit word units) to be performed.
- the vector, TB is used to set all bits less than or equal to the last destination register location of thermometer decoder B.
- the control inputs to the count thermometer control unit indicate whether a single or block transfer word (32-bit) or double word (64-bit) load instruction is to take place. Block transfer operations are signaled by the BLKTC control bit. Doubleword transfers are signaled by either of the bits DWRP2C and DWEXC control inputs.
Landscapes
- Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Advance Control (AREA)
Abstract
Est décrite une unité de suivi de registres pour transfert de blocs (16) dans des systèmes informatiques, qui non seulement réduit au minimum les instructions ineffectives (NOP) mais permet également le fonctionnement du fichier registre (14) du processeur comme mémoire à double tampon, l'unité d'exécution (15) du processeur traitant un bloc de registres dans le fichier registre (14) en même temps que l'unité de chargement-stockage de données (17) exécute une opération de transfert de la mémoire au fichier registre. L'architecture de l'unité de suivi prend en charge les opérations de chargement pour le transfert des blocs ainsi que celles du type chargement pour simple transfert. Une telle architecture permet des opérations de chargement et de stockage de données pour transfert de blocs, ainsi que la prise en charge en parallèle d'instructions de l'unité d'exécution ou de l'unité de commande de programme, permettant ainsi à un processeur de données rapide d'exécuter de multiples instructions pendant un seul cycle d'horloge machine. L'unité de suivi (16) est suffisamment compacte pour permettre sa mise en ÷uvre sur une puce de microprocesseur.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US56988590A | 1990-08-17 | 1990-08-17 | |
US569,885 | 1990-08-17 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO1992003777A1 true WO1992003777A1 (fr) | 1992-03-05 |
Family
ID=24277295
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/US1991/005885 WO1992003777A1 (fr) | 1990-08-17 | 1991-08-19 | Unite de suivi de registres pour transfert de blocs dans des systemes informatiques |
Country Status (1)
Country | Link |
---|---|
WO (1) | WO1992003777A1 (fr) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6610828B1 (en) | 1996-05-24 | 2003-08-26 | Syngenta Limited | Heliothis ecdysone receptor |
DE102016117588A1 (de) * | 2016-09-19 | 2018-03-22 | Infineon Technologies Ag | Prozessoranordnung und Verfahren zum Betreiben einer Prozessoranordnung |
US20220405348A1 (en) * | 2021-06-17 | 2022-12-22 | International Business Machines Corporation | Reformatting of tensors to provide sub-tensors |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4760518A (en) * | 1986-02-28 | 1988-07-26 | Scientific Computer Systems Corporation | Bi-directional databus system for supporting superposition of vector and scalar operations in a computer |
US4891753A (en) * | 1986-11-26 | 1990-01-02 | Intel Corporation | Register scorboarding on a microprocessor chip |
US4893233A (en) * | 1988-04-18 | 1990-01-09 | Motorola, Inc. | Method and apparatus for dynamically controlling each stage of a multi-stage pipelined data unit |
-
1991
- 1991-08-19 WO PCT/US1991/005885 patent/WO1992003777A1/fr unknown
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4760518A (en) * | 1986-02-28 | 1988-07-26 | Scientific Computer Systems Corporation | Bi-directional databus system for supporting superposition of vector and scalar operations in a computer |
US4891753A (en) * | 1986-11-26 | 1990-01-02 | Intel Corporation | Register scorboarding on a microprocessor chip |
US4893233A (en) * | 1988-04-18 | 1990-01-09 | Motorola, Inc. | Method and apparatus for dynamically controlling each stage of a multi-stage pipelined data unit |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6610828B1 (en) | 1996-05-24 | 2003-08-26 | Syngenta Limited | Heliothis ecdysone receptor |
US7183061B2 (en) | 1996-05-24 | 2007-02-27 | Syngenta Limited | Method of expressing Heliothis ecdysone receptor fusion protein |
DE102016117588A1 (de) * | 2016-09-19 | 2018-03-22 | Infineon Technologies Ag | Prozessoranordnung und Verfahren zum Betreiben einer Prozessoranordnung |
DE102016117588B4 (de) | 2016-09-19 | 2024-09-26 | Infineon Technologies Ag | Prozessoranordnung und Verfahren zum Betreiben einer Prozessoranordnung |
US20220405348A1 (en) * | 2021-06-17 | 2022-12-22 | International Business Machines Corporation | Reformatting of tensors to provide sub-tensors |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US7882332B1 (en) | Memory mapped register file | |
US5922066A (en) | Multifunction data aligner in wide data width processor | |
US5687336A (en) | Stack push/pop tracking and pairing in a pipelined processor | |
US6571328B2 (en) | Method and apparatus for obtaining a scalar value directly from a vector register | |
US5185872A (en) | System for executing different cycle instructions by selectively bypassing scoreboard register and canceling the execution of conditionally issued instruction if needed resources are busy | |
US5991531A (en) | Scalable width vector processor architecture for efficient emulation | |
US6374346B1 (en) | Processor with conditional execution of every instruction | |
WO1996012228A1 (fr) | Tables de cartographie redondante | |
EP1261914B1 (fr) | Architecture de traitement a fonction de controle des limites de matrice | |
EP0823083A1 (fr) | Systeme pour effectuer des operations arithmetiques en mode precision simple ou double | |
EP2267596B1 (fr) | Coeur de processeur pour traiter des instruction de formats differents | |
US7111155B1 (en) | Digital signal processor computation core with input operand selection from operand bus for dual operations | |
US8909904B2 (en) | Combined byte-permute and bit shift unit | |
US5752271A (en) | Method and apparatus for using double precision addressable registers for single precision data | |
US7107302B1 (en) | Finite impulse response filter algorithm for implementation on digital signal processor having dual execution units | |
JP2001501001A (ja) | データ処理システムにおける入力オペランド制御 | |
US6820189B1 (en) | Computation core executing multiple operation DSP instructions and micro-controller instructions of shorter length without performing switch operation | |
WO1992003777A1 (fr) | Unite de suivi de registres pour transfert de blocs dans des systemes informatiques | |
JP2001504956A (ja) | データ処理システム・レジスタ制御 | |
US6859872B1 (en) | Digital signal processor computation core with pipeline having memory access stages and multiply accumulate stages positioned for efficient operation | |
EP0992893B1 (fr) | Vérification de parallélisme d'instructions | |
US11775310B2 (en) | Data processing system having distrubuted registers | |
US5729729A (en) | System for fast trap generation by creation of possible trap masks from early trap indicators and selecting one mask using late trap indicators | |
JP2001501329A (ja) | データ処理装置レジスタ |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AK | Designated states |
Kind code of ref document: A1 Designated state(s): DE JP |
|
REG | Reference to national code |
Ref country code: DE Ref legal event code: 8642 |