[go: up one dir, main page]

WO1992003777A1 - Block transfer register scoreboard for data processing systems - Google Patents

Block transfer register scoreboard for data processing systems Download PDF

Info

Publication number
WO1992003777A1
WO1992003777A1 PCT/US1991/005885 US9105885W WO9203777A1 WO 1992003777 A1 WO1992003777 A1 WO 1992003777A1 US 9105885 W US9105885 W US 9105885W WO 9203777 A1 WO9203777 A1 WO 9203777A1
Authority
WO
WIPO (PCT)
Prior art keywords
scoreboard
register
unit
bits
register file
Prior art date
Application number
PCT/US1991/005885
Other languages
French (fr)
Inventor
James H. Hesson
Original Assignee
Micron Technology, Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Micron Technology, Inc. filed Critical Micron Technology, Inc.
Publication of WO1992003777A1 publication Critical patent/WO1992003777A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30003Arrangements for executing specific machine instructions
    • G06F9/3004Arrangements for executing specific machine instructions to perform operations on memory
    • G06F9/30043LOAD or STORE instructions; Clear instruction
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3836Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3836Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution
    • G06F9/3838Dependency mechanisms, e.g. register scoreboarding

Definitions

  • This invention relates to data processing systems and, more particularly, to register scoreboarding schemes used therein.
  • Register scoreboarding has been used since the mid 1960's to prevent the functional unit of a pipelined data processor from operating on the contents of a register file that has not yet been loaded with data.
  • a register scoreboard unit contains 1 bit for each register of a register file to be scoreboarded.
  • individual register scoreboard bits may be set by either a data load operation (memory-to-register file operation) , or an execution operation.
  • a register scoreboard bit is typically set upon the initiation of an operand decode/register file fetch instruction, and is reset when the register file location is reloaded. If a functional unit attempts to operate on a register file location whose corresponding scoreboard bit is set, the instruction pipeline will stall until the location is reloaded.
  • Prior register scoreboard units could support only single load transfers. More recently, simple power-of-two word load transfers were also supported (1, 2, 4) by the use of a revised register scoreboard unit. Both of these scoreboard unit architectures are inadequate to support a more general Block Transfer instruction which is capable of loading an arbitrary number of words from external memory to the register file, or visa versa.
  • the objective of the present invention is to provide a scoreboard unit architecture that will support both block transfer load and store operations, as well as those of the single transfer load and store type. Such an architecture will permit block transfer data load and store operations and will allow execution unit or program control unit instructions to operate in parallel, thus permitting a high-speed data processor to execute multiple instructions during a single machine clock cycle.
  • an instruction is fetched by the instruction fetch unit (IFU) of a central processor unit.
  • instruction operational code and function code fields are used to decode the instruction, and register file (RF) access is performed as directed by two source field instruction operands. If the issued instruction is not blocked by a scoreboard bit hit, then the destination location or locations (if a Block Transfer Instruction) of the issued instructions are sampled during the low state of the master clock, and at the end of this cycle, all register scoreboard bits associated with the decoded instruction destination locations are set. Thirdly, the register file (RF) scoreboard bits are reset as the corresponding loads occur.
  • the loads can be the result of the completion of an execution unit (EU) instruction, a program control unit (PCU) instruction, or a single or block transfer load instruction. Any instructions issued that request a register that has a pending load (scoreboard bit set) will stall until the corresponding scoreboard bit is reset.
  • EU execution unit
  • PCU program control unit
  • Single or block transfer instructions can occur in parallel with an execution unit instruction or program control instruction (e.g., branch, condition code, branch and link) .
  • a block transfer instruction in combination with a block transfer register scoreboard unit (BTRSU) , permits the register file to be operated as a double buffered memory with the execution occurring in parallel with the loading and storing of data. Hence, multiple instructions during a single machine cycle can be processed.
  • BTRSU block transfer register scoreboard unit
  • thermometer unit decoder generates a bar graph type output for a binary input.
  • One of the two decoder units (unit A) generates a bar graph pattern from the lowest significant bit (LSB) to the most significant bit (MSB)
  • MSB most significant bit
  • unit B generates a bar graph pattern from MSB to LSB.
  • An AND operation is performed on the outputs from the A and B thermometer decoder units, with the result being used to set the appropriate scoreboard bits at the end of a cycle. In this manner, N number of scoreboard bits can be set starting from the initial register file destination address.
  • the number of 32-bit words or 64-bit double words transferred is established by the COUNT field of the block transfer instruction format.
  • thermometer decoder units TDUs
  • TCUs thermometer control units
  • the first, or RD field, TCU is used to generate the binary input to the A TDU, which sets all bits greater than or equal to the value of the first location of the Block Transfer Load or Single Cycle Load instruction.
  • the vector, TB is used to set all bits that are less than or equal to the value of the last destination register location. Both 32-bit word transfers and 64-bit double word transfers are supported by left shifting the count field of the instruction 1 place to generate COUNT if a double word single transfer or double word block transfer is issued.
  • Figure 1 is a functional block diagram of the microprocessor chip that contains the block transfer register scoreboard unit
  • Figure 2 is a block diagram of the four instruction formats executed by the microprocessor and supported by the block transfer register scoreboard unit;
  • Figure 3 is a functional block diagram of the block transfer register scoreboard unit
  • FIG 4 is a detailed functional block diagram of the two 64-bit thermometer decoders which are referenced in Figure 1;
  • Figure 5 is a logic diagram of a 32-bit STHERM 32 block, from which the 64-bit thermometer decoder units are constructed;
  • Figure 6 is a logic diagram of an STHERM module which is used to construct the STHERM32 blocks
  • Figure 7 is a logic diagram of the scoreboard test cell used to generate the 64-bit scoreboard test array
  • Figure 8 is a logic diagram of the scoreboard register cell used to generate the 64-bit scoreboard register array
  • Figure 9 is a logic diagram of the scoreboard reset cell used to generate the 64-bit reset array
  • Figure 10 is a logic diagram of the RD field thermometer Control Block referenced in Figure 1;
  • FIG 11 is a logic diagram of the count field thermometer Control Block referenced in Figure 1.
  • a preferred microprocessor architecture contains seven functional units: a bus unit (BUSU) 11; a program control unit (PCU) 12; an instruction fetch unit (IFU) 13; a register file (RF) 14; an execution unit (EXU) 15; a block transfer register scoreboard unit (BTRSU) 16; and a data load store unit (DLSU) 17.
  • PCU 12 is responsible for generating a next instruction address (NIA) and generates clock control for the microprocessor pipeline.
  • An NIA is generated as the result of one of the following instructions: a continue operation, a conditional branch, a conditional branch and link, a jump to register, or a jump immediate.
  • IFU 13 consists of a set associative instruction cache (not shown) and an instruction load unit (not shown) that is responsible for the loading the instruction cache if the NIA is not contained in one of the instruction cache sets.
  • RF 14 has three read ports [(RSI), (RS2) , and IDS] and two write ports [ (RD) and IDL] .
  • the parentheses around the port names RSI, RS2 and RD indicate that these correspond to particular fields within multiple instruction formats. The multiple instruction formats will be explained below. Thus, for every machine cycle, three reads and two writes can be performed.
  • the (RSI) and (RS2) read ports and the (RD) write ports are dedicated to EXU 15. The remaining read and write ports are dedicated to DLSU 17.
  • DLSU 17 is responsible for generating all data memory load and store values, DA, as well as the corresponding register file load and store addresses, RA.
  • BUSU 11 arbitrates bus requests between IFU 13 and selects either an instruction address, IA, from IFU 13, or a data address, DA, from DLSU 17.
  • BUSU 11 generates all external memory addresses and timing for DRAM and SRAM memory, ADC, as well as I/O devices, and supports a bidirectional external data bus, D.
  • BTRSU 16 monitors issued machine instruction and, according to one of four instruction formats outlined in Figure 2, tests the register scoreboard to establish whether an issued instruction can continue or must be stalled due to a pending load operation.
  • a scoreboard signal (SCOREBRD) 20 indicates to DLSU 17 that a scoreboard hit condition (i.e. a load is pending) exists for the issued instruction.
  • BTRSU 16 can be reset by either a write port 2 address, WP2, from DLSU 17, or by a write port 1 address, WPl, from EXU 15.
  • PCU 12 can be stalled by either an instruction buffer stall condition (IBSTALL signal) or a data load store stall condition (DLSTALL signal) .
  • the data load store stall condition is generated by a logical OR of a scoreboard hit from BTRSU 16 (SCOREBRD signal) or the condition that a subsequent load or store instruction has been issued prior to the completion of the previous load or store instruction.
  • the I-type format 21 is the primary instruction format for single transfer load or store operations performed by the Data Load Store Unit.
  • This format comprises an operational code (OPCODE) field, an RSI field, an RD field, and an IMMEDIATE field.
  • OPCODE operational code
  • RSI field RSI field
  • RD field RD field
  • IMMEDIATE IMMEDIATE field
  • Both word and double word register file locations can be loaded by an I-type format instruction.
  • I-type format for a store instruction in the I-type format, on the other hand, the contents of the register pointed to by the RSI field is added to the contents of the IMMEDIATE field to provide the store memory address.
  • the register file address value to be stored is specified by the RD field. Both word and double word register file locations can be stored.
  • the J-type instruction format 22 generates a destination register word address of 63 for jump and link or conditional branch and link instructions.
  • the control signal corresponding to this condition is LDR63C.
  • the R-type instruction format 23 is the principal format for execution unit type instructions which predominantly access two source operands indicated by the RSI and RS2 field, and load to a single destination register RD.
  • the RSI, RS2 and RD fields of this format are either word or double word registers.
  • the R-type instruction format is also used by the data load store unit for instructions of the block transfer load and store types.
  • the COUNT field within the R-type format designates the number of words to be loaded from external memory to the register file.
  • the initial source address in the external memory is designated by the RSI field of the R-type format, the initial destination address within the register file is designated by the RD field, and the memory address increment is designated by the RS2 field.
  • the COUNT field within the R-type format designates the number of words to be transferred from the register file to external memory.
  • the initial destination address within the external memory is given by the RSI field value, the initial source address within the register file by the RD field value, and the memory address increment is designated by the RS2 field.
  • Register file transfer locations are sequential. There are both word and doubleword block load and store instructions.
  • the L-type instruction format 24 is used only for a load upper immediate instruction.
  • the load upper immediate instruction is used to left shift the 20-bit IMMEDIATE field of the L-type instruction and zero fill the lower order bits so that the most significant bits of a 32-bit word can be loaded in the upper end of the register.
  • the control signal RS1DC is used to signal that the destination of the left-shifted IMMEDIATE field is given by the RSI field.
  • the architecture of the block transfer register scoreboard unit is depicted in Figure 3.
  • the unit contains first and second 64-bit thermometer decoders (designated decoder A 31 and decoder B 32, respectively); an RD field thermometer control unit 33, which generates control vector bits TA5B and TA4-TA0 for thermometer decoder A 31; a count field thermometer control unit 34, which generates control vector bits TB5B and TB4-TB0 for thermometer decoder B 32; a 64-bit NAND array 35; a 64-bit scoreboard register array 36; a scoreboard reset array 37; a scoreboard test array 38; and a 64-bit OR unit 39.
  • thermometer decoder A The outputs from thermometer decoder A are designated XO through X63; the outputs from thermometer decoder B are designated Y0 through Y63. These outputs are paired as inputs to 64-bit NAND array 35.
  • the outputs from 64-bit NAND array 35 are designated SBOB through SB63B, and serve as scoreboard bit set inputs to scoreboard register array 36.
  • Scoreboard bit reset inputs to scoreboard register array 36 are generated by scoreboard reset array 37 and are designated SBRSO through SBRS63.
  • the test outputs from scoreboard register array 36 are designated SBTO through SBT63, and serve as inputs to scoreboard test array 38.
  • 64-bit OR unit 39 receives inputs SBHO through SBH63 from scoreboard test array 38, and produces the scoreboard hit output signal SCOREBRD.
  • thermometer decoder A 31 provides more detail of the structure of thermometer decoder A 31 and thermometer decoder B 32.
  • Each of the thermometer decoders is a mirror image of the other. Except for the mirror image relationship, decoder A 31 is identical to decoder B 32.
  • a pair of 32-bit STHERM blocks 41, as well as an array of OR gates 42 and AND gates 43 make up each of the 64-bit thermometer decoders (whether decoder A 31 or decoder b 32) .
  • FIG. 5 is a logic diagram of a 32-bit STHERM32 block, multiples of which are used to construct thermometer decoder A 31 and thermometer decoder B 32.
  • Each STHERM block contains four STHERM macros 51.
  • Figure 6 depicts the logic diagram of an STHERM macro 51.
  • the following truth table results for different inputs to A2, Al, and A0:
  • the scoreboard test cell logic diagram is shown in Figure 7. This cell is replicated in array format to create the scoreboard test array 37.
  • Two control bits RSIFENC and RS2FENC enable the RSI and RS2 fields of the instruction according to the issued instruction's format.
  • the scoreboard register output bit for each position (SBTX) is tested against the RSI and RS2 fields. If a match occurs, this indicates that a pending load condition exists for either the RSI or RS2 operand, and that the instruction must stall and wait for the load to occur before proceeding.
  • the scoreboard hit signal SCOREBRD is generated by a 64-bit OR of the entire scoreboard test array.
  • FIG. 8 A logic diagram of a scoreboard register cell is depicted in Figure 8. This cell is replicated in array format to create the scoreboard test array. It will be noted that the output from the NAND array is sampled on the negative phase of the clock cycle, which, in turn, enables the positive-edge-triggered register. The positive-edge- triggered register is thus enabled so that a scoreboard bit is set at the end of a current clock cycle. This prevents a possible lockup condition, which would otherwise have to be checked for by the compiler or assembler.
  • Figure 9 depicts the scoreboard reset cell logic diagram. Sixty-four of these cells comprise the scoreboard reset array 37. Doubleword control bits DWRP2 and DWRP1 ensure that two bits are correctly reset for doubleword values, in accordance with write port load address bits WP1A5-0 and WP2A5-0.
  • Write port 1 address bits WP1A5-0 correspond to the write address bits that come from the execution unit, and are simply the pipelined destination bits from the instruction RD field.
  • Write port 2 address bits WP2A5-0 are generated by the data load store unit and correspond to the destination address bits from a single or block transfer load operation from memory to the register file. As the bus unit supports multiple banks of DRAM, as well as SRAM, directly, the DLAT control signal is used to indicate the arrival of the data word from memory. Load address values that arrive from write port 1 are always valid.
  • thermometer control unit which is used to control 64-bit thermometer decoder A, is detailed in Figure
  • LDR63C 1
  • the vector represented by the control bits TA5B and TA4-TA0 is set to 011111, which in turn, sets the register scoreboard bit 63.
  • the LDR63C control bit is used to indicate the destination address for the jump and link, and the branch and link-type instructions.
  • the RD field instruction bits (119-114) are used to indicate the destination address.
  • the control bit RS1DC selects the RSI field (instructions bits 125-120) as the destination address, which is used for L-type instruction formats.
  • MAX is the number of register file locations (63 for the preferred embodiment)
  • RX is the load destination field
  • COUNT is the number of transfers (in units of 32-bit word units) to be performed.
  • the vector, TB is used to set all bits less than or equal to the last destination register location of thermometer decoder B.
  • the control inputs to the count thermometer control unit indicate whether a single or block transfer word (32-bit) or double word (64-bit) load instruction is to take place. Block transfer operations are signaled by the BLKTC control bit. Doubleword transfers are signaled by either of the bits DWRP2C and DWEXC control inputs.

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Advance Control (AREA)

Abstract

A block transfer register scoreboard unit (16) for data processing systems that not only minimizes no-operation (NOP) instructions, but also permits the processor's register file (14) to be operated as a double-buffered memory, with the processor's execution unit (15) processing one block of registers in the register file (14) simultaneously with the data load-store unit (17) performing a memory-to-register file transfer operation. Scoreboard unit architecture supports both block transfer load operations, as well as those of the single transfer load type. Such an architecture permits block transfer data load and store operation, as well as execution unit or program control unit instructions to operate in parallel, thus permitting a high-speed data processor to execute multiple instructions during a single machine clock cycle. The scoreboard unit (16) is sufficiently compact to enable implementation on a microprocessor chip.

Description

BLOCK TRANSFER REGISTER SCOREBOARD FOR DATA PROCESSING SYSTEMS
Field of the Invention
This invention relates to data processing systems and, more particularly, to register scoreboarding schemes used therein.
Background of the Invention
Pipelined data processing systems have become increasingly popular due to their ability to decrease machine instruction cycle time through a partitioning of instruction tasks among multiple functional units, thus permitting multiple instructions to be executed simultaneously at different stages within the machine. Register scoreboarding has been used since the mid 1960's to prevent the functional unit of a pipelined data processor from operating on the contents of a register file that has not yet been loaded with data. Typically, a register scoreboard unit contains 1 bit for each register of a register file to be scoreboarded. Inasmuch as scoreboarding is performed for destination registers, individual register scoreboard bits may be set by either a data load operation (memory-to-register file operation) , or an execution operation. A register scoreboard bit is typically set upon the initiation of an operand decode/register file fetch instruction, and is reset when the register file location is reloaded. If a functional unit attempts to operate on a register file location whose corresponding scoreboard bit is set, the instruction pipeline will stall until the location is reloaded.
Prior register scoreboard units could support only single load transfers. More recently, simple power-of-two word load transfers were also supported (1, 2, 4) by the use of a revised register scoreboard unit. Both of these scoreboard unit architectures are inadequate to support a more general Block Transfer instruction which is capable of loading an arbitrary number of words from external memory to the register file, or visa versa.
Summary of the Invention
The objective of the present invention is to provide a scoreboard unit architecture that will support both block transfer load and store operations, as well as those of the single transfer load and store type. Such an architecture will permit block transfer data load and store operations and will allow execution unit or program control unit instructions to operate in parallel, thus permitting a high-speed data processor to execute multiple instructions during a single machine clock cycle.
The aforementioned objective is accomplished as follows. Firstly, an instruction is fetched by the instruction fetch unit (IFU) of a central processor unit. Secondly, instruction operational code and function code fields are used to decode the instruction, and register file (RF) access is performed as directed by two source field instruction operands. If the issued instruction is not blocked by a scoreboard bit hit, then the destination location or locations (if a Block Transfer Instruction) of the issued instructions are sampled during the low state of the master clock, and at the end of this cycle, all register scoreboard bits associated with the decoded instruction destination locations are set. Thirdly, the register file (RF) scoreboard bits are reset as the corresponding loads occur. The loads can be the result of the completion of an execution unit (EU) instruction, a program control unit (PCU) instruction, or a single or block transfer load instruction. Any instructions issued that request a register that has a pending load (scoreboard bit set) will stall until the corresponding scoreboard bit is reset.
Single or block transfer instructions can occur in parallel with an execution unit instruction or program control instruction (e.g., branch, condition code, branch and link) . A block transfer instruction, in combination with a block transfer register scoreboard unit (BTRSU) , permits the register file to be operated as a double buffered memory with the execution occurring in parallel with the loading and storing of data. Hence, multiple instructions during a single machine cycle can be processed.
The setting of bits within the BTRSU is performed by a unique decoding scheme that utilizes two sets of thermometer type decoders. A thermometer unit decoder generates a bar graph type output for a binary input. One of the two decoder units (unit A) generates a bar graph pattern from the lowest significant bit (LSB) to the most significant bit (MSB) , while the other decoder unit (unit B) generates a bar graph pattern from MSB to LSB. An AND operation is performed on the outputs from the A and B thermometer decoder units, with the result being used to set the appropriate scoreboard bits at the end of a cycle. In this manner, N number of scoreboard bits can be set starting from the initial register file destination address. The number of 32-bit words or 64-bit double words transferred is established by the COUNT field of the block transfer instruction format.
The binary inputs to the A and B thermometer decoder units (TDUs) are set by two corresponding thermometer control units (TCUs) . The first, or RD field, TCU is used to generate the binary input to the A TDU, which sets all bits greater than or equal to the value of the first location of the Block Transfer Load or Single Cycle Load instruction. The second control unit computes the quantity TB = (MAX - (RX + COUNT - 1) , where MAX is the number of register file locations (63 for the preferred embodiment) , RX is the load destination field, and COUNT is the number of transfers (in units of 32-bit words) to be performed. The vector, TB, is used to set all bits that are less than or equal to the value of the last destination register location. Both 32-bit word transfers and 64-bit double word transfers are supported by left shifting the count field of the instruction 1 place to generate COUNT if a double word single transfer or double word block transfer is issued.
Brief Description of the Drawings
Figure 1 is a functional block diagram of the microprocessor chip that contains the block transfer register scoreboard unit;
Figure 2 is a block diagram of the four instruction formats executed by the microprocessor and supported by the block transfer register scoreboard unit;
Figure 3 is a functional block diagram of the block transfer register scoreboard unit;
Figure 4 is a detailed functional block diagram of the two 64-bit thermometer decoders which are referenced in Figure 1;
Figure 5 is a logic diagram of a 32-bit STHERM 32 block, from which the 64-bit thermometer decoder units are constructed;
Figure 6 is a logic diagram of an STHERM module which is used to construct the STHERM32 blocks; Figure 7 is a logic diagram of the scoreboard test cell used to generate the 64-bit scoreboard test array;
Figure 8 is a logic diagram of the scoreboard register cell used to generate the 64-bit scoreboard register array;
Figure 9 is a logic diagram of the scoreboard reset cell used to generate the 64-bit reset array;
Figure 10 is a logic diagram of the RD field thermometer Control Block referenced in Figure 1; and
Figure 11 is a logic diagram of the count field thermometer Control Block referenced in Figure 1.
Preferred Embodiment of the Invention
Referring now to Figure 1, a preferred microprocessor architecture contains seven functional units: a bus unit (BUSU) 11; a program control unit (PCU) 12; an instruction fetch unit (IFU) 13; a register file (RF) 14; an execution unit (EXU) 15; a block transfer register scoreboard unit (BTRSU) 16; and a data load store unit (DLSU) 17. PCU 12 is responsible for generating a next instruction address (NIA) and generates clock control for the microprocessor pipeline. An NIA is generated as the result of one of the following instructions: a continue operation, a conditional branch, a conditional branch and link, a jump to register, or a jump immediate. IFU 13 consists of a set associative instruction cache (not shown) and an instruction load unit (not shown) that is responsible for the loading the instruction cache if the NIA is not contained in one of the instruction cache sets. RF 14 has three read ports [(RSI), (RS2) , and IDS] and two write ports [ (RD) and IDL] . The parentheses around the port names RSI, RS2 and RD indicate that these correspond to particular fields within multiple instruction formats. The multiple instruction formats will be explained below. Thus, for every machine cycle, three reads and two writes can be performed. The (RSI) and (RS2) read ports and the (RD) write ports are dedicated to EXU 15. The remaining read and write ports are dedicated to DLSU 17. DLSU 17 is responsible for generating all data memory load and store values, DA, as well as the corresponding register file load and store addresses, RA. BUSU 11 arbitrates bus requests between IFU 13 and selects either an instruction address, IA, from IFU 13, or a data address, DA, from DLSU 17. BUSU 11 generates all external memory addresses and timing for DRAM and SRAM memory, ADC, as well as I/O devices, and supports a bidirectional external data bus, D. BTRSU 16 monitors issued machine instruction and, according to one of four instruction formats outlined in Figure 2, tests the register scoreboard to establish whether an issued instruction can continue or must be stalled due to a pending load operation. A scoreboard signal (SCOREBRD) 20 indicates to DLSU 17 that a scoreboard hit condition (i.e. a load is pending) exists for the issued instruction. BTRSU 16 can be reset by either a write port 2 address, WP2, from DLSU 17, or by a write port 1 address, WPl, from EXU 15. PCU 12 can be stalled by either an instruction buffer stall condition (IBSTALL signal) or a data load store stall condition (DLSTALL signal) . The data load store stall condition is generated by a logical OR of a scoreboard hit from BTRSU 16 (SCOREBRD signal) or the condition that a subsequent load or store instruction has been issued prior to the completion of the previous load or store instruction.
Referring now to Figure 2, four possible instruction formats are depicted for use with the preferred microprocessor architecture: an I-type format; a J-type format; an R-type format; and an L-type format. Each of these formats is explained below. The I-type format 21 is the primary instruction format for single transfer load or store operations performed by the Data Load Store Unit. This format comprises an operational code (OPCODE) field, an RSI field, an RD field, and an IMMEDIATE field. For a load instruction in the I- type format, the register file register pointed to by the RSI field of this format is added to the IMMEDIATE field to generate the memory address value that is loaded to the register file location specified by the RD field. Both word and double word register file locations can be loaded by an I-type format instruction. For a store instruction in the I-type format, on the other hand, the contents of the register pointed to by the RSI field is added to the contents of the IMMEDIATE field to provide the store memory address. The register file address value to be stored is specified by the RD field. Both word and double word register file locations can be stored.
The J-type instruction format 22 generates a destination register word address of 63 for jump and link or conditional branch and link instructions. The control signal corresponding to this condition is LDR63C.
The R-type instruction format 23 is the principal format for execution unit type instructions which predominantly access two source operands indicated by the RSI and RS2 field, and load to a single destination register RD. The RSI, RS2 and RD fields of this format are either word or double word registers. The R-type instruction format is also used by the data load store unit for instructions of the block transfer load and store types. When a block transfer load instruction is issued, the COUNT field within the R-type format designates the number of words to be loaded from external memory to the register file. The initial source address in the external memory is designated by the RSI field of the R-type format, the initial destination address within the register file is designated by the RD field, and the memory address increment is designated by the RS2 field. When a block transfer store instruction is issued, on the other hand, the COUNT field within the R-type format designates the number of words to be transferred from the register file to external memory. The initial destination address within the external memory is given by the RSI field value, the initial source address within the register file by the RD field value, and the memory address increment is designated by the RS2 field. Register file transfer locations are sequential. There are both word and doubleword block load and store instructions.
The L-type instruction format 24 is used only for a load upper immediate instruction. The load upper immediate instruction is used to left shift the 20-bit IMMEDIATE field of the L-type instruction and zero fill the lower order bits so that the most significant bits of a 32-bit word can be loaded in the upper end of the register. The control signal RS1DC is used to signal that the destination of the left-shifted IMMEDIATE field is given by the RSI field.
The architecture of the block transfer register scoreboard unit is depicted in Figure 3. The unit contains first and second 64-bit thermometer decoders (designated decoder A 31 and decoder B 32, respectively); an RD field thermometer control unit 33, which generates control vector bits TA5B and TA4-TA0 for thermometer decoder A 31; a count field thermometer control unit 34, which generates control vector bits TB5B and TB4-TB0 for thermometer decoder B 32; a 64-bit NAND array 35; a 64-bit scoreboard register array 36; a scoreboard reset array 37; a scoreboard test array 38; and a 64-bit OR unit 39. The outputs from thermometer decoder A are designated XO through X63; the outputs from thermometer decoder B are designated Y0 through Y63. These outputs are paired as inputs to 64-bit NAND array 35. The outputs from 64-bit NAND array 35 are designated SBOB through SB63B, and serve as scoreboard bit set inputs to scoreboard register array 36. Scoreboard bit reset inputs to scoreboard register array 36 are generated by scoreboard reset array 37 and are designated SBRSO through SBRS63. The test outputs from scoreboard register array 36 are designated SBTO through SBT63, and serve as inputs to scoreboard test array 38. 64-bit OR unit 39 receives inputs SBHO through SBH63 from scoreboard test array 38, and produces the scoreboard hit output signal SCOREBRD.
Figure 4 provides more detail of the structure of thermometer decoder A 31 and thermometer decoder B 32. Each of the thermometer decoders is a mirror image of the other. Except for the mirror image relationship, decoder A 31 is identical to decoder B 32. A pair of 32-bit STHERM blocks 41, as well as an array of OR gates 42 and AND gates 43 make up each of the 64-bit thermometer decoders (whether decoder A 31 or decoder b 32) .
Figure 5 is a logic diagram of a 32-bit STHERM32 block, multiples of which are used to construct thermometer decoder A 31 and thermometer decoder B 32. Each STHERM block contains four STHERM macros 51.
Figure imgf000011_0001
Figure 6 depicts the logic diagram of an STHERM macro 51. The following truth table results for different inputs to A2, Al, and A0:
Figure imgf000012_0001
Using this truth table, wider truth tables may be generated for the 32-bit STHERM32 block depicted in Figure 5 and for 64-bit thermometer decoders A and B shown in Figures 3 and 4.
The scoreboard test cell logic diagram is shown in Figure 7. This cell is replicated in array format to create the scoreboard test array 37. Two control bits RSIFENC and RS2FENC enable the RSI and RS2 fields of the instruction according to the issued instruction's format. The scoreboard register output bit for each position (SBTX) is tested against the RSI and RS2 fields. If a match occurs, this indicates that a pending load condition exists for either the RSI or RS2 operand, and that the instruction must stall and wait for the load to occur before proceeding. The scoreboard hit signal SCOREBRD is generated by a 64-bit OR of the entire scoreboard test array.
A logic diagram of a scoreboard register cell is depicted in Figure 8. This cell is replicated in array format to create the scoreboard test array. It will be noted that the output from the NAND array is sampled on the negative phase of the clock cycle, which, in turn, enables the positive-edge-triggered register. The positive-edge- triggered register is thus enabled so that a scoreboard bit is set at the end of a current clock cycle. This prevents a possible lockup condition, which would otherwise have to be checked for by the compiler or assembler.
Figure 9 depicts the scoreboard reset cell logic diagram. Sixty-four of these cells comprise the scoreboard reset array 37. Doubleword control bits DWRP2 and DWRP1 ensure that two bits are correctly reset for doubleword values, in accordance with write port load address bits WP1A5-0 and WP2A5-0. Write port 1 address bits WP1A5-0 correspond to the write address bits that come from the execution unit, and are simply the pipelined destination bits from the instruction RD field. Write port 2 address bits WP2A5-0 are generated by the data load store unit and correspond to the destination address bits from a single or block transfer load operation from memory to the register file. As the bus unit supports multiple banks of DRAM, as well as SRAM, directly, the DLAT control signal is used to indicate the arrival of the data word from memory. Load address values that arrive from write port 1 are always valid.
The logic for each of the control bits generated by the RD field thermometer control unit, which is used to control 64-bit thermometer decoder A, is detailed in Figure
10. If LDR63C = 1, then the vector represented by the control bits TA5B and TA4-TA0 is set to 011111, which in turn, sets the register scoreboard bit 63. The LDR63C control bit is used to indicate the destination address for the jump and link, and the branch and link-type instructions. For most other instructions, the RD field instruction bits (119-114) are used to indicate the destination address. The control bit RS1DC selects the RSI field (instructions bits 125-120) as the destination address, which is used for L-type instruction formats. The scoreboard enabling signal LDOPC is used whenever a load operation is issued. When LDOPC = 0, then the RD thermometer control unit and the count field thermometer control unit default so that no scoreboard bits are set.
The logic circuitry for each of the control bits generated by the count thermometer control unit, which are used to control 64-bit thermometer decoder B, is detailed in Figure 11. The count thermometer control unit computes the quantity TB = MAX - (RX + COUNT - 1) . MAX is the number of register file locations (63 for the preferred embodiment) , RX is the load destination field, COUNT is the number of transfers (in units of 32-bit word units) to be performed. The vector, TB, is used to set all bits less than or equal to the last destination register location of thermometer decoder B. The control inputs to the count thermometer control unit indicate whether a single or block transfer word (32-bit) or double word (64-bit) load instruction is to take place. Block transfer operations are signaled by the BLKTC control bit. Doubleword transfers are signaled by either of the bits DWRP2C and DWEXC control inputs.
Although only a single embodiment of the invention has been described herein, it will be apparent to those skilled in the art that modifications may be made thereto without departing from the spirit and the scope of the invention as claimed. For example, a similar scoreboarding scheme may be implemented in a data processing system having a central processor unit comprised of discrete, interconnected functional units, as opposed to functional units being incorporated within a single microprocessor chip.

Claims

Claims :
1. A register scoreboard unit (16) for a pipelined data processing system that supports block transfers of data between main memory and a register file (14) via a data load store unit (17), as well as multiple source and single destination transfers between the register file (14) and an execution unit (15) , said register file (14) having a fixed number of register file word locations, said scoreboard unit comprising:
first and second thermometer control units (33 and 34, respectively) ;
a first thermometer decoder (31) which generates a first bar-graph output pattern (X63-X0) in response to a first set of binary control input bits (TA5B and TA5-TA0) generated by said first thermometer control unit (33) ;
a second thermometer decoder (32) which generates a second bar-graph output pattern (Y0-Y-63) , which is the reverse of said first bar-code pattern (X63-X0) , in response to a second set of binary control input bits (TB5B and TB4-TB0) generated by said second thermometer control unit (34) ; means for combining said first and second bar-graph output patterns, such that the resultant pattern represents pending loads to register file (14) as a sequence of scoreboard bits (SB63B-SB0B) ;
a scoreboard register array (36) for storing said sequence of scoreboard bits (SB63B-SB0B) ;
means for testing said sequence of scoreboard bits (SB63B-SB0B) within said scoreboard register array (36) during a single machine cycle;
means for setting said sequence of scoreboard bits within scoreboard register array (36) during the same single machine cycle during which said sequence of bits is tested, such that identical source and destination register locations will not create a data processing system lock- up condition; and
means for resetting bits within scoreboard register array (36) when pending loads to register file locations associated with those bits are complete.
2. The register scoreboard unit of Claim 1, wherein said first thermometer control unit (33) comprises a collection of primitive gates (101) , a first multiplexer (102) , a first multiplexer control input (RSIDC) , a jump and link control input (LDR63C) , and an input for a scoreboard enabling signal (LDOPC) .
3. The register scoreboard unit of Claim 1, wherein said second thermometer control unit comprises a collection of primitive gates (111) , a second multiplexer (112) which selects an instruction destination field specific to a particular instruction, a third multiplexer (113) which selects a count value for both block and single-word transfers, a fourth multiplexer (114) which left-shifts the count value from the third multiplexer (113) whenever a doubleword transfer instruction is issued, an array of half-adders (114) and a carry-select adder (116) which calculate said second set of binary control input bits (TB5B and TB4-TB0) , and an input for said scoreboard enabling signal (LDOPC) .
4. The register scoreboard unit of Claim 1, wherein said data processing system utilizes multiple instruction formats (21-24) , only one of which is associated with a particular operational code (OPCODE) .
5. The register scoreboard unit of Claim 4, wherein said first set of binary control input bits (TA5B and TA5-TA0) and said second set of binary control input bits (TB5B and TB5-TB0) are generated as a function of an OPCODE and particular field values specific to that OPCODE'S instruction format.
6. The register scoreboard unit of Claim 1 wherein each of said thermometer decoders comprises multiple low-level decode blocks (STHERM32) , each of which generates a low- level bar-graph output pattern.
7. The register scoreboard unit of Claim 4, wherein the total number of output bits from a thermometer decoder is substantially equal to the number of register file word locations.
8. The register scoreboard unit of Claim 1, wherein said means for combining comprises a NAND array (35) .
9. The register scoreboard unit of Claim 1, wherein said means for setting comprises an array of scoreboard register cells (36) , each of which comprises a latch transparent to an inverted clock input signal (LE) , having inverted data input bit (SBRXB) from said NAND array (35) , and a reset input (R) said latch producing an output which feeds an enable input (EN) of a positive-edge-triggered D-type
SUB register (82) , which also has clock input (MCLK) , data input (VDD) , and reset inputs (R) , and scoreboard register array test signal output (SBTX) .
10. The register scoreboard unit of Claim 9, wherein said means for testing comprises an array of test cells (38) , each of which comprises a set of primitive gates (71) having input bits which correspond to the RSI and RS2 source operands (120-125 and 18-113, respectively) register field 1 and register field 2 enabling inputs (RSIFENC and RS2FENC, respectively) , a scoreboard register array test signal input (SBTX) , an OR gate (39) having inputs (SBH63- SBHO) which correspond to the output of each of the cells in the test cell array (38) , and a scoreboard hit output (SCOREBRD) .
11. The register scoreboard unit of Claim 1, wherein said means for resetting comprises an array of scoreboard reset cells (37) , each of which comprises three reset paths: the first reset path being a master reset signal (MRESET) ; the second reset path comprising an address decoder (91) associated with said execution unit (15) having as inputs write port 1 address bits (WP1A5-0) , a doubleword enable control bit (DWRP1) , and an inverted master clock enable input (MCLK*) ; and the third reset path comprising an address decoder (92) associated with the data load store
Figure imgf000019_0001
unit (17) having as inputs write port 2 address bits (WP2A5-0) , a doubleword enable control bit (DWRP2) , and a data load address transfer enable signal (DLAT) .
12. The register scoreboard unit of Claim 4, wherein a block transfer load instruction format 23 contains at least one operational code field (OPCODE) , a first source field (RSI) which points to a location in the register file (14) containing an initial main memory source address, a second source field (RS2) which points to a location in the register file containing a memory address increment for the block transfer load operation, a COUNT field which indicates the number of words or doublewords to be transferred, and a destination field (RD) which indicates the initial destination address location within the register file.
13. The register scoreboard unit of Claim 4, wherein a block transfer store instruction format contains at least one operational code field (OPCODE) , a first source field
(RSI) which points to a location in the register file (14) containing an initial main memory destination address, a second source field (RS2) which points to a location in the register file containing a memory address increment for the block transfer store operation, a COUNT field which indicates the number of words or doublewords to be transferred, and a destination field (RD) which indicates the initial source address location within the register file (14).
PCT/US1991/005885 1990-08-17 1991-08-19 Block transfer register scoreboard for data processing systems WO1992003777A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US56988590A 1990-08-17 1990-08-17
US569,885 1990-08-17

Publications (1)

Publication Number Publication Date
WO1992003777A1 true WO1992003777A1 (en) 1992-03-05

Family

ID=24277295

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US1991/005885 WO1992003777A1 (en) 1990-08-17 1991-08-19 Block transfer register scoreboard for data processing systems

Country Status (1)

Country Link
WO (1) WO1992003777A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6610828B1 (en) 1996-05-24 2003-08-26 Syngenta Limited Heliothis ecdysone receptor
DE102016117588A1 (en) * 2016-09-19 2018-03-22 Infineon Technologies Ag Processor arrangement and method for operating a processor arrangement
US20220405348A1 (en) * 2021-06-17 2022-12-22 International Business Machines Corporation Reformatting of tensors to provide sub-tensors

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4760518A (en) * 1986-02-28 1988-07-26 Scientific Computer Systems Corporation Bi-directional databus system for supporting superposition of vector and scalar operations in a computer
US4891753A (en) * 1986-11-26 1990-01-02 Intel Corporation Register scorboarding on a microprocessor chip
US4893233A (en) * 1988-04-18 1990-01-09 Motorola, Inc. Method and apparatus for dynamically controlling each stage of a multi-stage pipelined data unit

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4760518A (en) * 1986-02-28 1988-07-26 Scientific Computer Systems Corporation Bi-directional databus system for supporting superposition of vector and scalar operations in a computer
US4891753A (en) * 1986-11-26 1990-01-02 Intel Corporation Register scorboarding on a microprocessor chip
US4893233A (en) * 1988-04-18 1990-01-09 Motorola, Inc. Method and apparatus for dynamically controlling each stage of a multi-stage pipelined data unit

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6610828B1 (en) 1996-05-24 2003-08-26 Syngenta Limited Heliothis ecdysone receptor
US7183061B2 (en) 1996-05-24 2007-02-27 Syngenta Limited Method of expressing Heliothis ecdysone receptor fusion protein
DE102016117588A1 (en) * 2016-09-19 2018-03-22 Infineon Technologies Ag Processor arrangement and method for operating a processor arrangement
DE102016117588B4 (en) 2016-09-19 2024-09-26 Infineon Technologies Ag Processor arrangement and method for operating a processor arrangement
US20220405348A1 (en) * 2021-06-17 2022-12-22 International Business Machines Corporation Reformatting of tensors to provide sub-tensors

Similar Documents

Publication Publication Date Title
US7882332B1 (en) Memory mapped register file
US5922066A (en) Multifunction data aligner in wide data width processor
US5687336A (en) Stack push/pop tracking and pairing in a pipelined processor
US6571328B2 (en) Method and apparatus for obtaining a scalar value directly from a vector register
US5185872A (en) System for executing different cycle instructions by selectively bypassing scoreboard register and canceling the execution of conditionally issued instruction if needed resources are busy
US5991531A (en) Scalable width vector processor architecture for efficient emulation
US6374346B1 (en) Processor with conditional execution of every instruction
WO1996012228A1 (en) Redundant mapping tables
EP1261914B1 (en) Processing architecture having an array bounds check capability
EP0823083A1 (en) System for performing arithmetic operations with single or double precision
EP2267596B1 (en) Processor core for processing instructions of different formats
US7111155B1 (en) Digital signal processor computation core with input operand selection from operand bus for dual operations
US8909904B2 (en) Combined byte-permute and bit shift unit
US5752271A (en) Method and apparatus for using double precision addressable registers for single precision data
US7107302B1 (en) Finite impulse response filter algorithm for implementation on digital signal processor having dual execution units
JP2001501001A (en) Input operand control in data processing systems
US6820189B1 (en) Computation core executing multiple operation DSP instructions and micro-controller instructions of shorter length without performing switch operation
WO1992003777A1 (en) Block transfer register scoreboard for data processing systems
JP2001504956A (en) Data processing system register control
US6859872B1 (en) Digital signal processor computation core with pipeline having memory access stages and multiply accumulate stages positioned for efficient operation
EP0992893B1 (en) Verifying instruction parallelism
US11775310B2 (en) Data processing system having distrubuted registers
US5729729A (en) System for fast trap generation by creation of possible trap masks from early trap indicators and selecting one mask using late trap indicators
JP2001501329A (en) Data processing unit registers

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): DE JP

REG Reference to national code

Ref country code: DE

Ref legal event code: 8642