[go: up one dir, main page]

CN111538534B - A multi-instruction out-of-order emission method and processor based on instruction withering - Google Patents

A multi-instruction out-of-order emission method and processor based on instruction withering Download PDF

Info

Publication number
CN111538534B
CN111538534B CN202010264562.2A CN202010264562A CN111538534B CN 111538534 B CN111538534 B CN 111538534B CN 202010264562 A CN202010264562 A CN 202010264562A CN 111538534 B CN111538534 B CN 111538534B
Authority
CN
China
Prior art keywords
instruction
circuit
age
withering
wake
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010264562.2A
Other languages
Chinese (zh)
Other versions
CN111538534A (en
Inventor
虞致国
马晓杰
魏敬和
顾晓峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangnan University
Original Assignee
Jiangnan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangnan University filed Critical Jiangnan University
Priority to CN202010264562.2A priority Critical patent/CN111538534B/en
Priority to PCT/CN2020/098961 priority patent/WO2021203560A1/en
Publication of CN111538534A publication Critical patent/CN111538534A/en
Application granted granted Critical
Publication of CN111538534B publication Critical patent/CN111538534B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30003Arrangements for executing specific machine instructions
    • G06F9/30007Arrangements for executing specific machine instructions to perform operations on data operands
    • G06F9/3001Arithmetic instructions
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3836Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3867Concurrent instruction execution, e.g. pipeline or look ahead using instruction pipelines
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • Advance Control (AREA)

Abstract

The invention discloses a multi-instruction out-of-order transmitting method and a processor based on instruction withering, and belongs to the field of processor design. The invention abandons a lengthy arbitration structure in the traditional transmitting framework, increases an instruction withering circuit, adopts an instruction age array to represent the time of storing instructions in a CPU, additionally adds a one-bit wake-up state bit, stores the instructions exceeding the withering threshold value into a sedimentation tank so as to be directly transmitted by the CPU, improves the circuit structures of an instruction request circuit, an instruction distribution circuit, a wake-up circuit and the like, and effectively improves the time sequence of a critical path in the multi-instruction transmitting processor; when the instruction is awakened, the instruction with a short execution period is awakened in a delayed manner, and the instruction with a long execution period is awakened in an advanced manner, so that the instruction can be executed back to back, the requirements of high performance power consumption ratio, low delay and high IPC in a modern superscalar disordered processor are met, and the problems that the number of entries in a transmission queue of the processor in the prior art is increased and the delay is increased are solved.

Description

一种基于指令凋零的多指令乱序发射方法及处理器A multi-instruction out-of-order emission method and processor based on instruction withering

技术领域technical field

本发明涉及一种基于指令凋零的多指令乱序发射方法及处理器,属于处理器设计领域。The invention relates to a multi-instruction out-of-sequence emission method based on instruction withering and a processor, belonging to the field of processor design.

背景技术Background technique

自从Dennard扩展终结的十多年以来,CPU的单核性能改进尤为缓慢。在此背景下,重新研究核心微体系结构以获得高的单核性能是完全有必要的。CPUs have been particularly slow to improve in single-core performance in the over a decade since the Dennard scaling ended. In this context, it is absolutely necessary to re-study the core microarchitecture to achieve high single-core performance.

在CPU的众多结构中,指令发射架构是实现CPU高性能的重要架构之一。指令发射架构通过在每个周期从指令发射队列中的待发射指令中选择并发射指令来调度执行指令。为了获得高性能,指令发射架构必须在低延迟的情况下实现高IPC(Instructions perclock,每周期执行指令数)。同时在设计指令发射架构过程中,低延迟是重要的考虑因素,因为指令发射架构是处理器中的时序关键路径,指令发射架构的延迟会对CPU的工作主频产生重大影响。Among the many structures of the CPU, the instruction emission architecture is one of the important architectures to realize the high performance of the CPU. The instruction issue architecture schedules instructions for execution by selecting and issuing instructions from the instructions to issue in the instruction issue queue each cycle. In order to achieve high performance, the instruction launch architecture must achieve high IPC (Instructions per clock, the number of instructions executed per cycle) with low latency. At the same time, low latency is an important consideration in the process of designing the instruction emission architecture, because the instruction emission architecture is a timing-critical path in the processor, and the delay of the instruction emission architecture will have a significant impact on the operating frequency of the CPU.

传统多指令乱序发射架构通过仲裁电路来选择可以进行发射的指令,优点是可以准确选择年龄最大的指令进行发射,保证了处理器流水线的效率,但是随着发射队列表项数的增长,仲裁电路的延迟会相应增加。The traditional multi-instruction out-of-order issue architecture selects the instructions that can be issued through the arbitration circuit. The advantage is that the oldest instruction can be accurately selected for transmission, which ensures the efficiency of the processor pipeline. However, as the number of entries in the launch queue increases, the arbitration The delay of the circuit will increase accordingly.

在现代处理器中,为追求高IPC,发射队列中往往会设计众多表项,这就造成仲裁电路的延迟明显,使指令发射电路成为处理器中的关键路径,成为处理器的主频的瓶颈。In modern processors, in order to pursue high IPC, many entries are often designed in the launch queue, which causes the delay of the arbitration circuit to be obvious, making the instruction launch circuit a critical path in the processor and a bottleneck of the main frequency of the processor. .

针对以上需求和挑战,针对低延迟、高IPC等条件,提供一种基于指令凋零的多指令乱序发射架构的设计是非常迫切的。In view of the above requirements and challenges, it is very urgent to provide a multi-instruction out-of-order emission architecture design based on instruction withering for conditions such as low latency and high IPC.

本发明所设计的多指令乱序发射架构,在能有效判别指令年龄的大小、对处理器流水线的效率的影响尽可能小的条件下,时序路径的延迟不会随发射队列中表项数的增加而增加,保证在具有大量表项的处理器中延迟尽可能小,对处理器的主频提升提供了保障。The multi-instruction out-of-order emission architecture designed by the present invention can effectively determine the size of the instruction age and under the condition that the influence on the efficiency of the processor pipeline is as small as possible, the delay of the timing path will not increase with the number of entries in the emission queue. The increase is increased to ensure that the delay in the processor with a large number of table entries is as small as possible, which provides a guarantee for the increase of the main frequency of the processor.

发明内容Contents of the invention

为了解决目前通过仲裁电路来选择可以进行发射的指令的方法随着发射队列表项数的增长,仲裁电路的延迟会相应增加的问题,本发明提供一种基于指令凋零的多指令乱序发射方法及处理器。In order to solve the problem that the delay of the arbitration circuit will increase correspondingly with the increase of the number of items in the launch queue in the current method of selecting instructions that can be transmitted through the arbitration circuit, the present invention provides a multi-command out-of-sequence transmission method based on command withering and processor.

一种多指令乱序发射方法,在处理器的指令乱序发射架构中增加一个指令凋零电路,用于将新分配的指令存入发射队列,并对发射队列中的指令实现凋零操作;所述方法包括:A multi-instruction out-of-order emission method, adding an instruction withering circuit in the instruction out-of-order emission architecture of the processor, used to store newly allocated instructions into the emission queue, and implement the withering operation on the instructions in the emission queue; Methods include:

将指令凋零电路中各指令对应的指令年龄的最高位设置为指令的唤醒状态位,指令年龄的其余位表示指令本征年龄;唤醒状态位用来表示对应的指令是否被唤醒,发射队列中被唤醒的指令年龄大于非唤醒的指令年龄;Set the highest bit of the instruction age corresponding to each instruction in the instruction withering circuit as the wake-up state bit of the instruction, and the remaining bits of the instruction age represent the intrinsic age of the instruction; the wake-up state bit is used to indicate whether the corresponding The wake-up command age is greater than the non-wake-up command age;

设定凋零阈值,当某一指令的指令年龄超过凋零阈值时,指令年龄阵列触发凋零信号,使该指令发生凋零;发生凋零的指令无需经过仲裁就可被随机选择进行发射,实现多指令的乱序发射;Set the withering threshold. When the instruction age of a certain instruction exceeds the withering threshold, the instruction age array triggers the withering signal, causing the instruction to wither; the instruction that has withered can be randomly selected for launch without arbitration, realizing the randomness of multiple instructions. sequential launch;

所述发射队列中各指令根据指令年龄和唤醒状态确定发射顺序。Each instruction in the emission queue determines the emission order according to the instruction age and the wake-up state.

可选的,所述方法在唤醒指令时,对执行周期短的指令延迟唤醒,对执行周期长的指令提前唤醒,以保证指令能够背靠背执行。Optionally, when waking up instructions, the method delays waking up instructions with short execution cycles, and wakes up instructions with long execution cycles in advance, so as to ensure that the instructions can be executed back-to-back.

可选的,所述方法在唤醒指令时,当具有前后顺序的指令中在前指令被发射后,处理器等待在前指令执行完毕后再唤醒在后指令。Optionally, when waking up the instructions in the method, after the previous instruction is issued among the instructions having a sequence, the processor waits for the execution of the previous instruction to be completed before waking up the subsequent instruction.

可选的,所述指令乱序发射架构还包括指令分配电路,基于类加法器的指令请求电路和动态延迟唤醒电路;Optionally, the instruction out-of-order emission architecture also includes an instruction allocation circuit, an adder-based instruction request circuit and a dynamic delay wake-up circuit;

所述指令分配电路用于将物理寄存器发送过来的多条指令分配给发射队列中空闲的表项;The instruction allocation circuit is used to allocate multiple instructions sent by the physical register to idle entries in the emission queue;

所述基于类加法器的指令请求电路用于统计发射队列中表项空闲信号总数,并用特殊编码对空闲信号的数量进行编码,若经过该编码的空闲信号总数小于同样经过该编码的指令发射宽度,则向物理寄存器堆发出指令请求信号;The instruction request circuit based on the adder is used to count the total number of idle signals in the launch queue, and encode the number of idle signals with a special code, if the total number of idle signals after the code is smaller than the same coded instruction launch width , then send an instruction request signal to the physical register file;

所述动态延迟唤醒电路用于在待发射指令的源寄存器编号和已发射指令的目的寄存器编号相等时送出唤醒信号,同时,唤醒电路通过指令执行辨别电路识别待发射指令的执行周期,根据待发射指令的执行周期调整唤醒信号顺序,以保证指令能够背靠背执行。The dynamic delay wake-up circuit is used to send a wake-up signal when the source register number of the instruction to be transmitted is equal to the destination register number of the transmitted instruction. The execution cycle of the instruction adjusts the wake-up signal sequence to ensure that the instructions can be executed back to back.

可选的,所述指令凋零电路包含指令年龄阵列、发射队列、凋零阈值调整器、沉降池、全局年龄特征提取电路;Optionally, the instruction withering circuit includes an instruction age array, an emission queue, an withering threshold adjuster, a sedimentation pool, and a global age feature extraction circuit;

所述指令年龄阵列用于表示发射队列中各指令的指令年龄以及是否被唤醒;The instruction age array is used to indicate the instruction age of each instruction in the emission queue and whether it is awakened;

所述发射队列用于存放从物理寄存器发送过来的指令;发射队列设计为非压缩结构,即某表项中指令的物理寄存器编号被发射后呈空闲态时,其它表项不会进行移位,每个表项除了暂存当前指令的物理寄存器编号,还记录当前指令的唤醒状态以及表项是否为空闲状态;The emission queue is used to store instructions sent from physical registers; the emission queue is designed as a non-compressed structure, that is, when the physical register number of an instruction in a certain entry is idle after being emitted, other entries will not be shifted, In addition to temporarily storing the physical register number of the current instruction, each entry also records the wake-up state of the current instruction and whether the entry is idle;

所述凋零阈值调整器用于根据沉降池的空闲表项数和仍存留发射队列中的指令的年龄值,动态调整并输出凋零阈值;The withering threshold adjuster is used to dynamically adjust and output the withering threshold according to the number of idle entries in the settling pool and the age values of the instructions still remaining in the emission queue;

所述沉降池用于存有满足凋零条件的凋零指令;The settling tank is used to store withering instructions satisfying withering conditions;

所述全局年龄特征提取电路用于统计全局年龄特征。The global age feature extraction circuit is used for statistics of global age features.

可选的,所述凋零阈值调整器的输入为指令年龄阵列中各指令的年龄,输出为凋零阈值x,即:Optionally, the input of the withering threshold adjuster is the age of each instruction in the instruction age array, and the output is the withering threshold x, namely:

其中,σ为指令年龄的方差,μ为指令年龄的期望,α为调节系数,α满足 Among them, σ is the variance of the instruction age, μ is the expectation of the instruction age, α is the adjustment coefficient, and α satisfies

可选的,所述基于类加法器的指令请求电路包括类加法层和后log2(n/2)层移位逻辑层,n代表发射队列中的表项数。Optionally, the adder-like instruction request circuit includes an addition-like layer and a post-log2(n/2) shift logic layer, where n represents the number of entries in the transmit queue.

可选的,所述动态延迟唤醒电路由比较器、指令执行辨别电路、寄存器构成;唤醒电路的输入为待发射指令的源寄存器编号和已发射指令的目的寄存器编号,通过比较器比较待发射指令的源寄存器编号和已发射指令的目的寄存器编号是否相等,若相等则送出唤醒信号;同时唤醒电路通过指令执行辨别电路识别待发射指令的执行周期,并输出待发射指令的周期数,寄存器通过待发射指令的周期数对将要送出的唤醒信号进行寄存,从而达到对唤醒信号顺序调整的目的。Optionally, the dynamic delay wake-up circuit is composed of a comparator, an instruction execution discrimination circuit, and a register; the input of the wake-up circuit is the source register number of the instruction to be transmitted and the destination register number of the transmitted instruction, and the instruction to be transmitted is compared by the comparator Whether the source register number of the source register number and the destination register number of the transmitted instruction are equal, if they are equal, a wake-up signal is sent; at the same time, the wake-up circuit identifies the execution cycle of the instruction to be transmitted through the instruction execution discrimination circuit, and outputs the number of cycles of the instruction to be transmitted. The cycle number of the sending instruction registers the wake-up signal to be sent, so as to achieve the purpose of adjusting the sequence of the wake-up signal.

本申请还提供一种处理器,所述处理器的指令乱序发射架构包括指令分配电路,指令凋零电路,基于类加法器的指令请求电路和动态延迟唤醒电路;The present application also provides a processor, the instruction out-of-order emission architecture of the processor includes an instruction distribution circuit, an instruction withering circuit, an instruction request circuit based on a class adder, and a dynamic delay wake-up circuit;

所述指令分配电路用于将物理寄存器发送过来的多条指令分配给发射队列中空闲的表项;The instruction allocation circuit is used to allocate multiple instructions sent by the physical register to idle entries in the emission queue;

所述指令凋零电路用于将新分配的指令存入发射队列,并根据各指令的指令年龄对发射队列中的指令实现凋零操作;发生凋零的指令无需经过仲裁就可被随机选择进行发射;The instruction withering circuit is used to store the newly allocated instructions into the emission queue, and implement the withering operation on the instructions in the emission queue according to the instruction age of each instruction; the instruction withering can be randomly selected for emission without arbitration;

所述基于类加法器的指令请求电路用于统计发射队列中表项空闲信号总数,并用特殊编码对空闲信号的数量进行编码,若经过该编码的空闲信号总数小于同样经过该编码的指令发射宽度,则向物理寄存器堆发出指令请求信号;The instruction request circuit based on the adder is used to count the total number of idle signals in the launch queue, and encode the number of idle signals with a special code, if the total number of idle signals after the code is smaller than the same coded instruction launch width , then send an instruction request signal to the physical register file;

所述动态延迟唤醒电路用于在待发射指令的源寄存器编号和已发射指令的目的寄存器编号相等时送出唤醒信号,同时,唤醒电路通过指令执行辨别电路识别待发射指令的执行周期,根据待发射指令的执行周期调整唤醒信号顺序,以保证指令能够背靠背执行。The dynamic delay wake-up circuit is used to send a wake-up signal when the source register number of the instruction to be transmitted is equal to the destination register number of the transmitted instruction. The execution cycle of the instruction adjusts the wake-up signal sequence to ensure that the instructions can be executed back to back.

可选的,各指令的指令年龄的最高位设置为指令的唤醒状态位,指令年龄的其余位表示指令本征年龄;唤醒状态位用来表示对应的指令是否被唤醒,发射队列中被唤醒的指令年龄大于非唤醒的指令年龄。Optionally, the highest bit of the instruction age of each instruction is set as the wake-up state bit of the instruction, and the remaining bits of the instruction age represent the intrinsic age of the instruction; the wake-up state bit is used to indicate whether the corresponding instruction is awakened, and the awakened ones in the emission queue The command age is greater than the non-wakeup command age.

本发明有益效果是:The beneficial effects of the present invention are:

本申请摒弃传统发射架构中冗长的仲裁结构,增加指令凋零电路,采用指令年龄阵列来表征指令在CPU中存储的时间,另外加上一位唤醒状态位,将已经超过凋零阈值的指令存放至沉降池以便CPU直接发射,并改善指令请求电路、指令分配电路、唤醒电路等电路结构,有效改善多指令发射这一处理器中关键路径的时序;在唤醒指令时,对执行周期短的指令延迟唤醒,对执行周期长的指令提前唤醒,以保证指令能够背靠背执行,满足了现代超标量乱序处理器中高性能功耗比、低延时、高IPC的要求,解决了现有技术中处理器无法在发射队列表项数日益增加、延迟也日益增加的问题。This application abandons the lengthy arbitration structure in the traditional launch architecture, adds an instruction withering circuit, uses an instruction age array to represent the storage time of instructions in the CPU, and adds a wake-up status bit to store instructions that have exceeded the withering threshold in the settlement Pool for direct CPU launch, and improve the circuit structure of instruction request circuit, instruction distribution circuit, wake-up circuit, etc., effectively improve the timing of the critical path in the processor of multi-instruction launch; when waking up instructions, delay the wake-up of instructions with short execution cycles , to wake up the instructions with a long execution cycle in advance to ensure that the instructions can be executed back to back, which meets the requirements of high performance power consumption ratio, low delay, and high IPC in modern superscalar out-of-order processors, and solves the problem that processors in the prior art cannot Problems with increasing delays due to increasing number of launch queue entries.

附图说明Description of drawings

为了更清楚地说明本发明实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings that need to be used in the description of the embodiments will be briefly introduced below. Obviously, the drawings in the following description are only some embodiments of the present invention. For those skilled in the art, other drawings can also be obtained based on these drawings without creative effort.

图1为本发明的基于指令凋零的多指令乱序发射架构总体组成示意图。FIG. 1 is a schematic diagram of the overall composition of the multi-instruction out-of-order issue architecture based on instruction withering in the present invention.

图2为本发明的指令凋零电路的组成示意图。FIG. 2 is a schematic diagram of the composition of the instruction withering circuit of the present invention.

图3为本发明的指令分配电路的组成示意图。FIG. 3 is a schematic diagram of the composition of the instruction distribution circuit of the present invention.

图4为本发明的基于类加法器的指令请求电路的组成示意图。FIG. 4 is a schematic diagram of the composition of the instruction request circuit based on the class adder of the present invention.

图5为本发明的动态延迟唤醒电路的组成示意图。FIG. 5 is a schematic diagram of the composition of the dynamic delay wake-up circuit of the present invention.

图6为经过唤醒电路调整唤醒顺序的流水线示意图。FIG. 6 is a schematic diagram of a pipeline for adjusting a wake-up sequence through a wake-up circuit.

具体实施方式Detailed ways

为使本发明的目的、技术方案和优点更加清楚,下面将结合附图对本发明实施方式作进一步地详细描述。In order to make the purpose, technical solution and advantages of the present invention clearer, the following will further describe in detail the embodiments of the present invention in conjunction with the accompanying drawings.

实施例一:Embodiment one:

本实施例提供一种处理器,参见图1,所述处理器的多指令乱序发射架构总体组成示意图,所述多指令乱序发射架构包括:指令分配电路,指令凋零电路,基于类加法器的指令请求电路,动态延迟唤醒电路。This embodiment provides a processor. Referring to FIG. 1 , it is a schematic diagram of the overall structure of the multi-instruction out-of-order emission architecture of the processor. The multi-instruction out-of-order emission architecture includes: an instruction distribution circuit, an instruction withering circuit, and a class-based adder The instruction request circuit, the dynamic delay wake-up circuit.

其中,指令分配电路把经过寄存器重命名的指令分配给指令发射队列中的每个表项。指令发射队列包含多个表项,每个表项包含一条待发射的指令,若指令发射队列出现空闲表项,则会接受通过分配电路分配的指令。Wherein, the instruction allocating circuit allocates the register-renamed instruction to each entry in the instruction issuing queue. The instruction issue queue includes multiple entries, and each entry contains an instruction to be issued. If there is an idle entry in the instruction issue queue, the instruction allocated through the allocation circuit will be accepted.

所有刚进入表项的待发射指令呈未唤醒状态,若某指令的源寄存器编号与已发射的指令的目标寄存器的标号相等,则该指令会被唤醒电路唤醒。所有表项中的指令,由指令凋零电路实现指令凋零,所有完成凋零的指令最终会被发射,实现多指令的乱序发射,可在超标量乱序发射处理器中完成多指令的乱序发射。All the instructions to be issued that have just entered the entry are in the non-awakened state. If the number of the source register of an instruction is equal to the label of the target register of the issued instruction, the instruction will be awakened by the wake-up circuit. Instructions in all table entries are implemented by the instruction withering circuit, and all instructions that have completed the withering will be issued eventually, realizing the out-of-order emission of multiple instructions, and the out-of-order emission of multiple instructions can be completed in the superscalar out-of-order emission processor .

指令凋零电路的组成示意图如图2所示,所述指令凋零电路包含指令年龄阵列、发射队列、凋零阈值调整器、沉降池、全局年龄特征提取电路。The composition diagram of the command withering circuit is shown in Fig. 2, and the command withering circuit includes a command age array, an emission queue, a withering threshold adjuster, a settling pool, and a global age feature extraction circuit.

经过指令分配电路的新分配的指令,进入指令凋零电路存入空闲的发射队列表项,同时指令年龄阵列中对应的指令年龄被初始化,初始化为0与1之间的随机值。Instructions newly allocated by the instruction allocation circuit enter the instruction withering circuit and are stored in the idle launch queue entry, and at the same time, the corresponding instruction age in the instruction age array is initialized to a random value between 0 and 1.

每当有指令被发射,对年龄阵列释放年龄递增信号,在发射队列中未被发射的指令年龄相应加1。Whenever an instruction is issued, the age increment signal is released to the age array, and the age of the instructions that have not been issued in the emission queue is increased by 1 accordingly.

凋零阈值调整器根据沉降池空闲表项信息和全局年龄阈值,调整并输出凋零阈值,若指令年龄阵列中的某个指令年龄大于凋零阈值,则指令年龄阵列输出凋零信号,接收到凋零信号的指令执行凋零操作,由发射队列进入沉降池,发射队列中的相应表项置为空闲状态,等待新分配的指令输入。The withering threshold adjuster adjusts and outputs the withering threshold according to the free entry information of the sedimentation tank and the global age threshold. If the age of a certain command in the command age array is greater than the withering threshold, the command age array outputs a withering signal, and the command receiving the withering signal Execute the withering operation, enter the sedimentation pool from the launch queue, and set the corresponding entries in the launch queue to an idle state, waiting for the input of a newly allocated command.

处于沉降池中的凋零指令不用经过仲裁就可被发射。Wither commands that are in a settlement pool can be issued without arbitration.

所述凋零阈值调整器,输入为沉降池空闲表项信息和全局年龄特征,全局年龄特征值由全局年龄特征提取电路输出,调整器根据沉降池的空闲表项数和现行所有指令年龄值,调整并输出凋零阈值。The input of the withering threshold adjuster is the settling tank idle table item information and the global age feature, the global age feature value is output by the global age feature extraction circuit, and the adjuster adjusts according to the number of free table items in the settling tank and the age values of all current instructions. And output the wither threshold.

所述凋零阈值调整器的输入为指令年龄阵列中各指令的年龄,输出为凋零阈值,所述凋零阈值x,即:The input of the withering threshold adjuster is the age of each instruction in the instruction age array, and the output is the withering threshold, the withering threshold x, namely:

其中,α满足σ为指令年龄的方差,μ为指令年龄的期望。其特征值得推导过程如下:Among them, α satisfies σ is the variance of instruction age and μ is the expectation of instruction age. Its characteristics are worth deriving as follows:

现代处理器中,每秒可以处理数以亿计的指令,且年龄初始值为0与1之间的随机值,在此大样本条件下,可以认为处理器的年龄是连续的,并且根据大数定理,可以认为处理器的年龄服从正态分布:In modern processors, hundreds of millions of instructions can be processed per second, and the initial value of the age is a random value between 0 and 1. Under this large sample condition, it can be considered that the age of the processor is continuous, and according to the large According to the number theorem, it can be considered that the age of the processor obeys a normal distribution:

其中σ为指令年龄的方差,μ为指令年龄的期望。where σ is the variance of instruction age and μ is the expectation of instruction age.

构造函数g(x):Constructor g(x):

对(2)式变型For (2) variant

对(3)求一阶导Find the first derivative of (3)

对(3)求二阶导Find the second derivative of (3)

令(4)式为0可得Let formula (4) be 0 to get

将(6)式带入(5)使 Bring (6) into (5) so that

have to

为使按x为阈值所凋零的年龄尽可能大,对流水线效率的影响尽可能小,应有In order to make the age withered by x as the threshold as large as possible and have as little impact on the pipeline efficiency as possible, there should be

取最低阈值,得调节系数α的约束条件为Taking the lowest threshold, the constraint condition of the adjustment coefficient α is

综上可得To sum up, we can get

且α满足/> and α satisfies />

所述指令年龄阵列本质为计数器阵列,每个计数器总共位,代表相应指令的指令年龄,其中低/>位为年龄记数位,最高位1位为唤醒状态位。The instruction age array is essentially a counter array, and each counter has a total of bit, representing the instruction age of the corresponding instruction, where low /> The bit is the age counting bit, and the highest bit is the wake-up status bit.

每当新分配的指令进入指令凋零电路的发射队列,相对应的指令年龄置零;Whenever a newly allocated instruction enters the launch queue of the instruction withering circuit, the corresponding instruction age is set to zero;

每当有指令被发射,未被发射的指令对应的指令年龄加1;Whenever an instruction is issued, the age of the instruction corresponding to the instruction that has not been issued is increased by 1;

每当发射队列中指令被唤醒,指令对应的指令年龄的唤醒状态位置1,若某指令对应的指令年龄大于凋零阈值,则会输出凋零信号给发射队列,其中n代表发射队列的表项数,s代表指令发射宽度。Whenever an instruction in the launch queue is woken up, the wake-up status bit of the instruction age corresponding to the instruction is 1. If the instruction age corresponding to a certain instruction is greater than the withering threshold, a withering signal will be output to the launch queue, where n represents the number of entries in the launch queue, s represents the instruction issue width.

所述发射队列,包含n个表项,每个表项存有待发射的指令,以及表项空闲位。The transmit queue includes n entries, and each entry stores an instruction to be transmitted and an idle bit of the entry.

所述沉降池为表项数远小于指令发射队列的指令队列,其中存有满足凋零条件的凋零指令,沉降池中的凋零指令可不经过仲裁直接发射。The settling pool is an instruction queue whose number of entries is much smaller than that of the instruction issuing queue, in which there are withering instructions satisfying withering conditions, and the withering instructions in the settling pool can be directly issued without arbitration.

如图3所示为指令分配电路的组成示意图。指令分配电路用于将物理寄存器发送过来的多条指令分配给发射队列中空闲的表项。As shown in Figure 3, it is a schematic diagram of the composition of the instruction distribution circuit. The instruction allocating circuit is used for allocating multiple instructions sent from the physical registers to idle entries in the issue queue.

所述指令分配电路中包含s个表项编号选择电路,每个表项编号选择电路的输入为指令凋零电路中发射队列中n/s个表项的空闲信号序列与相应发射队列表项编号,表项编号选择电路根据输入空闲信号是否有效选择发射队列表项编号,若有多个空闲信号有效,则选择第一个空闲信号有效的表项编号;若不存在有效地空闲信号,则输出值为数据位上限的最大值,表示没有选中的表项。表项分配电路输出的表项编号与数值上限比较,若相等则有效信号置1,若不相等则置0。分配电路每个输入的待分配指令根据表项编号和有效信号写入相应表项。其中s代表指令发射宽度,n代表发射队列中的表项数。The instruction distribution circuit includes s entry number selection circuits, and the input of each entry number selection circuit is the idle signal sequence of n/s entries in the emission queue in the instruction withering circuit and the corresponding emission queue entry number, The entry number selection circuit selects the entry number of the transmit queue according to whether the input idle signal is valid. If there are multiple idle signals valid, the entry number of the first active idle signal is selected; if there is no effective idle signal, the output value It is the maximum value of the upper limit of data bits, indicating that there is no selected entry. The entry number output by the entry allocation circuit is compared with the upper limit of the numerical value, and if they are equal, the valid signal is set to 1, and if they are not equal, then it is set to 0. The instruction to be allocated for each input of the distribution circuit is written into the corresponding table entry according to the table entry number and the effective signal. Where s represents the instruction issue width, and n represents the number of entries in the issue queue.

所述表项编号选择电路由选择器阵列构成,如图2所示,第一列选择器输入表项编号,由于需选择首个空闲表项,所以选择器根据较小表项编号的空闲信号选择表项编号;第二层表项编号输入为第一层选择层的选择表项编号输出,选择信号为较小表项编号的空闲信号,以此类推总共有log2(n)层选择层。log2(n)层选择层的选择结果输出给全空表项选择器,该选择器的选择信号为第log2(n)层选择层的选择信号,待选数据为log2(n)层选择层的选择结果和数值上限值,若选择信号为0,则输出数值上限值作为最终表项编号输出;若不为0,则输出log2(n)层选择层的选择结果作为最终表项编号输出,其中n代表表项数。The entry number selection circuit is composed of a selector array. As shown in FIG. 2, the selector in the first column inputs the entry number. Since the first idle entry needs to be selected, the selector is based on the idle signal of the smaller entry number. Select the table entry number; the second layer table entry number input is the selection table entry number output of the first layer selection layer, and the selection signal is the idle signal of the smaller table entry number, and so on, there are a total of log2(n) layer selection layers. The selection result of the log2(n) layer selection layer is output to the all-empty item selector, the selection signal of the selector is the selection signal of the log2(n) layer selection layer, and the data to be selected is the log2(n) layer selection layer The selection result and numerical upper limit value, if the selection signal is 0, output the numerical upper limit value as the final entry number output; if not 0, output the selection result of the log2(n) layer selection layer as the final entry number output , where n represents the number of entries.

如图4所示为指令请求电路的组成示意图。所述指令请求电路用于统计表项空闲信号总数,并用特殊编码对空闲信号的数量进行编码,若经过该编码的空闲信号总数小于同样经过该编码的指令发射宽度,则向物理寄存器堆发出指令请求信号。指令请求电路由两部分构成:类加法层和后log2(n/2)层移位逻辑层。Figure 4 is a schematic diagram of the composition of the command request circuit. The instruction request circuit is used for counting the total number of idle signals of table entries, and encodes the number of idle signals with a special code, and if the total number of idle signals after the encoding is smaller than the instruction emission width that has also undergone the encoding, an instruction is sent to the physical register file request signal. The instruction request circuit is composed of two parts: the addition-like layer and the post-log2(n/2) shift logic layer.

所述类加法层由类加法计算单元构成;在统计表项空闲信号总数时,将表项的空闲信号序列输入类加法层,对表示空闲信号的数量进行运算并进行特殊编码,输出经过特殊编码后的空闲信号总数;将类加法层的输出送入后log2(n/2)层移位逻辑层,最终输出统计结果,将统计结果与同样经过特殊编码的指令发射宽度进行比较,以确定是否需要发送指令请求信号。The class addition layer is composed of a class addition calculation unit; when counting the total number of table item idle signals, the table item idle signal sequence is input into the class addition layer, and the number of idle signals is calculated and specially coded, and the output is specially coded The total number of idle signals at the end; the output of the class addition layer is sent to the rear log2(n/2) layer shifting logic layer, and the final output statistics are compared with the same specially encoded instruction launch width to determine whether A command request signal needs to be sent.

具体的,在统计表项空闲信号总数时,将表项的空闲信号序列输入类加法层,每个类加法单元输入为空闲信号序列中的两个二进制数并分别作与运算和异或运算,然后比较二者的计算结果:Specifically, when counting the total number of idle signals of table entries, the idle signal sequence of the table entry is input into the addition-like layer, and each addition-like unit is input as two binary numbers in the idle signal sequence and performs AND operation and XOR operation respectively, Then compare the calculation results of the two:

若相等,且与运算结果为1,则输出代表1的编码:“01”,表示类加法单元的两个二级制数输入的和为1,并对其编码为“01”;If they are equal, and the result of the AND operation is 1, the code representing 1 is output: "01", which means that the sum of the two binary system inputs of the addition unit is 1, and it is coded as "01";

若相等,且与运算结果位0,则输出代表0的编码:“10”,表示类加法单元的两个二级制数输入的和为0,并对其编码为“10”;If they are equal, and the bit of the operation result is 0, then the code representing 0 is output: "10", which means that the sum of the two binary system input of the addition unit is 0, and it is coded as "10";

若不相等,则输出代表2的编码:“00”,表示类加法单元的两个二级制数输入的和为2,并对其编码为“00”;If they are not equal, then the output represents the coding of 2: "00", which means that the sum of the two binary system input of the addition unit is 2, and it is coded as "00";

编码位数为n。The number of encoding bits is n.

后log2(n/2)层移位逻辑层由右移移位器构成;将类加法层的输出结果输入后log2(n/2)层移位逻辑层,与同样经过特殊编码的指令发射宽度进行比较,以确定是否需要发送指令请求信号,包括:The post-log2(n/2) layer shift logic layer is composed of a right-shift shifter; the output of the addition-like layer is input into the post-log2(n/2) layer shift logic layer, and the same specially encoded instruction emission width Comparisons are made to determine if a command request signal needs to be sent, including:

右移移位器把一类加法单元输出作为待移位数据输入,把另一类加法单元输出作为移位位数输入,待移位数通过右移移位器右移n位。其中n为移位位数所对应的十进制数。The right-shift shifter takes the output of one type of adding unit as the input of the data to be shifted, and the output of the other type of adding unit as the input of the number of bits to be shifted, and the number of bits to be shifted is shifted right by n bits by the right-shifter. Among them, n is the decimal number corresponding to the number of bits shifted.

例如,待移位数为“01”,移位位数为“00”,则根据上述编码规则,即对“01”右移2位;For example, if the number to be shifted is "01" and the number of shifted bits is "00", then according to the above coding rules, "01" is shifted to the right by 2 bits;

如图5所示为唤醒电路的组成示意图。所述唤醒电路由比较器、指令执行辨别电路、寄存器构成。Figure 5 is a schematic diagram of the composition of the wake-up circuit. The wake-up circuit is composed of a comparator, an instruction execution discrimination circuit and a register.

唤醒电路输入为待发射指令的源寄存器编号和已发射指令的目的寄存器编号,通过比较器来比较待发射指令的源寄存器编号和已发射指令的目的寄存器编号是否相等,若想等则送出唤醒信号;同时唤醒电路通过指令执行辨别电路识别待发射指令的执行周期,并输出待发射指令的周期数,寄存器通过待发射指令的周期数对将要送出的唤醒信号进行寄存,从而达到对唤醒信号顺序调整的目的,对执行周期短的指令延迟唤醒,对执行周期长的指令提前唤醒,以此保证流水线上的指令能够背靠背执行,提高流水线的效率。The input of the wake-up circuit is the source register number of the instruction to be transmitted and the destination register number of the transmitted instruction. The comparator is used to compare the source register number of the instruction to be transmitted and the destination register number of the transmitted instruction. At the same time, the wake-up circuit recognizes the execution cycle of the instruction to be transmitted through the instruction execution discrimination circuit, and outputs the cycle number of the instruction to be transmitted, and the register registers the wake-up signal to be sent by the cycle number of the instruction to be transmitted, so as to achieve the sequence adjustment of the wake-up signal The purpose is to delay the wake-up of instructions with a short execution cycle and wake up the instructions with a long execution cycle in advance, so as to ensure that the instructions on the pipeline can be executed back to back and improve the efficiency of the pipeline.

如图6所示为,经过指令唤醒调整的流水线示意图。指令A需3个执行周期,指令B、C、D各需一个执行周期,经过唤醒电路的唤醒顺序调整,使指令D延迟指令A两个周期唤醒,可使指令A、D之间插入两条背靠背执行的指令B、C,从而保证4条指令全部背靠背执行,不存在延迟气泡,提高流水线的执行效率。FIG. 6 is a schematic diagram of a pipeline after instruction wake-up adjustment. Instruction A requires 3 execution cycles, and instructions B, C, and D each require one execution cycle. After adjusting the wake-up sequence of the wake-up circuit, instruction D delays the wake-up of instruction A by two cycles, and two instructions can be inserted between instructions A and D. Instructions B and C are executed back-to-back, so as to ensure that all four instructions are executed back-to-back, without delay bubbles, and improve the execution efficiency of the pipeline.

实施例二Embodiment two

本实施例提供一种基于指令凋零的多指令乱序发射方法,用于实施例一所述的处理器中,该处理器中发射架构为非数据捕捉型发射架构,即CPU在发射阶段过后才会真正读取物理寄存器堆,发射队列中每个表项存放的皆为物理寄存器编号;所述方法包括:This embodiment provides a multi-instruction out-of-order emission method based on instruction withering, which is used in the processor described in Embodiment 1. The emission architecture of the processor is a non-data-capture emission architecture, that is, the CPU executes the instruction after the emission stage. The physical register file will be actually read, and each entry in the launch queue stores the number of the physical register; the method includes:

S1,当物理寄存器堆接受到指令请求电路的指令请求信号,输出合适指令到指令分配电路。S1, when the physical register file receives an instruction request signal from the instruction request circuit, output a suitable instruction to the instruction allocation circuit.

S2,指令分配电路把物理寄存器堆输出的指令分配给指令发射队列中的每个表项:S2, the instruction assignment circuit assigns the instruction output by the physical register file to each entry in the instruction emission queue:

指令分配电路包含s个表项编号选择电路,每个表项编号选择电路的输入为指令凋零电路中发射队列中n/s个表项的空闲信号序列与相应发射队列表项编号,表项编号选择电路根据输入空闲信号是否有效选择发射队列表项编号,若有多个空闲信号有效,则选择第一个空闲信号有效的表项编号;若不存在有效地空闲信号,则输出值为数据位上限的最大值,表示没有选中的表项。The command allocation circuit includes s table entry number selection circuits, and the input of each table entry number selection circuit is the idle signal sequence of n/s table entries in the transmission queue in the instruction withering circuit and the corresponding transmission queue entry number, and the table entry number The selection circuit selects the entry number of the transmit queue according to whether the input idle signal is valid. If multiple idle signals are valid, the entry number of the first idle signal is selected; if there is no effective idle signal, the output value is a data bit The maximum value of the upper limit, indicating that there is no selected entry.

表项分配电路输出的表项编号与数值上限比较,若相等则有效信号置1,若不相等则置0。分配电路每个输入的待分配指令根据表项编号和有效信号写入相应表项。The entry number output by the entry allocation circuit is compared with the upper limit of the numerical value, and if they are equal, the valid signal is set to 1, and if they are not equal, then it is set to 0. The instruction to be allocated for each input of the distribution circuit is written into the corresponding table entry according to the table entry number and the effective signal.

其中s代表指令发射宽度,n代表发射队列中的表项数。Where s represents the instruction issue width, and n represents the number of entries in the issue queue.

经过指令分配电路的新分配的指令,进入指令凋零电路存入空闲的发射队列表项,同时指令年龄阵列中对应的指令年龄被初始化,初始化为0与1之间的随机值。Instructions newly allocated by the instruction allocation circuit enter the instruction withering circuit and are stored in the idle launch queue entry, and at the same time, the corresponding instruction age in the instruction age array is initialized to a random value between 0 and 1.

S3,指令凋零电路中的发射队列每接受一个新的指令,该指令所在表项对应的指令年龄阵列中的指令年龄置零;指令凋零电路每发射一个指令,仍在发射队列中的指令对应的指令年龄加一;指令对应的指令年龄的最高位为指令的唤醒状态位,其余位表示指令本征年龄。发射队列中的指令被唤醒后,所对应的年龄信息的最高位置一,保证唤醒的指令年龄大于非唤醒的指令年龄。S3, every time the launch queue in the command withering circuit accepts a new command, the command age in the command age array corresponding to the entry of the command is set to zero; every time the command withering circuit transmits a command, the command still in the transmit queue corresponds to Add one to the instruction age; the highest bit of the instruction age corresponding to the instruction is the wake-up status bit of the instruction, and the remaining bits represent the intrinsic age of the instruction. After the commands in the launch queue are woken up, the highest position of the corresponding age information is one, which ensures that the age of the wake-up command is greater than the age of the non-wake-up command.

当指令年龄超过凋零阈值时,指令年龄阵列会触发凋零信号,使该指令发生凋零,发生凋零的指令由发射队列进入沉降池,同时发射队列中的该表项置为空闲。When the instruction age exceeds the withering threshold, the instruction age array will trigger the withering signal, causing the instruction to wither, and the withered instruction will enter the settlement pool from the emission queue, and the entry in the emission queue will be set to idle.

沉降池为表项数远小于发射队列的指令队列,存有凋零后的指令,沉降池中的凋零指令可被随机选择进行发射。The settling pool is an instruction queue whose number of entries is much smaller than that of the launch queue, and there are instructions after withering, and the withering commands in the settling pool can be randomly selected for launch.

凋零电路中的发射队列设计为非压缩结构,即某表项中指令的物理寄存器编号被发射后呈空闲态时,其它表项不会进行移位,每个表项除了暂存当前指令的物理寄存器编号,还记录当前指令的唤醒状态以及表项是否为空闲状态;The launch queue in the withering circuit is designed as a non-compressed structure, that is, when the physical register number of an instruction in a certain table entry is idle after being sent, other table entries will not be shifted, and each table entry saves the physical register number of the current instruction. The register number also records the wake-up state of the current instruction and whether the entry is idle;

S4,发射队列中表项的空闲信号同时会传送给指令请求电路,指令请求电路统计发射队列中的空闲表项数,若发射队列中的空闲表项数大于指令发射宽度,则请求电路会向物理寄存器堆发送指令请求信号,物理寄存器堆接受请求信号,并向指令分配电路发送指令;S4, the idle signal of the entry in the launch queue will be sent to the command request circuit at the same time, and the command request circuit will count the number of idle entries in the launch queue. If the number of idle entries in the launch queue is greater than the command launch width, the request circuit will send The physical register file sends an instruction request signal, the physical register file receives the request signal, and sends an instruction to the instruction distribution circuit;

S5,指令发射过程中,唤醒电路负责比较当前发射的目的寄存器编号与发射队列中的各指令的源寄存器编号,若编号相等则发出唤醒信号,同时根据该指令的执行周期判断改唤醒信号是否要延迟发送,执行周期长的指令提前唤醒,执行周期短的指令延后发射。唤醒信号对该指令对应的指令年龄中的唤醒状态位置1,保证被唤醒的指令年龄大于未被唤醒的指令年龄。S5, during the instruction emission process, the wake-up circuit is responsible for comparing the target register number of the current emission with the source register number of each instruction in the emission queue. Delayed transmission, early wake-up of instructions with long execution cycle, and delayed transmission of instructions with short execution cycle. The wake-up signal sets the wake-up state bit in the instruction age corresponding to the instruction to 1, ensuring that the age of the awakened instruction is greater than the age of the unawakened instruction.

本发明实施例中的部分步骤,可以利用软件实现,相应的软件程序可以存储在可读取的存储介质中,如光盘或硬盘等。Part of the steps in the embodiments of the present invention can be realized by software, and the corresponding software program can be stored in a readable storage medium, such as an optical disk or a hard disk.

以上所述仅为本发明的较佳实施例,并不用以限制本发明,凡在本发明的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本发明的保护范围之内。The above descriptions are only preferred embodiments of the present invention, and are not intended to limit the present invention. Any modifications, equivalent replacements, improvements, etc. made within the spirit and principles of the present invention shall be included in the protection of the present invention. within range.

Claims (9)

1. A multi-instruction out-of-order emission method is characterized in that an instruction withering circuit is added in an instruction out-of-order emission framework of a processor and used for storing newly allocated instructions into an emission queue and implementing withering operation on the instructions in the emission queue; the method comprises the following steps:
setting the highest bit of the instruction age corresponding to each instruction in the instruction withering circuit as the awakening state bit of the instruction, wherein the rest bits of the instruction age represent the intrinsic age of the instruction; the wake-up status bit is used for indicating whether the corresponding instruction is waken up, and the waken-up instruction age in the transmitting queue is larger than the non-waken-up instruction age;
setting a withering threshold, and triggering a withering signal by an instruction age array when the instruction age of a certain instruction exceeds the withering threshold so as to enable the instruction to wither; the instruction with withering can be randomly selected to transmit without arbitration, so that out-of-order transmission of multiple instructions is realized;
each instruction in the transmitting queue determines a transmitting sequence according to the instruction age and the awakening state;
the input of the withering threshold value adjuster is the age of each instruction in the instruction age array, and the output is the withering threshold value x, namely:
wherein σ is the variance of the instruction age, μ is the expectation of the instruction age, α is the adjustment coefficient, and α satisfies
2. The method of claim 1, wherein the method wakes up instructions with a short execution period later and wakes up instructions with a long execution period earlier when waking up instructions to ensure that instructions can be executed back-to-back.
3. The method of claim 2, wherein the method wakes up a subsequent instruction after the processor waits for the preceding instruction to complete execution when the preceding instruction is issued among the instructions having the order.
4. The method of claim 3, wherein the instruction out-of-order issue architecture further comprises an instruction dispatch circuit, a class adder based instruction request circuit and a dynamic delay wakeup circuit;
the instruction distribution circuit is used for distributing a plurality of instructions sent by the physical register to idle table items in the transmission queue;
the instruction request circuit based on the class adder is used for counting the total number of idle signals in an entry in the transmission queue, encoding the number of the idle signals by using a special encoding, and if the total number of the encoded idle signals is smaller than the transmission width of the instruction which is also encoded, transmitting an instruction request signal to a physical register file;
the dynamic delay wake-up circuit is used for sending out a wake-up signal when the source register number of the instruction to be transmitted is equal to the destination register number of the transmitted instruction, and meanwhile, the wake-up circuit identifies the execution period of the instruction to be transmitted through the instruction execution distinguishing circuit, and adjusts the sequence of the wake-up signal according to the execution period of the instruction to be transmitted so as to ensure that the instruction can be executed back to back.
5. The method of claim 4, wherein the instruction littering circuit comprises an instruction age array, a transmit queue, a littering threshold regulator, a settling pond, a global age feature extraction circuit;
the instruction age array is used for indicating the instruction age of each instruction in the emission queue and whether the instruction is awakened;
the emission queue is used for storing instructions sent from the physical register; the transmitting queue is designed into a non-compressed structure, namely when the physical register number of an instruction in a certain table item is in an idle state after being transmitted, other table items cannot be shifted, and each table item not only stores the physical register number of the current instruction temporarily, but also records the awakening state of the current instruction and whether the table item is in the idle state;
the withering threshold value adjuster is used for dynamically adjusting and outputting a withering threshold value according to the number of idle items of the sedimentation tank and the age value of an instruction in a remained transmitting queue;
the sedimentation tank is used for storing a withering instruction meeting a withering condition;
the global age characteristic extraction circuit is used for counting global age characteristics.
6. The method of claim 4, wherein the class adder based instruction request circuit comprises a class addition layer and a post log2 (n/2) layer shift logic layer, n representing the number of entries in the transmit queue.
7. The method of claim 4, wherein the dynamic delay wake-up circuit is comprised of a comparator, an instruction execution discrimination circuit, a register; the input of the wake-up circuit is the source register number of the instruction to be transmitted and the destination register number of the transmitted instruction, the source register number of the instruction to be transmitted and the destination register number of the transmitted instruction are compared through a comparator, whether the source register number of the instruction to be transmitted and the destination register number of the transmitted instruction are equal or not is judged, and if the source register number of the instruction to be transmitted and the destination register number of the transmitted instruction are equal, a wake-up signal is sent out; meanwhile, the wake-up circuit recognizes the execution cycle of the instruction to be transmitted through the instruction execution distinguishing circuit, outputs the cycle number of the instruction to be transmitted, and registers the wake-up signal to be transmitted through the cycle number of the instruction to be transmitted, so that the purpose of sequentially adjusting the wake-up signal is achieved.
8. A processor, wherein the instruction out-of-order emission architecture of the processor comprises an instruction distribution circuit, an instruction wither circuit, an instruction request circuit based on class adders and a dynamic delay wake-up circuit;
the instruction distribution circuit is used for distributing a plurality of instructions sent by the physical register to idle table items in the transmission queue;
the instruction withering circuit is used for storing newly allocated instructions into the transmission queue and implementing withering operation on the instructions in the transmission queue according to the instruction ages of the instructions; the instruction with withering can be randomly selected to transmit without arbitration;
the instruction request circuit based on the class adder is used for counting the total number of idle signals in an entry in the transmission queue, encoding the number of the idle signals by using a special encoding, and if the total number of the encoded idle signals is smaller than the transmission width of the instruction which is also encoded, transmitting an instruction request signal to a physical register file;
the dynamic delay wake-up circuit is used for sending out a wake-up signal when the source register number of the instruction to be transmitted is equal to the destination register number of the transmitted instruction, and meanwhile, the wake-up circuit identifies the execution period of the instruction to be transmitted through the instruction execution distinguishing circuit, and adjusts the sequence of the wake-up signal according to the execution period of the instruction to be transmitted so as to ensure that the instruction can be executed back to back.
9. The processor of claim 8, wherein a highest bit of an instruction age of each instruction is set to an awake state bit of the instruction, and remaining bits of the instruction age represent an instruction intrinsic age; the wake state bit is used to indicate whether the corresponding instruction is awakened, and the instruction age of the awakened instruction in the transmit queue is greater than the instruction age of the non-awakened instruction.
CN202010264562.2A 2020-04-07 2020-04-07 A multi-instruction out-of-order emission method and processor based on instruction withering Active CN111538534B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202010264562.2A CN111538534B (en) 2020-04-07 2020-04-07 A multi-instruction out-of-order emission method and processor based on instruction withering
PCT/CN2020/098961 WO2021203560A1 (en) 2020-04-07 2020-06-29 Instruction withering-based multi-instruction out-of-order transmission method and processor

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010264562.2A CN111538534B (en) 2020-04-07 2020-04-07 A multi-instruction out-of-order emission method and processor based on instruction withering

Publications (2)

Publication Number Publication Date
CN111538534A CN111538534A (en) 2020-08-14
CN111538534B true CN111538534B (en) 2023-08-08

Family

ID=71978534

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010264562.2A Active CN111538534B (en) 2020-04-07 2020-04-07 A multi-instruction out-of-order emission method and processor based on instruction withering

Country Status (2)

Country Link
CN (1) CN111538534B (en)
WO (1) WO2021203560A1 (en)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112099854B (en) * 2020-11-10 2021-04-23 北京微核芯科技有限公司 Method and device for scheduling out-of-order queue and judging queue cancellation item
EP4027236A4 (en) 2020-11-10 2023-07-26 Beijing Vcore Technology Co.,Ltd. Method and device for scheduling out-of-order queues and determining queue cancel items
CN113254079B (en) * 2021-06-28 2021-10-01 广东省新一代通信与网络创新研究院 A method and system for implementing self-incrementing instructions
CN114519319B (en) * 2021-12-30 2024-09-10 中国人民解放军国防科技大学 A hybrid transmission queue design implementation method and system based on high-level modeling
CN115509610B (en) * 2022-09-29 2025-07-25 上海壁仞科技股份有限公司 Out-of-order execution computing device
CN117908968A (en) * 2022-10-11 2024-04-19 深圳市中兴微电子技术有限公司 Instruction sending method, device, equipment and medium based on compressed transmission queue
CN117742796B (en) * 2023-12-11 2024-07-23 上海合芯数字科技有限公司 Command wake-up method, device and equipment

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101395573A (en) * 2006-02-28 2009-03-25 Mips技术公司 Distributive scoreboard scheduling in an out-of order processor
CN104932945A (en) * 2015-06-18 2015-09-23 合肥工业大学 Task-level out-of-order multi-issue scheduler and scheduling method thereof

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7464253B2 (en) * 2006-10-02 2008-12-09 The Regents Of The University Of California Tracking multiple dependent instructions with instruction queue pointer mapping table linked to a multiple wakeup table by a pointer
CN101706714B (en) * 2009-11-23 2014-03-26 龙芯中科技术有限公司 System and method for issuing instruction, processor and design method thereof
CN101826000A (en) * 2010-01-29 2010-09-08 北京龙芯中科技术服务中心有限公司 Interrupt response determining method, device and microprocessor core for pipeline microprocessor
US10185564B2 (en) * 2016-04-28 2019-01-22 Oracle International Corporation Method for managing software threads dependent on condition variables
CN109885857B (en) * 2018-12-26 2023-09-01 上海合芯数字科技有限公司 Instruction emission control method, instruction execution verification method, system and storage medium
CN110297662B (en) * 2019-07-04 2021-11-30 中昊芯英(杭州)科技有限公司 Method for out-of-order execution of instructions, processor and electronic equipment

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101395573A (en) * 2006-02-28 2009-03-25 Mips技术公司 Distributive scoreboard scheduling in an out-of order processor
CN104932945A (en) * 2015-06-18 2015-09-23 合肥工业大学 Task-level out-of-order multi-issue scheduler and scheduling method thereof

Also Published As

Publication number Publication date
CN111538534A (en) 2020-08-14
WO2021203560A1 (en) 2021-10-14

Similar Documents

Publication Publication Date Title
CN111538534B (en) A multi-instruction out-of-order emission method and processor based on instruction withering
US7966506B2 (en) Saving power in a computer system
TWI713637B (en) Hardware processor, method, and system for data decompression
US7765352B2 (en) Reducing core wake-up latency in a computer system
CN101436098A (en) Method and apparatus for reducing power consumption of multiple-core symmetrical multiprocessing system
US20120297216A1 (en) Dynamically selecting active polling or timed waits
CN104679689B (en) A kind of multinuclear DMA segment data transmission methods counted using slave for GPDSP
CN115374923A (en) RISC-V expansion based universal neural network processor micro-architecture
CN114528024B (en) An instruction fetch pipeline for a storage-computation fusion processor
CN112540796B (en) Instruction processing device, processor and processing method thereof
CN111552366B (en) A Dynamic Delay Wake-Up Circuit and Out-of-Order Instruction Issue Architecture
CN119539001A (en) A neural network extension execution computer system based on RISC-V
WO2024087559A1 (en) Memory access method and system, and apparatus and electronic device
WO2025140132A1 (en) Instruction execution module for use in processor, chip, device, and method
CN111538533B (en) Class adder-based instruction request circuit and out-of-order instruction transmitting architecture
CN118409800A (en) Dual-emission out-of-order RISC-V processor back end
US8694740B2 (en) Area efficient counters array system and method for updating counters
CN117632262A (en) Branch prediction method and system, branch predictor, processor and storage medium
CN102831024B (en) Anti-starvation memory request wake-up method based on random turbulence
CN116880774A (en) Dirty page write-back method under Linux system
CN110187865A (en) Full-pipeline high-throughput accumulator and data processing method therefor
CN111126584B (en) Data write-back system
TW522339B (en) Method and apparatus for buffering microinstructions between a trace cache and an allocator
US8209492B2 (en) Systems and methods of accessing common registers in a multi-core processor
TW202203146A (en) Graphics processing unit

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant