CN102855213B

CN102855213B - A kind of instruction storage method of network processing unit instruction storage device and the device

Info

Publication number: CN102855213B
Application number: CN201210233710.XA
Authority: CN
Inventors: 郝宇; 安康; 王志忠; 刘衡祁
Original assignee: ZTE Corp
Current assignee: Sanechips Technology Co Ltd
Priority date: 2012-07-06
Filing date: 2012-07-06
Publication date: 2017-10-27
Anticipated expiration: 2032-07-06
Also published as: CN102855213A; WO2013185660A1

Abstract

The invention discloses a network processor instruction storage device and an instruction storage method of the device, which can save hardware resources. The network processor includes more than two microengine groups, and each microengine group includes N microengines, and the N microengines include more than two microengine groups, and the instruction storage device includes: Qmem, Cache, the first low-speed instruction memory and the second low-speed instruction memory, wherein: each microengine corresponds to a Qmem and a cache, Qmem is set to be connected to the microengine, and the cache is connected to Qmem; each microengine group corresponds to a first low-speed Instruction memory, the cache corresponding to each microengine in the microengine group is connected to the first low-speed instruction memory; each microengine large group corresponds to a second low-speed instruction memory, and the cache corresponding to each microengine in the microengine large group is connected to the first low-speed instruction memory The two low-speed instruction memories are connected. Adopting this solution saves a lot of hardware storage resources.

Description

A network processor instruction storage device and the instruction storage method of the device

技术领域technical field

本发明涉及互联网领域，具体涉及一种网络处理器指令存储装置及该装置的指令存储方法。The invention relates to the Internet field, in particular to a network processor instruction storage device and an instruction storage method of the device.

背景技术Background technique

随着互联网（Internet）的迅猛发展，用于主干网络互联的核心路由器的接口速率已经达到100Gbps，该速率要求核心路由器的线卡能够迅速处理通过线卡上的报文，当前业界大都使用多核网络处理器的结构。而指令的取指效率是影响多核网络处理器性能的一个关键因素。With the rapid development of the Internet (Internet), the interface rate of the core router used for backbone network interconnection has reached 100Gbps. This rate requires the line card of the core router to be able to quickly process the messages passing through the line card. Currently, most industries use multi-core networks. The structure of the processor. The fetching efficiency of instructions is a key factor affecting the performance of multi-core network processors.

在多核结构的网络处理器系统中，同一组微引擎(Micro Engine，简称ME)有着同样的指令需求，由于芯片面积和工艺的限制，无法为每一个微引擎都配备一块独享的存储空间来存储这些指令。因此需要设计一个相应的方案来实现一组微引擎对一片指令存储空间的共享，同时能够有较高的取指效率。In a network processor system with a multi-core structure, the same group of micro engines (Micro Engine, ME for short) has the same instruction requirements. Due to the limitation of chip area and process, it is impossible to equip each micro engine with an exclusive storage space to Store these instructions. Therefore, it is necessary to design a corresponding scheme to realize the sharing of a piece of instruction storage space by a group of microengines, and at the same time, it can have higher instruction fetch efficiency.

一些传统的多核网络处理器使用多级缓存的结构，譬如每个微引擎配备一个单独的一级缓存，一组微引擎共享一个二级缓存的结构来实现存储空间的共享，如图1所示。这些缓存都具有较大的空间以保证命中率，但是由于网络报文的随机性造成指令的局部性不强，因此大容量的缓存并不能保证取指效率，同时还造成资源的大量浪费。Some traditional multi-core network processors use a multi-level cache structure. For example, each micro-engine is equipped with a separate first-level cache, and a group of micro-engines share a second-level cache structure to share storage space, as shown in Figure 1. . These caches have a large space to ensure the hit rate, but due to the randomness of network packets, the locality of instructions is not strong, so large-capacity caches cannot guarantee the efficiency of instruction fetching, and also cause a lot of waste of resources.

另一些网络处理器采用了轮询式的指令存储方案，将一组微引擎所需的指令存储在与微引擎同等数量的RAM内，如图2所示，图中4个微引擎通过一个仲裁模块轮询4个RAM中的指令。每个微引擎依次地访问所有的RAM，它们的访问始终处于不同的“相位”，因此不会发生不同微引擎访问同一个RAM的碰撞，实现了存储空间的共享。但是由于指令中存在大量的跳转指令，假设对流水线结构的微引擎来说，从开始取到跳转指令到跳转完成需要n个时钟的时间，为保证某个跳转指令的目标在该跳转指令所在RAM后面第n+1个RAM中，写入指令的时候必须要插入一些空指令保证跳转目标位置的正确。当跳转指令所占比例很大的时候就需要插入大量的空指令，造成了指令空间的大量浪费，而且也会增加编译器实现的复杂度。该方案要求所有的RAM都能在1个时钟周期返回数据，须采用SRAM实现，但大量SRAM的使用也造成大量的资源开销。Some other network processors adopt a polling instruction storage scheme, and store the instructions required by a group of micro-engines in the same amount of RAM as the micro-engines, as shown in Figure 2, in which 4 micro-engines pass an arbitration The module polls the instructions in the 4 RAMs. Each microengine accesses all RAMs sequentially, and their accesses are always in different "phases", so there will be no collision between different microengines accessing the same RAM, and the sharing of storage space is realized. However, because there are a large number of jump instructions in the instructions, assuming that for a micro-engine with a pipeline structure, it takes n clocks from the start of fetching the jump instruction to the completion of the jump, in order to ensure that the target of a certain jump instruction is within the In the n+1th RAM behind the RAM where the jump instruction is located, some empty instructions must be inserted when writing the instruction to ensure that the jump target position is correct. When jump instructions account for a large proportion, a large number of empty instructions need to be inserted, which causes a lot of waste of instruction space, and also increases the complexity of compiler implementation. This scheme requires that all RAMs can return data within one clock cycle, which must be implemented with SRAM, but the use of a large number of SRAMs also causes a large amount of resource overhead.

发明内容Contents of the invention

本发明要解决的技术问题是提供一种网络处理器指令存储装置及该装置的指令存储方法，能够节约硬件资源。The technical problem to be solved by the present invention is to provide a network processor instruction storage device and an instruction storage method of the device, which can save hardware resources.

为解决上述技术问题，本发明提供了一种网络处理器指令存储装置，网络处理器包括两个以上的微引擎大组，每个微引擎大组包括N个微引擎，该N个微引擎包括两个以上的微引擎小组，所述指令存储装置包括：快速存储器（Qmem）、缓存（cache）、第一低速指令存储器和第二低速指令存储器，其中：In order to solve the above-mentioned technical problems, the present invention provides a network processor instruction storage device, the network processor includes more than two large groups of micro-engines, each large group of micro-engines includes N micro-engines, and the N micro-engines include More than two microengine groups, the instruction storage device includes: fast memory (Qmem), cache (cache), first low-speed instruction memory and second low-speed instruction memory, wherein:

每个微引擎对应一个Qmem和一个缓存，Qmem设置为与微引擎连接，缓存与Qmem相连；Each microengine corresponds to a Qmem and a cache, Qmem is set to be connected to the microengine, and the cache is connected to Qmem;

每个微引擎小组对应一个第一低速指令存储器，微引擎小组中每个微引擎对应的缓存与第一低速指令存储器相连；Each microengine group corresponds to a first low-speed instruction memory, and the cache corresponding to each microengine in the microengine group is connected to the first low-speed instruction memory;

每个微引擎大组对应一个第二低速指令存储器，微引擎大组中每个微引擎对应的缓存与第二低速指令存储器相连。Each large group of microengines corresponds to a second low-speed instruction memory, and the cache corresponding to each microengine in the large group of microengines is connected to the second low-speed instruction memory.

进一步地，所述Qmem用于在接收到微引擎发送的指令数据请求后，判断本Qmem是否有该指令数据，如果有，则将指令数据返回给微引擎，如果没有，则向缓存发送指令数据请求。Further, the Qmem is used to judge whether the Qmem has the instruction data after receiving the instruction data request sent by the microengine, and if so, return the instruction data to the microengine, and if not, send the instruction data to the cache ask.

进一步地，所述Qmem中存储对处理质量要求最高的一个地址段的指令。Further, the Qmem stores an instruction of an address segment that requires the highest processing quality.

进一步地，所述缓存包括两个Cache Line，每个Cache Line存放多条连续的指令，所述Cache Line用于在接收到Qmem发送的指令数据请求后，判断本缓存是否有该指令数据，如果有，则将指令数据通过Qmem返回给微引擎，如果没有，则向第一低速指令存储器或第二低速指令存储器发送指令数据请求。Further, the cache includes two Cache Lines, each Cache Line stores a plurality of consecutive instructions, and the Cache Line is used to determine whether the cache has the instruction data after receiving the instruction data request sent by Qmem, if If yes, return the instruction data to the microengine through Qmem, if not, send the instruction data request to the first low-speed instruction memory or the second low-speed instruction memory.

进一步地，所述两个Cache Line采用乒乓操作形式，且与报文存储器的乒乓操作同步。Further, the two Cache Lines adopt a ping-pong operation and are synchronized with the ping-pong operation of the message memory.

进一步地，所述装置还包括第一仲裁模块、第二仲裁模块和第三仲裁模块，其中：Further, the device further includes a first arbitration module, a second arbitration module and a third arbitration module, wherein:

每个微引擎对应一个第一仲裁模块，该第一仲裁模块与每个微引擎的缓存相连；Each microengine corresponds to a first arbitration module, and the first arbitration module is connected to the cache of each microengine;

每个微引擎小组对应一个第二仲裁模块，该第二仲裁模块的一端与微引擎小组中每个微引擎的第一仲裁模块相连，另一端与第一低速指令存储器相连；Each microengine group corresponds to a second arbitration module, one end of the second arbitration module is connected to the first arbitration module of each microengine in the microengine group, and the other end is connected to the first low-speed instruction memory;

每个微引擎大组对应一个第三仲裁模块，该第三仲裁模块的一端与每个微引擎的第一仲裁模块相连，另一端与第二低速指令存储器相连。Each large microengine group corresponds to a third arbitration module, one end of the third arbitration module is connected to the first arbitration module of each microengine, and the other end is connected to the second low-speed instruction memory.

进一步地，所述第一仲裁模块，用于在缓存请求指令数据时，判断所请求的指令位于第一低速指令存储器还是位于第二低速指令存储器，向第一低速指令存储器或第二低速指令存储器发送指令数据请求；以及用于接收第一低速指令存储器或第二低速指令存储器返回的指令数据，将该指令数据返回给缓存；Further, the first arbitration module is used for judging whether the requested instruction is located in the first low-speed instruction memory or in the second low-speed instruction memory when cache requesting instruction data, and sending the request to the first low-speed instruction memory or the second low-speed instruction memory Sending a request for instruction data; and receiving instruction data returned by the first low-speed instruction memory or the second low-speed instruction memory, and returning the instruction data to the cache;

所述第二仲裁模块，用于在接收到一个或多个第一仲裁模块发送的指令数据请求时，选择一个指令数据请求发送给第一低速指令存储器处理，将第一低速指令存储器取指后得到指令数据返回给相应的第一仲裁模块；The second arbitration module is configured to select an instruction data request and send it to the first low-speed instruction memory for processing when receiving one or more instruction data requests sent by the first arbitration module, and after fetching the first low-speed instruction memory Get the command data and return it to the corresponding first arbitration module;

所述第三仲裁模块，用于在接收到一个或多个第一仲裁模块发送的指令数据请求时，选择一个指令数据请求发送给第二低速指令存储器处理，将第二低速指令存储器取指后得到指令数据返回给相应的第一仲裁模块。The third arbitration module is configured to select an instruction data request and send it to the second low-speed instruction memory for processing when receiving one or more instruction data requests sent by the first arbitration module, and fetch instructions from the second low-speed instruction memory The obtained command data is returned to the corresponding first arbitration module.

进一步地，所述缓存还用于在接收到第一仲裁模块返回的指令数据后，更新缓存内容和标签。Further, the cache is also used to update cache contents and tags after receiving the instruction data returned by the first arbitration module.

进一步地，每个微引擎大组包括32个微引擎，该32个微引擎包括4个微引擎小组，每个微引擎小组包括8个微引擎。Further, each large microengine group includes 32 microengines, the 32 microengines include 4 microengine groups, and each microengine group includes 8 microengines.

为解决上述技术问题，本发明还提供了一种指令存储装置的指令存储方法，所述指令存储装置为如前所述的指令存储装置，所述方法包括：In order to solve the above technical problems, the present invention also provides an instruction storage method of an instruction storage device, the instruction storage device is the above-mentioned instruction storage device, and the method includes:

快速存储器（Qmem）在接收到微引擎发送的指令数据请求后，判断本Qmem是否有该指令数据，如果有，则将指令数据返回给微引擎，如果没有，则向缓存发送指令数据请求；After receiving the instruction data request sent by the micro-engine, the fast memory (Qmem) judges whether the Qmem has the instruction data, and if so, returns the instruction data to the micro-engine, and if not, sends the instruction data request to the cache;

所述缓存中的一Cache Line在接收到Qmem发送的指令数据请求后，判断本缓存是否有该指令数据，如果有，则将指令数据通过Qmem返回给微引擎，如果没有，则向第一低速指令存储器或第二低速指令存储器发送指令数据请求；A Cache Line in the cache memory, after receiving the command data request sent by Qmem, judges whether this cache has the command data, if there is, then the command data is returned to the microengine by Qmem, if not, then to the first low-speed The instruction memory or the second low-speed instruction memory sends an instruction data request;

所述第一低速指令存储器在接收到缓存发送的指令数据请求后，查找指令数据，向缓存返回查找到的指令数据；After receiving the instruction data request sent by the cache, the first low-speed instruction memory searches for the instruction data, and returns the found instruction data to the cache;

所述第二低速指令存储器在接收到缓存发送的指令数据请求后，查找指令数据，向缓存返回查找到的指令数据。After the second low-speed instruction memory receives the instruction data request sent by the cache, it searches for the instruction data, and returns the found instruction data to the cache.

进一步地，所述方法还包括：Further, the method also includes:

所述缓存中的一Cache Line在判断本缓存没有该指令数据时，将指令数据请求发送给第一仲裁模块，该第一仲裁模块判断所请求的指令如果位于第一低速指令存储器，则向第一低速指令存储器发送指令数据请求，所请求的指令如果位于第二低速指令存储器，向第二低速指令存储器请求指令数据。When a Cache Line in the cache judges that the cache does not have the instruction data, it sends the instruction data request to the first arbitration module, and if the first arbitration module judges that the requested instruction is located in the first low-speed instruction memory, it sends the instruction data to the first low-speed instruction memory. A low-speed instruction memory sends an instruction data request, and if the requested instruction is located in the second low-speed instruction memory, request instruction data from the second low-speed instruction memory.

进一步地，所述方法还包括：Further, the method also includes:

所述第一仲裁模块判断所请求的指令如果位于第一低速指令存储器，则向第二仲裁模块发送指令数据请求，所述第二仲裁模块收到一个或多个第一仲裁模块发送的指令数据请求时，选择一个指令数据请求发送给第一低速指令存储器；If the first arbitration module judges that the requested instruction is located in the first low-speed instruction memory, it sends an instruction data request to the second arbitration module, and the second arbitration module receives the instruction data sent by one or more first arbitration modules When requesting, select an instruction data request and send it to the first low-speed instruction memory;

所述第一仲裁模块判断所请求的指令如果位于第二低速指令存储器，则向第三仲裁模块发送指令数据请求，所述第三仲裁模块收到一个或多个第一仲裁模块发送的指令数据请求时，选择一个指令数据请求发送给第二低速指令存储器。If the first arbitration module judges that the requested instruction is located in the second low-speed instruction memory, it sends an instruction data request to the third arbitration module, and the third arbitration module receives the instruction data sent by one or more first arbitration modules When requesting, select an instruction data request and send it to the second low-speed instruction memory.

本发明实施例所提供的适用于多核网络处理器的基于快速存储器和缓存的指令存储方案，将快速存储器、小容量并采用乒乓操作的缓存以及低速DRAM存储器结合在一起，其中存储器采用层次化的分组策略。采用该种指令存储方案有效地保证一部分指令的高取指效率和较高的平均取指效率，而且节省了大量的硬件存储资源，同时编译器的实现也十分简单。The fast memory and cache-based instruction storage solution suitable for multi-core network processors provided by the embodiment of the present invention combines fast memory, small-capacity cache with ping-pong operation, and low-speed DRAM memory, wherein the memory adopts hierarchical grouping strategy. Adopting this kind of instruction storage scheme can effectively guarantee the high fetching efficiency and high average fetching efficiency of some instructions, and save a lot of hardware storage resources, and the implementation of the compiler is also very simple.

附图说明Description of drawings

图1是传统的两级Cache的结构示意图；Fig. 1 is a schematic structural diagram of a traditional two-level Cache;

图2是轮询方式的指令存储方案的结构示意图；Fig. 2 is a schematic structural diagram of an instruction storage scheme in a polling mode;

图3是实施例1一种指令存储装置的结构示意图；3 is a schematic structural diagram of an instruction storage device in Embodiment 1;

图4是一种具体的指令存储装置结构示意图；FIG. 4 is a schematic structural diagram of a specific instruction storage device;

图5是报文存储器和icache乒乓操作的示意图；Fig. 5 is the schematic diagram of message memory and icache ping-pong operation;

图6是指令存储装置处理流程图；Fig. 6 is a flow chart of instruction storage device processing;

图7一种指令存储装置详细处理流程图；Fig. 7 is a detailed processing flowchart of an instruction storage device;

图8是本发明中Cache模块中一个Cache Line工作的过程图。Fig. 8 is a working process diagram of a Cache Line in the Cache module in the present invention.

具体实施方式detailed description

本发明考虑将快速存储器(Quick Memory，简称Qmem)与小容量并采用乒乓操作的缓存(Cache)，以及低速RAM存储器（如低速指令存储器（instruction memory，简称IMEM））结合起来作为微引擎的缓存。The present invention considers combining a fast memory (Quick Memory, Qmem for short) with a small-capacity cache (Cache) using ping-pong operations, and a low-speed RAM memory (such as a low-speed instruction memory (IMEM for short)) as a micro-engine cache .

为使本发明的目的、技术方案和优点更加清楚明白，下文中将结合附图对本发明的实施例进行详细说明。需要说明的是，在不冲突的情况下，本申请中的实施例及实施例中的特征可以相互任意组合。In order to make the purpose, technical solution and advantages of the present invention more clear, the embodiments of the present invention will be described in detail below in conjunction with the accompanying drawings. It should be noted that, in the case of no conflict, the embodiments in the present application and the features in the embodiments can be combined arbitrarily with each other.

实施例1Example 1

本实施例的指令存储器如图3所示，采用以下结构：The instruction memory of this embodiment is shown in Figure 3, adopts the following structure:

一大组N个微引擎分为两个以上的小组，每个微引擎对应一个Qmem和一个Cache，每小组微引擎对应一个第一低速指令存储器（以下简称IMEM），该一大组N个微引擎对应一个第二低速指令存储器（以下简称IMEM_COM），如图3所示，Qmem设置为与微引擎连接，缓存与Qmem相连；微引擎小组中每个微引擎对应的缓存与第一低速指令存储器相连；微引擎大组中每个微引擎对应的缓存与第二低速指令存储器相连，其中：A large group of N microengines is divided into two or more groups, each microengine corresponds to a Qmem and a Cache, and each group of microengines corresponds to a first low-speed instruction memory (hereinafter referred to as IMEM). The engine corresponds to a second low-speed instruction memory (hereinafter referred to as IMEM_COM), as shown in Figure 3, Qmem is set to be connected to the micro-engine, and the cache is connected to Qmem; the cache corresponding to each micro-engine in the micro-engine group is connected to the first low-speed instruction memory connected; the cache corresponding to each microengine in the large group of microengines is connected to the second low-speed instruction memory, wherein:

该Qmem用于在接收到微引擎发送的指令数据请求后，判断本Qmem是否有该指令数据，如果有，则将指令数据返回给微引擎，如果没有，则向缓存发送指令数据请求。该Qmem优选存储对处理质量要求最高的一个地址段的指令，优选由读写速度快的SRAM实现。Qmem中的内容在报文处理过程中将再不会被更新，当微引擎需求这部分指令的时候，Qmem可以在一个时钟周期内返回微引擎所需指令数据，大大提高了取指效率；The Qmem is used to judge whether the Qmem has the instruction data after receiving the instruction data request sent by the microengine, and if so, return the instruction data to the microengine, and if not, send the instruction data request to the cache. The Qmem preferably stores instructions of an address segment that requires the highest processing quality, and is preferably implemented by an SRAM with a fast read and write speed. The content in Qmem will not be updated during the message processing process. When the micro-engine needs this part of the instruction, Qmem can return the instruction data required by the micro-engine within one clock cycle, which greatly improves the efficiency of instruction fetching;

该Cache具有两个Cache Line（无通用中文技术术语），每个Cache Line可以存放多条连续的指令，Cache Line用于在接收到Qmem发送的指令数据请求后，判断本缓存是否有该指令数据，如果有，则将指令数据通过Qmem返回给微引擎，如果没有，则向IMEM或IMEM_COM发送指令数据请求。两个Cache Line采用乒乓操作形式，且与报文存储器的乒乓操作同步；The Cache has two Cache Lines (no common Chinese technical terms), each Cache Line can store multiple consecutive instructions, and the Cache Line is used to determine whether the cache has the instruction data after receiving the instruction data request sent by Qmem , if there is, return the instruction data to the microengine through Qmem, if not, send the instruction data request to IMEM or IMEM_COM. The two Cache Lines adopt the form of ping-pong operation and are synchronized with the ping-pong operation of the message memory;

上述IMEM和IMEM_COM分别用于存储位于不同地址段的一片指令，基于指令数据请求查找指令数据并返回。The above-mentioned IMEM and IMEM_COM are respectively used to store a piece of instructions located in different address segments, and search for and return instruction data based on instruction data requests.

上述四个存储位置：Qmem、Cache、IMEM、IMEM_COM，访问速度也依次降低。采用层次化的存储器可以有效地利用指令执行的概率的不同，从而优化微引擎取到指令的效率。由于较多的采用了低速存储器，节约了硬件资源。The above four storage locations: Qmem, Cache, IMEM, IMEM_COM, the access speed also decreases in turn. The use of hierarchical memory can effectively utilize the difference in the probability of instruction execution, thereby optimizing the efficiency of the microengine to fetch instructions. Since more low-speed memories are used, hardware resources are saved.

优选地，该装置还包括第一仲裁模块（arbiter1）、第二仲裁模块（arbiter2）和第三仲裁模块（arbiter3）。每个微引擎对应一个arbiter1，该arbiter1与每个微引擎的缓存相连；每个微引擎小组对应一个arbiter2，该arbiter2的一端与微引擎小组中每个微引擎的arbiter1相连，另一端与IMEM相连；每个微引擎大组对应一个arbiter3，该arbiter3的一端与每个微引擎的arbiter1相连，另一端与IMEM_COM相连。Preferably, the device further includes a first arbitration module (arbiter1), a second arbitration module (arbiter2) and a third arbitration module (arbiter3). Each microengine corresponds to an arbiter1, and the arbiter1 is connected to the cache of each microengine; each microengine group corresponds to an arbiter2, and one end of the arbiter2 is connected to the arbiter1 of each microengine in the microengine group, and the other end is connected to IMEM ; Each microengine group corresponds to an arbiter3, one end of the arbiter3 is connected to the arbiter1 of each microengine, and the other end is connected to IMEM_COM.

该arbiter1用于在缓存请求指令数据时，判断所请求的指令位于IMEM还是位于IMEM_COM，向IMEM或IMEM_COM发送指令数据请求；以及用于接收IMEM或IMEM_COM返回的指令数据，将该指令数据返回给缓存；The arbiter1 is used to determine whether the requested instruction is located in IMEM or IMEM_COM when the cache requests instruction data, and sends an instruction data request to IMEM or IMEM_COM; and is used to receive the instruction data returned by IMEM or IMEM_COM, and return the instruction data to the cache ;

该arbiter2用于在接收到一个或多个arbiter1发送的指令数据请求时，选择一个指令数据请求发送给IMEM处理，将IMEM取指后得到指令数据返回给相应的arbiter1；The arbiter2 is used to select an instruction data request and send it to the IMEM for processing when receiving one or more instruction data requests sent by the arbiter1, and return the instruction data obtained after the IMEM fetches the instruction to the corresponding arbiter1;

该arbiter3用于在接收到一个或多个arbiter1发送的指令数据请求时，选择一个指令数据请求发送给IMEM_COM处理，将IMEM_COM取指后得到指令数据返回给相应的arbiter1。The arbiter3 is used to select an instruction data request and send it to IMEM_COM for processing when receiving one or more instruction data requests sent by arbiter1, and return the instruction data obtained after fetching the instruction from IMEM_COM to the corresponding arbiter1.

以N=32为例，可将每组32个微引擎分成4个小组，每小组8个微引擎。如图4所示，每个微引擎对应一个Qmem和一个Cache（包括两个指令缓存（icache）），每小组8个微引擎共享一个IMEM，每组32个微引擎共享一个IMEM_COM。图4中A1表示arbiter1，A2表示arbiter2，A3表示arbiter3。如图5所示，两个icache与ME中的两个报文存储器一一对应，它们轮流工作以掩盖报文存储和取指的延时。Taking N=32 as an example, each group of 32 microengines can be divided into 4 groups, and each group has 8 microengines. As shown in Figure 4, each microengine corresponds to a Qmem and a Cache (including two instruction caches (icache)), each group of 8 microengines shares an IMEM, and each group of 32 microengines shares an IMEM_COM. In Figure 4, A1 represents arbiter1, A2 represents arbiter2, and A3 represents arbiter3. As shown in Figure 5, the two icaches correspond to the two message memories in the ME one by one, and they work in turn to cover the delay of message storage and instruction fetching.

实施例2Example 2

对应图3所示的指令存储装置，相应的指令存储方法如图6所示，包括：Corresponding to the instruction storage device shown in Figure 3, the corresponding instruction storage method is shown in Figure 6, including:

步骤1，Qmem在接收到微引擎发送的指令数据请求后，判断本Qmem是否有该指令数据，如果有，则将指令数据返回给微引擎，如果没有，则向缓存发送指令数据请求；Step 1, after receiving the instruction data request sent by the microengine, Qmem judges whether the Qmem has the instruction data, if so, returns the instruction data to the microengine, and if not, sends the instruction data request to the cache;

步骤2，缓存中的一Cache Line在接收到Qmem发送的指令数据请求后，判断本缓存是否有该指令数据，如果有，则将指令数据通过Qmem返回给微引擎，如果没有，则向IMEM或IMEM_COM发送指令数据请求；Step 2, after a Cache Line in the cache receives the command data request sent by Qmem, it judges whether the cache has the command data, if so, returns the command data to the microengine through Qmem, and if not, sends the command data to IMEM or IMEM_COM sends instruction data request;

步骤3，IMEM在接收到缓存发送的指令数据请求后，查找指令数据，向缓存返回查找到的指令数据；IMEM_COM在接收到缓存发送的指令数据请求后，查找指令数据，向缓存返回查找到的指令数据。Step 3: After receiving the instruction data request sent by the cache, IMEM searches for the instruction data and returns the found instruction data to the cache; after receiving the instruction data request sent by the cache, IMEM_COM searches for the instruction data and returns the found instruction data to the cache command data.

具体地，对任一个微引擎，取指令过程如图7所示，包括以下步骤：Specifically, for any microengine, the process of fetching instructions is shown in Figure 7, including the following steps:

步骤110，微引擎将需求的指令地址和地址使能发送至该微引擎的Qmem；Step 110, the microengine sends the required instruction address and address enable to the Qmem of the microengine;

具体地，微引擎中的报文存储器收到报文时将报文中的指令首地址和地址使能发送给指令存储装置，即该微引擎对应的Qmem。Specifically, when the message memory in the microengine receives the message, it sends the instruction header address and address enable in the message to the instruction storage device, that is, the Qmem corresponding to the microengine.

步骤120，Qmem判断该指令地址是否在其所存指令的地址范围内，如果在，则执行步骤130，否则执行步骤140；Step 120, Qmem judges whether the instruction address is within the address range of its stored instruction, if so, then execute step 130, otherwise execute step 140;

步骤130，用该指令地址和该地址使能取指令数据返回给微引擎，本次取指过程结束；Step 130, use the instruction address and the address to enable fetching instruction data to return to the microengine, and this instruction fetching process ends;

步骤140，将该指令地址和地址使能传送给该微引擎的Cache；Step 140, transmit the instruction address and address enable to the Cache of the microengine;

步骤150，Cache判断该指令地址是否在其所存指令的地址范围内，如果是，执行步骤160，否则执行步骤170；Step 150, Cache judges whether the instruction address is within the address range of its stored instruction, if yes, execute step 160, otherwise execute step 170;

由于Cache的每部分只有一个Cache Line，因此Cache的标签（Tag）也只有一个标签的信息，当地址请求被送到Cache时，根据tag马上就可以判断出所需数据是否在Cache中，即将该指令地址相应位与当前工作的CacheLine对应的tag进行对比，如果相同，说明该指令在Cache中，否则说明该指令不在Cache中。Since each part of the Cache has only one Cache Line, the tag (Tag) of the Cache has only one tag information. When the address request is sent to the Cache, it can be judged immediately according to the tag whether the required data is in the Cache, that is, the The corresponding bit of the instruction address is compared with the tag corresponding to the currently working CacheLine. If they are the same, it means that the instruction is in the Cache; otherwise, it means that the instruction is not in the Cache.

步骤160，基于地址使能将Cache Line中对应位置的指令数据取出通过Qmem送给微引擎，本次取指过程结束；Step 160, based on the address enabling, the instruction data corresponding to the position in the Cache Line is taken out and sent to the microengine through Qmem, and the fetching process ends;

步骤170，Cache将该指令地址和地址使能送至第一仲裁模块（arbiter1）；Step 170, Cache sends the instruction address and address enable to the first arbitration module (arbiter1);

步骤180，arbiter1判断该指令地址是在该微引擎所在小组对应的IMEM中，还是在该微引擎所在微引擎组对应的IMEM_COM中，如果在IMEM中，则执行步骤190，如果在IMEM_COM中，则执行步骤210；Step 180, arbiter1 judges whether the instruction address is in the IMEM corresponding to the group where the microengine is located, or in the IMEM_COM corresponding to the microengine group where the microengine is located, if it is in IMEM, then execute step 190, if it is in IMEM_COM, then Execute step 210;

具体地arbiter1根据指令地址判断该指令是在IMEM中还是在IMEM_COM中；Specifically, arbiter1 judges whether the instruction is in IMEM or in IMEM_COM according to the instruction address;

步骤190，arbiter1将指令地址和地址使能发送至第二仲裁模块（arbiter2）；Step 190, arbiter1 sends the instruction address and address enable to the second arbitration module (arbiter2);

步骤200，arbiter2选择一个指令请求发送至IMEM，IMEM根据请求中的指令地址和地址使能取指令数据，通过arbiter1返回给Cache，执行步骤230；Step 200, arbiter2 selects an instruction request and sends it to IMEM, and IMEM enables fetching instruction data according to the instruction address and address in the request, returns it to Cache through arbiter1, and executes step 230;

当有多个微引擎对应的arbiter1均向arbiter2发起取指请求时，arbiter2通过轮询的方式处理各cache的请求，选择一取指请求送IMEM处理，由于数据返回需要多个时钟周期，已经发出请求的支路将不会再被轮询到；When arbiter1 corresponding to multiple micro-engines sends an instruction fetch request to arbiter2, arbiter2 processes the requests of each cache through polling, selects an instruction fetch request and sends it to IMEM for processing. Since the data return requires multiple clock cycles, it has already sent The requested branch will no longer be polled;

步骤210，arbiter1将指令地址和地址使能发送至第三仲裁模块（arbiter3）；Step 210, arbiter1 sends the instruction address and address enable to the third arbitration module (arbiter3);

步骤220，arbiter3选择一个指令请求发送至IMEM_COM，IMEM_COM根据请求中的指令地址和地址使能取指令数据，通过arbiter1返回给Cache，执行步骤230；Step 220, arbiter3 selects an instruction request and sends it to IMEM_COM, and IMEM_COM enables fetching instruction data according to the instruction address and address in the request, returns it to Cache through arbiter1, and executes step 230;

每个微引擎对应的arbiter的功能同arbiter1，arbiter3的功能同arbiter2。The function of the arbiter corresponding to each microengine is the same as that of arbiter1, and the function of arbiter3 is the same as that of arbiter2.

步骤230，更新Cache Line和tag的内容，并将该指令数据通过Qmem返回给微引擎，本次取指过程结束。Step 230, updating the contents of Cache Line and tag, and returning the instruction data to the microengine through Qmem, and this fetching process ends.

图8为图5中icache的结构示意图，icache收到Qmem送来的指令地址后，与tag进行比较，判断是否命中，如果命中，则在译码后，根据地址使能从icache的物理存储位置取指令内容，通过多路选择器输出，如果未命中，则继续去低速指令存储器取指令数据，返回的指令数据经多路选择器输出。Figure 8 is a schematic diagram of the structure of the icache in Figure 5. After the icache receives the instruction address sent by Qmem, it compares it with the tag to determine whether it is a hit. If it is a hit, after decoding, the physical storage location of the slave icache is enabled according to the address. Fetch the instruction content and output it through the multiplexer. If it misses, it will continue to fetch the instruction data from the low-speed instruction memory, and the returned instruction data will be output through the multiplexer.

处理同一个报文时只使用其中一个Cache Line来工作。在当前报文所使用的Cache Line1在Cache中找到相应的指令数据，未向下级低速存储器（IMEM或IMEM_COM）发出读请求时，此时，如果Cache Line2检测到有下个报文中首地址的请求，则用下个报文中所包含的指令首地址向下级低速存储器发出读请求，以获得下一个报文的所需的指令数据。当前Cache Line1报文处理完后，Cache切换到另一半的Cache Line 2以准备处理下一个报文。这样采用乒乓操作来处理报文可以有效地掩盖报文存储的时间和去低速的指令存储器中取指的延时，在微引擎切换到下个报文时马上就可以取到所需要的指令，提高了取指效率，从而使得微引擎的处理效率提高。Only one of the Cache Lines is used to work when processing the same packet. When the Cache Line1 used by the current message finds the corresponding instruction data in the Cache and does not issue a read request to the lower-level low-speed memory (IMEM or IMEM_COM), at this time, if Cache Line2 detects that there is an address of the first address in the next message request, use the instruction header address contained in the next message to send a read request to the lower-level low-speed memory, so as to obtain the required instruction data of the next message. After the current Cache Line 1 packet is processed, the Cache switches to the other half of the Cache Line 2 to prepare for processing the next packet. In this way, the use of ping-pong operations to process messages can effectively cover up the time of message storage and the delay in fetching instructions from the low-speed instruction memory, and the required instructions can be fetched immediately when the microengine switches to the next message. The instruction fetching efficiency is improved, so that the processing efficiency of the micro-engine is improved.

本领域普通技术人员可以理解上述方法中的全部或部分步骤可通过程序来指令相关硬件完成，所述程序可以存储于计算机可读存储介质中，如只读存储器、磁盘或光盘等。可选地，上述实施例的全部或部分步骤也可以使用一个或多个集成电路来实现。相应地，上述实施例中的各模块/单元可以采用硬件的形式实现，也可以采用软件功能模块的形式实现。本发明不限制于任何特定形式的硬件和软件的结合。Those skilled in the art can understand that all or part of the steps in the above method can be completed by instructing relevant hardware through a program, and the program can be stored in a computer-readable storage medium, such as a read-only memory, a magnetic disk or an optical disk, and the like. Optionally, all or part of the steps in the foregoing embodiments may also be implemented using one or more integrated circuits. Correspondingly, each module/unit in the foregoing embodiments may be implemented in the form of hardware, or may be implemented in the form of software function modules. The present invention is not limited to any specific combination of hardware and software.

当然，本发明还可有其他多种实施例，在不背离本发明精神及其实质的情况下，熟悉本领域的技术人员当可根据本发明作出各种相应的改变和变形，但这些相应的改变和变形都应属于本发明所附的权利要求的保护范围。Of course, the present invention can also have other various embodiments, and those skilled in the art can make various corresponding changes and deformations according to the present invention without departing from the spirit and essence of the present invention, but these corresponding Changes and deformations should belong to the scope of protection of the appended claims of the present invention.

Claims

1. a kind of network processing unit instruction storage device, network processing unit includes big group of more than two micro engines, each micro- to draw Holding up big group includes N number of micro engine, and N number of micro engine is divided into more than two micro engine groups, the instruction storage device bag Include：Fast storage Qmem, caching cache, the first low speed instruction memory and the second low speed instruction memory, wherein：

Each one Qmem and caching of micro engine correspondence, Qmem is set to be connected with micro engine, and caching is connected with Qmem；

The corresponding caching of each micro engine in one the first low speed instruction memory of each micro engine group correspondence, micro engine group It is connected with the first low speed instruction memory；

The corresponding caching of each micro engine in one the second low speed instruction memory of each big group of correspondence of micro engine, big group of micro engine It is connected with the second low speed instruction memory.

2. device as claimed in claim 1, it is characterised in that：

The Qmem is used for after the director data request of micro engine transmission is received, and judges whether this Qmem has the instruction number According to if so, director data then is returned into micro engine, if it is not, sending director data request to caching.

3. device as claimed in claim 1 or 2, it is characterised in that：

The instruction to handling one address field of quality requirement highest is stored in the Qmem.

4. device as claimed in claim 1, it is characterised in that：

The caching includes two Cache Line, each Cache Line and deposits a plurality of continuous instruction, the Cache Line is used for after the director data request of Qmem transmissions is received, and judges whether this caching has the director data, if so, then Director data is returned into micro engine by Qmem, if it is not, to the first low speed instruction memory or the second low speed instruction Memory sends director data request.

5. device as claimed in claim 4, it is characterised in that：

Described two Cache Line use ping-pong operation form, and synchronous with the ping-pong operation of packet storage device.

6. the device as described in claim 1 or 2 or 4 or 5, it is characterised in that：

Described device also includes the first arbitration modules, the second arbitration modules and the 3rd arbitration modules, wherein：

One the first arbitration modules of each micro engine correspondence, first arbitration modules are connected with the caching of each micro engine；

It is each in one the second arbitration modules of each micro engine group correspondence, one end of second arbitration modules and micro engine group First arbitration modules of micro engine are connected, and the other end is connected with the first low speed instruction memory；

One the 3rd arbitration modules of each big group of correspondence of micro engine, one end of the 3rd arbitration modules and the first of each micro engine Arbitration modules are connected, and the other end is connected with the second low speed instruction memory.

7. device as claimed in claim 6, it is characterised in that：

First arbitration modules, in cache request director data, judging that asked instruction refers to positioned at the first low speed Make memory still be located at the second low speed instruction memory, sent out to the first low speed instruction memory or the second low speed instruction memory Director data is sent to ask；And for receiving the instruction number that the first low speed instruction memory or the second low speed instruction memory are returned According to the director data is returned into caching；

Second arbitration modules, during for being asked in the director data for receiving one or more first arbitration modules transmissions, One director data request of selection is sent to the processing of the first low speed instruction memory, will be obtained after the first low speed instruction memory fetching Corresponding first arbitration modules are returned to director data；

3rd arbitration modules, during for being asked in the director data for receiving one or more first arbitration modules transmissions, One director data request of selection is sent to the processing of the second low speed instruction memory, will be obtained after the second low speed instruction memory fetching Corresponding first arbitration modules are returned to director data.

8. device as claimed in claim 7, it is characterised in that：

The caching is additionally operable to after the director data of the first arbitration modules return is received, and updates cache contents and label.

9. the device as described in claim 1,2,4,5,7 or 8, it is characterised in that：

Each big group of micro engine includes 32 micro engines, and 32 micro engines are divided into 4 micro engine groups, and each micro engine is small Group includes 8 micro engines.

10. a kind of instruction storage method of instruction storage device, the instruction storage device is instruction as claimed in claim 1 Storage device, methods described includes：

Fast storage Qmem judges whether this Qmem has the instruction number after the director data request of micro engine transmission is received According to if so, director data then is returned into micro engine, if it is not, sending director data request to caching；

A Cache Line in the caching judge whether this caching has after the director data request of Qmem transmissions is received The director data, if so, director data then is returned into micro engine by Qmem, if it is not, to the first low speed instruction Memory or the second low speed instruction memory send director data request；

The first low speed instruction memory is after the director data request that caching is sent is received, look-up command data, Xiang Huan Deposit the director data for returning and finding；

The second low speed instruction memory is after the director data request that caching is sent is received, look-up command data, Xiang Huan Deposit the director data for returning and finding.

11. method as claimed in claim 10, it is characterised in that：

Methods described also includes：

Director data request is sent to by the Cache Line in the caching when judging this caching without the director data First arbitration modules, first arbitration modules judge asked instruction if located in the first low speed instruction memory, then to the One low speed instruction memory sends director data request, and the instruction asked is if located in the second low speed instruction memory, to the Two low speed instruction memory requests director datas.

12. method as claimed in claim 11, it is characterised in that：

Methods described also includes：

First arbitration modules judge asked instruction if located in the first low speed instruction memory, then to the second arbitration mould Block sends director data request, and the director data that second arbitration modules receive one or more first arbitration modules transmissions please When asking, one director data request of selection is sent to the first low speed instruction memory；

First arbitration modules judge asked instruction if located in the second low speed instruction memory, then to the 3rd arbitration mould Block sends director data request, and the director data that the 3rd arbitration modules receive one or more first arbitration modules transmissions please When asking, one director data request of selection is sent to the second low speed instruction memory.