CN115878521B

CN115878521B - Command processing system, electronic device and electronic equipment

Info

Publication number: CN115878521B
Application number: CN202310060930.5A
Authority: CN
Inventors: 孙成昆; 唐志敏; 梁建胜
Original assignee: Beijing Xiangdixian Computing Technology Co Ltd
Current assignee: Beijing Xiangdixian Computing Technology Co Ltd
Priority date: 2023-01-17
Filing date: 2023-01-17
Publication date: 2023-07-21
Anticipated expiration: 2043-01-17
Also published as: WO2024152560A1; CN115878521A

Abstract

The disclosure provides a command processing system, an electronic device and electronic equipment, and aims to improve command processing efficiency. The command processing system comprises N intermediate processors and a GPU; the intermediate processor is configured to: reading a command from the buffer area, sending a reading instruction to the first target module, and executing the read command under the condition that the read command needs to be executed by the intermediate processor; if the value of N is 1, the first target module is GPU; if the value of N is greater than 1, the first target module corresponding to the last intermediate processor is GPU, and the first target modules corresponding to other intermediate processors are next intermediate processors; the GPU is configured to: after receiving the reading instruction, reading the command from the buffer, preprocessing the read command under the condition that the read command needs to be executed by the intermediate processor and then executed by the GPU, and executing the read command according to the preprocessing result after the intermediate processor finishes executing the command.

Description

Command processing system, electronic device and electronic equipment

技术领域technical field

本公开涉及计算机技术领域，尤其涉及一种命令处理系统、电子装置及电子设备。The present disclosure relates to the field of computer technology, in particular to a command processing system, electronic device and electronic equipment.

背景技术Background technique

基于PCIe（peripheral component interconnect express）协议所实现的图形处理器GPU，其内部的命令传输方式通常采用门铃Doorbell与缓冲区相结合的架构来实现。具体实现方案是，宿主机host首先将DMA(Direct Memory Access）命令写入应用中央处理器ACPU对应的缓冲区，并通过门铃机制通知ACPU，ACPU收到门铃中断后，从其对应的缓冲区中读取出DMA命令并执行。ACPU处理完DMA命令后通知宿主机，宿主机收到ACPU的通知后，将需要GPU处理的命令写入GPU对应的缓冲区，并通过门铃机制通知GPU，GPU收到门铃中断后，从其对应的缓冲区中读取出命令并执行。可以看出，宿主机向GPU发送命令前，需等待ACPU执行完DMA命令，导致命令的执行效率偏低。Based on the PCIe (peripheral component interconnect express) protocol, the internal command transmission method of the graphics processor GPU is usually implemented by combining the Doorbell and the buffer. The specific implementation plan is that the host host first writes the DMA (Direct Memory Access) command into the buffer corresponding to the application central processing unit ACPU, and notifies the ACPU through the doorbell mechanism. After the ACPU receives the doorbell interrupt, it reads the DMA command from its corresponding buffer and executes it. The ACPU notifies the host after processing the DMA command. After receiving the notification from the ACPU, the host writes the command to be processed by the GPU into the buffer corresponding to the GPU, and notifies the GPU through the doorbell mechanism. After the GPU receives the doorbell interrupt, it reads the command from its corresponding buffer and executes it. It can be seen that before the host sends a command to the GPU, it needs to wait for the ACPU to finish executing the DMA command, resulting in low command execution efficiency.

发明内容Contents of the invention

本公开的目的是提供一种命令处理系统、电子装置及电子设备，旨在提高命令处理效率。The purpose of the present disclosure is to provide a command processing system, an electronic device, and an electronic device, aiming at improving command processing efficiency.

根据本公开的一个方面，提供一种命令处理系统，包括N个中间处理器和GPU，在N的取值大于1的情况下，N个中间处理器之间具有顺序关系；According to one aspect of the present disclosure, a command processing system is provided, including N intermediate processors and GPUs, and when the value of N is greater than 1, there is a sequence relationship between the N intermediate processors;

中间处理器被配置为：在接收到读取指示后，从缓冲区读取命令，向第一目标模块发送读取指示，并在读取的命令需要中间处理器自身执行的情况下，执行读取的命令；其中，若N的取值为1，第一目标模块为GPU；若N的取值大于1，最后一个中间处理器对应的第一目标模块为GPU，其他中间处理器对应的第一目标模块为下一个中间处理器；The intermediate processor is configured to: after receiving the read instruction, read the command from the buffer, send the read instruction to the first target module, and execute the read command when the read command needs to be executed by the intermediate processor itself; wherein, if the value of N is 1, the first target module is a GPU; if the value of N is greater than 1, the first target module corresponding to the last intermediate processor is a GPU, and the first target modules corresponding to other intermediate processors are the next intermediate processor;

GPU被配置为：在接收到读取指示后，从缓冲区读取命令，在读取的命令需要中间处理器先执行且GPU后执行的情况下，对读取的命令进行预处理，并等待中间处理器执行完命令后，根据预处理结果执行读取的命令。The GPU is configured to: after receiving the read instruction, read the command from the buffer, in the case that the read command needs to be executed first by the intermediate processor and executed later by the GPU, preprocess the read command, wait for the intermediate processor to execute the command, and then execute the read command according to the preprocessing result.

本公开一种可行的实现方式中，缓冲区为循环缓冲区，循环缓冲区配置有宿主机对应的第一偏移字段、每个中间处理器对应的第二偏移字段及GPU对应的第三偏移字段，第一偏移字段用于表征宿主机最后写入循环缓冲区的命令在循环缓冲区中的位置，每个中间处理器对应的第二偏移字段用于表征相应中间处理器最后从循环缓冲区读取的命令在循环缓冲区中的位置，第三偏移字段用于表征GPU最后从循环缓冲区读取的命令在循环缓冲区中的位置；In a feasible implementation of the present disclosure, the buffer is a circular buffer, and the circular buffer is configured with a first offset field corresponding to the host machine, a second offset field corresponding to each intermediate processor, and a third offset field corresponding to the GPU. The first offset field is used to represent the position in the circular buffer of the command last written by the host machine into the circular buffer, the second offset field corresponding to each intermediate processor is used to represent the position in the circular buffer of the command last read by the corresponding intermediate processor from the circular buffer, and the third offset field is used to represent the position in the circular buffer of the command last read by the GPU from the circular buffer;

中间处理器在从缓冲区读取命令时，被配置为：根据第二目标模块的偏移字段，从循环缓冲区读取命令，并更新自身的第二偏移字段；其中，若N的取值为1，第二目标模块为宿主机；若N的取值大于1，第一个中间处理器对应的第二目标模块为宿主机，其余中间处理器对应的第二目标模块为前一个中间处理器；When the intermediate processor reads the command from the buffer, it is configured to: read the command from the circular buffer according to the offset field of the second target module, and update the second offset field of itself; wherein, if the value of N is 1, the second target module is the host computer; if the value of N is greater than 1, the second target module corresponding to the first intermediate processor is the host computer, and the second target modules corresponding to the remaining intermediate processors are the previous intermediate processor;

GPU在从缓冲区读取命令时，被配置为：根据向其发送读取指示的中间处理器的第二偏移字段，从循环缓冲区读取命令，并更新第三偏移字段。When the GPU reads the command from the buffer, it is configured to: read the command from the circular buffer and update the third offset field according to the second offset field of the intermediate processor sending the read instruction to it.

本公开一种可行的实现方式中，中间处理器在根据第二目标模块的偏移字段，从循环缓冲区读取命令，并更新自身的第二偏移字段时，被配置为：在自身的第二偏移字段不等于第二目标模块的偏移字段的情况下，从循环缓冲区逐个读取命令，每读取一个命令，更新一次自身的第二偏移字段，直至自身的第二偏移字段等于第二目标模块的偏移字段，停止读取命令。In a feasible implementation of the present disclosure, when the intermediate processor reads commands from the circular buffer according to the offset field of the second target module and updates its own second offset field, it is configured to read commands one by one from the circular buffer when its own second offset field is not equal to the offset field of the second target module, update its own second offset field once for each command read, and stop reading commands until its own second offset field is equal to the second target module's offset field.

本公开一种可行的实现方式中，中间处理器在向第一目标模块发送读取指示时，被配置为：在自身的第二偏移字段等于第二目标模块的偏移字段的情况下，向第一目标模块发送读取指示。In a feasible implementation manner of the present disclosure, when the intermediate processor sends the read instruction to the first target module, it is configured to: send the read instruction to the first target module when its second offset field is equal to the offset field of the second target module.

本公开一种可行的实现方式中，GPU在根据向其发送读取指示的中间处理器的第二偏移字段，从循环缓冲区读取命令，并更新第三偏移字段时，被配置为：在第三偏移字段不等于向GPU发送读取指示的中间处理器的第二偏移字段的情况下，从循环缓冲区逐个读取命令，每读取一个命令，更新一次第三偏移字段，直至第三偏移字段等于向GPU发送读取指示的中间处理器的第二偏移字段，停止读取命令。In a feasible implementation of the present disclosure, when the GPU reads a command from the circular buffer and updates the third offset field according to the second offset field of the intermediate processor that sends the read instruction to the GPU, it is configured to: when the third offset field is not equal to the second offset field of the intermediate processor that sends the read instruction to the GPU, read commands one by one from the circular buffer, update the third offset field once for each command read, and stop reading the command until the third offset field is equal to the second offset field of the intermediate processor that sends the read instruction to the GPU.

本公开一种可行的实现方式中，中间处理器还被配置为：在从缓冲区读取命令后，判断读取的命令是否需要对应的第一目标模块执行，并判断读取的命令是否需要自身先执行且对应的第一目标模块后执行，根据判断结果，在缓冲区该命令的对应位置填充第一标识和第二标识，第一标识用于表征读取的命令是否需要对应的第一目标模块执行，第二标识用于表征读取的命令是否需要自身先执行且对应的第一目标模块后执行；In a feasible implementation of the present disclosure, the intermediate processor is further configured to: after reading the command from the buffer, judge whether the read command needs to be executed by the corresponding first target module, and judge whether the read command needs to be executed first by itself and after the corresponding first target module, and according to the judgment result, fill the corresponding position of the command in the buffer with a first identifier and a second identifier. The first identifier is used to indicate whether the read command needs to be executed by the corresponding first target module, and the second identifier is used to indicate whether the read command needs to be executed first and executed after the corresponding first target module;

在N的取值大于1的情况下，中间处理器还被配置为：在循环缓冲区读取命令后，根据相应位置的第一标识，判断读取的命令是否需要自身执行，并根据相应位置的第二标识，判断读取的命令是否需要前一个中间处理器先执行且自身后执行；When the value of N is greater than 1, the intermediate processor is further configured to: after the circular buffer reads the command, judge whether the read command needs to be executed by itself according to the first identification of the corresponding position, and judge whether the read command needs to be executed first by the previous intermediate processor and then executed by itself according to the second identification of the corresponding position;

GPU还被配置为：在从循环缓冲区读取命令后，根据相应位置的第一标识，判断读取的命令是否需要GPU执行，并根据相应位置的第二标识，判断读取的命令是否需要向GPU发送读取指示的中间处理器先执行且GPU后执行。The GPU is also configured to: after reading the command from the circular buffer, judge whether the read command needs to be executed by the GPU according to the first identifier at the corresponding position, and judge whether the read command needs to be executed first by the intermediate processor that sends the read instruction to the GPU and then executed by the GPU according to the second identifier at the corresponding position.

本公开一种可行的实现方式中，最后一个中间处理器还被配置为：针对需要最后一个中间处理器先执行且GPU后执行的命令，在处理完该命令后，更新该命令的fence状态，并通知GPU。In a feasible implementation manner of the present disclosure, the last intermediate processor is further configured to: for a command that needs to be executed first by the last intermediate processor and executed later by the GPU, after processing the command, update the fence status of the command and notify the GPU.

本公开一种可行的实现方式中，系统还包括宿主机，宿主机被配置为：向缓冲区中写入一个或多个命令，并向N个中间处理器中的第一个中间处理器发送读取指示。In a feasible implementation manner of the present disclosure, the system further includes a host, and the host is configured to: write one or more commands into the buffer, and send a read instruction to the first intermediate processor among the N intermediate processors.

本公开一种可行的实现方式中，宿主机在向缓冲区中写入一个或多个命令时，被配置为：根据第三偏移字段，向循环缓冲区中写入一个或多个命令，并更新第一偏移字段。In a feasible implementation manner of the present disclosure, when writing one or more commands into the buffer, the host is configured to: write one or more commands into the circular buffer according to the third offset field, and update the first offset field.

本公开一种可行的实现方式中，宿主机在根据第三偏移字段，向循环缓冲区中写入一个或多个命令时，被配置为：根据第三偏移字段，向循环缓冲区中写入一批命令，直至写入位置比所述第三偏移字段对应的位置小1，或者直至一批命令被全部写完。In a feasible implementation of the present disclosure, when the host computer writes one or more commands into the circular buffer according to the third offset field, it is configured to: write a batch of commands into the circular buffer according to the third offset field until the written position is 1 smaller than the position corresponding to the third offset field, or until a batch of commands are all written.

本公开一种可行的实现方式中，宿主机还被配置为：针对需要中间处理器先执行且GPU后执行的命令，将中间处理器的执行部分打包至GPU的执行部分内，并将中间处理器的执行部分和GPU的执行部分整体作为一个命令写入所述循环缓冲区。In a feasible implementation of the present disclosure, the host computer is further configured to: for commands that need to be executed first by the intermediate processor and executed later by the GPU, pack the execution part of the intermediate processor into the execution part of the GPU, and write the execution part of the intermediate processor and the execution part of the GPU into the circular buffer as a single command.

根据本公开的另一方面，还提供一种电子装置，该电子装置包括上述任一实施例中所述的命令处理系统。在一些使用场景下，该电子装置的产品形式体现为显卡；在另一些使用场景下，该电子装置的产品形式体现为CPU主板。According to another aspect of the present disclosure, there is also provided an electronic device, which includes the command processing system described in any one of the above embodiments. In some usage scenarios, the product form of the electronic device is a graphics card; in other usage scenarios, the product form of the electronic device is a CPU motherboard.

根据本公开的另一方面，还提供一种电子设备，该电子设备包括上述的电子装置。在一些使用场景下，该电子设备的产品形式是便携式电子设备，例如智能手机、平板电脑、VR设备等；在一些使用场景下，该电子设备的产品形式是个人电脑、游戏主机等。According to another aspect of the present disclosure, there is also provided an electronic device, which includes the above-mentioned electronic device. In some usage scenarios, the product form of the electronic device is a portable electronic device, such as a smartphone, tablet computer, VR device, etc.; in some usage scenarios, the product form of the electronic device is a personal computer, a game console, etc.

附图说明Description of drawings

图1是本公开一实施例提出的命令处理系统的结构示意图；FIG. 1 is a schematic structural diagram of a command processing system proposed by an embodiment of the present disclosure;

图2是本公开一实施例提出的循环缓冲区的示意图；FIG. 2 is a schematic diagram of a circular buffer proposed by an embodiment of the present disclosure;

图3是本公开另一实施例提出的命令处理系统的结构示意图；FIG. 3 is a schematic structural diagram of a command processing system proposed by another embodiment of the present disclosure;

图4是本公开另一实施例提出的循环缓冲区的示意图；FIG. 4 is a schematic diagram of a circular buffer proposed by another embodiment of the present disclosure;

图5是本公开另一实施例提出的命令处理系统的结构示意图。Fig. 5 is a schematic structural diagram of a command processing system proposed by another embodiment of the present disclosure.

具体实施方式Detailed ways

在介绍本公开实施例之前，应当说明的是：本公开部分实施例被描述为处理流程，虽然流程的各个操作步骤可能被冠以顺序的步骤编号，但是其中的操作步骤可以被并行地、并发地或者同时实施。Before introducing the embodiments of the present disclosure, it should be noted that some of the embodiments of the present disclosure are described as a processing flow, and although each operation step of the flow may be labeled with a sequential step number, the operation steps therein may be implemented in parallel, concurrently, or simultaneously.

本公开实施例中可能使用了术语“第一”、“第二”等等来描述各个特征，但是这些特征不应当受这些术语限制。使用这些术语仅仅是为了将一个特征与另一个特征进行区分。The embodiments of the present disclosure may use the terms "first", "second" and so on to describe various features, but these features should not be limited by these terms. These terms are used only to distinguish one feature from another.

本公开实施例中可能使用了术语“和/或”，“和/或”包括其中一个或更多所列出的相关联特征的任意和所有组合。The term "and/or" may be used in the embodiments of the present disclosure, and "and/or" includes any and all combinations of one or more listed associated features.

应当理解的是，当描述两个部件的连接关系或通信关系时，除非明确指明两个部件之间直接连接或直接通信，否则，两个部件的连接或通信可以理解为直接连接或通信，也可以理解为通过中间部件间接连接或通信。It should be understood that when describing the connection relationship or communication relationship between two components, unless the two components are directly connected or communicated directly, otherwise, the connection or communication of the two components can be understood as a direct connection or communication, or as an indirect connection or communication through an intermediate component.

为了使本公开实施例中的技术方案及优点更加清楚明白，以下结合附图对本公开的示例性实施例进行进一步详细的说明，显然，所描述的实施例仅是本公开的一部分实施例，而不是所有实施例的穷举。需要说明的是，在不冲突的情况下，本公开中的实施例及实施例中的特征可以相互组合。In order to make the technical solutions and advantages of the embodiments of the present disclosure clearer, the exemplary embodiments of the present disclosure will be further described in detail below in conjunction with the accompanying drawings. Obviously, the described embodiments are only part of the embodiments of the present disclosure, rather than an exhaustive list of all embodiments. It should be noted that, in the case of no conflict, the embodiments in the present disclosure and the features in the embodiments can be combined with each other.

基于PCIe（peripheral component interconnect express）协议所实现的图形处理器GPU，其内部的命令传输方式通常采用门铃Doorbell与缓冲区相结合的架构来实现。宿主机向GPU发送命令前，需等待应用中央处理器ACPU执行完DMA命令后，才能向GPU发送命令，导致命令的执行效率偏低。Based on the PCIe (peripheral component interconnect express) protocol, the internal command transmission method of the graphics processor GPU is usually implemented by combining the Doorbell and the buffer. Before the host computer sends commands to the GPU, it needs to wait for the application central processing unit ACPU to execute the DMA command before sending commands to the GPU, resulting in low command execution efficiency.

为了提高命令的执行效率，本公开通过以下实施例提出至少一种命令处理系统、电子装置及电子设备。In order to improve command execution efficiency, the present disclosure proposes at least one command processing system, electronic device and electronic equipment through the following embodiments.

参考图1，图1是本公开一实施例提出的命令处理系统的结构示意图。如图1所示，该系统包括宿主机、N个中间处理器（图1中仅示出了中间处理器的数量是多个的情况）以及GPU。其中，如果中间处理器的数量是多个，则多个中间处理器之间具有顺序关系。示例性地，图1中中间处理器的数量是3个，3个中间处理器的顺序依次为：中间处理器A、中间处理器B、中间处理器C。Referring to FIG. 1 , FIG. 1 is a schematic structural diagram of a command processing system proposed by an embodiment of the present disclosure. As shown in FIG. 1 , the system includes a host computer, N intermediate processors (FIG. 1 only shows a case where there are multiple intermediate processors), and a GPU. Wherein, if there are multiple intermediate processors, there is a sequence relationship among the multiple intermediate processors. Exemplarily, the number of intermediate processors in FIG. 1 is three, and the sequence of the three intermediate processors is: intermediate processor A, intermediate processor B, and intermediate processor C.

其中，宿主机被配置为：向缓冲区中写入一个或多个命令，并向N个中间处理器中的第一个中间处理器发送读取指示。Wherein, the host computer is configured to: write one or more commands into the buffer, and send a read instruction to the first intermediate processor among the N intermediate processors.

中间处理器被配置为：在接收到读取指示后，从缓冲区读取命令，向第一目标模块发送读取指示，并在读取的命令需要中间处理器自身执行的情况下，执行读取的命令；其中，若N的取值为1，第一目标模块为GPU；若N的取值大于1，最后一个中间处理器对应的第一目标模块为GPU，其他中间处理器对应的第一目标模块为下一个中间处理器。The intermediate processor is configured to: after receiving the read instruction, read the command from the buffer, send the read instruction to the first target module, and execute the read command when the read command needs to be executed by the intermediate processor itself; wherein, if the value of N is 1, the first target module is a GPU; if the value of N is greater than 1, the first target module corresponding to the last intermediate processor is a GPU, and the first target module corresponding to other intermediate processors is the next intermediate processor.

本公开中，GPU和中间处理器（例如ACPU）均从缓冲区中读取命令，GPU不需要等待中间处理器执行完命令后才从缓冲区读取命令，并且GPU可以在ACPU执行命令的同时，对读取的命令进行预处理，因此本公开可以有效提升命令执行效率。In the present disclosure, both the GPU and the intermediate processor (such as the ACPU) read commands from the buffer, and the GPU does not need to wait for the intermediate processor to execute the command before reading the command from the buffer, and the GPU can preprocess the read command while the ACPU executes the command, so the present disclosure can effectively improve command execution efficiency.

在一些具体实施方式中，中间处理器的数量是一个。宿主机向缓冲区中写入一批命令后（这一批命令可能是一个，也可能是多个），通过门铃中断的方式向中间处理器中发送读取指示。宿主机写入缓冲区的命令可能是需要中间处理器先处理且GPU后执行的命令，也可能是只需要中间处理器执行的命令，还可能是只需要GPU执行的命令。对于需要中间处理器先处理且GPU后执行的这类命令，宿主机在将这类命令写入缓冲区之前，需要提前封装这类命令。宿主机封装这类命令的方法是：将需要中间处理器执行的部分打包至需要GPU执行的部分之内，从而将中间处理器的执行部分和GPU的执行部分封装成一个命令。In some specific implementations, the number of intermediate processors is one. After the host writes a batch of commands into the buffer (this batch of commands may be one or multiple), it sends a read instruction to the intermediate processor through the doorbell interrupt. The command written by the host to the buffer may be a command that needs to be processed by the intermediate processor first and then executed by the GPU, or a command that only needs to be executed by the intermediate processor, or a command that only needs to be executed by the GPU. For such commands that need to be processed by the intermediate processor first and then executed by the GPU, the host needs to encapsulate such commands in advance before writing such commands into the buffer. The method for the host to encapsulate such commands is to package the part that needs to be executed by the intermediate processor into the part that needs to be executed by the GPU, so that the execution part of the intermediate processor and the execution part of the GPU are packaged into one command.

中间处理器收到门铃中断后，从缓冲区读取命令，然后也通过门铃中断的方式向GPU发送读取指示。此外，如果中间处理器读取的命令需要中间处理器自身执行，则中间处理器会执行读取的命令。GPU在接收到门铃中断后，从缓冲区读取命令。如果GPU读取的命令不需要中间处理器执行，而仅需要GPU执行，则GPU在读取出命令后可以直接执行该命令。如果GPU读取的命令需要中间处理器先执行而GPU后执行，则GPU对读取的命令执行预处理。当中间处理器针对需要自身先执行且GPU后执行的命令，在处理完该命令后，更新该命令的fence状态，并通知GPU。GPU收到通知后，再根据预处理结果执行读取的命令。After the intermediate processor receives the doorbell interrupt, it reads the command from the buffer, and then sends a read instruction to the GPU through the doorbell interrupt. In addition, if the command read by the midprocessor needs to be executed by the midprocessor itself, the midprocessor executes the read command. After the GPU receives the doorbell interrupt, it reads the command from the buffer. If the command read by the GPU does not need to be executed by the intermediate processor but only needs to be executed by the GPU, the GPU can directly execute the command after reading the command. If the commands read by the GPU need to be executed first by the intermediate processor and executed later by the GPU, the GPU performs preprocessing on the read commands. When the intermediate processor processes a command that needs to be executed first by itself and executed later by the GPU, it updates the fence status of the command and notifies the GPU. After the GPU receives the notification, it executes the read command according to the preprocessing result.

其中，预处理过程可以是提前对命令包头进行解析，并将命令包头中的信息分配给GPU内部的处理单元，例如片元处理单元、几何处理单元、通用计算单元等，使得处理单元能提前获取到处理命令时所需的数据。Among them, the preprocessing process may be to analyze the command header in advance, and distribute the information in the command header to the processing unit inside the GPU, such as a fragment processing unit, a geometry processing unit, a general computing unit, etc., so that the processing unit can obtain the data required for processing the command in advance.

其中，中间处理器在读取出命令后，可以对命令进行解析。通过解析命令，如果解析结果是中间处理器的执行部分包含在GPU的执行部分之内，则确定该命令需要中间处理器先执行而GPU后执行。如果解析结果是命令仅具有GPU的执行部分，则确定该命令仅需要GPU执行。如果解析结果是命令仅具有中间处理器的执行部分，则确定该命令仅需要中间处理器执行。Wherein, after the intermediate processor reads out the command, it can analyze the command. By parsing the command, if the parsing result is that the execution part of the intermediate processor is included in the execution part of the GPU, it is determined that the command needs to be executed first by the intermediate processor and executed later by the GPU. If the result of the parsing is that the command only has an execution part of the GPU, it is determined that the command only requires GPU execution. If the result of the parsing is that the command has only the execution part of the intermediate processor, it is determined that the command only requires the execution of the intermediate processor.

在另一些具体实施方式中，中间处理器的数量是多个，多个中间处理器之间具有顺序关系。宿主机向缓冲区中写入一批命令后，通过门铃中断的方式向第一个中间处理器中发送读取指示。宿主机写入缓冲区的命令可能是需要部分或全部中间处理器先处理且GPU后执行的命令，也可能是只需要部分或全部中间处理器执行的命令，还可能是只需要GPU执行的命令。对于需要部分或全部中间处理器先处理且GPU后执行的这类命令，宿主机在将这类命令写入缓冲区之前，需要提前封装这类命令。宿主机封装这类命令的方法是：按照命令的执行的顺序，从外到内打包每个处理器的执行部分。为了便于理解，示例性地，以图1所示的命令处理系统为例，假设一个命令需要先由中间处理器A执行，再由中间处理器C执行，最后由GPU执行。宿主机在封装命令时，将中间处理器C的执行部分打包至GPU的执行部分内，然后再将中间处理器A的执行部分打包至中间处理器C的执行部分内，从而将若干中间处理器的执行部分和GPU的执行部分封装成一个命令。In other specific implementation manners, there are multiple intermediate processors, and there is a sequence relationship among the multiple intermediate processors. After the host writes a batch of commands into the buffer, it sends a read instruction to the first intermediate processor through the doorbell interrupt. The command written by the host to the buffer may be a command that needs to be processed by some or all of the intermediate processors first and then executed by the GPU, or it may be a command that only needs to be executed by some or all of the intermediate processors, or it may be a command that only needs to be executed by the GPU. For such commands that need to be processed by some or all of the intermediate processors first and then executed by the GPU, the host needs to encapsulate such commands in advance before writing such commands into the buffer. The method for the host to encapsulate such commands is to package the execution part of each processor from the outside to the inside according to the execution order of the commands. For ease of understanding, taking the command processing system shown in FIG. 1 as an example, it is assumed that a command needs to be executed by an intermediate processor A first, then by an intermediate processor C, and finally by a GPU. When encapsulating commands, the host machine packs the execution part of the intermediate processor C into the execution part of the GPU, and then packages the execution part of the intermediate processor A into the execution part of the intermediate processor C, thereby packaging the execution parts of several intermediate processors and the execution part of the GPU into one command.

第一个中间处理器收到门铃中断后，从缓冲区读取命令，然后也通过门铃中断的方式向第二个中间处理器发送读取指示。第二个中间处理器在接收到门铃中断后，也会从缓冲区读取命令。如果命令处理系统还包括第三个中间处理器，则第二个中间处理器也通过门铃中断的方式向第三个中间处理器发送读取指示。如果命令处理系统不包括第三个中间处理器，则第二个中间处理器通过门铃中断的方式向GPU发送读取指示。简言之，每个中间处理器按照中间处理器的顺序，依次向下一个中间处理器发送读取指示。此外，每个中间处理器读取的命令需要该中间处理器自身执行的情况下，该中间处理器会执行读取的命令。After the first intermediate processor receives the doorbell interrupt, it reads the command from the buffer, and then sends a read instruction to the second intermediate processor through the doorbell interrupt. The second intermediate processor also reads the command from the buffer after receiving the doorbell interrupt. If the command processing system further includes a third intermediate processor, the second intermediate processor also sends a read instruction to the third intermediate processor through a doorbell interrupt. If the command processing system does not include a third intermediate processor, the second intermediate processor sends a read instruction to the GPU through a doorbell interrupt. In short, each intermediate processor sends read instructions to the next intermediate processor in sequence according to the sequence of the intermediate processors. In addition, if the command read by each intermediate processor needs to be executed by the intermediate processor itself, the intermediate processor will execute the read command.

GPU在接收到门铃中断后，从缓冲区读取命令。如果GPU读取的命令不需要中间处理器执行，而仅需要GPU执行，则GPU在读取出命令后可以直接执行该命令。如果GPU读取的命令需要部分或全部中间处理器先执行而GPU后执行，则GPU对读取的命令执行预处理。当需要执行该命令的最后一个中间处理器处理完该命令后，更新该命令的fence状态，并通知GPU。GPU收到通知后，再根据预处理结果执行读取的命令。After the GPU receives the doorbell interrupt, it reads the command from the buffer. If the command read by the GPU does not need to be executed by the intermediate processor but only needs to be executed by the GPU, the GPU can directly execute the command after reading the command. If the commands read by the GPU require some or all of the intermediate processors to be executed first and then executed by the GPU, the GPU performs preprocessing on the read commands. After the last intermediate processor that needs to execute the command finishes processing the command, it updates the fence state of the command and notifies the GPU. After the GPU receives the notification, it executes the read command according to the preprocessing result.

其中如果两个或两个以上的中间处理器均需要执行这个命令，则这些中间处理器在执行该命令时也需要按照顺序执行。If two or more intermediate processors need to execute the command, these intermediate processors also need to execute the command in order.

为便于理解，以上述示例为例，假设一个命令需要先由中间处理器A执行，再由中间处理器C执行，最后由GPU执行。中间处理器A在收到宿主机的读取指示后，从缓冲区读取出该命令。中间处理器A通过解析该命令，确定该命令需要自身执行，于是中间处理器A执行该命令。此外，无论该命令是否需要中间处理器A执行，中间处理器A都将向中间处理器B发送读取指示。中间处理器B在收到中间处理器A的读取指示后，从缓冲区读取出该命令。中间处理器B通过解析该命令，确定该命令不需要自身执行。此外，无论该命令是否需要中间处理器B执行，中间处理器B都将向中间处理器C发送读取指示。中间处理器C在收到中间处理器B的读取指示后，从缓冲区读取出该命令。中间处理器C通过解析该命令，确定该命令需要中间处理器A先执行然后自身再执行，于是中间处理器C对命令进行预处理，并等待中间处理器A执行完该命令后自身再根据预处理结果执行该命令。此外，无论该命令是否需要中间处理器C执行，中间处理器C都将向GPU发送读取指示。其中，中间处理器A可以通过向中间处理器C发送fence更新状态来触发中间处理器C执行命令。GPU在收到中间处理器C的读取指示后，从缓冲区读取出该命令。GPU通过解析该命令，确定该命令需要中间处理器C先执行然后自身再执行，于是GPU对命令进行预处理。并等待中间处理器C执行完该命令后自身再根据预处理结果执行该命令。For ease of understanding, taking the above example as an example, assume that a command needs to be executed by the intermediate processor A first, then by the intermediate processor C, and finally by the GPU. The intermediate processor A reads the command from the buffer after receiving the read instruction from the host computer. The intermediate processor A determines that the command needs to be executed by itself by analyzing the command, so the intermediate processor A executes the command. In addition, regardless of whether the command needs to be executed by the middle processor A, the middle processor A will send a read instruction to the middle processor B. After the intermediate processor B receives the read instruction from the intermediate processor A, it reads the command from the buffer. The intermediate processor B determines that the command does not need to be executed by itself by analyzing the command. In addition, regardless of whether the command needs to be executed by the intermediate processor B, the intermediate processor B will send a read instruction to the intermediate processor C. After the intermediate processor C receives the read instruction from the intermediate processor B, it reads the command from the buffer. By analyzing the command, the intermediate processor C determines that the command needs to be executed first by the intermediate processor A and then by itself, so the intermediate processor C preprocesses the command, waits for the intermediate processor A to execute the command, and then executes the command according to the preprocessing result. In addition, regardless of whether the command needs to be executed by the intermediate processor C, the intermediate processor C will send a read instruction to the GPU. Wherein, the intermediate processor A may trigger the intermediate processor C to execute the command by sending the fence update status to the intermediate processor C. After receiving the reading instruction from the intermediate processor C, the GPU reads the command from the buffer. By parsing the command, the GPU determines that the command needs to be executed first by the intermediate processor C and then executed by itself, so the GPU preprocesses the command. And wait for the intermediate processor C to execute the command and then execute the command itself according to the preprocessing result.

示例性地，本公开中的中间处理器可以是应用中央处理器ACPU，一个命令中需要由应用中央处理器ACPU执行的部分可以是DMA命令。需要解释的是，本公开中执行命令至少包括以下两种含义：第一种，处理器内部执行该命令；第二种，处理器将命令提交给其他装置执行，例如将命令提交给DMA控制器执行。Exemplarily, the intermediate processor in the present disclosure may be an application central processing unit ACPU, and a part of a command that needs to be executed by the application central processing unit ACPU may be a DMA command. It should be explained that executing a command in this disclosure includes at least the following two meanings: first, the processor executes the command internally; second, the processor submits the command to other devices for execution, such as submitting the command to a DMA controller for execution.

本公开中，一个或多个中间处理器和GPU共用同一个缓冲区。对于需要先由中间处理器执行再由GPU执行的命令，宿主机将该命令写入该缓冲区，宿主机无需等待中间处理器处理完命令后再向GPU发送命令，可以提升命令的处理效率。此外，GPU在中间处理器执行命令的同时即可开始预处理的动作，减少整个命令处理的延迟，也能有效提升命令的处理效率。In the present disclosure, one or more intermediate processors and the GPU share the same buffer. For commands that need to be executed by the intermediate processor first and then executed by the GPU, the host computer writes the command into the buffer, and the host computer does not need to wait for the intermediate processor to process the command before sending the command to the GPU, which can improve the processing efficiency of the command. In addition, the GPU can start the preprocessing action at the same time as the intermediate processor executes the command, which reduces the delay of the entire command processing and can effectively improve the command processing efficiency.

在一些具体实施方式中，缓冲区为循环缓冲区。例如循环缓冲区可以是环形缓冲区。循环缓冲区配置有宿主机对应的第一偏移字段、每个中间处理器对应的第二偏移字段及GPU对应的第三偏移字段。其中，第一偏移字段用于表征宿主机最后写入循环缓冲区的命令在循环缓冲区中的位置，每个中间处理器对应的第二偏移字段用于表征相应中间处理器最后从循环缓冲区读取的命令在循环缓冲区中的位置，第三偏移字段用于表征GPU最后从循环缓冲区读取的命令在循环缓冲区中的位置。In some embodiments, the buffer is a circular buffer. For example a circular buffer may be a ring buffer. The circular buffer is configured with a first offset field corresponding to the host computer, a second offset field corresponding to each intermediate processor, and a third offset field corresponding to the GPU. Wherein, the first offset field is used to represent the position in the circular buffer of the command last written by the host to the circular buffer, the second offset field corresponding to each intermediate processor is used to represent the position in the circular buffer of the command last read by the corresponding intermediate processor from the circular buffer, and the third offset field is used to represent the position in the circular buffer of the command last read by the GPU from the circular buffer.

本公开中，宿主机在向缓冲区中写入一个或多个命令时，被配置为：根据第三偏移字段，向循环缓冲区中写入一个或多个命令，并更新第一偏移字段。In the present disclosure, when the host computer writes one or more commands into the buffer, it is configured to: write one or more commands into the circular buffer according to the third offset field, and update the first offset field.

中间处理器在从缓冲区读取命令时，被配置为：根据第二目标模块的偏移字段，从循环缓冲区读取命令，并更新自身的第二偏移字段；其中，若N的取值为1，第二目标模块为宿主机；若N的取值大于1，第一个中间处理器对应的第二目标模块为宿主机，其余中间处理器对应的第二目标模块为前一个中间处理器。When the intermediate processor reads the command from the buffer, it is configured to: read the command from the circular buffer according to the offset field of the second target module, and update the second offset field of itself; wherein, if the value of N is 1, the second target module is the host computer; if the value of N is greater than 1, the second target module corresponding to the first intermediate processor is the host computer, and the second target modules corresponding to the remaining intermediate processors are the previous intermediate processor.

例如，第一个中间处理器在从缓冲区读取命令时，被配置为：根据第一偏移字段，从循环缓冲区读取命令，并更新自身的第二偏移字段。除第一个中间处理器以外的其余中间处理器在从缓冲区读取命令时，被配置为：根据前一个中间处理器的第二偏移字段，从循环缓冲区读取命令，并更新自身的第二偏移字段。For example, when the first intermediate processor reads the command from the buffer, it is configured to: read the command from the circular buffer according to the first offset field, and update its own second offset field. When the other intermediate processors except the first intermediate processor read the command from the buffer, they are configured to: read the command from the circular buffer according to the second offset field of the previous intermediate processor, and update their own second offset field.

GPU在从缓冲区读取命令时，被配置为：根据向其发送读取指示的中间处理器的第二偏移字段，从循环缓冲区读取命令，并更新第三偏移字段。换言之，根据最后一个中间处理器的第二偏移字段，从循环缓冲区读取命令，并更新第三偏移字段。When the GPU reads the command from the buffer, it is configured to: read the command from the circular buffer and update the third offset field according to the second offset field of the intermediate processor sending the read instruction to it. In other words, according to the second offset field of the last intermediate processor, the command is read from the circular buffer and the third offset field is updated.

为便于理解，以中间处理器的数量是一个为例，如图2所示，图2是本公开一实施例提出的循环缓冲区的示意图。图2中，循环缓冲区中存储了多个命令，通过第一偏移字段指向的位置可知，宿主机最后写入循环缓冲区的命令是命令n。通过第二偏移字段指向的位置可知，中间处理器最后从循环缓冲区读取的命令是命令k。通过第三偏移字段指向的位置可知，GPU最后从循环缓冲区读取的命令是命令h。For ease of understanding, taking one intermediate processor as an example, as shown in FIG. 2 , FIG. 2 is a schematic diagram of a circular buffer proposed by an embodiment of the present disclosure. In Fig. 2, multiple commands are stored in the circular buffer, and the position pointed to by the first offset field shows that the last command written by the host to the circular buffer is command n. It can be seen from the position pointed to by the second offset field that the last command read by the intermediate processor from the circular buffer is command k. It can be seen from the position pointed to by the third offset field that the last command read by the GPU from the circular buffer is the command h.

以下，以中间处理器的数量是一个为例，对本公开的实施例进行详细说明。Hereinafter, taking one intermediate processor as an example, the embodiments of the present disclosure will be described in detail.

在一些具体实施方式中，宿主机在根据第三偏移字段，向循环缓冲区中写入一个或多个命令时，被配置为：根据第三偏移字段，向循环缓冲区中写入一批命令，直至写入位置比所述第三偏移字段对应的位置小1，或者直至一批命令被全部写完。In some specific implementation manners, when the host computer writes one or more commands into the circular buffer according to the third offset field, it is configured to: write a batch of commands into the circular buffer according to the third offset field until the write position is 1 less than the position corresponding to the third offset field, or until a batch of commands are all written.

本公开中，当宿主机的写入位置仅比第三偏移字段对应的位置小1时，宿主机判断出缓冲区此时的状态为满，因此不会继续向环形缓冲区写入命令，直至GPU从缓冲区读取了新的命令并向前更新了其第三偏移字段，宿主机才会继续向缓冲区写入命令。本公开中，缓冲缓冲区包括多个存储单元，每个存储单元可用于存储至少一个命令，宿主机、中间处理器及GPU各自的偏移字段用于指向一个存储单元。上述写入位置比第三偏移字段对应的位置小1，可以理解成，宿主机当前写入命令的存储单元是第三偏移字段指向的存储单元的前一个存储单元。为便于理解，图2中存储了命令g的存储单元（以下称其为存储单元G）是存储了命令h的存储单元（以下称其为存储单元H）的前一个存储单元。如果当宿主机向存储单元G写入命令时，GPU的第三偏移字段还指向存储单元H，则宿主机判断出缓冲区此时的状态为满，因此不会继续向环形缓冲区写入命令。In the present disclosure, when the write position of the host is only 1 smaller than the position corresponding to the third offset field, the host determines that the state of the buffer is full at this time, so it will not continue to write commands to the ring buffer until the GPU reads new commands from the buffer and updates its third offset field forward, and the host will continue to write commands to the buffer. In the present disclosure, the buffer buffer includes a plurality of storage units, and each storage unit can be used to store at least one command, and the respective offset fields of the host machine, the intermediate processor and the GPU are used to point to a storage unit. The above write position is 1 less than the position corresponding to the third offset field. It can be understood that the storage unit of the current write command of the host is the previous storage unit pointed to by the third offset field. For ease of understanding, the storage unit storing the command g (hereinafter referred to as storage unit G) in FIG. 2 is the previous storage unit of the storage unit storing the command h (hereinafter referred to as storage unit H). If when the host computer writes the command to the storage unit G, the third offset field of the GPU also points to the storage unit H, then the host computer judges that the state of the buffer is full at this time, so it will not continue to write the command to the ring buffer.

本公开中，宿主机每次向循环缓冲区写入一批命令时，宿主机每写入一个命令，第一偏移字段的数值加1，直至第一偏移字段的数值等于第三偏移字段，宿主机暂停向缓冲区写入命令，并等待第三偏移字段的数值增加后，宿主机才继续向缓冲区写入命令。或者，直至一批命令被全部写入缓冲区，第一偏移字段仍然不等于第三偏移字段，宿主机停止向环形缓冲区中写入命令。需要说明的是，当第一偏移字段已经指向循环缓冲区的最后一个地址时，在第一偏移字段的数值加1后，第一偏移字段将重新指向环形缓冲区的第一个地址。In the present disclosure, each time the host computer writes a batch of commands to the circular buffer, each time the host computer writes a command, the value of the first offset field is increased by 1 until the value of the first offset field is equal to the third offset field, the host computer suspends writing commands to the buffer, and waits for the value of the third offset field to increase before the host computer continues to write commands to the buffer. Or, until a batch of commands are all written into the buffer, the first offset field is still not equal to the third offset field, and the host computer stops writing commands into the ring buffer. It should be noted that when the first offset field has already pointed to the last address of the circular buffer, after the value of the first offset field is increased by 1, the first offset field will point to the first address of the circular buffer again.

在一些具体实施方式中，宿主机向循环缓冲区写入命令期间，当第一偏移字段的数值等于第三偏移字段时，宿主机才向中间处理器发送读取指示。或者在待写入循环缓冲区的命令被全部写入循环缓冲区后，宿主机才向中间处理器发送读取指示。本公开中，宿主机并不是每向循环缓冲区写入一个命令，就通过门铃机制向中间处理器发送读取指示，而是在宿主机已经不能继续向循环缓冲区写入命令时（比如当第一偏移字段等于第三偏移字段时，或者当一批命令被全部写入缓冲区时），才通过门铃机制向中间处理器发送读取指示，这样可以减少对中间处理器的打扰。In some specific implementation manners, when the host computer writes the command to the circular buffer, when the value of the first offset field is equal to the third offset field, the host computer sends a read instruction to the intermediate processor. Or the host computer sends a read instruction to the intermediate processor only after all commands to be written into the circular buffer are written into the circular buffer. In this disclosure, the host computer does not send a read instruction to the intermediate processor through the doorbell mechanism every time a command is written to the circular buffer, but only sends a read instruction to the intermediate processor through the doorbell mechanism when the host computer cannot continue to write commands to the circular buffer (for example, when the first offset field is equal to the third offset field, or when a batch of commands are all written into the buffer), so as to reduce interruptions to the intermediate processor.

或者，在另一些具体实施方式中，宿主机也可以每从循环缓冲区写入一个命令，宿主机就通过门铃机制向中间处理器发送一次读取指示。Or, in some other specific implementation manners, the host computer may also send a read instruction to the intermediate processor through the doorbell mechanism every time a command is written from the circular buffer.

在一些具体实施方式中，中间处理器在根据第一偏移字段，从循环缓冲区读取命令，并更新自身的第二偏移字段时，被配置为：在第二偏移字段不等于第一偏移字段的情况下，从循环缓冲区逐个读取命令，每读取一个命令，更新一次第二偏移字段，直至第二偏移字段等于第一偏移字段，停止读取命令。In some specific implementation manners, when the intermediate processor reads commands from the circular buffer according to the first offset field and updates its own second offset field, it is configured to: when the second offset field is not equal to the first offset field, read commands from the circular buffer one by one, and update the second offset field every time a command is read, until the second offset field is equal to the first offset field, and stop reading the command.

本公开中，中间处理器每从循环缓冲区读取出一个命令时，第二偏移字段的数值加1，直至第二偏移字段的数值等于第一偏移字段，中间处理器暂停从循环缓冲区读取命令。需要说明的是，当第二偏移字段已经指向循环缓冲区的最后一个地址时，在第二偏移字段的数值加1后，第二偏移字段将重新指向环形缓冲区的第一个地址。In the present disclosure, each time the intermediate processor reads a command from the circular buffer, the value of the second offset field is increased by 1 until the value of the second offset field is equal to the first offset field, and the intermediate processor suspends reading commands from the circular buffer. It should be noted that when the second offset field has already pointed to the last address of the circular buffer, after the value of the second offset field is increased by 1, the second offset field will point to the first address of the circular buffer again.

在一些具体实施方式中，中间处理器在向GPU发送读取指示时，被配置为：在中间处理器的第二偏移字段等于第一偏移字段时，向GPU发送读取指示。换言之，当中间处理器读取到第一偏移字段所指向的命令时，也就是当中间处理器读取到宿主机最后写入循环缓冲区的命令（即宿主机最近一次写入循环缓冲区中的一个命令）时，中间处理器才通过门铃机制向GPU发送读取指示，这样可以减少中间处理器对GPU的打扰。In some specific implementation manners, when the intermediate processor sends the read instruction to the GPU, it is configured to: send the read instruction to the GPU when the second offset field of the intermediate processor is equal to the first offset field. In other words, when the central processor reads the command pointed to by the first offset field, that is, when the central processor reads the last command written by the host to the circular buffer (that is, the last command written by the host to the circular buffer), the central processor sends a read instruction to the GPU through the doorbell mechanism, which can reduce the disturbance of the central processor to the GPU.

或者，在另一些具体实施方式中，中间处理器也可以每从循环缓冲区中读取出一个命令，中间处理器就通过门铃机制向GPU发送一次读取指示。Or, in other specific implementation manners, every time the intermediate processor reads a command from the circular buffer, the intermediate processor sends a read instruction to the GPU through the doorbell mechanism.

在一些具体实施方式中，GPU在根据中间处理器的第二偏移字段，从循环缓冲区读取命令，并更新GPU偏移字段时，被配置为：在第三偏移字段不等于最后一个中间处理器的第二偏移字段的情况下，从循环缓冲区逐个读取命令，每读取一个命令，更新一次第三偏移字段，直至第三偏移字段等于最后一个中间处理器的第二偏移字段，停止读取命令。In some specific implementations, when the GPU reads commands from the circular buffer according to the second offset field of the intermediate processor, and updates the GPU offset field, it is configured to: when the third offset field is not equal to the second offset field of the last intermediate processor, read the commands one by one from the circular buffer, and update the third offset field once every time a command is read, until the third offset field is equal to the second offset field of the last intermediate processor, and stop reading the command.

本公开中，GPU每从循环缓冲区读取出一个命令时，第三偏移字段的数值加1，直至第三偏移字段的数值等于第二偏移字段，GPU暂停从循环缓冲区读取命令。需要说明的是，当第三偏移字段已经指向循环缓冲区的最后一个地址时，在第三偏移字段的数值加1后，第三偏移字段将重新指向环形缓冲区的第一个地址。In the present disclosure, every time the GPU reads a command from the circular buffer, the value of the third offset field is increased by 1 until the value of the third offset field is equal to the second offset field, and the GPU suspends reading commands from the circular buffer. It should be noted that when the third offset field has already pointed to the last address of the circular buffer, after the value of the third offset field is increased by 1, the third offset field will point to the first address of the circular buffer again.

在一些具体实施方式中，中间处理器还被配置为：在从缓冲区读取命令后，判断读取的命令是否需要GPU执行，并判断读取的命令是否需要自身先执行且GPU后执行，根据判断结果，在缓冲区该命令的对应位置填充第一标识和第二标识，第一标识用于表征读取的命令是否需要GPU执行，第二标识用于表征读取的命令是否需要自身先执行且GPU后执行。In some specific implementations, the intermediate processor is further configured to: after reading the command from the buffer, judge whether the read command needs to be executed by the GPU, and judge whether the read command needs to be executed first by itself and then executed by the GPU. According to the judgment result, the corresponding position of the command in the buffer is filled with a first identifier and a second identifier. The first identifier is used to indicate whether the read command needs to be executed by the GPU, and the second identifier is used to indicate whether the read command needs to be executed first by itself and then executed by the GPU.

GPU还被配置为：在从循环缓冲区读取命令后，根据第一标识，判断读取的命令是否需要GPU执行，并根据第二标识，判断读取的命令是否需要最后一个中间处理器先执行且GPU后执行。The GPU is further configured to: after reading the command from the circular buffer, judge whether the read command needs to be executed by the GPU according to the first flag, and judge whether the read command needs to be executed first by the last intermediate processor and executed by the GPU according to the second flag.

为便于理解，如图3所示，图3是本公开另一实施例提出的命令处理系统的结构示意图。如图3所示，该命令处理系统包括宿主机、中间处理器以及GPU，其中中间处理器和GPU共用一个循环缓冲区，该循环缓冲区包括多个存储区，每个存储区用于存储一个命令和该命令对应的第一标识和第二标识。For ease of understanding, as shown in FIG. 3 , FIG. 3 is a schematic structural diagram of a command processing system according to another embodiment of the present disclosure. As shown in FIG. 3 , the command processing system includes a host machine, an intermediate processor, and a GPU, wherein the intermediate processor and the GPU share a circular buffer, and the circular buffer includes a plurality of storage areas, and each storage area is used to store a command and a first identifier and a second identifier corresponding to the command.

图3中，第二偏移字段所指向的存储区是中间处理器当前刚好读取过命令的存储区。中间处理器从存储区中读取出命令后，通过解析该命令，以判断该命令是否需要GPU处理，并根据判断结果，在该存储区中填充第一标识。如图3所示，一些存储区中的第一标识为GE，表示这些存储区中的命令需要GPU执行。还有一些存储区中的第一标识为GNE，表示这些存储区中的命令不需要GPU执行。In FIG. 3 , the storage area pointed to by the second offset field is the storage area that the intermediate processor has just read the command at present. After the intermediate processor reads the command from the storage area, it parses the command to determine whether the command requires GPU processing, and fills the first identifier in the storage area according to the judgment result. As shown in FIG. 3 , the first identifier in some storage areas is GE, indicating that the commands in these storage areas need to be executed by the GPU. In some storage areas, the first identifier is GNE, indicating that the commands in these storage areas do not need to be executed by the GPU.

此外，中间处理器还判断命令是否需要中间处理器先执行且GPU后执行，并根据判断结果，在该存储区中填充第二标识。如图3所示，一些存储区中的第二标识为W，表示这些存储区中的命令需要中间处理器先执行且GPU后执行，即GPU需要等待中间处理器执行该命令后才能执行该命令。还有一些缓冲区中的第二标识为NW，表示这些存储区中的命令仅需GPU执行，即GPU不需要等待中间处理器先执行该命令，而可以直接执行该命令。In addition, the intermediate processor also judges whether the command needs to be executed first by the intermediate processor and executed later by the GPU, and fills the second identifier in the storage area according to the judgment result. As shown in FIG. 3 , the second mark in some storage areas is W, indicating that the commands in these storage areas need to be executed first by the intermediate processor and then executed by the GPU, that is, the GPU needs to wait for the intermediate processor to execute the command before executing the command. In some buffers, the second identifier is NW, indicating that the commands in these storage areas only need to be executed by the GPU, that is, the GPU does not need to wait for the intermediate processor to execute the command first, but can directly execute the command.

以上，本公开通过多个具体实施方式，详细说明了中间处理器的数量是一个的情况。以下，本公开将通过多个具体实施方式，详细说明中间处理器的数量是多个的情况。需要说明的是，多个中间处理器的一些具体实施方式与一个中间处理器的一些具体实施方式是相同的，为避免重复，对于相同的具体实施方式，本公开将不展开说明。Above, the present disclosure has described in detail the case that the number of intermediate processors is one through multiple specific implementation manners. Hereinafter, the present disclosure will describe in detail the situation that there are multiple intermediate processors through multiple specific implementation manners. It should be noted that some specific implementations of multiple intermediate processors are the same as some specific implementations of one intermediate processor, and to avoid repetition, the present disclosure will not describe the same specific implementations.

如图4所示，图4是本公开另一实施例提出的循环缓冲区的示意图。图4中，循环缓冲区中存储了多个命令，通过第一偏移字段指向的位置可知，宿主机最后写入循环缓冲区的命令是命令r。通过第一个第二偏移字段指向的位置可知，中间处理器A最后从循环缓冲区读取的命令是命令m。通过第二个第二偏移字段指向的位置可知，中间处理器B最后从循环缓冲区读取的命令是命令k。通过第三个第二偏移字段指向的位置可知，中间处理器C最后从循环缓冲区读取的命令是命令i。通过第三偏移字段指向的位置可知，GPU最后从循环缓冲区读取的命令是命令f。As shown in FIG. 4 , FIG. 4 is a schematic diagram of a circular buffer proposed by another embodiment of the present disclosure. In FIG. 4 , multiple commands are stored in the circular buffer, and the position pointed to by the first offset field shows that the last command written by the host to the circular buffer is command r. From the position pointed to by the first and second offset fields, it can be seen that the last command read by the intermediate processor A from the circular buffer is the command m. It can be seen from the position pointed to by the second second offset field that the last command read by the intermediate processor B from the circular buffer is command k. From the position pointed to by the third second offset field, it can be seen that the last command read by the intermediate processor C from the circular buffer is the command i. It can be seen from the position pointed to by the third offset field that the last command read by the GPU from the circular buffer is the command f.

在一些具体实施方式中，第一个中间处理器在其自身的第二偏移字段不等于第一偏移字段的情况下，从循环缓冲区逐个读取命令，每读取一个命令，更新一次其自身的第二偏移字段，直至其自身的第二偏移字段等于第一偏移字段，停止读取命令。In some specific implementations, the first intermediate processor reads commands one by one from the circular buffer when its own second offset field is not equal to the first offset field, and updates its own second offset field every time it reads a command, until its own second offset field is equal to the first offset field, and stops reading commands.

在一些具体实施方式中，除第一个中间处理器以外的其余中间处理器在从缓冲区读取命令时，被配置为：根据前一个中间处理器的第二偏移字段，从循环缓冲区读取命令，并更新自身的第二偏移字段。In some specific implementation manners, when reading commands from the buffer, the rest of the intermediate processors except the first intermediate processor are configured to: read the command from the circular buffer according to the second offset field of the previous intermediate processor, and update the second offset field of itself.

本公开中，除第一中间处理器以外的其余中间处理器每从循环缓冲区读取出一个命令时，该中间处理器的第二偏移字段的数值加1，直至第二偏移字段的数值等于该中间处理器的前一中间处理器的第二偏移字段，该中间处理器暂停从循环缓冲区读取命令。为便于理解，沿用上述示例，中间处理器B每从循环缓冲区读取出一个命令时，中间处理器B的第二偏移字段的数值加1，直至第二偏移字段的数值等于中间处理器A的第二偏移字段，中间处理器B暂停从循环缓冲区读取命令。中间处理器C的每从循环缓冲区读取出一个命令时，中间处理器C的第二偏移字段的数值加1，直至第二偏移字段的数值等于中间处理器A的第二偏移字段，中间处理器C暂停从循环缓冲区读取命令。In the present disclosure, when other intermediate processors except the first intermediate processor read a command from the circular buffer, the value of the second offset field of the intermediate processor is increased by 1 until the value of the second offset field is equal to the second offset field of the previous intermediate processor of the intermediate processor, and the intermediate processor suspends reading commands from the circular buffer. For ease of understanding, following the above example, when the intermediate processor B reads a command from the circular buffer, the value of the second offset field of the intermediate processor B is increased by 1 until the value of the second offset field is equal to the second offset field of the intermediate processor A, and the intermediate processor B suspends reading the command from the circular buffer. When each of the intermediate processor C reads a command from the circular buffer, the value of the second offset field of the intermediate processor C is increased by 1 until the value of the second offset field is equal to the second offset field of the intermediate processor A, and the intermediate processor C suspends reading the command from the circular buffer.

需要说明的是，当第二偏移字段已经指向循环缓冲区的最后一个地址时，在第二偏移字段的数值加1后，第二偏移字段将重新指向环形缓冲区的第一个地址。It should be noted that when the second offset field has already pointed to the last address of the circular buffer, after the value of the second offset field is increased by 1, the second offset field will point to the first address of the circular buffer again.

在一些具体实施方式中，最后一个中间处理器在向GPU发送读取指示时，被配置为：在最后一个中间处理器的第二偏移字段等于前一个中间处理器的第二偏移字段时，向GPU发送读取指示。换言之，当最后一个中间处理器读取到前一个中间处理器的第二偏移字段所指向的命令时，最后一个中间处理器才通过门铃机制向GPU发送读取指示，这样可以减少中间处理器对GPU的打扰。In some specific implementation manners, when the last intermediate processor sends the read instruction to the GPU, it is configured to: send the read instruction to the GPU when the second offset field of the last intermediate processor is equal to the second offset field of the previous intermediate processor. In other words, when the last intermediate processor reads the command pointed to by the second offset field of the previous intermediate processor, the last intermediate processor sends a read instruction to the GPU through the doorbell mechanism, which can reduce the interruption of the intermediate processor to the GPU.

或者，在另一些具体实施方式中，最后一个中间处理器也可以每从循环缓冲区中读取出一个命令，中间处理器就通过门铃机制向GPU发送一次读取指示。Or, in other specific implementation manners, every time the last intermediate processor reads a command from the circular buffer, the intermediate processor sends a read instruction to the GPU through the doorbell mechanism.

在一些具体实施方式中，GPU在根据最后一个中间处理器的第二偏移字段，从循环缓冲区读取命令，并更新GPU偏移字段时，被配置为：在第三偏移字段不等于最后一个中间处理器的第二偏移字段的情况下，从循环缓冲区逐个读取命令，每读取一个命令，更新一次第三偏移字段，直至第三偏移字段等于最后一个中间处理器的第二偏移字段，停止读取命令。In some specific implementations, when the GPU reads commands from the circular buffer according to the second offset field of the last intermediate processor, and updates the GPU offset field, it is configured to: when the third offset field is not equal to the second offset field of the last intermediate processor, read the commands one by one from the circular buffer, and update the third offset field once every time a command is read, until the third offset field is equal to the second offset field of the last intermediate processor, and stop reading the command.

在一些具体实施方式中，最后一个中间处理器还被配置为：在从缓冲区读取命令后，判断读取的命令是否需要GPU执行，并判断读取的命令是否需要自身先执行且GPU后执行，根据判断结果，在缓冲区该命令的对应位置填充第一标识和第二标识，第一标识用于表征读取的命令是否需要GPU执行，第二标识用于表征读取的命令是否需要自身先执行且GPU后执行。In some specific implementations, the last intermediate processor is further configured to: after reading the command from the buffer, judge whether the read command needs to be executed by the GPU, and judge whether the read command needs to be executed first by itself and then executed by the GPU, and according to the judgment result, fill the first identifier and the second identifier in the corresponding position of the command in the buffer, the first identifier is used to indicate whether the read command needs to be executed by the GPU, and the second identifier is used to indicate whether the read command needs to be executed first by itself and executed later by the GPU.

参考图5，图5是本公开另一实施例提出的命令处理系统的结构示意图。如图5所示，该命令处理系统包括宿主机、中间处理器以及GPU，其中中间处理器和GPU共用一个循环缓冲区，该循环缓冲区包括多个存储区，每个存储区用于存储一个命令和该命令对应的N个第一标识和N个第二标识。如图5中的虚线箭头所示，除第一个中间处理器以外的其余N-1个中间处理器中，每个中间处理器分别对应一个第一标识和一个第二标识，此外GPU还对应一个第一标识和一个第二标识。每个中间处理器对应的第一标识用于表示：该中间处理器是否需要执行相应命令。每个中间处理器对应的第二标识用于表示：该中间处理器是否需要等待其上一个中间处理器执行该命令后，才执行该命令。GPU对应的第一标识用于表示：GPU是否需要执行相应命令。GPU对应的第二标识用于表示：GPU是否需要等待最后一个中间处理器执行该命令后，才执行该命令。Referring to FIG. 5 , FIG. 5 is a schematic structural diagram of a command processing system according to another embodiment of the present disclosure. As shown in FIG. 5, the command processing system includes a host machine, an intermediate processor, and a GPU, wherein the intermediate processor and the GPU share a circular buffer, and the circular buffer includes a plurality of storage areas, and each storage area is used to store a command and N first identifiers and N second identifiers corresponding to the command. As shown by the dotted arrows in FIG. 5 , among the remaining N-1 intermediate processors except the first intermediate processor, each intermediate processor corresponds to a first identifier and a second identifier, and the GPU also corresponds to a first identifier and a second identifier. The first identifier corresponding to each intermediate processor is used to indicate whether the intermediate processor needs to execute the corresponding command. The second identifier corresponding to each intermediate processor is used to indicate whether the intermediate processor needs to wait for the previous intermediate processor to execute the command before executing the command. The first identifier corresponding to the GPU is used to indicate whether the GPU needs to execute a corresponding command. The second identifier corresponding to the GPU is used to indicate whether the GPU needs to wait for the last intermediate processor to execute the command before executing the command.

图5中，中间处理器A从循环缓冲器中读取出一个命令后，通过解析该命令，判断该命令是否需要中间处理器B执行，并判断该命令是否需要中间处理器A先执行而中间处理器B后执行。中间处理器A根据判断结果，填充第一个第一标识和第一个第二标识。类似地，中间处理器B从循环缓冲器中读取出一个命令后，通过解析该命令，判断该命令是否需要中间处理器C执行，并判断该命令是否需要中间处理器B先执行而中间处理器C后执行。中间处理器B根据判断结果，填充第二个第一标识和第二个第二标识。中间处理器C从循环缓冲器中读取出一个命令后，通过解析该命令，判断该命令是否需要GPU执行，并判断该命令是否需要中间处理器C先执行而GPU后执行。中间处理器C根据判断结果，填充第三个第一标识和第三个第二标识。In Fig. 5, after the intermediate processor A reads a command from the circular buffer, by analyzing the command, it is judged whether the command needs to be executed by the intermediate processor B, and whether the command needs to be executed first by the intermediate processor A and then executed by the intermediate processor B. The intermediate processor A fills in the first first identifier and the first second identifier according to the judgment result. Similarly, after the intermediate processor B reads a command from the circular buffer, it parses the command to determine whether the command needs to be executed by the intermediate processor C, and judges whether the command needs to be executed first by the intermediate processor B and executed later by the intermediate processor C. The intermediate processor B fills in the second first identifier and the second second identifier according to the judgment result. After the intermediate processor C reads a command from the circular buffer, it parses the command to judge whether the command needs to be executed by the GPU, and judges whether the command needs to be executed by the intermediate processor C first and then by the GPU. The intermediate processor C fills in the third first identifier and the third second identifier according to the judgment result.

以上，本公开通过多个具体实施方式，详细说明了中间处理器的数量是多个的情况。Above, the present disclosure has described in detail the case where there are multiple intermediate processors through multiple specific implementations.

基于同一发明构思，本公开还提供另一种命令处理系统，包括N个中间处理器及GPU，在N的取值大于1的情况下，N个中间处理器之间具有顺序关系。Based on the same inventive concept, the present disclosure also provides another command processing system, including N intermediate processors and GPUs. When the value of N is greater than 1, there is a sequence relationship among the N intermediate processors.

在一些具体实施方式中，中间处理器被配置为：在接收到读取指示后，从缓冲区读取命令，向第一目标模块发送读取指示，并在读取的命令需要中间处理器自身执行的情况下，执行读取的命令；其中，若N的取值为1，第一目标模块为GPU；若N的取值大于1，最后一个中间处理器对应的第一目标模块为GPU，其他中间处理器对应的第一目标模块为下一个中间处理器；In some specific implementations, the intermediate processor is configured to: after receiving the read instruction, read the command from the buffer, send the read instruction to the first target module, and execute the read command when the read command needs to be executed by the intermediate processor itself; wherein, if the value of N is 1, the first target module is a GPU; if the value of N is greater than 1, the first target module corresponding to the last intermediate processor is a GPU, and the first target module corresponding to other intermediate processors is the next intermediate processor;

在一些具体实施方式中，缓冲区为循环缓冲区，循环缓冲区配置有宿主机对应的第一偏移字段、每个中间处理器对应的第二偏移字段及GPU对应的第三偏移字段，第一偏移字段用于表征宿主机最后写入循环缓冲区的命令在循环缓冲区中的位置，每个中间处理器对应的第二偏移字段用于表征相应中间处理器最后从循环缓冲区读取的命令在循环缓冲区中的位置，第三偏移字段用于表征GPU最后从循环缓冲区读取的命令在循环缓冲区中的位置；In some specific implementations, the buffer is a circular buffer, and the circular buffer is configured with a first offset field corresponding to the host computer, a second offset field corresponding to each intermediate processor, and a third offset field corresponding to the GPU. The first offset field is used to represent the position in the circular buffer of the command last written by the host computer into the circular buffer, the second offset field corresponding to each intermediate processor is used to represent the position in the circular buffer of the command last read by the corresponding intermediate processor from the circular buffer, and the third offset field is used to represent the position in the circular buffer of the command last read by the GPU from the circular buffer;

在一些具体实施方式中，中间处理器在根据第二目标模块的偏移字段，从循环缓冲区读取命令，并更新自身的第二偏移字段时，被配置为：在自身的第二偏移字段不等于第二目标模块的偏移字段的情况下，从循环缓冲区逐个读取命令，每读取一个命令，更新一次自身的第二偏移字段，直至自身的第二偏移字段等于第二目标模块的偏移字段，停止读取命令。In some specific implementations, when the intermediate processor reads commands from the circular buffer according to the offset field of the second target module, and updates its own second offset field, it is configured to: when its own second offset field is not equal to the second target module's offset field, read commands one by one from the circular buffer, each time a command is read, update its own second offset field until its own second offset field is equal to the second target module's offset field, and stop reading the command.

在一些具体实施方式中，中间处理器在向第一目标模块发送读取指示时，被配置为：在自身的第二偏移字段等于第二目标模块的偏移字段的情况下，向第一目标模块发送读取指示。In some specific implementations, when the intermediate processor sends the read instruction to the first target module, it is configured to: send the read instruction to the first target module when its second offset field is equal to the offset field of the second target module.

在一些具体实施方式中，GPU在根据向其发送读取指示的中间处理器的第二偏移字段，从循环缓冲区读取命令，并更新第三偏移字段时，被配置为：在第三偏移字段不等于向GPU发送读取指示的中间处理器的第二偏移字段的情况下，从循环缓冲区逐个读取命令，每读取一个命令，更新一次第三偏移字段，直至第三偏移字段等于向GPU发送读取指示的中间处理器的第二偏移字段，停止读取命令。In some specific implementations, when the GPU reads the command from the circular buffer according to the second offset field of the intermediate processor that sends the read instruction to it, and when updating the third offset field, it is configured to: when the third offset field is not equal to the second offset field of the intermediate processor that sends the read instruction to the GPU, read the commands one by one from the circular buffer, and update the third offset field every time a command is read, until the third offset field is equal to the second offset field of the intermediate processor that sends the read instruction to the GPU, and stop reading the command.

在一些具体实施方式中，中间处理器还被配置为：在从缓冲区读取命令后，判断读取的命令是否需要对应的第一目标模块执行，并判断读取的命令是否需要自身先执行且对应的第一目标模块后执行，根据判断结果，在所述缓冲区该命令的对应位置填充第一标识和第二标识，第一标识用于表征读取的命令是否需要对应的第一目标模块执行，第二标识用于表征读取的命令是否需要自身先执行且对应的第一目标模块后执行；In some specific implementations, the intermediate processor is further configured to: after reading the command from the buffer, judge whether the read command needs to be executed by the corresponding first target module, and judge whether the read command needs to be executed first by itself and after the corresponding first target module, and according to the judgment result, fill the corresponding position of the command in the buffer with a first identifier and a second identifier, the first identifier is used to indicate whether the read command needs to be executed by the corresponding first target module, and the second identifier is used to indicate whether the read command needs to be executed first and executed after the corresponding first target module;

在一些具体实施方式中，最后一个中间处理器还被配置为：针对需要最后一个中间处理器先执行且GPU后执行的命令，在处理完该命令后，更新该命令的fence状态，并通知GPU。In some specific implementation manners, the last intermediate processor is further configured to: for a command that needs to be executed first by the last intermediate processor and executed later by the GPU, after processing the command, update the fence status of the command, and notify the GPU.

本公开实施例还提供一种电子装置，该电子装置包括上述任一实施例中所述的命令处理系统。在一些使用场景下，该电子装置的产品形式体现为显卡；在另一些使用场景下，该电子装置的产品形式体现为CPU主板。Embodiments of the present disclosure further provide an electronic device, which includes the command processing system described in any one of the above embodiments. In some usage scenarios, the product form of the electronic device is a graphics card; in other usage scenarios, the product form of the electronic device is a CPU motherboard.

本公开实施例还提供一种电子设备，该电子设备包括上述的电子装置。在一些使用场景下，该电子设备的产品形式是便携式电子设备，例如智能手机、平板电脑、VR设备等；在一些使用场景下，该电子设备的产品形式是个人电脑、游戏主机、工作站、服务器等。An embodiment of the present disclosure also provides an electronic device, where the electronic device includes the above-mentioned electronic device. In some usage scenarios, the product form of the electronic device is a portable electronic device, such as a smartphone, tablet computer, VR device, etc.; in some usage scenarios, the product form of the electronic device is a personal computer, a game console, a workstation, a server, etc.

尽管已描述了本公开的优选实施例，但本领域内的技术人员一旦得知了基本创造性概念，则可对这些实施例作出另外的变更和修改。所以，所附权利要求意欲解释为包括优选实施例以及落入本公开范围的所有变更和修改。While preferred embodiments of the present disclosure have been described, additional changes and modifications can be made to these embodiments by those skilled in the art once the basic inventive concept is appreciated. Therefore, it is intended that the appended claims be construed to cover the preferred embodiment and all changes and modifications which fall within the scope of the present disclosure.

显然，本领域的技术人员可以对本公开进行各种改动和变型而不脱离本公开的精神和范围。这样，倘若本公开的这些修改和变型属于本公开权利要求及其等同技术的范围之内，则本公开也意图包含这些改动和变型在内。It is obvious that those skilled in the art can make various changes and modifications to the present disclosure without departing from the spirit and scope of the present disclosure. Thus, if these modifications and variations of the present disclosure fall within the scope of the claims of the present disclosure and equivalent technologies thereof, the present disclosure also intends to include these modifications and variations.

Claims

1. A command processing system, comprising N intermediate processors and GPUs, where the value of N is greater than 1, there is a sequence relationship between the N intermediate processors;

The intermediate processor is configured to: after receiving the read instruction, read the command from the buffer, send the read instruction to the first target module, and execute the read command when the read command needs to be executed by the intermediate processor itself; wherein, if the value of N is 1, the first target module is the GPU; if the value of N is greater than 1, the first target module corresponding to the last intermediate processor is the GPU, and the first target module corresponding to other intermediate processors is the next intermediate processor;

The intermediate processor is further configured to: after reading the command from the buffer, judge whether the read command needs to be executed by the corresponding first target module, and judge whether the read command needs to be executed first by itself and after the corresponding first target module, and according to the judgment result, fill the corresponding position of the command in the buffer with a first identifier and a second identifier, the first identifier is used to indicate whether the read command needs to be executed by the corresponding first target module, and the second identifier is used to indicate whether the read command needs to be executed first and executed after the corresponding first target module;

The GPU is configured to: after receiving the read instruction, read the command from the buffer, and when the read command needs to be executed first by the intermediate processor and executed later by the GPU, preprocess the read command, wait for the intermediate processor to execute the command, and then execute the read command according to the preprocessing result;

The GPU is further configured to: after reading the command from the buffer, judge whether the read command needs to be executed by the GPU according to the first identifier at the corresponding position, and judge whether the read command needs to be executed first by the intermediate processor that sends the read instruction to the GPU and then executed by the GPU according to the second identifier at the corresponding position.

2. The system according to claim 1, wherein the buffer is a circular buffer, and the circular buffer is configured with a first offset field corresponding to the host computer, a second offset field corresponding to each of the intermediate processors, and a third offset field corresponding to the GPU, the first offset field is used to represent the position in the circular buffer of the command last written by the host computer into the circular buffer, the second offset field corresponding to each intermediate processor is used to represent the position in the circular buffer of the command last read by the corresponding intermediate processor from the circular buffer, and the third offset field is used to represent the last command read by the GPU from the circular buffer in the circular buffer location;

When the intermediate processor reads the command from the buffer, it is configured to: read the command from the circular buffer according to the offset field of the second target module, and update the second offset field of itself; wherein, if the value of N is 1, the second target module is the host computer; if the value of N is greater than 1, the second target module corresponding to the first intermediate processor is the host computer, and the second target modules corresponding to the remaining intermediate processors are the previous intermediate processor;

When the GPU reads the command from the buffer, it is configured to: read the command from the circular buffer according to the second offset field of the intermediate processor sending the read instruction to it, and update the third offset field.

3. The system according to claim 2, when the intermediate processor reads commands from the circular buffer according to the offset field of the second target module and updates its own second offset field, it is configured to: in the case that its second offset field is not equal to the offset field of the second target module, read commands one by one from the circular buffer, and update the second offset field of itself every time a command is read, until the second offset field of itself is equal to the offset field of the second target module, and stop reading the command.

4. The system according to claim 2, when the intermediate processor sends the read instruction to the first target module, it is configured to: send the read instruction to the first target module when its second offset field is equal to the offset field of the second target module.

5. The system according to claim 2, when the GPU reads a command from the circular buffer according to the second offset field of the intermediate processor that sends the read instruction to it, and when updating the third offset field, it is configured to: when the third offset field is not equal to the second offset field of the intermediate processor that sends the read instruction to the GPU, read commands one by one from the circular buffer, and update the third offset field once every time a command is read, until the third offset field is equal to the second offset field of the intermediate processor that sends the read instruction to the GPU, and stop reading commands.

6. The system according to claim 1 or 2, when the value of N is greater than 1, the intermediate processor is further configured to: after reading the command from the buffer, judge whether the read command needs to be executed by itself according to the first identifier at the corresponding position, and judge whether the read command needs to be executed first by the previous intermediate processor and executed after itself according to the second identifier at the corresponding position.

7. The system according to claim 1, wherein the last intermediate processor is further configured to: for a command that needs to be executed first by the last intermediate processor and executed later by the GPU, after processing the command, update the fence status of the command, and notify the GPU.

8. The system according to claim 2, further comprising a host machine configured to: write one or more commands into the buffer, and send a read instruction to the first intermediate processor among the N intermediate processors.

9. The system according to claim 8, when the host computer writes one or more commands into the buffer, it is configured to: write one or more commands into the circular buffer according to the third offset field, and update the first offset field.

10. The system according to claim 9, when the host machine writes one or more commands into the circular buffer according to the third offset field, it is configured to: write a batch of commands into the circular buffer according to the third offset field until the written position is 1 smaller than the position corresponding to the third offset field, or until the batch of commands are all written.

11. The system according to claim 8, wherein the host machine is further configured to: for commands that require the intermediate processor to execute first and the GPU to execute later, pack the execution part of the intermediate processor into the execution part of the GPU, and write the execution part of the intermediate processor and the execution part of the GPU into the circular buffer as a single command.

12. An electronic device comprising the system according to any one of claims 1-11.

13. An electronic device comprising the electronic device according to claim 12.