[go: up one dir, main page]

CN104615557A - Multi-core fine grit synchronous DMA transmission method used for GPDSP - Google Patents

Multi-core fine grit synchronous DMA transmission method used for GPDSP Download PDF

Info

Publication number
CN104615557A
CN104615557A CN201510033310.8A CN201510033310A CN104615557A CN 104615557 A CN104615557 A CN 104615557A CN 201510033310 A CN201510033310 A CN 201510033310A CN 104615557 A CN104615557 A CN 104615557A
Authority
CN
China
Prior art keywords
dma
transmission
data
frame
core
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201510033310.8A
Other languages
Chinese (zh)
Other versions
CN104615557B (en
Inventor
万江华
马胜
杨柳
陈书明
郭阳
刘胜
雷元武
陈胜刚
彭元喜
胡封林
田玉恒
李晨
王占立
胡月安
丁一博
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
National University of Defense Technology
Original Assignee
National University of Defense Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by National University of Defense Technology filed Critical National University of Defense Technology
Priority to CN201510033310.8A priority Critical patent/CN104615557B/en
Publication of CN104615557A publication Critical patent/CN104615557A/en
Application granted granted Critical
Publication of CN104615557B publication Critical patent/CN104615557B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/14Handling requests for interconnection or transfer
    • G06F13/20Handling requests for interconnection or transfer for access to input/output bus
    • G06F13/28Handling requests for interconnection or transfer for access to input/output bus using burst mode transfer, e.g. direct memory access DMA, cycle steal
    • G06F13/282Cycle stealing DMA
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2213/00Indexing scheme relating to interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F2213/28DMA
    • G06F2213/2806Space or buffer allocation for DMA transfers

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Bus Control (AREA)

Abstract

一种用于GPDSP的多核细粒度同步的DMA传输方法,每个参与多核细粒度同步传输的直接存储访问部件DMA在传输完一帧数据后都会将一个本地帧传输结束信号发送到全局同步寄存器;将来自多个核的结束信号整合成一个全局帧传输结束信号;每个直接存储访问部件DMA检查自己所配置的参与同步传输的核列表参数是否与接收到全局帧传输结束信号一致;如果一致,则表明所有参与直接存储访问部件DMA都完成了上一帧数据的传输,可以开始进行下一帧数据的搬移;如果不一致,则表明依然有参与直接存储访问部件DMA没有完成上一帧数据的搬移,这些参与的直接存储访问部件DMA都继续等待,直到匹配成功。本发明可有效提升SDRAM的行命中率,显著提高访存带宽的利用率和DMA传输效率。

A multi-core fine-grained synchronous DMA transmission method for GPDSP, each direct storage access component DMA participating in the multi-core fine-grained synchronous transmission will send a local frame transmission end signal to the global synchronization register after transmitting a frame of data; Integrate the end signals from multiple cores into a global frame transmission end signal; each direct storage access component DMA checks whether the core list parameters configured by itself to participate in synchronous transmission are consistent with the received global frame transmission end signal; if consistent, It indicates that all DMAs participating in the direct storage access component have completed the transmission of the previous frame of data, and can start moving the data of the next frame; , these participating direct memory access components DMA continue to wait until the matching is successful. The invention can effectively improve the row hit rate of the SDRAM, significantly improve the utilization rate of memory access bandwidth and DMA transmission efficiency.

Description

一种用于GPDSP的多核细粒度同步的DMA传输方法A DMA transmission method for multi-core fine-grained synchronization for GPDSP

技术领域technical field

本发明主要涉及到通用数字信号处理器(General Purpose Digital Signal Processor,GPDSP)领域,特指一种适用于GPDSP中直接存储访问部件DMA(Direct Memory Access,DMA)的多核细粒度同步传输方法,以提高DDR写访问效率。The present invention mainly relates to the general purpose digital signal processor (General Purpose Digital Signal Processor, GPDSP) field, specifically refers to a kind of multi-core fine-grained synchronous transmission method suitable for direct memory access component DMA (Direct Memory Access, DMA) in GPDSP, with Improve DDR write access efficiency.

背景技术Background technique

通用数字信号处理器GPDSP(General Purpose Digital Signal Processor,GPDSP)是一种既能保持嵌入式DSP基本特征和高性能低功耗的优势,又可高效支持通用科学计算的新型体系结构,它能够同时提供对64位高性能计算机和嵌入式高精度信号处理的高效支持。GPDSP的应用领域包括科学计算、通信语音、图形图像和可穿戴设备等,这些领域对GPDSP芯片的系统性能提出了更高的要求。但是GPDSP计算性能与访存性能的严重失衡造成的“存储墙”问题已成为继续提升GPDSP处理性能的最主要阻碍。General purpose digital signal processor GPDSP (General Purpose Digital Signal Processor, GPDSP) is a new architecture that can not only maintain the basic characteristics of embedded DSP and the advantages of high performance and low power consumption, but also efficiently support general scientific computing. It can simultaneously Provide efficient support for 64-bit high-performance computers and embedded high-precision signal processing. The application fields of GPDSP include scientific computing, communication voice, graphics and images, and wearable devices, etc. These fields put forward higher requirements for the system performance of GPDSP chips. However, the "storage wall" problem caused by the serious imbalance between GPDSP computing performance and memory access performance has become the main obstacle to continue to improve GPDSP processing performance.

直接存储访问(Direct Memory Access,DMA)是GPDSP结构中缓解“存储墙”问题的一种重要技术,DMA可以在无需计算核干预的情况下,以后台工作方式快速进行核内存储空间和核外存储空间之间的数据搬移。DMA将对核外存储空间的访问操作与程序的计算操作重叠起来,在一定程度上隐藏了访存操作对计算性能的影响。此外,DMA还支持多种灵活可配置的传输模式,可以满足FFT、FIR、HPL(High Performance Linpack)等各种算法和程序对不同数据传输方式的需求。Direct memory access (Direct Memory Access, DMA) is an important technology to alleviate the "storage wall" problem in the GPDSP structure. DMA can quickly perform in-core storage space and out-of-core storage in the background without the intervention of the computing core. Data movement between storage spaces. DMA overlaps the access operation of the external storage space with the calculation operation of the program, which to a certain extent hides the impact of the memory access operation on the calculation performance. In addition, DMA also supports a variety of flexible and configurable transmission modes, which can meet the needs of various algorithms and programs such as FFT, FIR, and HPL (High Performance Linpack) for different data transmission methods.

为有效隐藏访存操作延迟,DMA需提供越来越高的数据搬移速度,以满足DSP持续增长的计算性能的需求。DMA的数据搬移速度在很大程度受限于访问DDR核外存储空间的效率,因此,DMA设计的一个很重要的任务是尽量提升数据搬移过程中的DDR访问效率。DSP核外存储广泛使用DDR3SDRAM部件,DDR3SDRAM内部采用三维结构,其由多个banks组成,每个bank由多个存储行组成,每个存储行包含多个存储单元列。DDR3SDRAM的访问是按照bank号、行地址、列地址进行寻址的,首先根据bank号选中需要访问的bank,之后读取该bank上行地址所指定的存储行到行缓存中,最后根据列地址从行缓存中读取某些存储单元列。In order to effectively hide the latency of memory access operations, DMA needs to provide higher and higher data transfer speeds to meet the ever-increasing computing performance requirements of DSPs. The data transfer speed of DMA is largely limited by the efficiency of accessing the storage space outside the DDR core. Therefore, a very important task of DMA design is to try to improve the DDR access efficiency during the data transfer process. DSP core storage widely uses DDR3SDRAM components, DDR3SDRAM uses a three-dimensional structure, which is composed of multiple banks, each bank is composed of multiple storage rows, and each storage row contains multiple storage cell columns. The access of DDR3SDRAM is addressed according to the bank number, row address, and column address. First, select the bank to be accessed according to the bank number, and then read the storage line specified by the bank’s upstream address into the row cache. Finally, according to the column address from Some memory cell columns are read from the row cache.

DDR3SDRAM存储单元的行访问主要有三种情况:行空、行缺失和行命中。“行空”是指需要访问的bank中当前没有任何激活的存储行,此时,控制器在访问前需要发送一个“行激活”命令将需要访问的存储行读取到行缓存中。“行缺失”是指bank中当前处于激活状态的存储行与需访问的存储行不同,此时控制器需要执行两个命令将存储行读取到行缓存中:“预充电”命令用于关闭当前的激活行,“行激活”命令用于打开需要访问的存储行。“行命中”是指访问的行当前已处于激活状态,即存储行已经位于行缓存中,此时控制器不需要发送任何命令,直接进行列访问即可。因此,在这三种情况中,“行命中”的访问延迟最低,“行缺失”的访问延迟最大。如果DMA数据传输能够有效增加DDR SDRAM的行命中率,则访存延迟就能得到大幅度降低,那么数据传输效率也能得到显著提升。There are three main cases of row access of DDR3 SDRAM storage unit: row empty, row missing and row hit. "Row empty" means that there is currently no active storage row in the bank to be accessed. At this time, the controller needs to send a "row activation" command to read the storage row to be accessed into the row cache before accessing. "Row missing" means that the currently active storage row in the bank is different from the storage row to be accessed. At this time, the controller needs to execute two commands to read the storage row into the row cache: the "precharge" command is used to turn off The current active row, the "Row Activation" command is used to open the storage row that needs to be accessed. "Row hit" means that the accessed row is currently active, that is, the storage row is already in the row cache. At this time, the controller does not need to send any commands, and can directly access the column. Therefore, among the three cases, the access latency of "row hit" is the lowest, and that of "row miss" is the largest. If DMA data transmission can effectively increase the row hit rate of DDR SDRAM, the memory access delay can be greatly reduced, and the data transmission efficiency can also be significantly improved.

在多核处理器中,当一个计算任务被划分成多个子任务在多个处理核上执行时,这些处理核访问的数据在DDR地址空间中是连续的。也就是说,如果这些核能够有效同步进行DDR数据的传输,它们访问的很大一部分数据可能会连续落在同一个DDR SDRAM行中,获得了较高的DDR SDRAM行命中率。但是,传统的DMA结构并没有提供多核同步数据搬移的功能,此时由于每个处理核执行状态的不同,它们各自的DMA在将数据搬移到DDR SDRAM时,数据很可能乱序到达,降低了DDR SDRAM的行命中率,限制了DMA的数据搬移效率。In a multi-core processor, when a computing task is divided into multiple subtasks to be executed on multiple processing cores, the data accessed by these processing cores is continuous in the DDR address space. That is to say, if these cores can efficiently and synchronously transmit DDR data, a large part of the data they access may continuously fall in the same DDR SDRAM row, and a higher DDR SDRAM row hit rate is obtained. However, the traditional DMA structure does not provide the function of multi-core synchronous data transfer. At this time, due to the different execution states of each processing core, when their respective DMAs transfer data to DDR SDRAM, the data may arrive out of order, reducing the The row hit rate of DDR SDRAM limits the data transfer efficiency of DMA.

发明内容Contents of the invention

本发明要解决的技术问题就在于:针对现有技术存在的技术问题,本发明提供一种可有效提升SDRAM的行命中率、提高访存带宽的利用率和DMA传输效率的用于GPDSP的多核细粒度同步的DMA传输方法。The technical problem to be solved by the present invention is: for the technical problems existing in the prior art, the present invention provides a multi-core GPDSP that can effectively improve the line hit rate of SDRAM, improve the utilization rate of memory access bandwidth and DMA transmission efficiency. Fine-grained synchronized DMA transfer method.

为解决上述技术问题,本发明采用以下技术方案:In order to solve the problems of the technologies described above, the present invention adopts the following technical solutions:

一种用于GPDSP的多核细粒度同步的DMA传输方法,每个参与多核细粒度同步传输的直接存储访问部件DMA在传输完一帧数据后都会将一个本地帧传输结束信号发送到全局同步寄存器;全局同步寄存器将来自多个核的结束信号整合成一个多位宽的全局帧传输结束信号;每个直接存储访问部件DMA检查自己所配置的参与多核细粒度同步传输的核列表参数是否与接收到全局帧传输结束信号一致;如果一致,则表明所有参与直接存储访问部件DMA都完成了上一帧数据的传输,可以开始进行下一帧数据的搬移;如果不一致,则表明依然有参与直接存储访问部件DMA没有完成上一帧数据的搬移,这些参与的直接存储访问部件DMA都继续等待,直到匹配成功。A DMA transmission method for multi-core fine-grained synchronization of GPDSP, each direct storage access component DMA participating in multi-core fine-grained synchronous transmission will send a local frame transmission end signal to the global synchronization register after transmitting a frame of data; The global synchronization register integrates the end signals from multiple cores into a multi-bit wide global frame transmission end signal; each direct storage access component DMA checks whether the core list parameters configured by itself to participate in multi-core fine-grained synchronous transmission are consistent with the received The global frame transmission end signal is consistent; if consistent, it indicates that all DMA components participating in direct storage access have completed the transmission of the previous frame of data, and can start moving the next frame of data; if not consistent, it indicates that there is still participation in direct storage access The component DMA has not completed the transfer of the last frame of data, and these participating direct storage access component DMAs continue to wait until the matching is successful.

作为本发明的进一步改进:所述全局同步寄存器通过连线直接将全局帧传输结束信号发送到每个核的DMA控制器。As a further improvement of the present invention: the global synchronization register directly sends the global frame transmission end signal to the DMA controller of each core through a connection.

作为本发明的进一步改进:在启动多核细粒度同步传输前,配置基本传输参数以及确定哪些核参与多核细粒度同步传输。As a further improvement of the present invention: before starting the multi-core fine-grained synchronous transmission, configure basic transmission parameters and determine which cores participate in the multi-core fine-grained synchronous transmission.

作为本发明的进一步改进:所述直接存储访问部件DMA通过专用总线与标量存储器SM、向量存储器AM、ET调试部件和节点访问控制器NAC相连;所述直接存储访问部件DMA的主动数据传输请求通过两条通用通道发出,四条专用通道用于处理直接存储访问部件DMA被动接收到的数据传输请求。As a further improvement of the present invention: the direct storage access unit DMA is connected to the scalar memory SM, the vector memory AM, the ET debugging unit and the node access controller NAC through a dedicated bus; the active data transmission request of the direct storage access unit DMA passes through Two general-purpose channels are sent out, and four dedicated channels are used to process data transmission requests passively received by the direct storage access component DMA.

作为本发明的进一步改进:所述直接存储访问部件DMA以二维数据块格式描述需要传输的数据,所述二维数据块由多个数据帧组成,每个数据帧由多个地址连续的数据单元组成,数据单元大小等于芯片位宽。As a further improvement of the present invention: the direct storage access unit DMA describes the data to be transmitted in a two-dimensional data block format, the two-dimensional data block is composed of multiple data frames, and each data frame is composed of multiple data with continuous addresses Composed of units, the size of the data unit is equal to the bit width of the chip.

作为本发明的进一步改进:所述全局同步寄存器位于GPDSP核外,每个GPDSP内核有1位信号线LOver输出到全局同步寄存器,所述信号线Lover用来传输每个GPDSP内核的直接存储访问部件DMA的本地帧传输结束信号;所述全局同步寄存器将从所有GPDSP内核输入的LOver信号整合成一个全局帧传输结束信号GOver,所述G全局帧传输结束信号Over信号输入到每个GPDSP内核。As a further improvement of the present invention: the global synchronization register is located outside the GPDSP core, and each GPDSP core has a 1-bit signal line LOver output to the global synchronization register, and the signal line Lover is used to transmit the direct storage access components of each GPDSP core The local frame transmission end signal of DMA; the global synchronization register integrates the LOver signal input from all GPDSP cores into a global frame transmission end signal GOver, and the G global frame transmission end signal Over signal is input to each GPDSP core.

作为本发明的进一步改进:所述传输方法的具体流程为:As a further improvement of the present invention: the specific flow of the transmission method is:

S1:开始多核细粒度同步传输;S1: Start multi-core fine-grained synchronous transmission;

S2:发送读请求;判断是否为帧最后一个数据,如为否,则返回重新发送读请求;如为是,执行步骤S3;S2: Send a read request; judge whether it is the last data of the frame, if not, return and resend the read request; if yes, execute step S3;

S3:判断是否为最后一帧,如为否,则执行步骤S4;如为是,则结束多核细粒度同步传输;S3: judge whether it is the last frame, if no, execute step S4; if yes, end multi-core fine-grained synchronous transmission;

S4:置位本地帧传输结束信号,该信号用于置位全局同步寄存器的相应位;匹配全局帧传输结束信号,即将输入的全局帧传输结束信号与所配置的参与细粒度同步传输的核列表参数进行匹配;若匹配成功则执行步骤S5,如匹配不成功,则返回重新进行匹配全局帧传输结束信号,即表明还有参与直接存储访问部件DMA没有完成上一帧数据的传输,此时每个直接存储访问部件DMA需要继续等待,直到匹配成功;S4: Set the local frame transmission end signal, which is used to set the corresponding bit of the global synchronization register; match the global frame transmission end signal, that is, the input global frame transmission end signal and the configured core list participating in the fine-grained synchronous transmission Parameters are matched; if the matching is successful, then step S5 is performed, and if the matching is unsuccessful, then return to re-matching the global frame transmission end signal, which means that there is still participation in the direct storage access component DMA that has not completed the transmission of the previous frame of data. A direct storage access component DMA needs to continue to wait until the matching is successful;

S5:清除本地帧传输结束信号后,返回步骤S2;即每个直接存储访问部件DMA都将自己的本地帧传输结束信号拉低,同时可以开始进行下一帧数据的传输;S5: After clearing the local frame transmission end signal, return to step S2; that is, each direct storage access component DMA pulls its own local frame transmission end signal low, and can start the transmission of the next frame of data at the same time;

S6:按照上述方式反复执行,直到完成所有数据帧的传输。S6: Execute repeatedly according to the above method until the transmission of all data frames is completed.

与现有技术相比,本发明的优点在于:本发明的用于GPDSP的多核细粒度同步的DMA传输方法,克服了多核处理器传统DMA传输方法在对DDR SDRAM执行写操作时存在的不足,通过同步多个核同时写外存时的同一帧,可有效提升SDRAM的行命中率,显著提高访存带宽的利用率和DMA的传输效率。Compared with prior art, advantage of the present invention is: the multi-core fine-grained synchronous DMA transfer method for GPDSP of the present invention overcomes the deficiency that the traditional DMA transfer method of multi-core processor exists when DDR SDRAM is written, By synchronizing the same frame when multiple cores write external memory at the same time, the line hit rate of SDRAM can be effectively improved, and the utilization rate of memory access bandwidth and the transmission efficiency of DMA can be significantly improved.

附图说明Description of drawings

图1是本发明在具体应用实例中GPDSP节点的整体框架示意图。FIG. 1 is a schematic diagram of the overall framework of the GPDSP node in a specific application example of the present invention.

图2是本发明在具体应用实例中DMA整体结构的示意图。Fig. 2 is a schematic diagram of the overall structure of the DMA in a specific application example of the present invention.

图3是本发明在具体应用实例中DMA传输的二维数据块的示意图。Fig. 3 is a schematic diagram of a two-dimensional data block transferred by DMA in a specific application example of the present invention.

图4是本发明方法在具体应用实例中的流程示意图。Fig. 4 is a schematic flow chart of the method of the present invention in a specific application example.

图5是本发明在具体应用实例中DMA控制器与全局同步寄存器的连接关系示意图。FIG. 5 is a schematic diagram of the connection relationship between the DMA controller and the global synchronization register in a specific application example of the present invention.

图6是本发明在具体应用实例中DMA控制器与全局同步寄存器的接口时序示意图。FIG. 6 is a schematic diagram of the interface sequence between the DMA controller and the global synchronization register in a specific application example of the present invention.

具体实施方式Detailed ways

以下将结合说明书附图和具体实施例对本发明做进一步详细说明。The present invention will be further described in detail below in conjunction with the accompanying drawings and specific embodiments.

在本发明的一个具体应用实例中,包括DMA控制器和全局同步寄存器。DMA控制器存在于通用数字信号处理器(General Purpose Digital Signal Processor,GPDSP)的核内,DMA全局寄存器存在于GPDSP的核外,所有的DMA控制器与全局同步寄存器直接相连。本发明的DMA控制器在进行数据搬移时,传输的一端是GPDSP核内存储资源,另外一端是核外存储资源DDRSDRAM。每次启动直接存储访问部件DMA前,外设配置总线(PBUS)会将传输参数写入DMA参数RAM,参数将直接存储访问部件DMA所传输的数据描述成一个二维数据块。该二维数据块由多个数据帧组成,每个数据帧由多个数据单元组成。即,每个内核包含一个DMA控制器,所有内核共享一个全局同步寄存器。全局同步寄存器GSynReg位于芯片顶层,与所有DMA相连,全局同步寄存器GSynReg为每个DMA分配一位,每个DMA可以置位和清除自己相应位,同时每个DMA都可以读取全局同步寄存器GSynReg所有位的值。In a specific application example of the present invention, a DMA controller and a global synchronization register are included. The DMA controller exists in the core of the General Purpose Digital Signal Processor (GPDSP), the DMA global register exists outside the core of the GPDSP, and all DMA controllers are directly connected to the global synchronization register. When the DMA controller of the present invention moves data, one end of the transmission is the storage resource in the GPDSP core, and the other end is the storage resource DDR SDRAM outside the core. Before starting the direct storage access component DMA each time, the peripheral configuration bus (PBUS) will write the transmission parameters into the DMA parameter RAM, and the parameters describe the data transmitted by the direct storage access component DMA as a two-dimensional data block. The two-dimensional data block is composed of multiple data frames, and each data frame is composed of multiple data units. That is, each core contains a DMA controller, and all cores share a global synchronization register. The global synchronization register GSynReg is located on the top layer of the chip and is connected to all DMAs. The global synchronization register GSynReg allocates one bit for each DMA. Each DMA can set and clear its own corresponding bit. At the same time, each DMA can read all the global synchronization registers GSynReg. bit value.

全局同步寄存器与所有直接存储访问部件DMA相连,接收多个核的帧传输结束信号。全局同步寄存器将这些帧传输结束信号整合成一个多位信号发送到GPDSP的每个内核。GPDSP内核根据这个多位信号来判断是否所有参与细粒度同步数据传输的GPDSP核都已经完成上一帧数据的传输。如果已完成,那么所有参与多核细粒度同步数据传输的核都可以进行下一帧数据的读取。否则,每个GPDSP核继续等待,直到所有参与多核细粒度同步数据传输事务的GPDSP核都完成上一帧数据的传输。The global synchronous register is connected with all direct storage access parts DMA, and receives the frame transmission end signal of multiple cores. The global synchronization register integrates these end-of-frame transmission signals into a multi-bit signal and sends it to each core of the GPDSP. The GPDSP core judges whether all GPDSP cores participating in fine-grained synchronous data transmission have completed the transmission of the previous frame of data according to this multi-bit signal. If completed, all cores participating in multi-core fine-grained synchronous data transmission can read the next frame of data. Otherwise, each GPDSP core continues to wait until all GPDSP cores participating in multi-core fine-grained synchronous data transmission transactions have completed the transmission of the last frame of data.

本发明方法的基本原理为:在启动多核细粒度同步传输前,除配置基本传输参数外,还需配置指定哪些核参与多核细粒度同步传输的参数SynCoreList。SynCoreList的位宽与芯片核数相同,若第j位被置为高电平,则表示核j参与多核细粒度同步传输。每个参与多核细粒度同步传输的直接存储访问部件DMA在传输完一帧数据后都会将一个一位本地帧传输结束信号发送到全局同步寄存器GSynReg,该信号用于置位全局同步寄存器的相应位。全局同步寄存器将来自多个核的结束信号整合成一个多位宽的全局帧传输结束信号,这个信号通过连线直接发送到每个核的DMA控制器。每个直接存储访问部件DMA检查自己所配置的参与多核细粒度同步传输的核列表参数是否与接收到全局帧传输结束信号一致。如果一致,则表明所有参与直接存储访问部件DMA都完成了上一帧数据的传输(搬移),它们可以开始进行下一帧数据的搬移。如果不一致,则表明依然有参与直接存储访问部件DMA没有完成上一帧数据的搬移,因此,这些参与的直接存储访问部件DMA都继续等待,直到匹配成功,即直到所有参与DMA都完成上一帧数据的搬移。按照上述方式反复执行,直到完成所有数据帧的传输。The basic principle of the method of the present invention is: before starting the multi-core fine-grained synchronous transmission, in addition to configuring the basic transmission parameters, it is also necessary to configure the parameter SynCoreList specifying which cores participate in the multi-core fine-grained synchronous transmission. The bit width of SynCoreList is the same as the number of chip cores. If the jth bit is set to high level, it means that core j participates in multi-core fine-grained synchronous transmission. Each direct memory access component DMA that participates in multi-core fine-grained synchronous transmission will send a one-bit local frame transmission end signal to the global synchronization register GSynReg after transmitting a frame of data, which is used to set the corresponding bit of the global synchronization register . The global synchronization register integrates the end signals from multiple cores into a multi-bit wide global end-of-frame signal, which is wired directly to each core's DMA controller. Each direct memory access component DMA checks whether the parameters of the list of cores configured by itself to participate in the multi-core fine-grained synchronous transmission are consistent with the received global frame transmission end signal. If they are consistent, it indicates that all DMAs participating in the direct memory access unit have completed the transmission (movement) of the previous frame of data, and they can start to move the data of the next frame. If they are inconsistent, it means that there are still participating direct storage access parts DMAs that have not completed the transfer of the previous frame of data. Therefore, these participating direct storage access parts DMAs continue to wait until the matching is successful, that is, until all participating DMAs have completed the previous frame Data movement. Execute repeatedly in the above manner until the transmission of all data frames is completed.

举例说明,假设参与核间同步的直接存储访问部件DMA集合为[DMA0,DMA1,…,DMAk],每个直接存储访问部件DMA开始搬运下一帧数据前必须要等待所有参与核间同步的直接存储访问部件DMA将上一帧数据搬运完。比如说,DMAi(0<=i<=k)开始搬运第二帧的条件是DMA0,DMA1,…,DMAk都完成了第一帧的搬运。针对一个大任务被划分到多个处理核上并行执行的情况,当配置DMA参数时,基本可以保证多个核同步传输的每一帧数据写入DDR SDRAM的同一行。此时,相比于传统DMA传输方法,本发明的技术方案可以降低DDR SDRAM换行次数,显著提升DDR SDRAM的行命中率,从而达到提高DMA数据传输效率的目的。For example, assuming that the DMA set of direct memory access units participating in inter-core synchronization is [DMA 0 , DMA 1 ,…,DMA k ], each direct memory access unit DMA must wait for all participating inter-core The synchronous direct storage access component DMA completes the transfer of the previous frame of data. For example, the condition for DMA i (0<=i<=k) to start transferring the second frame is that DMA 0 , DMA 1 , . . . , DMA k have all completed transferring the first frame. For the case where a large task is divided into multiple processing cores for parallel execution, when configuring DMA parameters, it can basically ensure that each frame of data synchronously transmitted by multiple cores is written to the same row of DDR SDRAM. At this time, compared with the traditional DMA transmission method, the technical solution of the present invention can reduce the number of line feeds of DDR SDRAM, significantly improve the line hit rate of DDR SDRAM, thereby achieving the purpose of improving the efficiency of DMA data transmission.

如图4所示,是本发明在具体应用实例中进行多核细粒度核间同步传输的流程图。如图所示,本发明在具体应用时的具体流程为:As shown in FIG. 4 , it is a flow chart of performing multi-core fine-grained inter-core synchronous transmission in a specific application example of the present invention. As shown in the figure, the specific process of the present invention in specific application is:

S1:开始多核细粒度同步传输;S1: Start multi-core fine-grained synchronous transmission;

S2:发送读请求;判断是否为帧最后一个数据,如为否,则返回重新发送读请求;如为是,执行步骤S3;S2: Send a read request; judge whether it is the last data of the frame, if not, return and resend the read request; if yes, execute step S3;

S3:判断是否为最后一帧,如为否,则执行步骤S4;如为是,则结束多核细粒度同步传输;S3: judge whether it is the last frame, if no, execute step S4; if yes, end multi-core fine-grained synchronous transmission;

S4:置位本地帧传输结束信号,该信号用于置位全局同步寄存器的相应位;匹配全局帧传输结束信号,即将输入的全局帧传输结束信号与所配置的参与细粒度同步传输的核列表参数进行匹配;若匹配成功则执行步骤S5,如匹配不成功,则返回重新进行匹配全局帧传输结束信号,即表明还有参与直接存储访问部件DMA没有完成上一帧数据的传输,此时每个直接存储访问部件DMA需要继续等待,直到匹配成功。S4: Set the local frame transmission end signal, which is used to set the corresponding bit of the global synchronization register; match the global frame transmission end signal, that is, the input global frame transmission end signal and the configured core list participating in the fine-grained synchronous transmission Parameters are matched; if the matching is successful, then step S5 is performed, and if the matching is unsuccessful, then return to re-matching the global frame transmission end signal, which means that there is still participation in the direct storage access component DMA that has not completed the transmission of the previous frame of data. A direct storage access component DMA needs to continue to wait until the matching is successful.

S5:清除本地帧传输结束信号后,返回步骤S2;即每个直接存储访问部件DMA都将自己的本地帧传输结束信号拉低,同时可以开始进行下一帧数据的传输。S5: After clearing the local frame transmission end signal, return to step S2; that is, each direct memory access unit DMA pulls its own local frame transmission end signal low, and at the same time can start the transmission of the next frame of data.

S6:按照上述方式反复执行,直到完成所有数据帧的传输。S6: Execute repeatedly according to the above method until the transmission of all data frames is completed.

如图1所示,为本发明在具体应用实例中的GPDSP节点整体框架示意图。每个GPDSP核内包含一个标量处理器单元SPU和一个向量处理单元VPU,标量处理单元SPU内部包含一个用于存储标量操作数的标量存储器SM,向量处理单元内部包含一个用于存储向量操作数的向量存储器AM。直接存储访问部件DMA也位于GPDSP核内,它主要用于完成核内存储资源AM、SM与核外存储资源DDR3SDRAM之间的数据传输。根据系统的配置,对DDR3SDRAM的访问可以是先经过全局Cache,之后到达DDR3SDRAM,也可以是不经过全局Cache,直接到达DDR3SDRAM。As shown in FIG. 1 , it is a schematic diagram of the overall framework of the GPDSP node in a specific application example of the present invention. Each GPDSP core contains a scalar processor unit SPU and a vector processing unit VPU. The scalar processing unit SPU contains a scalar memory SM for storing scalar operands. The vector processing unit contains a memory SM for storing vector operands. Vector memory AM. The direct storage access component DMA is also located in the GPDSP core, and it is mainly used to complete the data transmission between the core storage resources AM, SM and the external storage resource DDR3SDRAM. According to the configuration of the system, the access to DDR3SDRAM can go through the global Cache first, and then reach the DDR3SDRAM, or it can directly reach the DDR3SDRAM without going through the global Cache.

如图2所示,为本发明在具体应用实例中直接存储访问部件DMA的整体结构示意图。直接存储访问部件DMA通过专用总线与标量存储器SM、向量存储器AM、ET调试部件和节点访问控制器NAC相连。直接存储访问部件DMA的主动数据传输请求通过两条通用通道发出,四条专用通道用于处理直接存储访问部件DMA被动接收到的数据传输请求。每个直接存储访问部件DMA主动传输数据前都需要从标量处理单元SPU接收传输配置参数,这些配置参数用来指示需要搬移的数据块的格式、源地址、目的地址等。为了支持多核细粒度同步数据传输,还需要配置指定哪些核参与细粒度同步传输的参数。As shown in FIG. 2 , it is a schematic diagram of the overall structure of the direct storage access unit DMA in a specific application example of the present invention. Direct storage access unit DMA is connected with scalar memory SM, vector memory AM, ET debugging unit and node access controller NAC through a dedicated bus. The active data transmission request of the direct storage access unit DMA is sent through two general channels, and the four dedicated channels are used to process the data transmission requests passively received by the direct storage access unit DMA. Before each direct memory access unit DMA actively transmits data, it needs to receive transmission configuration parameters from the scalar processing unit SPU, and these configuration parameters are used to indicate the format, source address, destination address, etc. of the data block to be moved. In order to support multi-core fine-grained synchronous data transmission, it is also necessary to configure parameters specifying which cores participate in fine-grained synchronous transmission.

直接存储访问部件DMA以二维数据块格式描述需要传输的数据,如图3所示,为本发明在具体应用实例中DMA传输的二维数据块示意图。二维数据块由多个数据帧组成,相邻数据帧之间的地址可以是连续的,也可以是不连续的。每个数据帧由多个地址连续的数据单元组成,数据单元大小等于芯片位宽,即64bits。图3中的数据块包含K个数据帧,每个数据帧包含m个数据单元。The direct storage access component DMA describes the data to be transmitted in a two-dimensional data block format, as shown in FIG. 3 , which is a schematic diagram of a two-dimensional data block transmitted by DMA in a specific application example of the present invention. A two-dimensional data block is composed of multiple data frames, and addresses between adjacent data frames can be continuous or discontinuous. Each data frame is composed of multiple data units with consecutive addresses, and the size of the data unit is equal to the bit width of the chip, that is, 64 bits. The data block in FIG. 3 includes K data frames, and each data frame includes m data units.

如图5所示,是本发明在具体应用实例中DMA控制器与全局同步寄存器的连接关系示意图。全局同步寄存器GSynReg位于GPDSP核外,每个GPDSP内核有1位信号线LOver输出到全局同步寄存器GSynReg,LOver信号线传输每个GPDSP内核的直接存储访问部件DMA的本地帧传输结束信号。全局同步寄存器GSynReg将从K+1个GPDSP内核输入的LOver信号整合成一个K+1位的全局帧传输结束信号GOver,GOver信号输入到每个GPDSP内核。As shown in FIG. 5 , it is a schematic diagram of the connection relationship between the DMA controller and the global synchronization register in a specific application example of the present invention. The global synchronization register GSynReg is located outside the GPDSP core. Each GPDSP core has a 1-bit signal line LOver output to the global synchronization register GSynReg. The LOver signal line transmits the local frame transmission end signal of the direct storage access component DMA of each GPDSP core. The global synchronization register GSynReg integrates the LOver signals input from K+1 GPDSP cores into a K+1-bit global frame transmission end signal GOver, and the GOver signal is input to each GPDSP core.

如图6所示,是本发明在具体应用实例中DMA控制器与全局同步寄存器的接口时序示意图。在该实施例中,假设12个核参与多核细粒度同步数据传输,假设Cycle0时全局同步寄存器GSynReg值为12'hFFE,同时某个直接存储访问部DMA传输完一帧数据,该直接存储访问部件DMA会置位本地帧传输结束信号LOver。本地帧传输结束信号LOver首先传递到全局同步寄存器GSynReg,全局同步寄存器将多个内核的LOver信号组合成一个多位全局帧传输结束信号GOver发送到每个直接存储访问部件DMA。假设上述信号传输过程共需要N个cycles,在经过N个cycles后,直接存储访问部件DMA接收到全局帧同步结束信号值为12'hFFF,与参与细粒度同步传输的核列表信号匹配成功,直接存储访问部件DMA拉低本地帧传输结束信号LOver,开始进行下一帧数据的传输。为了防止出现多次匹配的情况,在直接存储访问部件DMA拉低本地帧传输结束信号LOver之后的N个cycles内不进行匹配操作。如此循环往复,直到完成所有数据帧的传输。As shown in FIG. 6 , it is a schematic diagram of the interface sequence between the DMA controller and the global synchronization register in a specific application example of the present invention. In this embodiment, assuming that 12 cores participate in multi-core fine-grained synchronous data transmission, assuming that the value of the global synchronization register GSynReg is 12'hFFE at Cycle 0, and a certain direct storage access unit DMA transfers a frame of data, the direct storage access unit The DMA will set the local frame transmission end signal LOver. The local frame transmission end signal LOver is first passed to the global synchronization register GSynReg, and the global synchronization register combines the LOver signals of multiple cores into a multi-bit global frame transmission end signal GOver and sends it to each direct storage access component DMA. Assuming that the above signal transmission process requires a total of N cycles, after N cycles, the direct storage access component DMA receives the global frame synchronization end signal with a value of 12'hFFF, which successfully matches the core list signal participating in the fine-grained synchronous transmission, and directly The storage access unit DMA pulls down the local frame transmission end signal LOver, and starts to transmit the next frame of data. In order to prevent multiple matchings, no matching operation is performed within N cycles after the direct memory access component DMA pulls down the local frame transmission end signal LOver. This cycle repeats until the transmission of all data frames is completed.

以上仅是本发明的优选实施方式,本发明的保护范围并不仅局限于上述实施例,凡属于本发明思路下的技术方案均属于本发明的保护范围。应当指出,对于本技术领域的普通技术人员来说,在不脱离本发明原理前提下的若干改进和润饰,应视为本发明的保护范围。The above are only preferred implementations of the present invention, and the protection scope of the present invention is not limited to the above-mentioned embodiments, and all technical solutions under the idea of the present invention belong to the protection scope of the present invention. It should be pointed out that for those skilled in the art, some improvements and modifications without departing from the principle of the present invention should be regarded as the protection scope of the present invention.

Claims (7)

1.一种用于GPDSP的多核细粒度同步的DMA传输方法,其特征在于,每个参与多核细粒度同步传输的直接存储访问部件DMA在传输完一帧数据后都会将一个本地帧传输结束信号发送到全局同步寄存器;全局同步寄存器将来自多个核的结束信号整合成一个多位宽的全局帧传输结束信号;每个直接存储访问部件DMA检查自己所配置的参与多核细粒度同步传输的核列表参数是否与接收到全局帧传输结束信号一致;如果一致,则表明所有参与直接存储访问部件DMA都完成了上一帧数据的传输,可以开始进行下一帧数据的搬移;如果不一致,则表明依然有参与直接存储访问部件DMA没有完成上一帧数据的搬移,这些参与的直接存储访问部件DMA都继续等待,直到匹配成功。1. A kind of DMA transmission method of the multi-core fine-grained synchronization that is used for GPDSP, it is characterized in that, each direct storage access part DMA that participates in multi-core fine-grained synchronous transmission will send a local frame transmission end signal after having transmitted a frame of data Send to the global synchronization register; the global synchronization register integrates the end signals from multiple cores into a multi-bit wide global frame transmission end signal; each direct storage access component DMA checks the cores configured by itself to participate in the multi-core fine-grained synchronous transmission Whether the list parameter is consistent with the received global frame transmission end signal; if consistent, it indicates that all participating direct storage access components DMA have completed the transmission of the previous frame of data, and can start to move the next frame of data; if not consistent, it indicates There are still participating direct storage access components DMA that have not completed the transfer of the last frame of data, and these participating direct storage access components DMA continue to wait until the matching is successful. 2.根据权利要求1所述的用于GPDSP的多核细粒度同步的DMA传输方法,其特征在于,所述全局同步寄存器通过连线直接将全局帧传输结束信号发送到每个核的DMA控制器。2. the DMA transmission method that is used for the multi-core fine-grained synchronization of GPDSP according to claim 1, is characterized in that, described global synchronous register directly sends global frame transmission end signal to the DMA controller of each core by wiring . 3.根据权利要求1所述的用于GPDSP的多核细粒度同步的DMA传输方法,其特征在于,在启动多核细粒度同步传输前,配置基本传输参数以及确定哪些核参与多核细粒度同步传输。3. the DMA transfer method for the multi-core fine-grained synchronous transmission of GPDSP according to claim 1, is characterized in that, before starting the multi-core fine-grained synchronous transmission, configure basic transmission parameters and determine which cores participate in the multi-core fine-grained synchronous transmission. 4.根据权利要求1所述的用于GPDSP的多核细粒度同步的DMA传输方法,其特征在于,所述直接存储访问部件DMA通过专用总线与标量存储器SM、向量存储器AM、ET调试部件和节点访问控制器NAC相连;所述直接存储访问部件DMA的主动数据传输请求通过两条通用通道发出,四条专用通道用于处理直接存储访问部件DMA被动接收到的数据传输请求。4. the DMA transmission method that is used for the multicore fine-grained synchronization of GPDSP according to claim 1, is characterized in that, described direct storage access part DMA is with scalar memory SM, vector memory AM, ET debugging part and node by dedicated bus The access controller NAC is connected; the active data transmission request of the direct storage access unit DMA is sent through two general channels, and the four dedicated channels are used to process the data transmission requests passively received by the direct storage access unit DMA. 5.根据权利要求1所述的用于GPDSP的多核细粒度同步的DMA传输方法,其特征在于,所述直接存储访问部件DMA以二维数据块格式描述需要传输的数据,所述二维数据块由多个数据帧组成,每个数据帧由多个地址连续的数据单元组成,数据单元大小等于芯片位宽。5. the DMA transmission method that is used for the multi-core fine-grained synchronization of GPDSP according to claim 1, is characterized in that, described direct storage access part DMA describes the data that needs to transmit with two-dimensional data block format, and described two-dimensional data A block is composed of multiple data frames, and each data frame is composed of multiple data units with consecutive addresses, and the size of the data unit is equal to the bit width of the chip. 6.根据权利要求1所述的用于GPDSP的多核细粒度同步的DMA传输方法,其特征在于,所述全局同步寄存器位于GPDSP核外,每个GPDSP内核有1位信号线LOver输出到全局同步寄存器,所述信号线Lover用来传输每个GPDSP内核的直接存储访问部件DMA的本地帧传输结束信号;所述全局同步寄存器将从所有GPDSP内核输入的LOver信号整合成一个全局帧传输结束信号GOver,所述G全局帧传输结束信号Over信号输入到每个GPDSP内核。6. the DMA transmission method that is used for the multicore fine-grained synchronization of GPDSP according to claim 1, is characterized in that, described global synchronous register is positioned at outside the GPDSP core, and each GPDSP core has 1 bit signal line LOver to export to global synchronous Register, the signal line Lover is used to transmit the local frame transmission end signal of the direct storage access part DMA of each GPDSP core; the global synchronization register integrates the LOver signal input from all GPDSP cores into a global frame transmission end signal GOver , the G global frame transmission end signal Over signal is input to each GPDSP core. 7.根据权利要求1~6中任意一项所述的用于GPDSP的多核细粒度同步的DMA传输方法,其特征在于,所述传输方法的具体流程为:7. according to the DMA transmission method of the multi-core fine-grained synchronization that is used for GPDSP according to any one of claims 1~6, it is characterized in that, the concrete process of described transmission method is: S1:开始多核细粒度同步传输;S1: Start multi-core fine-grained synchronous transmission; S2:发送读请求;判断是否为帧最后一个数据,如为否,则返回重新发送读请求;如为是,执行步骤S3;S2: Send a read request; judge whether it is the last data of the frame, if not, return and resend the read request; if yes, execute step S3; S3:判断是否为最后一帧,如为否,则执行步骤S4;如为是,则结束多核细粒度同步传输;S3: judge whether it is the last frame, if no, execute step S4; if yes, end multi-core fine-grained synchronous transmission; S4:置位本地帧传输结束信号,该信号用于置位全局同步寄存器的相应位;匹配全局帧传输结束信号,即将输入的全局帧传输结束信号与所配置的参与细粒度同步传输的核列表参数进行匹配;若匹配成功则执行步骤S5,如匹配不成功,则返回重新进行匹配全局帧传输结束信号,即表明还有参与直接存储访问部件DMA没有完成上一帧数据的传输,此时每个直接存储访问部件DMA需要继续等待,直到匹配成功;S4: Set the local frame transmission end signal, which is used to set the corresponding bit of the global synchronization register; match the global frame transmission end signal, that is, the input global frame transmission end signal and the configured core list participating in the fine-grained synchronous transmission Parameters are matched; if the matching is successful, then step S5 is performed, and if the matching is unsuccessful, then return to re-matching the global frame transmission end signal, which means that there is still participation in the direct storage access component DMA that has not completed the transmission of the previous frame of data. A direct storage access component DMA needs to continue to wait until the matching is successful; S5:清除本地帧传输结束信号后,返回步骤S2;即每个直接存储访问部件DMA都将自己的本地帧传输结束信号拉低,同时可以开始进行下一帧数据的传输;S5: After clearing the local frame transmission end signal, return to step S2; that is, each direct storage access component DMA pulls its own local frame transmission end signal low, and can start the transmission of the next frame of data at the same time; S6:按照上述方式反复执行,直到完成所有数据帧的传输。S6: Execute repeatedly according to the above method until the transmission of all data frames is completed.
CN201510033310.8A 2015-01-22 2015-01-22 A kind of DMA transfer method that multinuclear fine granularity for GPDSP synchronizes Active CN104615557B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510033310.8A CN104615557B (en) 2015-01-22 2015-01-22 A kind of DMA transfer method that multinuclear fine granularity for GPDSP synchronizes

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510033310.8A CN104615557B (en) 2015-01-22 2015-01-22 A kind of DMA transfer method that multinuclear fine granularity for GPDSP synchronizes

Publications (2)

Publication Number Publication Date
CN104615557A true CN104615557A (en) 2015-05-13
CN104615557B CN104615557B (en) 2018-08-21

Family

ID=53150011

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510033310.8A Active CN104615557B (en) 2015-01-22 2015-01-22 A kind of DMA transfer method that multinuclear fine granularity for GPDSP synchronizes

Country Status (1)

Country Link
CN (1) CN104615557B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108062282A (en) * 2017-12-29 2018-05-22 中国人民解放军国防科技大学 DMA data merging transmission method in GPDSP

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5758188A (en) * 1995-11-21 1998-05-26 Quantum Corporation Synchronous DMA burst transfer protocol having the peripheral device toggle the strobe signal such that data is latched using both edges of the strobe signal
CN103714039A (en) * 2013-12-25 2014-04-09 中国人民解放军国防科学技术大学 Universal computing digital signal processor

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5758188A (en) * 1995-11-21 1998-05-26 Quantum Corporation Synchronous DMA burst transfer protocol having the peripheral device toggle the strobe signal such that data is latched using both edges of the strobe signal
CN103714039A (en) * 2013-12-25 2014-04-09 中国人民解放军国防科学技术大学 Universal computing digital signal processor

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
王占立等: "一种支持阻塞分段传输的DMA部件的设计与实现", 《计算机研究与发展》 *
郑挺等: "高性能DSP软核中DMA控制器的设计与验证", 《计算机工程与设计》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108062282A (en) * 2017-12-29 2018-05-22 中国人民解放军国防科技大学 DMA data merging transmission method in GPDSP
CN108062282B (en) * 2017-12-29 2020-01-14 中国人民解放军国防科技大学 DMA data merging transmission method in GPDSP

Also Published As

Publication number Publication date
CN104615557B (en) 2018-08-21

Similar Documents

Publication Publication Date Title
US10216419B2 (en) Direct interface between graphics processing unit and data storage unit
US11276459B2 (en) Memory die including local processor and global processor, memory device, and electronic device
CN104699631B (en) It is multi-level in GPDSP to cooperate with and shared storage device and access method
CN112463719A (en) In-memory computing method realized based on coarse-grained reconfigurable array
CN107391400B (en) A memory expansion method and system supporting complex memory access instructions
KR102219545B1 (en) Mid-thread pre-emption with software assisted context switch
CN111538679B (en) Processor data prefetching method based on embedded DMA
CN110647480A (en) Data processing method, remote direct memory access network card and equipment
CN104317770B (en) Data store organisation for many-core processing system and data access method
CN103744644B (en) The four core processor systems built using four nuclear structures and method for interchanging data
CN105573959A (en) Computation and storage integrated distributed computer architecture
TW201423600A (en) Technique for improving performance in multi-threaded processing units
EP4060505A1 (en) Techniques for near data acceleration for a multi-core architecture
CN105389277A (en) Scientific computation-oriented high performance DMA (Direct Memory Access) part in GPDSP (General-Purpose Digital Signal Processor)
CN112527729B (en) A tightly coupled heterogeneous multi-core processor architecture and processing method thereof
CN104699641A (en) EDMA (enhanced direct memory access) controller concurrent control method in multinuclear DSP (digital signal processor) system
CN104820659B (en) A kind of multi-mode dynamic towards coarseness reconfigurable system can match somebody with somebody high speed memory access interface
CN109739785A (en) The interconnect structure of multi-core systems
CN108234147B (en) DMA Broadcast Data Transmission Method Based on Host Counting in GPDSP
CN100357932C (en) Method for decreasing data access delay in stream processor
CN104615557B (en) A kind of DMA transfer method that multinuclear fine granularity for GPDSP synchronizes
CN120344964A (en) Fusion data generation and associated communications
CN106569968A (en) Inter-array data transmission structure and scheduling method used for reconfigurable processor
CN106201931A (en) A kind of hypervelocity matrix operations coprocessor system
CN107678781A (en) Processor and the method for execute instruction on a processor

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant