[go: up one dir, main page]

CN112597096B - A low-power FPGA partially reconfigurable method and device - Google Patents

A low-power FPGA partially reconfigurable method and device Download PDF

Info

Publication number
CN112597096B
CN112597096B CN202011478343.0A CN202011478343A CN112597096B CN 112597096 B CN112597096 B CN 112597096B CN 202011478343 A CN202011478343 A CN 202011478343A CN 112597096 B CN112597096 B CN 112597096B
Authority
CN
China
Prior art keywords
logic
fpga
area
dynamic
static area
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011478343.0A
Other languages
Chinese (zh)
Other versions
CN112597096A (en
Inventor
张科
齐乐
陈明宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Computing Technology of CAS
Original Assignee
Institute of Computing Technology of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Computing Technology of CAS filed Critical Institute of Computing Technology of CAS
Priority to CN202011478343.0A priority Critical patent/CN112597096B/en
Publication of CN112597096A publication Critical patent/CN112597096A/en
Application granted granted Critical
Publication of CN112597096B publication Critical patent/CN112597096B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/76Architectures of general purpose stored program computers
    • G06F15/78Architectures of general purpose stored program computers comprising a single central processing unit
    • G06F15/7867Architectures of general purpose stored program computers comprising a single central processing unit with reconfigurable architecture
    • G06F15/7871Reconfiguration support, e.g. configuration loading, configuration switching, or hardware OS
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F1/00Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
    • G06F1/26Power supply means, e.g. regulation thereof
    • G06F1/32Means for saving power
    • G06F1/3203Power management, i.e. event-based initiation of a power-saving mode
    • G06F1/3234Power saving characterised by the action undertaken
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F1/00Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
    • G06F1/26Power supply means, e.g. regulation thereof
    • G06F1/32Means for saving power
    • G06F1/3203Power management, i.e. event-based initiation of a power-saving mode
    • G06F1/3234Power saving characterised by the action undertaken
    • G06F1/324Power saving characterised by the action undertaken by lowering clock frequency
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Hardware Design (AREA)
  • Logic Circuits (AREA)

Abstract

The invention provides a method and a device for reconstructing an FPGA part with low power consumption. The system device of the scheme of the invention effectively reduces the invalid power consumption of the FPGA static logic area by combining the cutting and reconstruction switching technology of the FPGA static area logic and the clock frequency real-time adjustment mechanism of the memory control interface dormancy and dynamic clock management unit, and further avoids the heat dissipation and stability problems caused by long-term working of the static logic area in a high-frequency state.

Description

一种低功耗的FPGA部分可重构方法和装置A low-power FPGA partially reconfigurable method and device

技术领域Technical field

本发明涉及可重构计算领域,并特别涉及一种低功耗的FPGA部分可重构方法和装置。The present invention relates to the field of reconfigurable computing, and in particular to a low-power FPGA partially reconfigurable method and device.

背景技术Background technique

计算机的系统结构对于其信息处理能力有着至关重要的影响。然而,很难存在一种对所有运算任务都是最优解的通用计算架构。例如,在机器学习、数据库、图像处理、通信和金融运算等场景领域,不同属性的计算任务有其相适应的计算架构,这往往需要设计专用的定制芯片架构以获取最佳性能。由于专用计算芯片的研发成本高、生产周期长,业界近年来一直在探索如何使用更为灵活的方案解决此问题。The system structure of a computer has a crucial impact on its information processing capabilities. However, it is difficult to find a universal computing architecture that is optimal for all computing tasks. For example, in the fields of machine learning, databases, image processing, communications, and financial computing, computing tasks with different attributes have their own corresponding computing architectures, which often require the design of dedicated customized chip architectures to obtain the best performance. Due to the high research and development costs and long production cycle of specialized computing chips, the industry has been exploring how to use more flexible solutions to solve this problem in recent years.

可重构计算的思想源于上世纪60年代,最初由UCLA的Gerald Estrin提出。Estrin团队的Fixed-Plus-Variable系统被业界认为是可重构计算的原型机。在Estrin最初的设想中,可重构计算包括一个作为中央控制单元的标准CPU,以及众多可重构的运算单元。这些可重构计算单元由中央CPU控制,在执行相应任务(如图像处理,模式识别,科学运算等等)时配置成对应的最优架构(即硬件编程)。然而受限于当时的电子技术水平,直到数十年后,这种设计思想才伴随以FPGA为代表的可编程器件的发展得以真正实现。到2001年,虽然可重构计算使用的运算单元(FPGA)时钟频率远低于同期的CPU,但是可重构计算的综合运算能力却可以超越CPU数倍,而功耗也远小于CPU。The idea of reconfigurable computing originated in the 1960s and was first proposed by Gerald Estrin of UCLA. The Estrin team's Fixed-Plus-Variable system is considered by the industry to be a prototype of reconfigurable computing. In Estrin's original vision, reconfigurable computing included a standard CPU as a central control unit and numerous reconfigurable computing units. These reconfigurable computing units are controlled by the central CPU and configured into the corresponding optimal architecture (i.e., hardware programming) when performing corresponding tasks (such as image processing, pattern recognition, scientific operations, etc.). However, limited by the level of electronic technology at the time, it was not until decades later that this design idea was truly realized with the development of programmable devices represented by FPGAs. By 2001, although the clock frequency of the computing unit (FPGA) used in reconfigurable computing was much lower than that of the CPU in the same period, the comprehensive computing power of reconfigurable computing could exceed that of the CPU by several times, and the power consumption was far less than that of the CPU.

FPGA(Field Programmable Gate Arrays,现场可编程门阵列)是基于通过可编程互连连接的可配置逻辑块矩阵的半导体器件。此类芯片在制造完成后,依旧可以被重新编程,从而满足用户所需的应用程序或功能要求。此特性使得FPGA与专用集成电路(ASIC)区别开来,后者是为特定设计任务而定制生产的。目前主流的FPGA主要基于SRAM技术工艺,开发者可以根据设计需求的变化,对其进行重新编程。FPGA (Field Programmable Gate Arrays) is a semiconductor device based on a matrix of configurable logic blocks connected through programmable interconnects. After such chips are manufactured, they can still be reprogrammed to meet the application or functional requirements required by the user. This characteristic distinguishes FPGAs from application-specific integrated circuits (ASICs), which are custom-built for specific design tasks. The current mainstream FPGA is mainly based on SRAM technology, and developers can reprogram it according to changes in design requirements.

PR(Partial Reconfiguration,部分可重构)允许开发人员动态地重构FPGA的特定区域部分,而其余的FPGA设计继续正常运行。这种方法在具有多个功能、并且各个功能模块可以时分复用的系统中非常有效。这些功能模块可以分时共享相同的FPGA器件资源,并根据场景需求动态实时切换。很多FPGA厂商已经可以提供较为完整的部分可重构解决方案。PR使实现更复杂、更加弹性灵活的FPGA系统成为了可能。部分可重构的方案需要将FPGA的逻辑资源划分为静态逻辑区和动态逻辑区两个区域。借助于部分可重构的技术框架,FPGA系统可根据设计者的需要,对动态逻辑区域进行调整、划分,并在系统运行过程中实时切换动态区的逻辑模块,动态地调整系统自身的硬件架构,从而实现了可重构计算的构想。目前主流的云服务提供商借助部分可重构等相关技术,通过将其自研的静态逻辑区框架与整体的云平台框架进行有效整合,向外部用户开放FPGA的动态逻辑区。在这种使用模式中,FPGA部署于服务器端,作为加速设备对外提供服务;用户借助网络远程上传自己设计的逻辑模块,并在云服务器端的FPGA设备中直接运行并进行相关测试。PR (Partial Reconfiguration, Partial Reconfiguration) allows developers to dynamically reconfigure specific areas of the FPGA while the rest of the FPGA design continues to operate normally. This method is very effective in systems with multiple functions and where each functional module can be time-division multiplexed. These functional modules can share the same FPGA device resources in a time-sharing manner and switch dynamically in real time according to scene requirements. Many FPGA manufacturers can already provide relatively complete partially reconfigurable solutions. PR makes it possible to implement more complex, flexible and flexible FPGA systems. Partially reconfigurable solutions require dividing the FPGA's logic resources into two areas: the static logic area and the dynamic logic area. With the help of a partially reconfigurable technical framework, the FPGA system can adjust and divide the dynamic logic area according to the designer's needs, and switch the logic modules in the dynamic area in real time during system operation, dynamically adjusting the system's own hardware architecture. , thereby realizing the concept of reconfigurable computing. At present, mainstream cloud service providers use related technologies such as partial reconfiguration to effectively integrate their self-developed static logic area framework with the overall cloud platform framework to open the dynamic logic area of FPGA to external users. In this usage model, FPGA is deployed on the server side and serves as an acceleration device to provide external services; users remotely upload their own designed logic modules through the network, and run them directly in the FPGA device on the cloud server side and conduct related tests.

部分可重构的方案需要将FPGA的逻辑资源划分为静态逻辑区和动态逻辑区两个区域,其中静态逻辑区将提供对外通信接口、板载DDR3/DDR4/HBM等存储器的控制器接口、动态重构控制CPU等必要模块,动态逻辑区则预留出各项逻辑资源向用户提供服务。静态逻辑区与动态逻辑区通过片内配置通道接口(例如Xilinx FPGA的ICAP/PCAP等原语模块)连接。这些原语模块是部分可重构比特流由静态区向动态区进行配置的通道。在业界现有的FPGA部分可重构方案中,静态区逻辑,尤其是静态区逻辑的控制CPU往往仅需要在动态区的部分可重构期间工作,而在其他时间往往处于闲置状态。考虑到FPGA系统的绝大部分时间均工作在动态区稳定运行阶段而非部分可重构阶段,因此大部分时间,静态区CPU功耗可以认为是浪费的。此外,考虑到动态区逻辑所有可能的需求,静态区逻辑往往需要尽可能实现并提供所有的外部接口,包括FPGA片外板载DDR的访问控制模块,高速以太网模块,PCIe、SRIO、Aurora等各类高速总线接口。以DDR4存储器控制模块接口为例,由于需要以高频率重复对DRAM单位进行刷新,因此一旦开启不得不维持在较高能耗的状态。Partially reconfigurable solutions require dividing the FPGA's logic resources into two areas: a static logic area and a dynamic logic area. The static logic area will provide external communication interfaces, controller interfaces for onboard DDR3/DDR4/HBM and other memories, and dynamic logic areas. Necessary modules such as the control CPU are reconstructed, and various logical resources are reserved in the dynamic logic area to provide services to users. The static logic area and the dynamic logic area are connected through the on-chip configuration channel interface (such as Xilinx FPGA's ICAP/PCAP and other primitive modules). These primitive modules are channels for partially reconfigurable bit streams to be configured from the static area to the dynamic area. In the existing FPGA partial reconfiguration solutions in the industry, the static area logic, especially the control CPU of the static area logic, often only needs to work during the partial reconfiguration period of the dynamic area, and is often idle at other times. Considering that the FPGA system works most of the time in the stable operation stage of the dynamic area rather than the partially reconfigurable stage, most of the time, the CPU power consumption in the static area can be considered a waste. In addition, considering all possible needs of dynamic area logic, static area logic often needs to implement and provide all external interfaces as much as possible, including FPGA off-chip on-board DDR access control module, high-speed Ethernet module, PCIe, SRIO, Aurora, etc. Various high-speed bus interfaces. Take the DDR4 memory control module interface as an example. Since the DRAM unit needs to be refreshed repeatedly at a high frequency, it has to maintain a high energy consumption state once it is turned on.

持续工作于较高时钟频率下的静态逻辑区,同样也加重了设备散热的负担,长期过高的工作温度也必然影响FPGA器件及其周边电子元件的稳定性与寿命。现有技术只考虑了动态区逻辑配置的性能,忽略了功耗平衡,因而未对静态逻辑区进行灵活调整。至目前为止,国内外相关研究机构和企业均没有提出完整的降低FPGA部分可重构系统功耗的可行方案。Continuously working in the static logic area at a higher clock frequency also increases the burden of device heat dissipation. Long-term excessively high operating temperatures will inevitably affect the stability and life of the FPGA device and its surrounding electronic components. The existing technology only considers the performance of the dynamic area logic configuration and ignores the power consumption balance, so the static logic area is not flexibly adjusted. So far, relevant domestic and foreign research institutions and companies have not proposed a complete feasible solution to reduce the power consumption of FPGA partially reconfigurable systems.

综上所述,业界目前的FPGA部分可重构系统的功耗设计,尤其是静态区功耗设计,仍存在改进空间。To sum up, there is still room for improvement in the power consumption design of some FPGA reconfigurable systems in the industry, especially the power consumption design in the static area.

发明内容Contents of the invention

发明人在进行FPGA部分可重构设计的研究中,发现现有技术只考虑了动态区逻辑配置的性能,忽略了功耗平衡,因而未对静态逻辑区进行灵活调整。During the research on the partially reconfigurable design of FPGA, the inventor found that the existing technology only considered the performance of the dynamic area logic configuration and ignored the power consumption balance, so the static logic area was not flexibly adjusted.

本发明的目的是解决现有技术中存在的FPGA部分可重构无效功耗过多的问题。本发明基于静态区逻辑的裁剪及重构切换技术与动态时钟管理调控技术,设计了一种在不影响功能,不降低性能的前提下节省功耗的FPGA部分可重构系统装置。The purpose of the present invention is to solve the problem of excessive ineffective power consumption of partially reconfigurable FPGAs existing in the prior art. Based on the cutting and reconstruction switching technology of static area logic and the dynamic clock management and control technology, the present invention designs an FPGA partially reconfigurable system device that saves power consumption without affecting functions or reducing performance.

具体来说本发明提供了一种低功耗的FPGA部分可重构方法,其中包括:Specifically, the present invention provides a low-power FPGA partially reconfigurable method, which includes:

步骤1、根据预设的接口类别,裁剪对外接口逻辑的完整静态区框架,得到多个分别对应特定接口类别的静态区逻辑,并将所有该静态区逻辑以FPGA静态区逻辑配置文件的形式放置于FPGA片外用于上电启动配置的非易失闪存中;Step 1. Cut the complete static area framework of the external interface logic according to the preset interface category, obtain multiple static area logics corresponding to specific interface categories, and place all the static area logic in the form of FPGA static area logic configuration files. In the non-volatile flash memory used for power-on startup configuration outside the FPGA chip;

步骤2、根据用户动态重构指令中动态区逻辑特性,得到用户逻辑运行所需接口,并根据该用户逻辑运行所需接口所属的接口类别,选择该非易失闪存中包含有相应接口的FPGA静态区逻辑配置文件,再根据该用户动态重构指令进行FPGA片内动态区配置。Step 2. According to the logic characteristics of the dynamic area in the user's dynamic reconstruction instruction, obtain the interface required for user logic operation, and select the FPGA containing the corresponding interface in the non-volatile flash memory according to the interface category to which the interface required for user logic operation belongs. Static area logic configuration file, and then perform dynamic area configuration on the FPGA chip according to the user's dynamic reconstruction instructions.

所述的低功耗的FPGA部分可重构方法,其中该步骤2包括:The low-power FPGA partially reconfigurable method, wherein step 2 includes:

在完成FPGA片内动态区配置后,设置重构的动态时钟单元用于对该FPGA片内静态逻辑区的CPU软核及相关逻辑模块进行动态时钟调整。After completing the configuration of the dynamic area in the FPGA chip, the reconstructed dynamic clock unit is set up to dynamically adjust the clock of the CPU soft core and related logic modules in the static logic area of the FPGA chip.

所述的低功耗的FPGA部分可重构方法,其中该步骤2包括:The low-power FPGA partially reconfigurable method, wherein step 2 includes:

根据该FPGA片内动态逻辑区对各外部存储器的使用状态,向外部存储器的控制器接口模块发送控制指令,使得指定存储系统工作在自刷新状态或掉电休眠状态。According to the usage status of each external memory in the dynamic logic area of the FPGA chip, control instructions are sent to the controller interface module of the external memory, so that the designated storage system works in a self-refresh state or a power-down sleep state.

所述的低功耗的FPGA部分可重构方法,其中该步骤2包括:选择该非易失闪存中静态区逻辑的初始缺省镜像,在该初始缺省镜像的引导下进行二次重构以引导FPGA进入多重启动镜像切换状态,对FPGA片内静态区的逻辑进行重构,若重构失败中断,将自动回退到初始状态。The low-power FPGA partially reconfigurable method, wherein step 2 includes: selecting an initial default image of the static area logic in the non-volatile flash memory, and performing a second reconstruction under the guidance of the initial default image To guide the FPGA into the multi-boot image switching state, the logic of the static area on the FPGA chip is reconstructed. If the reconstruction fails and is interrupted, it will automatically fall back to the initial state.

本发明还提出了一种低功耗的FPGA部分可重构系统,其中包括:The present invention also proposes a low-power FPGA partially reconfigurable system, which includes:

模块1、用于根据预设的接口类别,裁剪对外接口逻辑的完整静态区框架,得到多个分别对应特定接口类别的静态区逻辑,并将所有该静态区逻辑以FPGA静态区逻辑配置文件的形式放置于FPGA片外用于上电启动配置的非易失闪存中;Module 1 is used to tailor the complete static area framework of the external interface logic according to the preset interface category, obtain multiple static area logics corresponding to specific interface categories, and convert all the static area logic to the FPGA static area logic configuration file. The form is placed in the non-volatile flash memory outside the FPGA chip for power-on startup configuration;

模块2、用于根据用户动态重构指令中动态区逻辑特性,得到用户逻辑运行所需接口,并根据该用户逻辑运行所需接口所属的接口类别,选择该非易失闪存中包含有相应接口的FPGA静态区逻辑配置文件,再根据该用户动态重构指令进行FPGA片内动态区配置。Module 2 is used to obtain the interface required for user logic operation based on the logic characteristics of the dynamic area in the user's dynamic reconstruction instruction, and select the corresponding interface included in the non-volatile flash memory according to the interface category to which the interface required for user logic operation belongs. FPGA static area logic configuration file, and then configure the FPGA on-chip dynamic area according to the user's dynamic reconstruction instructions.

所述的低功耗的FPGA部分可重构系统,其中该模块2包括:The low-power FPGA partially reconfigurable system, wherein the module 2 includes:

在完成FPGA片内动态区配置后,设置重构的动态时钟单元用于对该FPGA片内静态逻辑区的CPU软核及相关逻辑模块进行动态时钟调整。After completing the configuration of the dynamic area in the FPGA chip, the reconstructed dynamic clock unit is set up to dynamically adjust the clock of the CPU soft core and related logic modules in the static logic area of the FPGA chip.

所述的低功耗的FPGA部分可重构系统,其中该模块2包括:The low-power FPGA partially reconfigurable system, wherein the module 2 includes:

根据该FPGA片内动态逻辑区对各外部存储器的使用状态,向外部存储器的控制器接口模块发送控制指令使得指定存储系统工作在自刷新状态或掉电休眠状态。According to the usage status of each external memory in the dynamic logic area within the FPGA chip, control instructions are sent to the controller interface module of the external memory to cause the designated storage system to work in a self-refresh state or a power-down sleep state.

所述的低功耗的FPGA部分可重构系统,其中该模块2包括:选择该非易失闪存中静态区逻辑的初始缺省镜像,在该初始缺省镜像的引导下进行二次重构以引导FPGA进入多重启动镜像切换状态,对FPGA片内静态区的逻辑进行重构,若重构失败中断,将自动回退到初始状态。The low-power FPGA partially reconfigurable system, wherein the module 2 includes: selecting an initial default image of the static area logic in the non-volatile flash memory, and performing a second reconstruction under the guidance of the initial default image. To guide the FPGA into the multi-boot image switching state, the logic of the static area on the FPGA chip is reconstructed. If the reconstruction fails and is interrupted, it will automatically fall back to the initial state.

由以上方案可知,本发明的优点在于:It can be seen from the above solutions that the advantages of the present invention are:

相比业界原有的技术方案,本发明方案所述的系统装置,通过结合FPGA静态区逻辑的裁剪及重构切换技术与动态时钟管理单元的时钟频率实时调整机制,有效地降低了FPGA静态逻辑区的无效功耗,同时也进一步避免了静态逻辑区长期工作在高频率状态带来的散热与稳定性问题。Compared with the original technical solutions in the industry, the system device described in the solution of the present invention effectively reduces the cost of FPGA static logic by combining the cutting and reconstruction switching technology of FPGA static area logic and the real-time clock frequency adjustment mechanism of the dynamic clock management unit. It also reduces the ineffective power consumption of the area and further avoids the heat dissipation and stability problems caused by the long-term operation of the static logic area in a high-frequency state.

附图说明Description of the drawings

图1为FPGA部分可重构整体框架示意图;Figure 1 is a schematic diagram of the partially reconfigurable overall framework of FPGA;

图2为FPGA静态区逻辑的裁剪及重构切换技术示意图;Figure 2 is a schematic diagram of the cutting and reconstruction switching technology of FPGA static area logic;

图3为基于动态时钟管理单元的时钟频率实时调整机制示意图;Figure 3 is a schematic diagram of the real-time adjustment mechanism of clock frequency based on the dynamic clock management unit;

图4为本发明一具体实施例的流程图。Figure 4 is a flow chart of a specific embodiment of the present invention.

具体实施方式Detailed ways

发明人对静态逻辑区和动态逻辑区的配置方案、静态区功耗组成进行了分析,提出降低无效功耗可从两个方面解决:The inventor analyzed the configuration scheme of the static logic area and the dynamic logic area and the power consumption composition of the static area, and proposed that reducing ineffective power consumption can be solved from two aspects:

第一个方面,动态降低静态区逻辑的时钟频率,以降低因不必要的高运行频率所产生的电路动态功耗,即通过设置动态可重构的时钟单元来替换目前业界方案中的静态时钟单元,对静态逻辑区的CPU进行动态时钟调整,降低其空闲状态下的功耗;The first aspect is to dynamically reduce the clock frequency of the static area logic to reduce the dynamic power consumption of the circuit caused by unnecessary high operating frequency, that is, by setting up a dynamically reconfigurable clock unit to replace the static clock in the current industry solution. unit, dynamically adjusts the clock of the CPU in the static logic area to reduce its power consumption in idle state;

第二个方面,灵活降低静态区逻辑中外部控制器接口的闲置功耗。这里根据云服务的提供方式,又分为两类子场景:The second aspect is to flexibly reduce the idle power consumption of the external controller interface in the static area logic. According to the cloud service provision method, it is divided into two types of sub-scenarios:

在整体FPGA面向单一用户提供服务的场景中,通过多重启动镜像可切换模式方式,在FPGA初始化配置的片外NorFlash上划分多个独立的静态区逻辑配置方案,每个配置方案都不同的静态区接口,例如在某个时段动态逻辑区不需要访问外部DDR3、DDR4时,可以首先将静态逻辑区初始化配置为没有DDR4控制器接口的框架,再进行动态区配置,从而有效降低了因外部控制器接口闲置而带来的额外功耗。即对静态区逻辑进行直接裁剪,并根据需要选择性加载裁剪后的特定配置文件。In the scenario where the entire FPGA provides services to a single user, multiple independent static area logical configuration schemes are divided on the off-chip NorFlash initialized and configured by the FPGA through multiple boot mirroring switchable modes. Each configuration scheme has a different static area. interface, for example, when the dynamic logical area does not need to access external DDR3 and DDR4 for a certain period of time, the static logical area can be initialized and configured as a framework without a DDR4 controller interface, and then the dynamic area can be configured, thus effectively reducing the cost of external controller access. Additional power consumption caused by idle interface. That is, the static area logic is directly trimmed, and the trimmed specific configuration files are selectively loaded as needed.

在FPGA的动态区资源通过划分,面向多用户共享的服务场景中,虽然需要考虑到多个用户需求的集合(一般只能选择最完整的、拥有全部接口模块的静态区逻辑框架,且时间上也不适合频繁加载裁剪后的静态区逻辑镜像),但这种场景中仍可以通过结合动态区工作需求,调整相关静态区模块(如DDR存储器控制接口等)状态,使这些模块在不需要工作时自动切换至休眠模式。即对静态区逻辑无法裁剪切换的情况,针对某些模块,控制其切换为休眠模式。In a service scenario where the dynamic area resources of FPGA are divided and shared by multiple users, although the set of multiple user needs needs to be considered (generally, only the most complete static area logical framework with all interface modules can be selected, and in terms of time It is also not suitable for frequent loading of trimmed static area logical images), but in this scenario, the status of relevant static area modules (such as DDR memory control interface, etc.) can still be adjusted by combining the working requirements of the dynamic area, so that these modules do not need to work automatically switches to sleep mode. That is, when the static area logic cannot be cut and switched, for some modules, it is controlled to switch to sleep mode.

具体来说,基于上述构思,本发明包括以下关键技术点:Specifically, based on the above concept, the present invention includes the following key technical points:

关键点1:本发明方案首先设计了一套包含用于部分可重构控制的软核CPU,以及完整的对外通信接口(包括SRIO、高速以太网、Aurora等高速协议接口)、对外访存控制接口(包括DDR3、DDR4、HBM等)逻辑的静态区框架,并将这一包含全部接口的静态区逻辑框架的比特流文件作为系统缺省初始化镜像,放置于FPGA片外用于上电启动配置的非易失闪存NorFlash中。同时,基于上述包含完整对外接口逻辑的静态区框架进行分类裁剪,生成仅包含某一或某些特定接口的静态区逻辑,放置于NorFlash的其他地址段,并记录保存这些经过裁剪的静态区框架比特流文件在NorFlash中存储的首地址信息。Key point 1: The solution of this invention first designs a set of soft-core CPUs for partially reconfigurable control, as well as complete external communication interfaces (including SRIO, high-speed Ethernet, Aurora and other high-speed protocol interfaces), external memory access control The static area framework of the interface (including DDR3, DDR4, HBM, etc.) logic, and the bit stream file containing the static area logical framework of all interfaces is used as the system default initialization image and placed outside the FPGA chip for power-on startup configuration. Non-volatile flash memory NorFlash. At the same time, based on the above-mentioned static area framework containing complete external interface logic, we perform classification and cutting, generate static area logic that only contains one or some specific interfaces, place it in other address segments of NorFlash, and record and save these trimmed static area frames. The first address information of the bit stream file stored in NorFlash.

实际运行阶段,在整体FPGA面向单一用户提供服务的场景中,根据用户上传的动态区逻辑特性,以及用户逻辑运行所需的接口环境,选择NorFlash中相应的静态区框架配置比特流文件。例如,云服务运维中心的管理程序通过对用户所需要运行的动态逻辑进行分析,如果动态逻辑的运行与测试不需要访问外部存储,只需访问外部网络,则通过多重启动镜像切换模式的方式,将裁剪后仅保留了以太网接口和部分可重构控制软核CPU等必要部件的静态区逻辑加载到FPGA系统,以此为基础再对动态区逻辑的相应区域进行部分可重构工作。而在FPGA动态区资源需通过划分以面向多用户共享的云服务场景中,则可以通过部分可重构控制软核CPU向DDR3、DDR4或HBM等外部存储器的控制器接口模块发送低功耗模式的指令,例如通过发送相关控制指令使得存储系统工作在自刷新模式甚至掉电休眠模式,从而在不裁剪静态区逻辑的前提下,仍然能够动态地降低功耗。In the actual operation stage, in the scenario where the overall FPGA provides services to a single user, the corresponding static area framework configuration bitstream file in NorFlash is selected based on the dynamic area logic characteristics uploaded by the user and the interface environment required for user logic operation. For example, the management program of the cloud service operation and maintenance center analyzes the dynamic logic that users need to run. If the running and testing of the dynamic logic does not require access to external storage, but only needs to access the external network, it can switch modes through multiple boot images. , load the static area logic that only retains necessary components such as the Ethernet interface and some reconfigurable control soft-core CPUs into the FPGA system after cutting, and then perform partial reconfiguration work on the corresponding areas of the dynamic area logic based on this. In cloud service scenarios where FPGA dynamic area resources need to be divided for multi-user sharing, the soft-core CPU can be partially reconfigurably controlled to send low-power mode to the controller interface module of external memory such as DDR3, DDR4 or HBM. For example, by sending relevant control instructions to make the storage system work in self-refresh mode or even power-down sleep mode, power consumption can still be dynamically reduced without cutting off the static area logic.

技术效果:通过针对不同用户的需求,选择性地提供经裁剪的静态区逻辑框架,避免了冗余接口逻辑的持续耗电问题。尤其是对于某些用户逻辑并不需访问任何外部存储器和外部通信接口,仅需通过标准的片内系统总线(如果ARM的AMBA协议)完成所有指令与数据交互的场景,针对性地提供经裁剪的静态区逻辑,相比直接统一提供完整对外接口的静态区逻辑,功耗方面有显著的降低。Technical effect: By selectively providing a tailored static area logic framework according to the needs of different users, the problem of continuous power consumption of redundant interface logic is avoided. Especially for scenarios where some user logic does not need to access any external memory and external communication interfaces, and only needs to complete all instruction and data interaction through the standard on-chip system bus (if ARM's AMBA protocol), tailored solutions are provided. The static area logic has a significant reduction in power consumption compared to the static area logic that directly and uniformly provides a complete external interface.

关键点2:通过动态时钟控制单元对部分可重构流程的状态进行检测,当动态逻辑区完成配置,开始进入稳定运行阶段,动态时钟控制单元将对静态区中的CPU软核等模块的时钟信号发生单元进行重构,使其工作在较低的时钟频率下。当下次部分可重构开始时,动态时钟管理单元再次重构静态区的相应时钟发生单元,使其恢复到高时钟频率输出状态,从而确保软核CPU等系列模块能够高效地完成部分可重构工作。Key point 2: The dynamic clock control unit detects the status of some reconfigurable processes. When the dynamic logic area completes the configuration and begins to enter the stable operation stage, the dynamic clock control unit will monitor the clocks of the CPU soft core and other modules in the static area. The signal generation unit is restructured to operate at a lower clock frequency. When the next partial reconfiguration starts, the dynamic clock management unit once again reconstructs the corresponding clock generation unit in the static area to restore it to a high clock frequency output state, thereby ensuring that series modules such as soft-core CPUs can efficiently complete partial reconfiguration Work.

技术效果:由于FPGA部分可重构系统的绝大部分时间均工作在动态逻辑区稳定运行的阶段,因此通过动态时钟降频的方式可以显著降低静态区逻辑的无效功耗。Technical effect: Since the FPGA partially reconfigurable system works most of the time in the stable operation stage of the dynamic logic area, the ineffective power consumption of the static area logic can be significantly reduced through dynamic clock frequency reduction.

FPGA的功耗分为静态功耗与动态功耗,其中动态功耗主要来源于芯片电路中电容的高频充放电动作,可以公式化地描述为:The power consumption of FPGA is divided into static power consumption and dynamic power consumption. Dynamic power consumption mainly comes from the high-frequency charging and discharging actions of capacitors in the chip circuit, which can be formulaically described as:

其中:in:

PD_avg是平均动态功耗;P D_avg is the average dynamic power consumption;

Cn指第n个模块电路的等效负载电容; Cn refers to the equivalent load capacitance of the nth module circuit;

fn指第n个模块电路开关动作的平均频率;f n refers to the average frequency of switching action of the n-th module circuit;

V指电路供电电压。V refers to the circuit supply voltage.

可见,FPGA的动态功耗基本与电路运行时所参考的时钟频率成正比。It can be seen that the dynamic power consumption of FPGA is basically proportional to the clock frequency referenced when the circuit is running.

为让本发明的上述特征和效果能阐述的更明确易懂,下文特举实施例,并配合说明书附图作详细说明如下。In order to make the above-mentioned features and effects of the present invention more clear and understandable, examples are given below and are described in detail with reference to the accompanying drawings.

如图1和图4所示为本发明的FPGA部分可重构整体框架示意图。本发明方案所述的系统装置从两个方面进行功耗控制:一、基于FPGA静态区逻辑的裁剪及重构切换技术;二、基于动态时钟管理单元的时钟频率实时调整机制。通过结合上述两个方面,有效地降低了FPGA静态逻辑区的无效功耗。下面以Xilinx FPGA芯片(型号:XCVU37P)为例进行本发明实施例的描述。Figures 1 and 4 show schematic diagrams of the partially reconfigurable overall framework of the FPGA of the present invention. The system device described in the solution of the present invention controls power consumption from two aspects: 1. Clipping and reconstruction switching technology based on FPGA static area logic; 2. Real-time adjustment mechanism of clock frequency based on dynamic clock management unit. By combining the above two aspects, the ineffective power consumption of the FPGA static logic area is effectively reduced. The following uses a Xilinx FPGA chip (model: XCVU37P) as an example to describe the embodiment of the present invention.

如图2所示,为FPGA静态区逻辑的裁剪及重构切换技术。As shown in Figure 2, it is the cutting and reconstruction switching technology of FPGA static area logic.

首先,我们通过对近几年来的FPGA云服务器用户的需求进行了一定调查了解,发现大部分用户所上传的逻辑模块仅需要某一到两种对外接口,甚至有部分用户逻辑模块只需要通过片内的AXI4接口总线即可完成所有的指令与数据交互。因此,虽然提供一套包含完整的各类对外通信接口与存储器控制接口的静态区逻辑可以保证不同用户的需求,但这种方式往往也存在一定程度上的资源与功耗浪费。例如,对于不需要访问外部DDR3、DDR4等存储器的用户逻辑,当这些逻辑被配置到FPGA的动态逻辑区中并开始运行时,其数据交互或缓存需求可以直接通过片内总线和片内存储器满足,尽管如此,静态区仍然因需要不间断地保持DDR3、DDR4的刷新率,而不得不继续维持静态区逻辑中的相应存储器控制模块的运行。这样持续长时间的无意义功耗的累计将带来显著的浪费和散热负担。First of all, we conducted a certain survey on the needs of FPGA cloud server users in recent years and found that most of the logic modules uploaded by users only require one or two external interfaces, and some even some user logic modules only need to pass through the chip. The AXI4 interface bus within can complete all instructions and data interactions. Therefore, although providing a complete set of static area logic that includes various external communication interfaces and memory control interfaces can meet the needs of different users, this approach often involves a certain degree of waste of resources and power consumption. For example, for user logic that does not need to access external DDR3, DDR4 and other memories, when these logics are configured into the dynamic logic area of FPGA and start running, their data interaction or caching needs can be met directly through the on-chip bus and on-chip memory. , nevertheless, the static area still needs to maintain the refresh rate of DDR3 and DDR4 uninterruptedly, and has to continue to maintain the operation of the corresponding memory control module in the static area logic. The accumulation of meaningless power consumption over a long period of time will bring significant waste and heat dissipation burden.

为此,不同于业界目前基于单一的、固定的静态区框架方案,本发明对静态区框架内的各个对外接口模块进行了逐项裁剪,产生了拥有多个不同对外接口组合的静态区逻辑框架镜像,并将这些框架镜像所对应的比特流配置文件,分别存储在FPGA片外用于上电启动配置的NorFlash中的不同地址段。如图所示,在NorFlash的0x00000000首地址处,将存放未经过裁剪、拥有所有对外接口的完整静态区逻辑框架镜像的比特流配置文件,当FPGA系统重新上电启动时,将自动加载这一完整框架镜像的比特流文件并完成静态区框架的初始化配置工作。To this end, unlike the industry's current solution based on a single, fixed static area framework, the present invention tailors each external interface module in the static area frame one by one, resulting in a static area logical framework with multiple different external interface combinations. Mirror, and store the bit stream configuration files corresponding to these frame mirrors in different address segments in NorFlash off-chip of the FPGA for power-on startup configuration. As shown in the figure, at the first address of 0x00000000 in NorFlash, the bitstream configuration file of the complete static area logical frame image that has not been trimmed and has all external interfaces will be stored. When the FPGA system is powered on again, this file will be automatically loaded. The bitstream file of the complete frame image and completes the initial configuration of the static area frame.

当静态区框架的首次初始化配置完成后,云服务器管理程序将用户准备上传提交的逻辑模块信息进行分析,如果用户逻辑此时需要所有类型的对外接口才能进行工作和测试,管理程序将直接基于完整的静态区逻辑框架,对FPGA系统传输用户逻辑所对应的部分可重构比特流文件,进而对动态区进行配置。如果经分析,认为用户逻辑不需要访问所有的对外接口,仅需要某一或某些接口,甚至不需要任何接口而仅依靠片内总线和片内存储资源即可完成运行和测试工作时,管理程序将对FPGA系统提供其运行所需的最小对外接口资源的裁剪后静态区框架配置镜像,提供该配置镜像在NorFlash中存储的首地址信息,引导FPGA进入多重启动镜像可切换模式,重新加载裁剪后的配置比特流,对静态区逻辑进行重构。若因偶然因素导致静态区逻辑重构出现错误而失败中断,系统将自动回退到初始状态,以确保FPGA至少能够继续运行并提供部分可重构服务。重构完成后,静态区将工作在提供用户所需的最小资源状态,并以此为基础进行后续动态区逻辑的部分可重构工作。When the first initialization configuration of the static area framework is completed, the cloud server management program will analyze the logic module information that the user is ready to upload and submit. If the user logic requires all types of external interfaces to work and test at this time, the management program will be directly based on the complete The static area logic framework transmits the partially reconfigurable bitstream file corresponding to the user logic to the FPGA system, and then configures the dynamic area. If after analysis, it is believed that the user logic does not need to access all external interfaces, but only needs one or some interfaces, or even does not need any interface and can complete the operation and testing work by relying only on the on-chip bus and on-chip storage resources, management The program will provide the FPGA system with a trimmed static area frame configuration image of the minimum external interface resources required for its operation, provide the first address information of the configuration image stored in NorFlash, guide the FPGA into the multi-boot image switchable mode, and reload the trimmed image. The final configuration bitstream is used to reconstruct the static area logic. If the logic reconstruction of the static area fails due to accidental factors and fails, the system will automatically return to the initial state to ensure that the FPGA can continue to run at least and provide some reconfigurable services. After the reconstruction is completed, the static area will work in a state of providing the minimum resources required by the user, and based on this, subsequent partial reconfiguration of the dynamic area logic will be carried out.

可选的,对于将要长时间重复运行的用户逻辑,除加载配置上述经裁剪的静态区逻辑外,还可以通过云服务器管理程序,对板级硬件的相关部件的电源进行管理,例如适度关闭某些板载通信与存储的硬件电源等方式,来进一步地压缩整体系统功耗。但由于这一方法可能会扰乱服务器系统的硬件上电复位时序,需要根据实际情况进行。Optionally, for user logic that will be run repeatedly for a long time, in addition to loading and configuring the above-mentioned tailored static area logic, the power of related components of the board-level hardware can also be managed through the cloud server management program, such as appropriately turning off a certain Some onboard communication and storage hardware power supplies are used to further reduce overall system power consumption. However, since this method may disrupt the hardware power-on reset sequence of the server system, it needs to be performed according to the actual situation.

而在FPGA动态区资源需通过划分以面向多用户共享的云服务场景中,当FPGA系统内的所有用户在一段时间内都不需要访问外部存储时,则可以通过部分可重构控制软核CPU向DDR3、DDR4或HBM等外部存储器的控制器接口模块发送低功耗模式的指令(例如通过发送相关控制指令使得存储系统工作在自刷新模式甚至掉电休眠模式)从而在不裁剪静态区逻辑,保持静态区框架完整性的前提下,仍能动态地降低功耗。In cloud service scenarios where FPGA dynamic area resources need to be divided for multi-user sharing, when all users in the FPGA system do not need to access external storage for a period of time, the soft-core CPU can be controlled through partial reconfiguration. Send low-power mode instructions to the controller interface module of external memory such as DDR3, DDR4 or HBM (for example, by sending relevant control instructions to make the storage system work in self-refresh mode or even power-down sleep mode) so as not to cut the static area logic. Power consumption can be dynamically reduced while maintaining the integrity of the static area framework.

如图3所示,基于动态时钟管理单元的时钟频率实时调整机制。As shown in Figure 3, the clock frequency real-time adjustment mechanism is based on the dynamic clock management unit.

部分可重构过程中,比特流需经过ICAP等通道进行配置,本发明方案在ICAP传输控制状态机中加入了对部分可重构状态的反馈机制。当动态时钟控制单元检测到动态逻辑区完成配置开始进入稳定运行阶段的反馈信号后,动态时钟控制单元将对静态区中的CPU软核和相关逻辑模块的专属时钟发生单元进行重构,使其工作在较低的时钟频率下,相关逻辑模块包括:CPU软核的片内局部存储器模块、存储交互总线及控制单元等。当下次部分可重构开始时,动态时钟管理单元再次重构静态区的相应时钟单元,使其恢复到高时钟频率输出状态。在不同时钟频率的切换过程中,通过在CPU软核和片内互联交换单元(英文一般称为Crossbar)直接插入异步时钟FIFO来解决跨时钟域问题。通过动态地调整相关模块的时钟频率,确保软核CPU等系列模块能够高效地完成部分可重构工作。During the partial reconfiguration process, the bit stream needs to be configured through channels such as ICAP. The solution of the present invention adds a feedback mechanism for the partial reconfiguration state to the ICAP transmission control state machine. When the dynamic clock control unit detects the feedback signal that the dynamic logic area has completed the configuration and entered the stable operation stage, the dynamic clock control unit will reconstruct the exclusive clock generation unit of the CPU soft core and related logic modules in the static area to make it Working at a lower clock frequency, the relevant logic modules include: the on-chip local memory module of the CPU soft core, the storage interactive bus and the control unit, etc. When the next partial reconfiguration starts, the dynamic clock management unit reconfigures the corresponding clock unit in the static area again to restore it to a high clock frequency output state. During the switching process of different clock frequencies, the cross-clock domain problem is solved by directly inserting asynchronous clock FIFO into the CPU soft core and the on-chip interconnection switching unit (generally called Crossbar in English). By dynamically adjusting the clock frequency of relevant modules, it is ensured that soft-core CPU and other series modules can efficiently complete part of the reconfigurable work.

实测案例:Actual test case:

在对本发明所属方案及装置的实验测试中,我们首先对FPGA静态区框架中设置了必要的单元,包括用于部分可重构控制的CPU软核及其所配置的片上Local RAM、ICAP通道单元及控制逻辑、SPI接口标准的NorFlash控制接口模块和Crossbar单元以及必要的FIFO模块,同时也加入了DDR4控制器、以太网控制器、串口等外部接口。In the experimental testing of the scheme and device of the present invention, we first set up the necessary units in the FPGA static area framework, including the CPU soft core for partial reconfigurable control and its configured on-chip Local RAM and ICAP channel units And control logic, SPI interface standard NorFlash control interface module and Crossbar unit and necessary FIFO module, while also adding DDR4 controller, Ethernet controller, serial port and other external interfaces.

通过VIVADO等EDA工具对布局布线后的设计进行详细的评估发现:A detailed evaluation of the design after placement and routing through EDA tools such as VIVADO found that:

对于未集成封装HBM的FPGA系列型号,DDR4控制器的功耗约1.501~1.609W,大致占据整体静态区框架的22%~28%。对于集成封装HBM的FPGA系列型号,HBM相关组件在闲置状态时,HBM存储器颗粒的功耗约为0.240~0.253W,在工作状态时,HBM存储器颗粒功耗约为0.833~0.890W,与此同时位于FPGA片内的HBM控制器逻辑功耗约为1.155~1.167W,约占整体静态区框架的16%~17%。For FPGA series models without integrated package HBM, the power consumption of the DDR4 controller is about 1.501~1.609W, which roughly accounts for 22%~28% of the overall static area frame. For FPGA series models with integrated HBM packaging, when the HBM-related components are in the idle state, the power consumption of the HBM memory particles is about 0.240~0.253W, and in the working state, the power consumption of the HBM memory particles is about 0.833~0.890W. At the same time, The logic power consumption of the HBM controller located in the FPGA chip is about 1.155~1.167W, accounting for about 16%~17% of the overall static area frame.

同时考虑到XCVU37P器件属于拥有多个SLR(Super Logic Region,超级逻辑区域)的SSI(Stacked Silicon Interconnect,堆叠硅片互联技术)器件,而为了提高FPGA整体系统性能,静态区框架和ICAP通道往往集中布局在主SLR区域,即SLR0中,这一布局方式也可能导致其长期工作出现片内局部温度过高的问题。通过本发明所述的技术,对这一实测案例进行裁剪,并生成了一套低功耗的FPGA部分可重构系统。对于不需要对外访问DDR4存储器的用户逻辑,可能降低功耗:At the same time, considering that the XCVU37P device is an SSI (Stacked Silicon Interconnect, stacked silicon interconnect technology) device with multiple SLRs (Super Logic Region, super logic region), and in order to improve the overall system performance of the FPGA, the static area framework and ICAP channels are often concentrated It is laid out in the main SLR area, that is, SLR0. This layout method may also cause the problem of excessive local temperature within the chip during long-term operation. Through the technology described in the present invention, this actual measurement case is tailored and a set of low-power FPGA partially reconfigurable system is generated. For user logic that does not require external access to DDR4 memory, power consumption may be reduced:

静态区中不同类型的CPU软核所对应的功耗不同。在本发明工作的测试阶段中,以Microblaze软核及其附属片上存储、总线等模块为例,在较高时钟频率下运行时(200MHz)此时静态区逻辑框架中的CPU软核及其附属模块的功耗,与经过动态时钟管理单元降频至25MHz后,静态区逻辑框架中的CPU软核及其附属模块的功耗分别为:Different types of CPU soft cores in the static area have different power consumption. In the test phase of the work of the present invention, taking the Microblaze soft core and its attached on-chip storage, bus and other modules as an example, when running at a higher clock frequency (200MHz), the CPU soft core and its attached parts in the static area logical framework are The power consumption of the module and the power consumption of the CPU soft core and its auxiliary modules in the static area logical framework after the dynamic clock management unit is downclocked to 25MHz are:

时钟频率Clock frequency 200MHz200MHz 25MHz25MHz 软核功耗Soft core power consumption 0.241W0.241W 0.030W0.030W 软核所需片内局部存储器模块功耗Power consumption of on-chip local memory module required by soft core 0.211W0.211W 0.026W0.026W 软核AXI4总线功耗Soft core AXI4 bus power consumption 0.092W0.092W 0.029W0.029W

此外,由于静态区逻辑框架的主控制CPU以及片内总线频率均以降低,Crossbar模块的功耗也由0.336W降至0.148W,节省55.95%的功耗,为0.188W;而且静态区备用的片内存储Local RAM及其控制模块功耗也从0.752W降至0.148W,节省80.32%的功耗,为0.604W。In addition, since the main control CPU of the static area logical framework and the on-chip bus frequency are reduced, the power consumption of the Crossbar module is also reduced from 0.336W to 0.148W, saving 55.95% of the power consumption to 0.188W; and the static area backup The power consumption of the on-chip storage Local RAM and its control module also dropped from 0.752W to 0.148W, saving 80.32% of the power consumption to 0.604W.

综合其他各个逻辑模块,静态区逻辑框架总体的功耗由3.601W降低至2.135W,节省功耗1.466W,相比无时钟动态调整方案的功耗降低了40.71%,效果显著。Combined with other logic modules, the overall power consumption of the static area logic framework is reduced from 3.601W to 2.135W, saving 1.466W of power consumption. Compared with the power consumption of the clock-free dynamic adjustment scheme, the power consumption is reduced by 40.71%, and the effect is significant.

综上,本发明所述方案,通过将FPGA静态区逻辑的裁剪及重构切换技术或控制器接口模块低功耗模式调整技术与动态时钟管理单元的时钟频率实时调整机制相结合,有效地降低了FPGA静态逻辑区的无效功耗,对于某些不需访问片外存储器的用户逻辑,综合上述技术甚至可以将静态区框架的功耗降低40%以上。同时本发明所述方案对功耗的降低效果,也使得FPGA系统有效地避免了静态逻辑区长期工作在高频率状态所带来的散热与稳定性问题。In summary, the solution of the present invention effectively reduces the cost by combining the cutting and reconstruction switching technology of the FPGA static area logic or the low-power mode adjustment technology of the controller interface module with the real-time clock frequency adjustment mechanism of the dynamic clock management unit. The ineffective power consumption of the FPGA static logic area is eliminated. For some user logic that does not need to access off-chip memory, the above technology can even reduce the power consumption of the static area framework by more than 40%. At the same time, the power consumption reduction effect of the solution of the present invention also enables the FPGA system to effectively avoid the heat dissipation and stability problems caused by the long-term operation of the static logic area in a high-frequency state.

以下为与上述方法实施例对应的系统实施例,本实施方式可与上述实施方式互相配合实施。上述实施方式中提到的相关技术细节在本实施方式中依然有效,为了减少重复,这里不再赘述。相应地,本实施方式中提到的相关技术细节也可应用在上述实施方式中。The following is a system embodiment corresponding to the above method embodiment. This implementation mode can be implemented in conjunction with the above implementation mode. The relevant technical details mentioned in the above embodiments are still valid in this embodiment, and will not be described again in order to reduce duplication. Correspondingly, the relevant technical details mentioned in this embodiment can also be applied to the above-mentioned embodiments.

本发明还提出了一种低功耗的FPGA部分可重构系统,其中包括:The present invention also proposes a low-power FPGA partially reconfigurable system, which includes:

模块1、用于根据预设的接口类别,裁剪对外接口逻辑的完整静态区框架,得到多个分别对应特定接口类别的静态区逻辑,并将所有该静态区逻辑以FPGA静态区逻辑配置文件的形式放置于FPGA片外用于上电启动配置的非易失闪存中;Module 1 is used to tailor the complete static area framework of the external interface logic according to the preset interface category, obtain multiple static area logics corresponding to specific interface categories, and convert all the static area logic to the FPGA static area logic configuration file. The form is placed in the non-volatile flash memory outside the FPGA chip for power-on startup configuration;

模块2、用于根据用户动态重构指令中动态区逻辑特性,得到用户逻辑运行所需接口,并根据该用户逻辑运行所需接口所属的接口类别,选择该非易失闪存中包含有相应接口的FPGA静态区逻辑配置文件,再根据该用户动态重构指令进行FPGA片内动态区配置。Module 2 is used to obtain the interface required for user logic operation based on the logic characteristics of the dynamic area in the user's dynamic reconstruction instruction, and select the corresponding interface included in the non-volatile flash memory according to the interface category to which the interface required for user logic operation belongs. FPGA static area logic configuration file, and then configure the FPGA on-chip dynamic area according to the user's dynamic reconstruction instructions.

所述的低功耗的FPGA部分可重构系统,其中该模块2包括:The low-power FPGA partially reconfigurable system, wherein the module 2 includes:

在完成FPGA片内动态区配置后,设置重构的动态时钟单元用于对该FPGA片内静态逻辑区的CPU软核及相关逻辑模块进行动态时钟调整。After completing the configuration of the dynamic area in the FPGA chip, the reconstructed dynamic clock unit is set up to dynamically adjust the clock of the CPU soft core and related logic modules in the static logic area of the FPGA chip.

所述的低功耗的FPGA部分可重构系统,其中该模块2包括:The low-power FPGA partially reconfigurable system, wherein the module 2 includes:

根据该FPGA片内动态逻辑区对各外部存储器的使用状态,向外部存储器的控制器接口模块发送控制指令使得指定存储系统工作在自刷新状态或掉电休眠状态。According to the usage status of each external memory in the dynamic logic area within the FPGA chip, control instructions are sent to the controller interface module of the external memory to cause the designated storage system to work in a self-refresh state or a power-down sleep state.

所述的低功耗的FPGA部分可重构系统,其中该模块2包括:选择该非易失闪存中静态区逻辑的初始缺省镜像,在该初始缺省镜像的引导下进行二次重构以引导FPGA进入多重启动镜像切换状态,对FPGA片内静态区的逻辑进行重构,若重构失败中断,将自动回退到初始状态。The low-power FPGA partially reconfigurable system, wherein the module 2 includes: selecting an initial default image of the static area logic in the non-volatile flash memory, and performing a second reconstruction under the guidance of the initial default image. To guide the FPGA into the multi-boot image switching state, the logic of the static area on the FPGA chip is reconstructed. If the reconstruction fails and is interrupted, it will automatically fall back to the initial state.

Claims (6)

1.一种低功耗的FPGA部分可重构方法,其特征在于,包括:1. A low-power FPGA partially reconfigurable method, characterized by: 步骤1、根据预设的接口类别,裁剪对外接口逻辑的完整静态区框架,得到多个分别对应特定接口类别的静态区逻辑,并将所有该静态区逻辑以FPGA静态区逻辑配置镜像文件的形式分别放置于FPGA片外用于上电启动配置的非易失闪存中的不同地址区间内;Step 1. According to the preset interface category, tailor the complete static area framework of the external interface logic to obtain multiple static area logics corresponding to specific interface categories, and convert all the static area logic into the form of FPGA static area logic configuration image files. They are placed in different address ranges in the non-volatile flash memory outside the FPGA chip for power-on startup configuration; 步骤2、分析用户动态重构指令中动态区逻辑特性,得到用户逻辑运行所需接口,提取该用户逻辑运行所需接口的接口信息;Step 2: Analyze the logic characteristics of the dynamic area in the user's dynamic reconstruction instructions, obtain the interfaces required for the user's logic operation, and extract the interface information of the interfaces required for the user's logic operation; 步骤3、根据用户逻辑运行所需接口所属的接口信息,管理程序将对FPGA系统提供其运行所需的最小对外接口资源的裁剪后静态区框架配置镜像文件,提供该配置镜像文件在NorFlash中存储区间的首地址信息,引导FPGA进入多重启动镜像可切换模式;Step 3. According to the interface information of the interface required for user logic operation, the management program will provide the FPGA system with a trimmed static area frame configuration image file of the minimum external interface resources required for its operation, and provide the configuration image file for storage in NorFlash. The first address information of the interval guides the FPGA into the multi-boot image switchable mode; 步骤4、多镜像启动判断模块选择该非易失闪存中包含有相应接口的FPGA静态区逻辑配置文件,基于非易失闪存中静态区逻辑的初始缺省镜像,在该初始缺省镜像的引导下进行二次重构以引导FPGA进行多重启动镜像切换,对FPGA片内静态区的逻辑进行重构;若重构失败中断,将自动回退到初始状态;Step 4. The multi-image startup judgment module selects the FPGA static area logic configuration file containing the corresponding interface in the non-volatile flash memory. Based on the initial default image of the static area logic in the non-volatile flash memory, the initial default image is booted Perform a second reconstruction to guide the FPGA to switch between multiple boot images and reconstruct the logic of the static area in the FPGA chip; if the reconstruction fails and is interrupted, it will automatically fall back to the initial state; 步骤5、完成静态区逻辑的二次重构后,根据该用户动态重构指令进行FPGA片内动态区配置流程。Step 5: After completing the secondary reconstruction of the static area logic, perform the FPGA on-chip dynamic area configuration process according to the user's dynamic reconstruction instructions. 2.如权利要求1所述的低功耗的FPGA部分可重构方法,其特征在于,该步骤5包括完成静态区逻辑的二次重构后,通过重构动态时钟单元,降低静态区逻辑动态功耗;所述重构动态时钟单元包括:2. The low-power FPGA partially reconfigurable method according to claim 1, characterized in that step 5 includes completing the second reconstruction of the static area logic and reducing the static area logic by reconstructing the dynamic clock unit. Dynamic power consumption; the reconstructed dynamic clock unit includes: 在完成FPGA片内动态区配置后,设置重构的动态时钟单元用于对该FPGA片内静态逻辑区的CPU软核及相关逻辑模块进行动态时钟调整;当动态时钟控制单元检测到动态逻辑区完成配置开始进入稳定运行阶段的反馈信号后,动态时钟控制单元将对静态区中的CPU软核和相关逻辑模块的专属时钟发生单元进行重构,使其工作在较低的时钟频率下,所述相关逻辑模块包括:CPU软核的片内局部存储器模块、存储交互总线及控制单元;当下次部分可重构开始时,动态时钟管理单元再次重构静态区的相应时钟单元,使其恢复到高时钟频率输出状态。After completing the configuration of the dynamic area in the FPGA chip, set up the reconstructed dynamic clock unit to dynamically adjust the CPU soft core and related logic modules in the static logic area in the FPGA chip; when the dynamic clock control unit detects the dynamic logic area After completing the configuration of the feedback signal and entering the stable operation stage, the dynamic clock control unit will reconstruct the dedicated clock generation unit of the CPU soft core and related logic modules in the static area so that it works at a lower clock frequency, so The above-mentioned relevant logic modules include: the on-chip local memory module of the CPU soft core, the storage interactive bus and the control unit; when the next partial reconfiguration starts, the dynamic clock management unit once again reconstructs the corresponding clock unit in the static area to restore it to High clock frequency output state. 3.如权利要求1所述的低功耗的FPGA部分可重构方法,其特征在于,该步骤5包括完成静态区逻辑的二次重构后,通过低功耗访存设置,降低静态区逻辑动态功耗;所述低功耗访存设置包括:3. The low-power FPGA partially reconfigurable method according to claim 1, characterized in that step 5 includes completing the second reconstruction of the static area logic and reducing the static area through low-power memory access settings. Logic dynamic power consumption; the low-power memory access settings include: 根据该FPGA片内动态逻辑区对各外部存储器的使用状态,向外部存储器的控制器接口模块发送控制指令使得指定存储系统工作在自刷新状态或掉电休眠状态。According to the usage status of each external memory in the dynamic logic area within the FPGA chip, control instructions are sent to the controller interface module of the external memory to cause the designated storage system to work in a self-refresh state or a power-down sleep state. 4.一种低功耗的FPGA部分可重构系统,其特征在于,包括:4. A low-power FPGA partially reconfigurable system, characterized by: 模块1、用于根据预设的接口类别,裁剪对外接口逻辑的完整静态区框架,得到多个分别对应特定接口类别的静态区逻辑,并将所有该静态区逻辑以FPGA静态区逻辑配置镜像文件的形式分别放置于FPGA片外用于上电启动配置的非易失闪存中的不同地址区间内;Module 1 is used to tailor the complete static area framework of the external interface logic according to the preset interface category, obtain multiple static area logics corresponding to specific interface categories, and configure all the static area logic with FPGA static area logic image files The forms are placed in different address ranges in the non-volatile flash memory outside the FPGA chip for power-on startup configuration; 模块2、用于分析用户动态重构指令中动态区逻辑特性,得到用户逻辑运行所需接口,提取该用户逻辑运行所需接口的接口信息Module 2 is used to analyze the logic characteristics of the dynamic area in the user's dynamic reconstruction instructions, obtain the interface required for the user's logic operation, and extract the interface information of the interface required for the user's logic operation. 模块3、用于根据用户逻辑运行所需接口所属的接口信息,管理程序将对FPGA系统提供其运行所需的最小对外接口资源的裁剪后静态区框架配置镜像文件,提供该配置镜像文件在NorFlash中存储区间的首地址信息,引导FPGA进入多重启动镜像可切换模式;Module 3 is used to operate the user logic according to the interface information required by the interface. The management program will provide the FPGA system with a trimmed static area frame configuration image file of the minimum external interface resources required for its operation, and provide the configuration image file in NorFlash. The first address information in the storage area guides the FPGA to enter the multi-boot image switchable mode; 模块4、用于使多镜像启动判断模块选择该非易失闪存中包含有相应接口的FPGA静态区逻辑配置文件,基于非易失闪存中静态区逻辑的初始缺省镜像,在该初始缺省镜像的引导下进行二次重构以引导FPGA进行多重启动镜像切换,对FPGA片内静态区的逻辑进行重构;若重构失败中断,将自动回退到初始状态;Module 4 is used to enable the multi-mirror startup judgment module to select the FPGA static area logic configuration file containing the corresponding interface in the non-volatile flash memory, based on the initial default image of the static area logic in the non-volatile flash memory. A second reconstruction is performed under the guidance of the image to guide the FPGA to perform multi-boot mirror switching and reconstruct the logic of the static area in the FPGA chip; if the reconstruction fails and is interrupted, it will automatically fall back to the initial state; 模块5、用于在完成静态区逻辑的二次重构后,根据该用户动态重构指令进行FPGA片内动态区配置流程。Module 5 is used to perform the FPGA on-chip dynamic area configuration process according to the user's dynamic reconstruction instructions after completing the second reconstruction of the static area logic. 5.如权利要求4所述的低功耗的FPGA部分可重构系统,其特征在于,该模块5还用于完成静态区逻辑的二次重构后,通过重构动态时钟单元,降低静态区逻辑动态功耗;所述重构动态时钟单元包括:5. The low-power FPGA partially reconfigurable system according to claim 4, characterized in that the module 5 is also used to reduce the static area by reconstructing the dynamic clock unit after completing the second reconstruction of the static area logic. Area logic dynamic power consumption; the reconstructed dynamic clock unit includes: 在完成FPGA片内动态区配置后,设置重构的动态时钟单元用于对该FPGA片内静态逻辑区的CPU软核及相关逻辑模块进行动态时钟调整;当动态时钟控制单元检测到动态逻辑区完成配置开始进入稳定运行阶段的反馈信号后,动态时钟控制单元将对静态区中的CPU软核和相关逻辑模块的专属时钟发生单元进行重构,使其工作在较低的时钟频率下,所述相关逻辑模块包括:CPU软核的片内局部存储器模块、存储交互总线及控制单元;当下次部分可重构开始时,动态时钟管理单元再次重构静态区的相应时钟单元,使其恢复到高时钟频率输出状态。After completing the configuration of the dynamic area in the FPGA chip, set up the reconstructed dynamic clock unit to dynamically adjust the CPU soft core and related logic modules in the static logic area in the FPGA chip; when the dynamic clock control unit detects the dynamic logic area After completing the configuration of the feedback signal and entering the stable operation stage, the dynamic clock control unit will reconstruct the dedicated clock generation unit of the CPU soft core and related logic modules in the static area so that it works at a lower clock frequency, so The above-mentioned relevant logic modules include: the on-chip local memory module of the CPU soft core, the storage interactive bus and the control unit; when the next partial reconfiguration starts, the dynamic clock management unit once again reconstructs the corresponding clock unit in the static area to restore it to High clock frequency output state. 6.如权利要求4所述的低功耗的FPGA部分可重构系统,其特征在于,该模块5还用于完成静态区逻辑的二次重构后,通过低功耗访存设置,降低静态区逻辑动态功耗;所述低功耗访存设置包括:6. The low-power FPGA partially reconfigurable system as claimed in claim 4, characterized in that the module 5 is also used to reduce the cost through low-power memory access settings after completing the secondary reconstruction of the static area logic. Static area logic dynamic power consumption; the low-power memory access settings include: 根据该FPGA片内动态逻辑区对各外部存储器的使用状态,向外部存储器的控制器接口模块发送控制指令使得指定存储系统工作在自刷新状态或掉电休眠状态。According to the usage status of each external memory in the dynamic logic area within the FPGA chip, control instructions are sent to the controller interface module of the external memory to cause the designated storage system to work in a self-refresh state or a power-down sleep state.
CN202011478343.0A 2020-12-15 2020-12-15 A low-power FPGA partially reconfigurable method and device Active CN112597096B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011478343.0A CN112597096B (en) 2020-12-15 2020-12-15 A low-power FPGA partially reconfigurable method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011478343.0A CN112597096B (en) 2020-12-15 2020-12-15 A low-power FPGA partially reconfigurable method and device

Publications (2)

Publication Number Publication Date
CN112597096A CN112597096A (en) 2021-04-02
CN112597096B true CN112597096B (en) 2023-11-21

Family

ID=75195792

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011478343.0A Active CN112597096B (en) 2020-12-15 2020-12-15 A low-power FPGA partially reconfigurable method and device

Country Status (1)

Country Link
CN (1) CN112597096B (en)

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113342715B (en) * 2021-06-10 2024-05-24 京微齐力(北京)科技股份有限公司 Field programmable gate array device and memory power supply control method
CN113505096B (en) * 2021-08-05 2022-10-18 北京极光星通科技有限公司 Satellite-borne laser communication terminal and power consumption control method thereof
CN113887164B (en) * 2021-09-30 2025-02-07 中国科学院计算技术研究所 A FPGA continuous integration development method and system for SSI devices
CN114722752A (en) * 2022-04-26 2022-07-08 无锡华普微电子有限公司 Partially dynamically reconfigurable FPGA remote maintenance method, device and system
CN114968487B (en) * 2022-05-24 2025-01-24 中国科学院计算技术研究所 A method and system for performing dynamic partial reconfigurable configuration of FPGA in a virtual machine, a storage medium, and an electronic device
CN114911749A (en) * 2022-07-20 2022-08-16 中科亿海微电子科技(苏州)有限公司 Method with programmable logic block partial reconfiguration function and FPGA
CN116029242B (en) * 2022-12-23 2025-04-04 中国科学院计算技术研究所 A cloud-native hardware logic simulation FPGA acceleration method and system
CN116301662B (en) * 2023-05-12 2023-08-01 合肥联宝信息技术有限公司 Method for managing power consumption of solid-state hard disk and solid-state hard disk
CN116738912B (en) * 2023-08-09 2023-10-27 中科亿海微电子科技(苏州)有限公司 EDA software reconfigurable function automation method and electronic equipment

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106775869A (en) * 2016-12-16 2017-05-31 四川九洲电器集团有限责任公司 A kind of loading method and terminal device
CN110287141A (en) * 2019-06-27 2019-09-27 天津津航计算技术研究所 A kind of FPGA reconstructing method and system based on multiple interfaces

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
ES2562153B2 (en) * 2015-08-10 2016-10-07 Keysight Technologies Singapore (Holdings) Pte. Ltd. System and hardware configuration method of programmable control, test and measurement instruments

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106775869A (en) * 2016-12-16 2017-05-31 四川九洲电器集团有限责任公司 A kind of loading method and terminal device
CN110287141A (en) * 2019-06-27 2019-09-27 天津津航计算技术研究所 A kind of FPGA reconstructing method and system based on multiple interfaces

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
可重构系统原型设计及动态重构技术实现;高鑫;赵东阳;吕众;杨志来;;科技创新与应用(第15期);全文 *
基于FPGA的动态自重构系统原理与实现;徐彦峰;张丽娟;谢文虎;;电子与封装(第09期);全文 *

Also Published As

Publication number Publication date
CN112597096A (en) 2021-04-02

Similar Documents

Publication Publication Date Title
CN112597096B (en) A low-power FPGA partially reconfigurable method and device
US9454498B1 (en) Integrated circuit with programmable circuitry and an embedded processor system
CN103370878B (en) Power management in integrated circuit
Liu et al. Run-time Partial Reconfiguration speed investigation and architectural design space exploration
Murali et al. A methodology for mapping multiple use-cases onto networks on chips
TWI601065B (en) Multi-cpu system and computing system having the same
US8549463B2 (en) Die expansion bus
US8601288B2 (en) Intelligent power controller
DE112015002522B4 (en) System-on-a-chip with always-on processor that reconfigures SOC and supports memory-only communication mode
US7992020B1 (en) Power management with packaged multi-die integrated circuit
US7498835B1 (en) Implementation of low power standby modes for integrated circuits
CN103354977A (en) Extending a processor system within an integrated circuit
US10921874B2 (en) Hardware-based operating point controller for circuit regions in an integrated circuit
WO2019052461A1 (en) Method and device for managing power consumption of server
US8832355B2 (en) Storage device, storage controlling device, and storage controlling method
CN108983945A (en) The selective power gating of interconnection resource configuration bit for programmable logic device
CN112347035B (en) Remote FPGA equipment-oriented dynamic part reconfigurable configuration device and method
DE102021119090A1 (en) SETUP AND PROCEDURE FOR ENERGY EFFICIENT NUCLEAR VOLTAGE SELECTION
Goossens et al. rdwired Networks on Chip in FPGAs to Unify Functional and Configuration Interconnects
US20250271826A1 (en) Systems and methods to reduce voltage guardband
CN115441867A (en) Phase-locked loop assisted fast start apparatus and method
TWI870031B (en) Apparatus with configurable functionality and method for configuring an integrated circuit with configurable functionality
Fanni et al. Automated power gating methodology for dataflow-based reconfigurable systems
RU2686004C1 (en) Computing module
CN117857889A (en) On-board processing system based on domestic AI chip

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant