[go: up one dir, main page]

WO2018171715A1 - Procédé et système de conception automatisée applicables à un processeur de réseau neuronal - Google Patents

Procédé et système de conception automatisée applicables à un processeur de réseau neuronal Download PDF

Info

Publication number
WO2018171715A1
WO2018171715A1 PCT/CN2018/080200 CN2018080200W WO2018171715A1 WO 2018171715 A1 WO2018171715 A1 WO 2018171715A1 CN 2018080200 W CN2018080200 W CN 2018080200W WO 2018171715 A1 WO2018171715 A1 WO 2018171715A1
Authority
WO
WIPO (PCT)
Prior art keywords
neural network
hardware
file
data
instruction
Prior art date
Application number
PCT/CN2018/080200
Other languages
English (en)
Chinese (zh)
Inventor
韩银和
许浩博
王颖
Original Assignee
中国科学院计算技术研究所
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中国科学院计算技术研究所 filed Critical 中国科学院计算技术研究所
Publication of WO2018171715A1 publication Critical patent/WO2018171715A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/30Circuit design
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks

Definitions

  • the present invention relates to the technical field of neural network processor architecture, and in particular to an automatic design method and system for a neural network processor.
  • the existing neural network hardware acceleration technology includes an Application Specific Integrated Circuit (ASIC) chip and a Field Programmable Gate Array (FPGA).
  • ASIC Application Specific Integrated Circuit
  • FPGA Field Programmable Gate Array
  • the ASIC chip runs at a speed. Fast and low power consumption, but the design process is complex, the filming period is long, the development cost is high, and it can not adapt to the characteristics of rapid update of the neural network model; FPGA has the characteristics of flexible circuit configuration and short development cycle, but the running speed is relatively low, hardware overhead And the power consumption is relatively large.
  • the neural network model and algorithm developers need to master the hardware development technology while understanding the network topology and data flow mode, including processor architecture design, hardware code writing, Simulation verification and layout and other aspects, these technologies are more difficult for high-level application developers who focus on researching neural network models and structural design without hardware design capabilities.
  • the present invention provides an automatic design method and system for a neural network processor, so that high-level developers can efficiently perform neural network technology application development.
  • the present invention provides an automated design method for a neural network processor, the method comprising:
  • Step A acquiring a neural network model topology configuration file and a hardware resource constraint file of the target hardware circuit for the neural network model to be implemented in a hardware circuit manner;
  • Step B construct a neural network processor hardware architecture corresponding to the neural network model according to the neural network model topology configuration file and the hardware resource constraint file, and generate a hardware architecture description file;
  • Step C Generate a control description file for controlling data scheduling, storage, and calculation of the neural network processor according to the neural network model topology, the hardware resource constraint file, and the hardware architecture description file;
  • Step D Generate a hardware circuit description language corresponding to the neural network processor according to the hardware architecture description file and the control description file, so as to implement a hardware circuit of the neural network processor on the target hardware circuit.
  • the step B may include:
  • Various types of unit components each unit including a hardware description file and a configuration script for describing its hardware structure
  • the hardware resource constraint file may include one or more of the following: an operating frequency of the target hardware circuit, a target circuit area overhead, a target circuit power consumption overhead, a supported data precision, and a target circuit memory size.
  • the neural network model topology configuration file may include the number of neural network layers and each layer network size, data bit width, weight bit width, current layer function attribute, current layer input layer number, current layer output layer Number, current layer convolution kernel size, current layer step size, next layer connection property.
  • the step C may further include determining a convolution kernel splitting and data sharing manner by the following steps:
  • the convolution kernel k is divided into a plurality of convolution kernels k s ; if the number of data layers is greater than the calculation unit width, a data sharing manner is adopted;
  • control description file may include an instruction stream for controlling data scheduling, storage, and calculation of the neural network processor, wherein the types of instructions include load/store instructions and operation instructions.
  • the load/store instructions may include data exchange instructions between an external memory and an internal memory of the neural network processor, data for the internal memory, and weight loading to the computing unit. And an instruction for storing the calculation result of the calculation unit into the memory; and the operation instruction includes a convolution operation instruction, a pooling operation instruction, a local corresponding normalization instruction, a clear instruction, and an excitation function operation instruction.
  • the format of the convolutional instruction may include the following fields: an opcode for marking the type of instruction, a number of computation cores for marking the number of computational cores participating in the operation, and a transmission interval for marking each operation of the instruction.
  • the transmission interval for marking the mode of convolution and cross-layer convolution in the layer, and the destination register for marking the storage location of the calculation result.
  • the present invention provides an automated design method for a neural network processor, comprising:
  • the control description file from the constructed neural network reusable cell library to find a cell library that meets the design requirements, generate corresponding control logic, and generate a corresponding hardware circuit description language, Converting the hardware circuit description language into a hardware circuit.
  • the neural network model topology configuration file may include the number of neural network layers and each layer network size, data bit width, weight bit width, current layer function attribute, current layer input layer number, current layer output layer number, current Layer convolution kernel size, current layer step size, next layer connection property.
  • the method further includes generating a stream of control instructions while generating a neural network circuit model, the types of instructions including load/store instructions and arithmetic instructions.
  • the step 3) may include: performing convolution kernel partitioning, data partitioning according to the neural network model topology configuration file, and generating a control state machine; generating a control instruction according to the control state machine flow.
  • the hardware architecture description file includes input data memory capacity, input memory bit width, weight memory capacity, weight memory bit width, offset memory capacity, offset memory bit width, output data memory capacity, output data Memory bit width, data bit width, calculation unit width, calculation unit depth, data sharing flag bit, and weight sharing flag bit.
  • the present invention also provides an automated design system for a neural network processor, comprising:
  • Obtaining a data module configured to acquire a neural network model topology configuration file and a hardware resource constraint file, where the hardware resource constraint file includes a target circuit area overhead, a target circuit power consumption overhead, and a target circuit operating frequency;
  • Generating a hardware architecture description file module configured to generate a neural network processor hardware architecture according to the neural network model topology configuration file and the hardware resource constraint file, and generate a hardware architecture description file;
  • Generating a control description file module configured to optimize a data scheduling, storage, and calculation manner according to the neural network model topology, the hardware resource constraint file, and the hardware architecture description file, to generate a corresponding control description file;
  • Generating a hardware circuit module configured to search, according to the hardware architecture description file, the control description file, a cell library that meets a design requirement from a constructed neural network reusable cell library, and generate a corresponding hardware circuit description language, where The hardware circuit description language is converted into a hardware circuit.
  • the present invention also provides an optimization method based on an automated design method for a neural network processor as described, comprising:
  • Step (1) for a given neural network layer, if the convolution kernel size k is consistent with the step value s, the weight sharing mode is adopted, and the convolution kernel performs a convolution operation in the single layer data graph;
  • Step (2) if the number of data layers is smaller than the calculation unit width, the convolution kernel is divided into multiple convolution kernels k s by a method of convolution kernel division; if the number of data layers is greater than the calculation unit width, data is used. Sharing method
  • step (3) the calculation mode of the next neural network layer is determined, and the calculation result of the current layer is stored according to the convolution operation mode of the next neural network layer.
  • the neural network model is mapped to the hardware description language code for designing the hardware circuit, and the designed hardware circuit structure and data storage mode are automatically optimized according to hardware resource constraints and network characteristics, and the corresponding control instruction stream is generated at the same time, and the neural network is realized.
  • the hardware and software automation of the hardware accelerator is designed to shorten the design cycle of the neural network processor, improve the performance of the neural network processor, and meet the neural network operation requirements of the upper application developers.
  • Figure 1 shows a schematic diagram of a topology common to neural networks
  • Figure 2 shows a schematic block diagram of a neural network convolution operation
  • Figure 3 shows a schematic block diagram of a common structure according to a neural network processor
  • FIG. 4 is a schematic diagram of an automated design flow of a neural network processor in accordance with one embodiment of the present invention.
  • FIG. 5 is a schematic diagram of a compiler workflow according to an embodiment of the present invention.
  • FIG. 6 is a flow chart of a control state machine for performing a convolution operation by a neural network processor in accordance with one embodiment of the present invention
  • FIG. 7 is a schematic diagram of the operation of a convolution kernel in a weight sharing mode according to an embodiment of the present invention.
  • FIG. 8 is a schematic diagram of a convolution kernel split according to an embodiment of the present invention.
  • Figure 9 is a diagram of an instruction format of a load/store instruction
  • Fig. 10 is a diagram showing an instruction format of an arithmetic instruction.
  • the neural network is a mathematical model for modeling human brain structure and behavioral activities. It is usually divided into input layer, hidden layer and output layer. Each layer is composed of multiple neuron nodes. The neuron nodes in this layer are composed. The output value is passed as input to the next level of neuron nodes, connected layer by layer.
  • the neural network itself has bionic characteristics, and its multi-layer abstract iterative process has similar information processing methods as the human brain and other sensory organs.
  • Figure 1 shows a common topology diagram of a neural network.
  • the first layer input value of the neural network multilayer structure is the original image (the "original image” in the present invention refers to the original data to be processed, not only the image obtained by taking a photograph in a narrow sense), typically, for
  • the convolution operation process in the neural network is generally as shown in Fig. 2: a K*K-sized two-dimensional weight convolution check feature map is scanned, and the inner weight of the corresponding feature elements in the feature map is obtained during the scanning process. And sum all the inner product values to get an output layer feature element.
  • N convolution kernels of K*K size are convoluted with the feature maps in the convolutional layer, and N inner product values are summed to obtain an output layer.
  • Feature element N convolution kernels of K*K size
  • the neural network operation may also include pooling, normalization calculation, and the like.
  • hardware acceleration techniques are usually used to construct a dedicated neural network processor to implement neural network computing.
  • common hardware acceleration technologies include ASIC or FPGA.
  • FPGAs are more flexible from a design perspective.
  • Verilog HDL Hard Description Language
  • VHDL Very High Speed L1
  • other hardware description language can be used to define internal logic structures to implement custom hardware circuits.
  • Common neural network processors are based on storage-control-calculation logic structures.
  • the storage structure is configured to store data participating in the calculation, the weight of the neural network, and the operation instruction of the processor;
  • the control structure includes a decoding circuit and a control logic circuit for parsing the operation instruction, generating a control signal to control scheduling of the data in the processor, and
  • Storage and computational processes of neural networks; computational structures are responsible for neural network computational operations.
  • the storage unit may store data transmitted from outside the neural network processor (for example, original feature map data), trained neural network weights, processing results or intermediate results generated in the calculation process, instruction information participating in the calculation, and the like.
  • FIG. 3 is a schematic diagram showing a conventional neural network processor system 101 including an input data storage unit 102, a control unit 103, an output data storage unit 104, a weight storage unit 105, and an instruction storage unit 106. , calculation unit 107.
  • the input data storage unit 102 is configured to store data participating in the calculation, the data includes original feature map data and data participating in the intermediate layer calculation;
  • the output data storage unit 104 stores the calculated neuron response value;
  • the instruction storage unit 106 stores the participating calculations.
  • the instruction information is interpreted as a control flow to schedule neural network calculations;
  • the weight storage unit 105 is configured to store the trained neural network weights.
  • the control unit 103 is connected to the output data storage unit 104, the weight storage unit 105, the instruction storage unit 106, and the calculation unit 107, respectively.
  • the control unit 103 is responsible for instruction decoding, data scheduling, process control, etc., for example, is obtained and stored in the instruction storage unit 106.
  • the instruction in the instruction parses the instruction, schedules the data according to the parsed control signal, and controls the computing unit to perform a correlation operation of the neural network.
  • the calculation unit 107 is configured to perform a corresponding neural network calculation according to a control signal generated by the control unit 103.
  • the computing unit 107 is associated with one or more storage units, which may obtain data from a data storage component in its associated input data storage unit 102 for calculation and may output to the output data storage unit 104 associated therewith. data input.
  • the computing unit 107 performs most of the operations in the neural network algorithm, including vector multiply and add operations, pooling, normalization calculations, and the like.
  • the topology and parameter design of the neural network model will change according to different application scenarios or application requirements, and the development of the neural network model is fast, which is a neural network model and algorithm.
  • High-level application developers bring great development challenges, not only need to quickly design or adjust related hardware acceleration solutions for different application requirements, but also require high-level developers to understand hardware development such as FPGA while mastering neural network models and algorithms. Technology, so development is very difficult.
  • an automated design method system or apparatus suitable for a neural network processor comprising a hardware generator and a compiler; wherein the hardware generator is based on a neural network model and hardware
  • the resource constraint automatically generates the hardware description language code of the neural network processor for subsequent hardware designers to generate the hardware circuit of the neural network processor through the hardware description language by using the existing hardware circuit design method; and the compiler can be used to generate the neural network processor Control of the circuit structure and flow of data dispatch instructions.
  • the hardware generator may construct a neural network processor hardware architecture according to a topology of the neural network model, a hardware resource constraint file, and a constructed neural network reusable cell library according to the processor hardware architecture.
  • the control state machine generated by the compiler generates the hardware description language code.
  • the system may also include a pre-built neural network reusable cell library, which may include various recoverable neural network models.
  • the basic units used include, for example but are not limited to: a neuron unit, an accumulator unit, a pooling unit, a classifier unit, a local response normalization unit, a lookup table unit, an address generation unit, a control unit, and the like.
  • the specific hardware structure of each unit in the cell library is defined by the hardware description file associated with it.
  • the hardware description file for each unit can be described in Verilog HDL or other hardware description language.
  • each unit also has a configuration script associated therewith, by which the hardware structure of each unit can be appropriately adjusted; for example, configuring the bit width of the register in the neuron unit, and adding the addition included in the configuration addition tree unit The number of devices, the number of comparators in the configuration pooling unit, and so on.
  • FIG. 4 illustrates a workflow applicable to a neural network processor automation design system according to an embodiment of the present invention, which may mainly include:
  • the neural network model topology configuration file is read.
  • the neural network model topology configuration file is mainly used to describe a neural network model designed according to specific application requirements, including a network topology of the neural network model and various operational layer definitions.
  • the neural network model topology configuration file may include a number of neural network layers, a network size and structure of each layer, a data bit width, a weight bit width, a current layer function attribute, a current layer input layer number, a current layer output layer number, Current layer convolution kernel size, current layer step size, next layer connection properties, and so on.
  • Each layer of the neural network includes one or more units, and the unit types are usually basic neuron units, convolution units, pool units, normalization units, loop units, and the like.
  • the neural network model description file may include three parts: a basic attribute, a parameter description, and a connection information, wherein the basic attribute may include a layer name, a layer type, a layer structure, and the like; the parameter description may include an output layer number, a convolution kernel size, Step size, etc.; connection information may include connection name, connection direction, connection type, and the like.
  • Step S2 reading a hardware resource constraint file, the hardware resource constraint file including some parameters describing the available hardware resources of the target hardware circuit of the neural network processor to be implemented, for example, may include implementing the neural network processing The operating frequency of the target hardware circuit, the target circuit area overhead, the target circuit power consumption overhead, the supported data precision, the target circuit memory size, and so on. These hardware resource constraint parameters can be loaded into the system together in a constraint file.
  • Step S3 the hardware generator of the system constructs a neural network processor hardware architecture according to the neural network model topology configuration file and the hardware resource constraint file, and generates a corresponding hardware architecture description file.
  • the hardware architecture description file may include hardware circuit structure, input data memory capacity, input memory bit width, weight memory capacity, weight memory bit width, offset memory capacity, offset memory bit width, output data memory capacity, output data memory bit. Width, data bit width, calculation unit width, calculation unit depth, data sharing flag, weight sharing flag, and so on.
  • step S4 the compiler of the system generates a control of the neural network processor circuit structure and a data dispatch instruction stream, and these processes can be described by controlling the state machine. For example, according to the neural network model topology, hardware resource constraints, and hardware architecture description files, data scheduling, storage, and calculation methods are optimized, and corresponding control description files are generated.
  • FIG. 5 is a flowchart showing a workflow of a compiler for generating a control instruction stream according to a neural network topology, a constructed hardware architecture, and a hardware resource constraint file to perform real-time on a neural network processor according to an embodiment of the present invention. control.
  • the method may include: step a1, reading the neural network model topology configuration file network topology structure configuration file, the hardware architecture description file, and the hardware resource constraint file.
  • the compiler performs scheduling optimization such as convolution kernel partitioning and data partitioning according to the above file, and generates a control state machine.
  • the control state machine can be used to schedule the operational state of the neural network processor hardware circuitry to be implemented.
  • a control instruction stream for the neural network processor is generated based on the control state machine.
  • Figure 6 depicts a partial control state machine flow diagram with a neural network processor performing a convolution operation as an example.
  • control neural network related unit reads the neural network data and the weight data from the external memory into the internal memory, and then loads the relevant neural network data, the offset data, and the weight data to be subjected to the convolution operation into the calculation unit, and then controls the calculation unit to perform multiplication. Add operation and accumulate operation, and repeat the above loading and calculation operations until the corresponding data is calculated.
  • step S5 the hardware generator indexes the cell library that meets the design requirements from the constructed neural network reusable cell library according to the hardware architecture description file and the control description file, generates corresponding control logic, and generates and generates The hardware circuit description language of the neural network processor corresponding to the neural network model. Then in step S6, the generated hardware circuit description language can be converted into a specific hardware circuit implementing the neural network processor by an existing hardware design method.
  • the neural network model is often unable to fully expand according to its model description when mapped to a hardware circuit, whereby the compiler can analyze the computational throughput of the neural network processor and On-chip memory size, which divides neural network feature data and weight data into appropriate data block storage and access.
  • the computational data of the neural network includes input feature data and trained weight data. Through good data segmentation and storage layout, the internal data bandwidth of the processor can be reduced and the storage space utilization efficiency can be improved.
  • the optimization method of the compiler based on convolution kernel partitioning and data sharing is described below with reference to Figs. 7 and 8, and mainly includes the following steps:
  • Step (1) for a given neural network layer, if the convolution kernel size k and the step value s are the same, the weight sharing mode is adopted, and the convolution kernel performs a convolution operation in the single layer data graph, as shown in FIG. ;
  • Step (2) if the number of data layers is smaller than the calculation unit width, the convolution kernel segmentation method is used to divide the large convolution kernel k into small convolution kernels k s , as shown in FIG. 8; if the data layer number is greater than Calculate the unit width and use data sharing.
  • step (3) the calculation mode of the next neural network layer is determined, and the calculation result of the current layer is stored according to the convolution operation mode of the next neural network layer.
  • the instruction streams generated by the compiler of the system for control and data scheduling of neural network processor circuit structures may be collectively referred to as instruction streams. These instruction streams are used to control the operation and operation of the designed neural network processor.
  • Instruction types can include types such as load/store instructions and arithmetic instructions. For example, where the load/store instruction can include the following instructions:
  • the instruction format of the load/store instruction is introduced.
  • the instruction format is as shown in FIG. 9, wherein the operation code is used to mark the instruction type; the transmission interval is used to mark the emission of each operation of the instruction. Interval; data first address is used to mark the data first address; operation mode is used to describe the working state of the circuit, including large convolution kernel operation, small convolution kernel operation, pooling operation, full connection operation, etc.; convolution kernel size The convolution kernel value is marked; the output image size is used to mark the output image size; the number of input layers is used to mark the number of input layers; the number of output layers is used to mark the number of output layers; and the clear signal is used to clear the data values.
  • the operation instructions may include a convolution operation instruction for controlling a convolution operation; a pooled operation instruction for controlling a pooling operation; a local corresponding normalization instruction for controlling a local response normalization operation; A clear instruction for clearing data loaded in the calculation unit; an excitation function operation instruction for controlling the operation of the excitation function and configuring the function mode, and the like.
  • the convolution instruction as an example, the instruction format of the operation instruction is introduced.
  • the instruction format is shown in Figure 10, where the operation code is used to mark the instruction type; the calculation core number is used to mark the number of calculation cores participating in the operation; the transmission interval is used to mark the instruction The transmission interval of each operation; the operation mode is used to include modes such as intra-layer convolution and cross-layer convolution; the target register is used to mark the storage location of the calculation result, including the output data memory, the excitation function register, and the lookup table register.
  • the compiler may, for example, take the following steps to generate the above described instruction stream:
  • Step b1 reading the name of the neural network layer
  • Step b2 reading in the neural network layer type
  • Step b3 parsing neural network layer parameters
  • Step b4 determining a hardware circuit structure and parameters
  • Step b5 performing scheduling optimization based on the convolution kernel segmentation and data sharing optimization manner described above in connection with FIGS. 7 and 8;
  • the instruction parameters are determined and the control flow instruction is generated according to the neural network working mode and the scheduling mode.
  • the command parameters may include, for example, a neural network layer serial number, an input layer number, an output layer number, a data size per layer, a data width, a weight width, a convolution kernel size, and the like.
  • an automated design method for a neural network processor comprising: step A, acquiring a neural network model topology configuration file for a neural network model to be implemented in a hardware circuit manner, and a hardware resource constraint file of the target hardware circuit; step B, constructing a hardware structure of the neural network processor corresponding to the neural network model according to the neural network model topology configuration file and the hardware resource constraint file, and generating a hardware architecture description a file; step C, generating a control description file for controlling data scheduling, storage, and calculation of the neural network processor according to the neural network model topology, the hardware resource constraint file, and the hardware architecture description file Step D, generating a hardware circuit description language corresponding to the neural network processor according to the hardware architecture description file and the control description file, so as to implement the hardware circuit of the neural network processor on the target hardware circuit.
  • the hardware resource constraint file may include one or more of the following: an operating frequency of the target hardware circuit, a target circuit area overhead, a target circuit power consumption overhead, a supported data precision, and a target circuit memory size.
  • the control description file may include an instruction stream for controlling data scheduling, storage, and calculation of the neural network processor, wherein the types of instructions include load/store instructions and operation instructions.
  • step B may include: acquiring the neural model components and their associated specific hardware structures according to the neural network model topology configuration file and the hardware resource constraint parameters and based on the pre-established cell library, wherein
  • the unit library is composed of various types of units reusable in a neural network, each unit includes a hardware description file and a configuration script for describing a hardware structure thereof; and a description file according to the neural network model and the description
  • the hardware resource constraint parameter sets a configuration script of each unit acquired from the unit library to obtain a description file of a hardware structure corresponding to each unit, thereby obtaining a hardware architecture description file of the neural network processor.
  • the step C may further comprise determining a convolution kernel partitioning and data sharing manner by the following steps:
  • the convolution kernel k is divided into a plurality of convolution kernels k s ; if the number of data layers is greater than the calculation unit width, a data sharing manner is adopted;
  • an automated design method for a neural network processor including: step 1), acquiring a neural network model topology configuration file and a hardware resource constraint file, wherein the hardware resource constraint file includes Target circuit area overhead, target circuit power consumption overhead, and target circuit operating frequency; step 2), generating a neural network processor hardware architecture according to the neural network model topology configuration file and the hardware resource constraint file, and generating a hardware architecture description a file; step 3), optimizing a data scheduling, storing, and calculating manner according to the neural network model topology, the hardware resource constraint file, and the hardware architecture description file, and generating a corresponding control description file; step 4)
  • the hardware architecture description file, the control description file, the cell library that meets the design requirements are searched from the constructed neural network reusable cell library, the corresponding control logic is generated, and a corresponding hardware circuit description language is generated, and the hardware circuit is generated.
  • the description language is converted to a hardware circuit.
  • the neural network model topology configuration file may include a number of neural network layers and each layer network size, a data bit width, a weight bit width, a current layer function attribute, a current layer input layer number, and a current layer output. The number of layers, the current layer convolution kernel size, the current layer step size, and the next layer of connection properties.
  • the hardware architecture description file may include input data memory capacity, input memory bit width, weight memory capacity, weight memory bit width, offset memory capacity, offset memory bit width, output data memory capacity, output Data memory bit width, data bit width, calculation unit width, calculation unit depth, data sharing flag bit, and weight sharing flag bit.
  • the method can also include generating a stream of control instructions while generating a neural network circuit model, the types of instructions including load/store instructions and arithmetic instructions.
  • step 3) may comprise: convolution kernel partitioning, data partitioning according to the neural network model topology configuration file, and generating a control state machine; generating a control instruction stream according to the control state machine.
  • an automated design apparatus for a neural network processor comprising:
  • Obtaining a data module configured to acquire a neural network model topology configuration file and a hardware resource constraint file, where the hardware resource constraint file includes a target circuit area overhead, a target circuit power consumption overhead, and a target circuit operating frequency;
  • Generating a hardware architecture description file module configured to generate a neural network processor hardware architecture according to the neural network model topology configuration file and the hardware resource constraint file, and generate a hardware architecture description file;
  • Generating a control description file module configured to optimize a data scheduling, storage, and calculation manner according to the neural network model topology, the hardware resource constraint file, and the hardware architecture description file, to generate a corresponding control description file;
  • Generating a hardware circuit module configured to search, according to the hardware architecture description file, the control description file, a cell library that meets a design requirement from a constructed neural network reusable cell library, and generate a corresponding hardware circuit description language, where The hardware circuit description language is converted into a hardware circuit.
  • the neural network reusable unit library includes: a neuron unit, an accumulator unit, a pooling unit, a classifier unit, a local response normalization unit, a lookup table unit, an address generation unit, and a control unit .
  • the generating the control description file includes: performing convolution kernel partitioning, data partitioning according to the neural network model topology configuration file, and generating a control state machine; generating a control instruction according to the control state machine flow.
  • the automatic design system applicable to the neural network processor can map the neural network model to the hardware description language of the neural network dedicated processor, and optimize the data calculation and scheduling according to the processor structure.
  • the method and the corresponding control flow instruction are generated, which realizes the automatic design of the neural network processor, reduces the design cycle of the neural network processor, and adapts to the neural network technology network model update fast, the operation speed requirement block, and the energy efficiency requirement.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computer Hardware Design (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Geometry (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Devices For Executing Special Programs (AREA)
  • Design And Manufacture Of Integrated Circuits (AREA)

Abstract

L'invention concerne un procédé et un système de conception automatisée applicables à un processeur de réseau neuronal. Le procédé consiste à : acquérir un fichier de configuration de structure topologique d'un modèle de réseau neuronal et un fichier de contrainte de ressource matérielle d'un circuit matériel cible ; construire, en fonction du fichier de configuration de structure topologique du modèle de réseau neuronal et du fichier de contrainte de ressource matérielle, une architecture matérielle et un fichier descripteur de celle-ci, d'un processeur de réseau neuronal correspondant au modèle de réseau neuronal, ainsi qu'un fichier descripteur de commande afin de commander la planification, la mémorisation et le calcul de données du processeur de réseau neuronal, puis, sur la base du fichier descripteur d'architecture matérielle et du fichier descripteur de commande, produire un code de descripteur de matériel avec le processeur de réseau neuronal, de façon à réaliser un circuit matériel du processeur de réseau neuronal sur le circuit matériel cible. Le système et le procédé réalisent une conception automatisée d'un processeur de réseau neuronal, réduisent le cycle de conception du processeur de réseau neuronal et s'adaptent aux caractéristiques de la mise à jour rapide de modèles de réseau dans la technologie du réseau neuronal et aux exigences de vitesse de fonctionnement élevée.
PCT/CN2018/080200 2017-03-23 2018-03-23 Procédé et système de conception automatisée applicables à un processeur de réseau neuronal WO2018171715A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201710178679.7 2017-03-23
CN201710178679.7A CN107016175B (zh) 2017-03-23 2017-03-23 适用神经网络处理器的自动化设计方法、装置及优化方法

Publications (1)

Publication Number Publication Date
WO2018171715A1 true WO2018171715A1 (fr) 2018-09-27

Family

ID=59444868

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/080200 WO2018171715A1 (fr) 2017-03-23 2018-03-23 Procédé et système de conception automatisée applicables à un processeur de réseau neuronal

Country Status (2)

Country Link
CN (1) CN107016175B (fr)
WO (1) WO2018171715A1 (fr)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022068343A1 (fr) * 2020-09-30 2022-04-07 International Business Machines Corporation Accélérateur de réseau neuronal cartographié en mémoire pour systèmes d'inférence déployables
CN116861971A (zh) * 2023-06-27 2023-10-10 北京微电子技术研究所 一种面向神经网络处理器的高效固件运行系统

Families Citing this family (74)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107016175B (zh) * 2017-03-23 2018-08-31 中国科学院计算技术研究所 适用神经网络处理器的自动化设计方法、装置及优化方法
CN107480789B (zh) * 2017-08-07 2020-12-29 北京中星微电子有限公司 一种深度学习模型的高效转换方法及装置
CN107480115B (zh) * 2017-08-31 2021-04-06 郑州云海信息技术有限公司 一种caffe框架残差网络配置文件格式转换方法及系统
CN107578098B (zh) * 2017-09-01 2020-10-30 中国科学院计算技术研究所 基于脉动阵列的神经网络处理器
CN109697509B (zh) * 2017-10-24 2020-10-20 上海寒武纪信息科技有限公司 处理方法及装置、运算方法及装置
CN107918794A (zh) * 2017-11-15 2018-04-17 中国科学院计算技术研究所 基于计算阵列的神经网络处理器
WO2019114842A1 (fr) 2017-12-14 2019-06-20 北京中科寒武纪科技有限公司 Appareil à puce de circuit intégré
CN111126588B (zh) * 2017-12-14 2023-05-23 中科寒武纪科技股份有限公司 集成电路芯片装置及相关产品
WO2019136758A1 (fr) * 2018-01-15 2019-07-18 深圳鲲云信息科技有限公司 Procédé et système d'optimisation de matériel d'appareil de traitement d'intelligence artificielle, support d'informations et terminal
CN108280305B (zh) * 2018-01-30 2020-03-13 西安交通大学 基于深度学习的散热器件冷却通道快速拓扑优化设计方法
JPWO2019181137A1 (ja) * 2018-03-23 2021-03-25 ソニー株式会社 情報処理装置および情報処理方法
CN108764483B (zh) * 2018-03-29 2021-05-18 杭州必优波浪科技有限公司 低算力要求的神经网络分块优化方法及分块优化器
CN108564168B (zh) * 2018-04-03 2021-03-09 中国科学院计算技术研究所 一种对支持多精度卷积神经网络处理器的设计方法
CN109643229B (zh) * 2018-04-17 2022-10-04 深圳鲲云信息科技有限公司 网络模型的应用开发方法、平台及计算机可读存储介质
CN110555334B (zh) * 2018-05-30 2022-06-07 东华软件股份公司 人脸特征确定方法、装置、存储介质及电子设备
US11663461B2 (en) 2018-07-05 2023-05-30 International Business Machines Corporation Instruction distribution in an array of neural network cores
CN109255148B (zh) * 2018-07-27 2023-01-31 石家庄创天电子科技有限公司 力学产品设计方法及其系统
US10728954B2 (en) 2018-08-07 2020-07-28 At&T Intellectual Property I, L.P. Automated network design and traffic steering
CN110825311B (zh) * 2018-08-10 2023-04-18 昆仑芯(北京)科技有限公司 用于存储数据的方法和装置
CN109086875A (zh) * 2018-08-16 2018-12-25 郑州云海信息技术有限公司 一种基于宏指令集的卷积网络加速方法及装置
CN109409510B (zh) * 2018-09-14 2022-12-23 深圳市中科元物芯科技有限公司 神经元电路、芯片、系统及其方法、存储介质
CN109359732B (zh) * 2018-09-30 2020-06-09 阿里巴巴集团控股有限公司 一种芯片及基于其的数据处理方法
CN110991161B (zh) * 2018-09-30 2023-04-18 北京国双科技有限公司 相似文本确定方法、神经网络模型获得方法及相关装置
CN111079909B (zh) * 2018-10-19 2021-01-26 安徽寒武纪信息科技有限公司 运算方法、系统及相关产品
CN111079925B (zh) * 2018-10-19 2021-04-09 中科寒武纪科技股份有限公司 运算方法、装置及相关产品
CN111078291B (zh) * 2018-10-19 2021-02-09 中科寒武纪科技股份有限公司 运算方法、系统及相关产品
CN111079924B (zh) * 2018-10-19 2021-01-08 中科寒武纪科技股份有限公司 运算方法、系统及相关产品
CN111078284B (zh) * 2018-10-19 2021-02-05 中科寒武纪科技股份有限公司 运算方法、系统及相关产品
CN111079913B (zh) * 2018-10-19 2021-02-05 中科寒武纪科技股份有限公司 运算方法、装置及相关产品
CN111079912B (zh) * 2018-10-19 2021-02-12 中科寒武纪科技股份有限公司 运算方法、系统及相关产品
CN111078125B (zh) * 2018-10-19 2021-01-29 中科寒武纪科技股份有限公司 运算方法、装置及相关产品
CN111079907B (zh) * 2018-10-19 2021-01-26 安徽寒武纪信息科技有限公司 运算方法、装置及相关产品
CN111078283B (zh) * 2018-10-19 2021-02-09 中科寒武纪科技股份有限公司 运算方法、装置及相关产品
CN111079915B (zh) * 2018-10-19 2021-01-26 中科寒武纪科技股份有限公司 运算方法、装置及相关产品
CN111079916B (zh) * 2018-10-19 2021-01-15 安徽寒武纪信息科技有限公司 运算方法、系统及相关产品
CN111078281B (zh) * 2018-10-19 2021-02-12 中科寒武纪科技股份有限公司 运算方法、系统及相关产品
CN111078280B (zh) * 2018-10-19 2021-01-26 中科寒武纪科技股份有限公司 运算方法、装置及相关产品
WO2020078446A1 (fr) * 2018-10-19 2020-04-23 中科寒武纪科技股份有限公司 Procédé et appareil de calcul, et produit associé
CN111079910B (zh) * 2018-10-19 2021-01-26 中科寒武纪科技股份有限公司 运算方法、装置及相关产品
CN111078285B (zh) * 2018-10-19 2021-01-26 中科寒武纪科技股份有限公司 运算方法、系统及相关产品
CN111079911B (zh) * 2018-10-19 2021-02-09 中科寒武纪科技股份有限公司 运算方法、系统及相关产品
CN111078293B (zh) * 2018-10-19 2021-03-16 中科寒武纪科技股份有限公司 运算方法、装置及相关产品
CN111079914B (zh) * 2018-10-19 2021-02-09 中科寒武纪科技股份有限公司 运算方法、系统及相关产品
CN111078282B (zh) * 2018-10-19 2020-12-22 安徽寒武纪信息科技有限公司 运算方法、装置及相关产品
CN111104120B (zh) * 2018-10-29 2023-12-22 赛灵思公司 神经网络编译方法、系统及相应异构计算平台
CN111144561B (zh) * 2018-11-05 2023-05-02 杭州海康威视数字技术股份有限公司 一种神经网络模型确定方法及装置
CN111240682B (zh) * 2018-11-28 2024-11-08 深圳市中兴微电子技术有限公司 一种指令数据的处理方法及装置、设备、存储介质
WO2020107265A1 (fr) * 2018-11-28 2020-06-04 深圳市大疆创新科技有限公司 Dispositif de traitement de réseau neuronal, procédé de commande et système informatique
CN111542818B (zh) * 2018-12-12 2023-06-06 深圳鲲云信息科技有限公司 一种网络模型数据存取方法、装置及电子设备
CN111325311B (zh) * 2018-12-14 2024-03-29 深圳云天励飞技术有限公司 用于图像识别的神经网络模型生成方法及相关设备
CN111381979B (zh) * 2018-12-29 2023-05-23 杭州海康威视数字技术股份有限公司 神经网络的开发验证方法、装置、系统及存储介质
CN109799977B (zh) * 2019-01-25 2021-07-27 西安电子科技大学 指令程序开发调度数据的方法及系统
CN109978160B (zh) * 2019-03-25 2021-03-02 中科寒武纪科技股份有限公司 人工智能处理器的配置装置、方法及相关产品
CN111767078B (zh) * 2019-04-02 2024-08-06 上海寒武纪信息科技有限公司 数据运行方法、装置和相关产品
CN111865640B (zh) * 2019-04-30 2023-09-26 华为技术服务有限公司 一种网络架构描述方法及其装置、介质
CN110210605B (zh) * 2019-05-31 2023-04-07 Oppo广东移动通信有限公司 硬件算子匹配方法及相关产品
CN112132271A (zh) * 2019-06-25 2020-12-25 Oppo广东移动通信有限公司 神经网络加速器运行方法、架构及相关装置
CN110443357B (zh) * 2019-08-07 2020-09-15 上海燧原智能科技有限公司 卷积神经网络计算优化方法、装置、计算机设备及介质
CN115462079A (zh) * 2019-08-13 2022-12-09 深圳鲲云信息科技有限公司 神经网络数据流加速方法、装置、计算机设备及存储介质
WO2021031154A1 (fr) * 2019-08-21 2021-02-25 深圳市大疆创新科技有限公司 Procédé et dispositif de chargement d'une carte de caractéristiques d'un réseau neuronal
WO2021068253A1 (fr) * 2019-10-12 2021-04-15 深圳鲲云信息科技有限公司 Procédé et appareil de simulation de matériel de flux de données personnalisé, dispositif, et support de stockage
CN111339027B (zh) * 2020-02-25 2023-11-28 中国科学院苏州纳米技术与纳米仿生研究所 可重构的人工智能核心与异构多核芯片的自动设计方法
CN113407330A (zh) * 2020-03-16 2021-09-17 中国移动通信有限公司研究院 一种加速能力的匹配方法及装置、设备、存储介质
US11748533B2 (en) * 2020-06-10 2023-09-05 Texas Instruments Incorporated Programmatic circuit partitioning and topology identification
CN111563483B (zh) * 2020-06-22 2024-06-11 武汉芯昌科技有限公司 一种基于精简lenet5模型的图像识别方法及系统
WO2022039334A1 (fr) * 2020-08-21 2022-02-24 주식회사 딥엑스 Unité de traitement de réseau neuronal
WO2022135599A1 (fr) * 2020-12-25 2022-06-30 中科寒武纪科技股份有限公司 Dispositif, carte et procédé pour fusionner des structures de ramification, et support de stockage lisible
US11693692B2 (en) 2021-06-17 2023-07-04 International Business Machines Corporation Program event recording storage alteration processing for a neural network accelerator instruction
CN113657059B (zh) * 2021-08-17 2023-05-09 成都视海芯图微电子有限公司 一种适用于点云数据处理器的自动化设计方法及装置
CN114328039B (zh) * 2021-11-24 2025-03-25 山东产研鲲云人工智能研究院有限公司 基于随机网络拓扑的硬件运行验证方法、设备及存储介质
CN114239479A (zh) * 2021-12-15 2022-03-25 上海季丰电子股份有限公司 电路模块复用设计方法、装置、计算机设备及存储介质
CN114399019B (zh) * 2021-12-30 2025-06-10 南京风兴科技有限公司 神经网络编译方法、系统、计算机设备及存储介质
CN115115043B (zh) * 2022-06-20 2025-03-18 上海交通大学 片上-片间互连的神经网络芯片硬件架构设计方法及系统
CN114968602B (zh) * 2022-08-01 2022-10-21 成都图影视讯科技有限公司 资源动态分配型神经网络芯片的构架、方法和设备

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106022468A (zh) * 2016-05-17 2016-10-12 成都启英泰伦科技有限公司 人工神经网络处理器集成电路及该集成电路的设计方法
CN106355244A (zh) * 2016-08-30 2017-01-25 深圳市诺比邻科技有限公司 卷积神经网络的构建方法及系统
CN106529670A (zh) * 2016-10-27 2017-03-22 中国科学院计算技术研究所 一种基于权重压缩的神经网络处理器、设计方法、芯片
CN107016175A (zh) * 2017-03-23 2017-08-04 中国科学院计算技术研究所 适用神经网络处理器的自动化设计方法、装置及优化方法

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106022468A (zh) * 2016-05-17 2016-10-12 成都启英泰伦科技有限公司 人工神经网络处理器集成电路及该集成电路的设计方法
CN106355244A (zh) * 2016-08-30 2017-01-25 深圳市诺比邻科技有限公司 卷积神经网络的构建方法及系统
CN106529670A (zh) * 2016-10-27 2017-03-22 中国科学院计算技术研究所 一种基于权重压缩的神经网络处理器、设计方法、芯片
CN107016175A (zh) * 2017-03-23 2017-08-04 中国科学院计算技术研究所 适用神经网络处理器的自动化设计方法、装置及优化方法

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
WANG, YING ET AL.: "DeepBurning: Automatic Generation of FPGA-Based Learning Accelerators for the Neural Network Family", DESIGN AUTOMATION CONFERENCE (DAC), 9 June 2016 (2016-06-09), XP055540634 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022068343A1 (fr) * 2020-09-30 2022-04-07 International Business Machines Corporation Accélérateur de réseau neuronal cartographié en mémoire pour systèmes d'inférence déployables
GB2614851A (en) * 2020-09-30 2023-07-19 Ibm Memory-mapped neural network accelerator for deployable inference systems
CN116861971A (zh) * 2023-06-27 2023-10-10 北京微电子技术研究所 一种面向神经网络处理器的高效固件运行系统

Also Published As

Publication number Publication date
CN107016175A (zh) 2017-08-04
CN107016175B (zh) 2018-08-31

Similar Documents

Publication Publication Date Title
WO2018171715A1 (fr) Procédé et système de conception automatisée applicables à un processeur de réseau neuronal
WO2018171717A1 (fr) Procédé et système de conception automatisée pour processeur de réseau neuronal
Xu et al. Autodnnchip: An automated dnn chip predictor and builder for both fpgas and asics
Ma et al. Hardware implementation and optimization of tiny-YOLO network
Xu et al. CaFPGA: An automatic generation model for CNN accelerator
CN106775905A (zh) 基于fpga的高级综合实现拟牛顿算法加速的方法
US20210312278A1 (en) Method and apparatus with incremental learning moddel
CN111563582A (zh) 一种在fpga上实现及优化加速卷积神经网络的方法
Odetola et al. 2l-3w: 2-level 3-way hardware-software co-verification for the mapping of deep learning architecture (dla) onto fpga boards
Gan et al. High performance reconfigurable computing for numerical simulation and deep learning
Yu et al. Hardware implementation of CNN based on FPGA for EEG signal patterns recognition
CN111931913B (zh) 基于Caffe的卷积神经网络在FPGA上的部署方法
Krishnamoorthy et al. Integrated analysis of power and performance for cutting edge Internet of Things microprocessor architectures
Ali et al. Risc-v based mpsoc design exploration for fpgas: Area, power and performance
CN109583006B (zh) 一种基于循环切割和重排的现场可编程门阵列卷积层的动态优化方法
US20240143885A1 (en) Multiply-Instantiated Block Modeling For Circuit Component Placement In Integrated Circuit
Gonçalves et al. Exploring data size to run convolutional neural networks in low density fpgas
Lin et al. Enhancing FPGA CAD Flow with AI-Powered Solutions
Amrutha et al. Realization of convolution layer using system verilog for achieving parallelism and improvement in performance parameters
Zhang Application of FPGA in deep learning
CN114691457A (zh) 一种确定硬件性能的方法、装置、存储介质以及电子设备
Goel et al. Comparative Study of ANN, CNN, and RNN Hardware Chips
Zhang et al. A RISC-v based coprocessor accelerator technology research for convolution neural networks
CN120012860B (zh) 卷积神经网络模型的训练方法、装置、电子设备及介质
Tourad et al. Generic Automated Implementation of Deep Neural Networks on Field Programmable Gate Arrays

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18772279

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18772279

Country of ref document: EP

Kind code of ref document: A1