[go: up one dir, main page]

WO2018121118A1 - Appareil et procédé de calcul - Google Patents

Appareil et procédé de calcul Download PDF

Info

Publication number
WO2018121118A1
WO2018121118A1 PCT/CN2017/111333 CN2017111333W WO2018121118A1 WO 2018121118 A1 WO2018121118 A1 WO 2018121118A1 CN 2017111333 W CN2017111333 W CN 2017111333W WO 2018121118 A1 WO2018121118 A1 WO 2018121118A1
Authority
WO
WIPO (PCT)
Prior art keywords
neural network
memory
data
hmc
buffer
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/CN2017/111333
Other languages
English (en)
Chinese (zh)
Inventor
陈天石
郭崎
陈小兵
刘少礼
陈云霁
李韦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Cambricon Information Technology Co Ltd
Original Assignee
Shanghai Cambricon Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from CN201611221798.8A external-priority patent/CN108241484B/zh
Priority claimed from CN201611242813.7A external-priority patent/CN108256643A/zh
Application filed by Shanghai Cambricon Information Technology Co Ltd filed Critical Shanghai Cambricon Information Technology Co Ltd
Publication of WO2018121118A1 publication Critical patent/WO2018121118A1/fr
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means

Definitions

  • the present disclosure relates to the field of neural network computing, and more particularly to a neural network computing apparatus and method.
  • DRAMs used in the architecture of neural network computing devices are mostly high performance graphics memory (Graphics Double Data Rate Version 4, GDDR4) or high performance graphics memory (Graphics Double Data Rate Version 5). , GDDR5).
  • GDDR4 or GDDR5 can not fully meet the needs of the neural network computing device, and the technological development has entered a bottleneck period. Adding 1GB of bandwidth per second will result in more power consumption, which is not a sensible, efficient or cost-effective option for designers and consumers.
  • GDDR4 or GDDR5 still has serious problems that are difficult to reduce the area. Therefore, GDDR4 or GDDR5 will gradually hinder the continuous growth of the performance of neural network computing devices.
  • the present disclosure provides a neural network computing apparatus and method for solving bottlenecks in terms of bandwidth, power consumption, area, and the like faced by the neural network computing device proposed above.
  • a neural network computing device including: a storage device; and a neural network processor electrically connected to the storage device, and the data exchange between the neural network processor and the storage device And perform neural network calculations.
  • a chip including the neural network meter Calculation device.
  • a chip package structure including the chip is provided.
  • a board including the chip package structure.
  • an electronic device including the board is provided.
  • a neural network calculation method comprising: storing data by a storage device; and performing data exchange with the storage device by a neural network processor and performing neural network calculation.
  • high-bandwidth memory as the memory of the neural network computing device, can exchange data of input data and operation parameters between the buffer and the memory more quickly, which greatly shortens the I/O time.
  • the high-bandwidth memory with stacked storage structure and the neural network processor with HBM memory control module can greatly improve the storage bandwidth, and the bandwidth can be increased to more than twice that of the prior art, and the computing performance is greatly improved.
  • the high-bandwidth memory is a stacked (stacked) structure that does not occupy a horizontal plane space, the area of the neural network computing device can be greatly reduced, and the area of the neural network computing device can be reduced to about 5% of the prior art.
  • HMC's memory provides high data transmission bandwidth, its data transmission bandwidth can exceed 15 times the bandwidth of DDR3; reduce the overall power consumption, HMC memory technology, compared to the commonly used memory technology such as DDR3/DDR4, for each Bit storage can save more than 70% of power consumption.
  • the HMC memory module includes a plurality of cascaded HMC memory units, which can flexibly select the number of HMC memory units according to the actual memory size required in the neural network operation process, thereby reducing the waste of functional components.
  • the HMC memory unit adopts a stacked structure, which can reduce the memory footprint by more than 90% compared with the existing RDIMM technology, and at the same time greatly reduce the overall volume of the neural network computing device. Since the HMC can perform massive parallel processing, the latency of the memory components is small.
  • the neural network computing device has a plurality of neural network processors, and the plurality of neural network processors are interconnected and communicate information with each other to avoid data consistency problems during operation; and the neural network computing device Support multi-core processor architecture, can make full use of the parallelism in the neural network operation process, accelerate the operation of the neural network.
  • FIG. 1 is a block diagram showing the overall structure of a high bandwidth memory based neural network computing device in accordance with an embodiment of the present disclosure.
  • FIG. 2 is a cross-sectional view of the neural network computing device of FIG. 1.
  • FIG. 3 is an overall architectural diagram of a neural network processor with a HBM memory control module in accordance with an embodiment of the present disclosure.
  • FIG. 4 is a cross-sectional view of a high bandwidth memory based neural network computing device in accordance with another embodiment of the present disclosure.
  • FIG. 5 is a schematic diagram of an overall structure of a neural network computing device based on an HMC memory according to an embodiment of the present disclosure.
  • FIG. 6 is a schematic diagram of an HMC memory unit of a HMC memory-based neural network computing device according to an embodiment of the present disclosure.
  • FIG. 7 is a schematic diagram of capacity expansion of an HMC memory of a neural network computing device based on an HMC memory according to an actual need according to an embodiment of the present disclosure.
  • FIG. 8 is a flow chart of a high bandwidth memory based neural network calculation method in accordance with an embodiment of the present disclosure.
  • FIG. 9 is a flowchart of a HMC memory-based neural network calculation method according to an embodiment of the present disclosure.
  • 1-HMC memory module 2-neural network processor; 3-external access unit; 4-other modules; 11-mixed memory cube; 12-logic base layer; 101, 201, 401-package substrate; 102, 202-intermediate layer; 103, 203, 403-logic die; 104-high bandwidth memory; 105, 205, 402- Neural network processor; 204-HBM memory control module; 206, 406-through hole; 207, 407-micro solder ball; 208, 405-dynamic random access memory DRAM; 301-storage interface; 302-package structure; Control unit; 304, 404-HBM memory control module; 305-buffer; 306-buffer control module; 307-neural processing unit.
  • the present disclosure provides a neural network computing device, including:
  • At least one storage device At least one storage device
  • At least one neural network processor is electrically connected to the storage device, the neural network processor exchanges data with the storage device, and performs neural network calculation.
  • the storage device is a high bandwidth memory (correspondingly, the neural network computing device is a high bandwidth memory based neural network computing device), and each high bandwidth memory includes a stacked (stacked) A plurality of accumulated memories; at least one neural network processor electrically coupled to the high bandwidth memory, the neural network processor and the high bandwidth memory exchange data, and perform neural network calculations.
  • the High-Bandwidth Memory is a new type of low-power memory with excellent communication data path, low power consumption and small area.
  • An embodiment of the present disclosure provides a neural network computing device based on a high-bandwidth memory.
  • the neural network computing device includes: a package substrate 101 (Package Substrate), an interposer 102 (Interposer), and a logic die 103 ( Logic Die), a high bandwidth memory 104 (Stacked Memory) and a neural network processor 105. among them,
  • the package substrate 101 is configured to carry the other components of the neural network computing device and is electrically connected to the host device, such as a computer, a mobile phone, and various embedded devices.
  • the interposer 102 is formed on the package substrate 101 and carries the logic die 103 and the neural network processor 105.
  • the logic die 103 is formed on the interposer 102, and the logic die 103 is used to connect the interposer 102 and the high bandwidth memory 104 to implement a layer encapsulation of the high bandwidth memory 104.
  • the high bandwidth memory 104 is formed on a logic die 103 that includes a plurality of memories stacked (stacked) in a direction perpendicular to the package substrate 101.
  • the neural network processor 105 is also formed on the interposer 102 for performing neural network calculations, which can complete the entire neural network calculation, and can also perform basic operations of neural network calculation such as convolution, and the neural network processor 105 passes through the interposer 102.
  • the data is exchanged with the logic die 103 and with the high bandwidth memory 104.
  • the neural network computing device is a 2.5D (2.5-dimensional) memory architecture.
  • the high-bandwidth memory includes four dynamic random access memories (DRAMs) 208, and the DRAM 208 is stacked and stacked by a microbumps process. Micro solder balls 207 are formed between the adjacent DRAMs 208, and a high-bandwidth memory is formed.
  • DRAMs dynamic random access memories
  • the bottommost DRAM 208 is formed on the logic die 203 by a micro-bumping process
  • the logic die 203 and the neural network processor 205 are formed on the interposer 202 by a micro-bumping process
  • the interposer 202 is formed by a flip chip soldering process.
  • Through holes 206 are formed in the DRAM 208 by using a through-silicon via (TSVs) process
  • TSVs through-silicon via
  • through holes are formed in the logic die 203 and the interposer 202 by using a through-silicon via process
  • the wires are arranged by using the through holes and the micro solder balls.
  • the DRAM 208 is electrically connected to the logic die 203, and the logic die 203 is electrically connected to the neural network processor 205 through the wires of the vias of the interposer 202 to realize interconnection between the high bandwidth memory and the neural network processor 205, and is processed in the neural network. Under the control of the HBM memory control module 204 of the 205, data is transferred between the neural network processor 205 and the high bandwidth memory.
  • the existing GDDR memory has a channel width of 32 bits and a 16 channel memory bus width of 512 bits.
  • the high bandwidth memory may include four DRAMs, each having two 128-bit channels, and the high bandwidth memory providing a bit width of 1024 bits, which is twice the bit width of the above GDDR memory.
  • the logical network of the neural network computing device may be multiple, and correspondingly, the high-bandwidth memory may also be multiple.
  • the DRAM of each high-bandwidth memory can also be greater than four, and the number of the above components can be set according to actual needs.
  • a neural network computing device can include four logical dies and four high-bandwidth memories, each high-bandwidth memory including four DRAMs, each having two 128-bit channels.
  • Each high-bandwidth memory can provide 1024 bits of bit width, and four high-bandwidth memories can provide 4096 bits of bit width, which is eight times the bit width of the above GDDR memory.
  • Each high-bandwidth memory can also include eight DRAMs, each with two 128-bit channels, each high-bandwidth memory can provide 2048 bits of bit width, and four high-bandwidth memories can provide 8192 bits of bit width.
  • GDDR memory is sixteen times wider than the bit width.
  • the neural network processor includes: a storage interface 301 (Memory Interface), a control unit 303 (Control Processor), a HBM memory control module 304 (HBM Controller), a buffer 305 (BUFFER), a buffer control module 306 (Buffer Controller), and neural processing.
  • the unit 307 NFU), wherein the control processor 303, the HBM memory control module 304, the buffer 305, the buffer control module 306, and the neural processing unit 307 are integrally packaged to form a package structure 302.
  • the storage interface 301 as an interface between the neural network processor and the high-bandwidth memory, is electrically connected to the DRAM of the logic die and the high-bandwidth memory through wires, and is used for receiving data transmitted by the high-bandwidth memory and transmitting data to the high-bandwidth memory.
  • the HBM memory control module 304 is configured to control data transmission between the high bandwidth memory and the buffer, including coordinating the data bandwidth of the high bandwidth memory and the buffer, and synchronizing the clock of the high bandwidth memory and the buffer.
  • the HBM memory control module 304 synchronizes the clocks of the high bandwidth memory and the buffer, converts the bandwidth of the data of the high bandwidth memory received by the storage interface 301 into a bandwidth matched with the buffer, and transmits the bandwidth matched data to the buffer;
  • the bandwidth of the buffer's data is converted to a bandwidth that matches the high bandwidth memory, and the bandwidth matched data is transferred to the high bandwidth memory via the storage interface 301.
  • the buffer 305 is an internal storage unit of the neural network processor for receiving bandwidth-matched data transmitted by the HBM memory control module 304 and transmitting the stored data to the HBM memory control module 304.
  • the buffer control module 306 is configured to control data interaction between the buffer 305 and the neural processing unit 307, transmit the data stored by the buffer 305 to the neural processing unit 307, the neural processing unit 307 performs neural network calculation, and the buffer control module 306 performs the neural network.
  • the calculation result of the processing unit 307 is transmitted to the buffer 305.
  • the control unit 303 decodes the instruction to the HBM memory control module 304, the buffer 305, The buffer control module 306 and the neural processing unit 307 send control commands, coordinate and schedule the above modules to work together to implement the computing functions of the neural network processor.
  • the high-bandwidth memory using the stacked storage structure and the neural network processor with the HBM memory control module can greatly improve the storage bandwidth, and the bandwidth can be increased to twice that of the prior art. In the above, the computing performance is greatly improved.
  • the high-bandwidth memory is used as the memory of the neural network computing device, and the data exchange between the input data and the operation parameters can be performed between the buffer and the memory more quickly, which makes IO time is greatly reduced.
  • the high-bandwidth memory is a stacked (stacked) structure that does not occupy horizontal planar space, the area of the neural network computing device can be greatly reduced, and the area of the neural network computing device can be reduced to about 5% of the prior art;
  • the power consumption of the neural network computing device; and the DRAMs are interconnected by a micro-bumping process and a TSV process wiring, and the intermediate layer is used for data exchange between the neural network processor and the high-bandwidth memory, thereby further improving between different DRAMs and Transmission bandwidth and transmission speed between the neural network processor and the high bandwidth memory.
  • the neural network computing device is a 3D (3-dimensional) memory architecture, including a package substrate 401, a neural network processor 402, a logic die 403, and a high bandwidth memory stacked from bottom to top.
  • the high-bandwidth memory includes four DRAMs 405.
  • the DRAM 405 is stacked and stacked by a micro-bumping process.
  • a micro solder ball 407 is formed between the adjacent DRAMs 405.
  • the bottommost DRAM 405 of the high-bandwidth memory is formed by a micro-bumping process.
  • the logic die 403 is formed on the neural network processor 402 by a micro-bumping process, and the neural network processor 402 is formed on the package substrate 401 by a micro-bumping process.
  • a via hole 406 is formed in the DRAM 405 by a through-silicon via process, and a via hole is formed in the logic die 403 by using a through-silicon via process.
  • the via hole and the micro solder ball are used to arrange the wire, the DRAM 405 and the logic die 403, and the neural network processor 402.
  • the electrical connection enables vertical interconnection of the high bandwidth memory with the neural network processor 402. Under the control of the HBM memory control module 404 of the neural network processor 402, data is transferred between the neural network processor 402 and the high bandwidth memory.
  • the neural network computing device can further save the neural network compared to the 2.5D storage architecture because the high bandwidth memory is directly stacked (stacked) on the neural network processor.
  • the area of the computing device is particularly advantageous for miniaturization of the neural network computing device; and the distance between the high bandwidth memory and the neural network processor is shorter, meaning that the wiring between the two is shorter, and the signal transmission quality and transmission speed can be further improved. .
  • the storage device is an HMC memory module (corresponding, the neural network computing device is a neural network computing device based on a Hybrid Memory Cube (HMC) memory); at least one nerve
  • the network processor is connected to the HMC memory module, and is configured to acquire data and instructions required for the neural network operation from the HMC memory module, perform a neural network partial operation, and write the operation result back to the HMC memory module.
  • HMC Hybrid Memory Cube
  • the HMC-based neural network computing device and method can effectively meet the requirements of the neural network computing device for data transmission and storage in the operation process, and the computing device can provide unprecedented system performance and bandwidth, and at the same time, high memory utilization. Low power consumption, low cost and fast data transfer rate.
  • FIG. 5 is a schematic diagram of an overall structure of a neural network computing device based on an HMC memory according to an embodiment of the present disclosure. As shown in FIG. 5, the present disclosure is based on a HMC memory-based neural network computing device.
  • the device includes an HMC memory module 1, a neural network processor 2, and an external access unit 3.
  • the HMC memory module 1 includes: a plurality of cascaded HMC memory units, each of which includes a mixed memory cube and a logical base layer.
  • the hybrid memory cube is connected by a plurality of memory die layers through through silicon vias (TSVs), and the logic substrate layer includes a logic control unit that controls the mixed memory cube for data read and write operations and an external processor or HMC device or external The link to which the access unit is connected.
  • TSVs through silicon vias
  • the neural network processor is configured to perform a function of the neural network operation, acquire the required instructions and data of the neural network operation from the HMC memory module, perform a neural network partial operation, and calculate the intermediate value or the final result Writing back to the HMC memory module, while transmitting data preparation completion signals to other neural network processors through the data path between the neural network processors.
  • the data includes first data and second data, the first data includes network parameters (such as weights, offsets, etc.) and a function table, and the second data includes network input data, and the data may further Includes the intermediate value after the operation.
  • the external access unit connects the HMC memory module and an external designated address, and reads the neural network instructions and data required for the neural network operation from the external designated address to the HMC memory.
  • the module, as well as the neural network operation results from the HMC memory module to the output operation of the external specified address space.
  • the cascaded multiple HMC memory units are uniformly addressed, and the cascading manner and specific implementation of the multiple HMC memory units are transparent to the neural network processor, and the neural network processor is cascaded.
  • the memory address of the HMC memory unit reads and writes the memory location of the HMC memory module.
  • the neural network processor may use a multi-core processor in a specific implementation process to improve the efficiency of operation.
  • the HMC memory module of the present disclosure can be used by a multi-core neural network processor.
  • multiple neural network processors are interconnected and information is transmitted between each other.
  • the other neural network processing when a part of the neural network processors in the plurality of neural network processors are in an operational state, and the other neural network processors are in an operation result waiting for one of the partial neural network processors, the other neural network processing The device is in a wait state, and after one of the partial neural network processors transmits the operation result to the HMC memory module and transmits the data delivery signal to the other neural network processor, the other neural network processor is woken up. Read the corresponding data from the HMC memory module and perform the corresponding operation.
  • FIG. 6 is a schematic diagram of an HMC memory unit of a HMC memory-based neural network computing device according to an embodiment of the present disclosure. As shown in FIG. 6, the HMC includes a hybrid memory cube 11 and a logic substrate layer 12.
  • the mixed memory cube 11 has a memory composed of a plurality of memory bodies, wherein each memory body includes a plurality of memory grain layers.
  • the mixed memory cube has a memory composed of 16 memory banks, wherein each memory bank includes 16 memory grain layers.
  • the top and bottom of the memory die layer are interconnected using a through silicon via (TSV) structure.
  • TSV through silicon via
  • the TSV interconnects a plurality of memory die layers, adding another dimension in addition to the rows and columns to form a three-dimensional structure of the die.
  • the memory die layer is a conventional random dynamic memory (DRAM).
  • DRAM random dynamic memory
  • the logical base layer is configured to directly select the desired memory die layer by vertically rising.
  • rows and columns are used to organize DRAM.
  • a read or write request to a specific location in the memory bank is performed by the memory body controller.
  • the logic base layer includes a logic control unit that controls the mixed memory cube for data read and write operations, and the logic control unit includes a plurality of memory body controllers, each of which is at a logical base layer There are corresponding memory body controllers for managing 3D access control of the memory.
  • the 3D layering method allows memory accesses to be accessed not only in the direction of the rows and columns but also in parallel between multiple memory die layers.
  • the logical base layer further includes a link to the external HMC memory unit for connecting the plurality of HMC memory units to increase the total capacity of the HMC storage device.
  • the link includes an external I/O link and internal routing and switching logic connected to an HMC memory module, the external I/O link including a plurality of logical links attached to the switching logic
  • the bootstrap internal routing controls the data transfer of each memory bank or forwards data to other HMC memory cells and neural network processors.
  • the external I/O link comprises 4 or 8 logical links, each logical link comprising 16 or 8 serial I/O or SerDes bidirectional link groups.
  • HMC memory cells with 4 logical links can transmit data at 10Gbps, 12.5Gbps and 15Gbps
  • HMC memory cells with 8 logical links can transmit data at 10Gbps.
  • the switching logic determines whether the HMC memory address passed through the external I/O link is on the HMC memory unit in the address strobing phase. If yes, convert the HMC memory address to the data format of the HMC internal hybrid memory cube address that can be recognized by the internal route; if not, the HMC memory address is forwarded to another HMC memory unit connected to the HMC memory unit.
  • the data read and write phase if data read and write occurs on the HMC memory unit, the HMC memory unit is responsible for data conversion between the external I/O link and the internal route. If it does not occur on the HMC memory unit, the HMC memory The unit is responsible for forwarding the data on the two external I/O links connected to it to complete the data transmission.
  • FIG. 7 is a schematic diagram of capacity expansion of an HMC memory of a neural network computing device based on an HMC memory according to an actual need according to an embodiment of the present disclosure.
  • the logical base layer structure in the HMC memory unit supports attaching the device to a neural network processor or another HMC memory unit.
  • the total capacity of the HMC storage device (memory module) can be increased without changing the structure of the HMC memory unit.
  • Multiple HMC memory units can be connected to other modules 4 through topology connection.
  • FIG. 7 by connecting two HMC memory units, the memory capacity of the neural network computing device can be doubled.
  • this design can cause the neural network computing device to dynamically change the HMC memory according to the needs of the actual application.
  • the number of units enables the HMC memory module to be fully utilized, and at the same time, the configurability of the neural network computing device is greatly improved.
  • Another embodiment of the present disclosure further discloses a chip including the high bandwidth memory based neural network computing device of the above embodiment.
  • Another embodiment of the present disclosure also discloses a chip package structure including the above chip.
  • Another embodiment of the present disclosure also discloses a board that includes the above chip package structure.
  • Another embodiment of the present disclosure also discloses an electronic device including the above card.
  • Electronic devices include data processing devices, robots, computers, printers, scanners, tablets, smart terminals, mobile phones, driving recorders, navigators, sensors, cameras, cloud servers, cameras, cameras, projectors, watches, headphones, mobile Storage, wearable device vehicles, household appliances, and/or medical devices.
  • the vehicle includes an airplane, a ship, and/or a vehicle;
  • the household appliance includes a television, an air conditioner, a microwave oven, a refrigerator, a rice cooker, a humidifier, a washing machine, an electric lamp, a gas stove, a range hood;
  • the medical device includes a nuclear magnetic resonance instrument, B-ultrasound and / or electrocardiograph.
  • the present disclosure also provides a neural network calculation method, including:
  • the neural network calculation method performs neural network calculation based on the neural network computing device of the high bandwidth memory, and referring to FIG. 8, includes:
  • Step S1 Write the operation parameters calculated by the neural network into the high bandwidth memory.
  • the high bandwidth memory is connected to an external storage device such as an external disk through an external access unit.
  • the external access unit writes the operation parameters of the external specified address to the high-bandwidth memory, and the operation parameters include the weight, the offset table, and the function table.
  • Step S2 transmitting the input data calculated by the neural network from the high-bandwidth memory to the buffer of the neural network processor, which may specifically include:
  • Sub-step S21 the HBM memory control module addresses according to the start address of the input data. If the address hits, the data of the bit width starting from the start address is sequentially transmitted to the HBM through the storage interface according to the high bandwidth provided by the high-bandwidth memory. The memory control module until all input data is transferred to the HBM memory control module.
  • a high-bandwidth memory can provide 1024 bits of bits. Wide, the stored input data has a total of 4096 bits, then the high-bandwidth memory transmits 1024-bit input data to the HBM memory control module each time, and transfers the input data to the HBM memory control module after four transmissions.
  • Sub-step S22 The HBM memory control module converts the bit width of the input data into a bit width matched with the buffer, and transmits the bit-width matched input data to the buffer.
  • the input data can be the input neuron vector calculated by the neural network.
  • Step S3 The operation parameters calculated by the neural network are transmitted from the high-bandwidth memory to the buffer of the neural network processor, and the step may be similar to the step S2, and the step may specifically include:
  • Sub-step S31 the HBM memory control module addresses according to the start address of the operation parameter. If the address hits, the data of the bit width starting from the start address is sequentially transmitted to the HBM through the storage interface according to the high bandwidth provided by the high-bandwidth memory. The memory control module until all the operating parameters are transferred to the HBM memory control module.
  • Sub-step S32 The HBM memory control module converts the bit width of the operation parameter into a bit width matched with the buffer, and transmits the operation parameter of the bit width matching to the buffer.
  • Step S4 The buffer control module transmits the input data and the operation parameters stored in the buffer to the neural processing unit, and the neural processing unit processes the input data and the operation parameters to obtain the output data of the current neural network calculation, and the buffer control module outputs the data. Store to the cache.
  • the buffer control module stores the intermediate data in the buffer, and the neural processing unit continues the operation, when the intermediate data is required to be input into the operation.
  • the buffer control module transmits the intermediate data back to the neural processing unit, the neural processing unit continues the operation using the intermediate data to obtain the output data of the neural network calculation, and the output data may be the output neuron vector.
  • Step S5 The output data in the buffer is transferred to the high bandwidth memory, and the output data is transmitted to the external storage device through the external access unit.
  • the HBM memory control module converts the bit width of the output data into a bit width matched with the high bandwidth memory, and transmits the bit width matched output data to the high bandwidth memory, and the external access unit stores the output data of the high bandwidth memory. Transfer to an external storage device.
  • step S2 the process returns to step S2 to obtain the operation result of the next neural network calculation.
  • the neural network calculation method of the embodiment using the high-bandwidth memory of the stacked storage structure and the neural network processor with the HBM memory control module, can greatly improve the storage bandwidth, and the computing performance is greatly improved and improved. Signal transmission bandwidth and transmission speed.
  • the neural network calculation method performs neural network calculation based on the neural network computing device of the HMC memory module, and referring to FIG. 9, includes:
  • Step S1 the external access unit writes data and instructions of the external designated address into the HMC memory module
  • Step S2 the neural network processor reads, from the HMC memory module, required data and instructions for performing a partial operation of the neural network;
  • Step S3 the neural network processor performs a partial operation of the neural network, and writes the intermediate value or the final value obtained by the operation back to the HMC memory module;
  • Step S4 the operation result is written from the HMC memory module to the external designated address via the external access unit;
  • step S5 if there is an operation termination instruction in the operation instruction, the execution is terminated, otherwise, the process returns to step S1.
  • the method includes the following steps: Step S11, the first data of the external designated address is written into the HMC memory module 1 through the external access unit 3.
  • the first data includes a weight, an offset, a function table, and the like for performing a neural network operation; and in step S12, an externally designated address is written into the HMC memory module 1 through the external access unit 3.
  • the instruction includes an operation instruction for performing a neural network operation at this time. If the operation is not performed after the current neural network operation, the instruction further includes an operation termination instruction; and in step S13, the second data of the external designated address is externally accessed.
  • Unit 3 is written to the HMC memory module 1.
  • the second data includes input data for performing neural network operations at this time.
  • the neural network processor reads the required data for performing the neural network partial operation from the HMC memory module, including: S21 sets the data preparation state in the neural network processor to be operated to be prepared; S22 The data preparation state is that the prepared neural network processor reads the required data for performing the neural network partial operation from the HMC memory module; after the S23 reading is completed, the data preparation state is the prepared data preparation state of the neural network processor. Set to not prepared.
  • step S3 it is further included that it is determined whether the neural network operation ends, and if it is finished, Going to step S4; otherwise, transmitting a data preparation completion signal to the other neural network processors, and setting the data preparation state in the other neural network processors to be prepared, returning to step S2.
  • the HMC memory-based neural network operation method in which a part of the neural network processors in the plurality of neural network processors are in an operational state, and the other neural network processors are in an operation result of waiting for one of the partial neural network processors At the time, the other neural network processor is in a waiting state, after one of the partial neural network processors transmits the operation result to the HMC memory module and sends the data delivery signal to the other neural network processor.
  • the other neural network processors are woken up, the corresponding data is read from the HMC memory module, and corresponding operations are performed.
  • each functional unit in various embodiments of the present disclosure may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit.
  • Each functional unit/module may be hardware, such as the hardware may be a circuit, including digital circuits, analog circuits, and the like.
  • Physical implementations of hardware structures include, but are not limited to, physical devices including, but not limited to, transistors, memristors, and the like.
  • the computing modules in the computing device can be any suitable hardware processor, such as a CPU, GPU, FPGA, DSP, ASIC, and the like.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Theoretical Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Neurology (AREA)
  • Memory System (AREA)

Abstract

La présente invention concerne un appareil et un procédé de calcul. L'appareil de calcul comprend un dispositif de stockage et un processeur de réseau neuronal connecté électriquement au dispositif de stockage. Le processeur de réseau neuronal et le dispositif de stockage échangent des données entre eux, et exécutent des calculs de réseau neuronal. L'appareil et le procédé de calcul de la présente invention peuvent satisfaire efficacement les exigences de transmission et de stockage de données pendant le fonctionnement, et ont les avantages d'une utilisation de mémoire élevée, d'une faible consommation d'énergie, d'un faible coût et d'un taux de transmission de données élevé.
PCT/CN2017/111333 2016-12-26 2017-11-16 Appareil et procédé de calcul Ceased WO2018121118A1 (fr)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
CN201611221798.8 2016-12-26
CN201611221798.8A CN108241484B (zh) 2016-12-26 2016-12-26 基于高带宽存储器的神经网络计算装置和方法
CN201611242813.7A CN108256643A (zh) 2016-12-29 2016-12-29 一种基于hmc的神经网络运算装置和方法
CN201611242813.7 2016-12-29

Publications (1)

Publication Number Publication Date
WO2018121118A1 true WO2018121118A1 (fr) 2018-07-05

Family

ID=62710389

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/111333 Ceased WO2018121118A1 (fr) 2016-12-26 2017-11-16 Appareil et procédé de calcul

Country Status (1)

Country Link
WO (1) WO2018121118A1 (fr)

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110597756A (zh) * 2019-08-26 2019-12-20 光子算数(北京)科技有限责任公司 一种计算电路以及数据运算方法
CN111222632A (zh) * 2018-11-27 2020-06-02 中科寒武纪科技股份有限公司 计算装置、计算方法及相关产品
CN111433758A (zh) * 2018-11-21 2020-07-17 吴国盛 可编程运算与控制芯片、设计方法及其装置
CN111461314A (zh) * 2020-03-31 2020-07-28 中科寒武纪科技股份有限公司 计算神经网络的方法、装置、板卡及计算机可读存储介质
CN112036557A (zh) * 2019-06-04 2020-12-04 北京邮电大学 一种基于多fpga开发板的深度学习系统
CN113703690A (zh) * 2021-10-28 2021-11-26 北京微核芯科技有限公司 处理器单元、访问内存的方法、计算机主板和计算机系统
CN114461546A (zh) * 2020-11-09 2022-05-10 哲库科技(上海)有限公司 内存控制方法、装置、存储介质和电子设备
CN114692852A (zh) * 2020-12-31 2022-07-01 Oppo广东移动通信有限公司 神经网络的运行方法、装置、终端及存储介质
CN114692851A (zh) * 2020-12-31 2022-07-01 Oppo广东移动通信有限公司 神经网络模型的计算方法、装置、终端及存储介质
CN115081599A (zh) * 2021-03-11 2022-09-20 安徽寒武纪信息科技有限公司 预处理Winograd卷积的方法、计算机可读存储介质及装置
CN115617739A (zh) * 2022-09-27 2023-01-17 南京信息工程大学 一种基于Chiplet架构的芯片及控制方法
CN117915670A (zh) * 2024-03-14 2024-04-19 上海芯高峰微电子有限公司 一种存算一体的芯片结构
US12443832B1 (en) * 2021-04-30 2025-10-14 Xilinx, Inc. Neural network architecture with high bandwidth memory (HBM)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140071778A1 (en) * 2012-09-11 2014-03-13 International Business Machines Corporation Memory device refresh
CN104701309A (zh) * 2015-03-24 2015-06-10 上海新储集成电路有限公司 三维堆叠式神经元装置及制备方法
CN105404925A (zh) * 2015-11-02 2016-03-16 上海新储集成电路有限公司 一种三维神经网络芯片
CN105789139A (zh) * 2016-03-31 2016-07-20 上海新储集成电路有限公司 一种神经网络芯片的制备方法
CN106030553A (zh) * 2013-04-30 2016-10-12 惠普发展公司,有限责任合伙企业 存储器网络

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140071778A1 (en) * 2012-09-11 2014-03-13 International Business Machines Corporation Memory device refresh
CN106030553A (zh) * 2013-04-30 2016-10-12 惠普发展公司,有限责任合伙企业 存储器网络
CN104701309A (zh) * 2015-03-24 2015-06-10 上海新储集成电路有限公司 三维堆叠式神经元装置及制备方法
CN105404925A (zh) * 2015-11-02 2016-03-16 上海新储集成电路有限公司 一种三维神经网络芯片
CN105789139A (zh) * 2016-03-31 2016-07-20 上海新储集成电路有限公司 一种神经网络芯片的制备方法

Cited By (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111433758A (zh) * 2018-11-21 2020-07-17 吴国盛 可编程运算与控制芯片、设计方法及其装置
CN111433758B (zh) * 2018-11-21 2024-04-02 吴国盛 可编程运算与控制芯片、设计方法及其装置
CN111222632A (zh) * 2018-11-27 2020-06-02 中科寒武纪科技股份有限公司 计算装置、计算方法及相关产品
CN112036557A (zh) * 2019-06-04 2020-12-04 北京邮电大学 一种基于多fpga开发板的深度学习系统
CN112036557B (zh) * 2019-06-04 2023-06-27 北京邮电大学 一种基于多fpga开发板的深度学习系统
CN110597756A (zh) * 2019-08-26 2019-12-20 光子算数(北京)科技有限责任公司 一种计算电路以及数据运算方法
CN110597756B (zh) * 2019-08-26 2023-07-25 光子算数(北京)科技有限责任公司 一种计算电路以及数据运算方法
CN111461314B (zh) * 2020-03-31 2022-12-20 中科寒武纪科技股份有限公司 基于常量数据包进行人工神经网络计算的方法、装置及计算机可读存储介质
CN111461314A (zh) * 2020-03-31 2020-07-28 中科寒武纪科技股份有限公司 计算神经网络的方法、装置、板卡及计算机可读存储介质
CN114461546A (zh) * 2020-11-09 2022-05-10 哲库科技(上海)有限公司 内存控制方法、装置、存储介质和电子设备
CN114692852A (zh) * 2020-12-31 2022-07-01 Oppo广东移动通信有限公司 神经网络的运行方法、装置、终端及存储介质
CN114692851B (zh) * 2020-12-31 2025-09-26 Oppo广东移动通信有限公司 神经网络模型的计算方法、装置、终端及存储介质
CN114692851A (zh) * 2020-12-31 2022-07-01 Oppo广东移动通信有限公司 神经网络模型的计算方法、装置、终端及存储介质
CN114692852B (zh) * 2020-12-31 2025-07-29 Oppo广东移动通信有限公司 神经网络的运行方法、装置、终端及存储介质
CN115081599A (zh) * 2021-03-11 2022-09-20 安徽寒武纪信息科技有限公司 预处理Winograd卷积的方法、计算机可读存储介质及装置
US12443832B1 (en) * 2021-04-30 2025-10-14 Xilinx, Inc. Neural network architecture with high bandwidth memory (HBM)
CN113703690A (zh) * 2021-10-28 2021-11-26 北京微核芯科技有限公司 处理器单元、访问内存的方法、计算机主板和计算机系统
CN113703690B (zh) * 2021-10-28 2022-02-22 北京微核芯科技有限公司 处理器单元、访问内存的方法、计算机主板和计算机系统
CN115617739B (zh) * 2022-09-27 2024-02-23 南京信息工程大学 一种基于Chiplet架构的芯片及控制方法
CN115617739A (zh) * 2022-09-27 2023-01-17 南京信息工程大学 一种基于Chiplet架构的芯片及控制方法
CN117915670A (zh) * 2024-03-14 2024-04-19 上海芯高峰微电子有限公司 一种存算一体的芯片结构

Similar Documents

Publication Publication Date Title
WO2018121118A1 (fr) Appareil et procédé de calcul
CN108241484B (zh) 基于高带宽存储器的神经网络计算装置和方法
US8018752B2 (en) Configurable bandwidth memory devices and methods
JP7349812B2 (ja) メモリシステム
TWI681525B (zh) 具有一控制器及一記憶體堆疊之彈性記憶體系統
CN109783410A (zh) 执行并行运算处理的存储器设备和包括其的存储器模块
CN108256643A (zh) 一种基于hmc的神经网络运算装置和方法
KR20210098831A (ko) 비휘발성 메모리에서의 구성가능한 기입 커맨드 지연
CN116737617A (zh) 一种访问控制器
US20230343380A1 (en) Bank-Level Self-Refresh
CN214225915U (zh) 应用于便携式移动终端的多媒体芯片架构与多媒体处理系统
WO2021018313A1 (fr) Procédé et appareil de synchronisation de données, et produit associé
US12300300B2 (en) Bank-level self-refresh
WO2024049862A1 (fr) Systèmes, procédés et dispositifs pour technologie de mémoire avancée
CN106502923B (zh) 阵列处理器中簇内存储访问行列两级交换电路
CN111382847A (zh) 数据处理装置及相关产品
WO2023274032A1 (fr) Circuit d'accès au stockage, puce intégrée, dispositif électronique et procédé d'accès au stockage
US12100468B2 (en) Standalone mode
TWI814179B (zh) 多核芯片、積體電路裝置、板卡及其製程方法
US12321288B2 (en) Asymmetric read-write sequence for interconnected dies
CN118796757B (zh) 一种整合异构处理器共享内存的芯片架构、芯片及其设计方法
TWI840894B (zh) 存儲電路、數據傳輸電路和記憶體
US20240407127A1 (en) Airflow distribution to cool memory module shadowed by the processor
EP4210099A1 (fr) Routage de boîtier pour réduction de diaphonie dans une communication haute fréquence
CN120631815A (zh) 基于3d-dram的外部存储方案

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17885496

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 17885496

Country of ref document: EP

Kind code of ref document: A1