CN112261023A - A kind of data transmission method and device of convolutional neural network - Google Patents
A kind of data transmission method and device of convolutional neural network Download PDFInfo
- Publication number
- CN112261023A CN112261023A CN202011104673.3A CN202011104673A CN112261023A CN 112261023 A CN112261023 A CN 112261023A CN 202011104673 A CN202011104673 A CN 202011104673A CN 112261023 A CN112261023 A CN 112261023A
- Authority
- CN
- China
- Prior art keywords
- array
- processing unit
- transmission
- data
- compressed
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L69/00—Network arrangements, protocols or services independent of the application payload and not provided for in the other groups of this subclass
- H04L69/04—Protocols for data compression, e.g. ROHC
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/06—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
- G06N3/063—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L47/00—Traffic control in data switching networks
- H04L47/10—Flow control; Congestion control
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- General Health & Medical Sciences (AREA)
- General Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- Computational Linguistics (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Artificial Intelligence (AREA)
- Signal Processing (AREA)
- Computer Networks & Wireless Communication (AREA)
- Computer Security & Cryptography (AREA)
- Neurology (AREA)
- Data Exchanges In Wide-Area Networks (AREA)
Abstract
The invention discloses a data transmission method and a data transmission device of a convolutional neural network, wherein the method comprises the steps of dividing data to be transmitted into a plurality of arrays based on a data division mode, and sequentially executing the following steps for each array in response to the start of aggregation of the array above the array: calling computing resources to perform sparse compression on the array at the source processing unit to generate a compressed array; calling communication resources to execute a transmission mode-based protocol on the compressed array; calling communication resources to perform transmission mode-based aggregation on the compressed array; the computing resource is invoked to perform decompression on the compressed array at the target processing unit to extract the array. The invention can reduce the communication data volume under the condition of ensuring the convergence precision so as to improve the transmission efficiency, reduce the waiting time and improve the overall speed.
Description
Technical Field
The present invention relates to the field of neural networks, and more particularly, to a data transmission method and apparatus for a convolutional neural network.
Background
Increasingly sophisticated machine learning algorithms, such as Deep Neural Networks (DNN), Convolutional Neural Networks (CNN), etc., can achieve unprecedented performance in many practical applications and solve many areas of difficulty, such as speech recognition, text processing, and image recognition. However, a long time is often required for training on a single Graphics Processing Unit (GPU), and the application is limited to a certain extent due to low efficiency. The most widely used method to reduce training time is to perform data parallel training. In data parallel training, each GPU has a complete copy of the model parameters, and the GPU often exchanges parameters with other GPUs participating in the training, which results in significant communication costs and becomes a system bottleneck when communication is slow.
In order to solve the communication bottleneck in training, the communication bottleneck can be solved from two aspects of hardware and software. More advanced GPU interconnection technology is adopted in the aspect of hardware; advanced modern communication libraries are employed in software. The ring communication method is applied more in the existing communication method, and the Pipeline technology can be effectively adopted, so that the method has good expansibility and is applied more in large data volume transmission. However, under the limitation of a low-speed network, for example, under a partial PCIE connection, the transmission speed is only about 7.5GB/s, which has gradually become a bottleneck for GPU calculation. In the case of multi-node transmission, the transmission is often performed through a network, which imposes a more serious restriction on GPU interactive computation.
Aiming at the problems of large communication data volume, long time consumption and slow overall task processing progress of the convolutional neural network in the prior art, no effective solution is available at present.
Disclosure of Invention
In view of the above, an object of the embodiments of the present invention is to provide a data transmission method and apparatus for a convolutional neural network, which can reduce the amount of communication data to improve the transmission efficiency, reduce the latency, and improve the overall speed while ensuring the convergence accuracy.
In view of the above object, a first aspect of the embodiments of the present invention provides a data transmission method for a convolutional neural network, including dividing data to be transmitted into a plurality of arrays based on a data division manner, and sequentially performing the following steps for each array in response to a previous array starting aggregation:
calling computing resources to perform sparse compression on the array at the source processing unit to generate a compressed array;
calling communication resources to execute a transmission mode-based protocol on the compressed array;
calling communication resources to perform transmission mode-based aggregation on the compressed array;
the computing resource is invoked to perform decompression on the compressed array at the target processing unit to extract the array.
In some embodiments, performing the sparse compression on the array to generate the compressed array comprises:
extracting the value and position of each element from the array to form a pair of numbers;
deleting the element number pairs with the value of zero;
the remaining pairs of element numbers are combined to form a compressed array.
In some embodiments, further comprising: after deleting the element number pairs having a value of zero, additionally deleting element number pairs having a value less than the filtering threshold based on a predetermined filtering threshold.
In some embodiments, the data partitioning and transmission modes are determined based on the processing unit topology.
In some embodiments, the processing unit topology is determined based on the number and architecture of processing units used by the convolutional neural network.
In some embodiments, the data partitioning is an average allocation based on the number of processing units; the transmission mode is annular transmission or annular full-protocol transmission; the processing unit topology is a ring topology.
In some embodiments, further comprising: while the transport-based aggregation is being performed, the compute resources are also initially invoked to perform sparse compression on its next array.
In some embodiments, further comprising: and a transmission interface is pre-established for the convolutional neural network, and transmission mode-based reduction and aggregation are performed on the compressed array based on the transmission interface.
A second aspect of an embodiment of the present invention provides a data transmission apparatus for a convolutional neural network, including:
a processor; and
a memory storing program code executable by the processor, the program code when executed partitioning data to be transmitted into a plurality of arrays based on a data partitioning manner, and for each array performing the following steps in sequence in response to a previous array thereon starting aggregation:
calling computing resources to perform sparse compression on the array at the source processing unit to generate a compressed array;
calling communication resources to execute a transmission mode-based protocol on the compressed array;
calling communication resources to perform transmission mode-based aggregation on the compressed array;
the computing resource is invoked to perform decompression on the compressed array at the target processing unit to extract the array.
In some embodiments, the data partitioning manner and the transmission manner are both determined based on the topology of the processing unit; the processing unit topology is determined based on the number and architecture of processing units used by the convolutional neural network.
The invention has the following beneficial technical effects: according to the data transmission method and device of the convolutional neural network, provided by the embodiment of the invention, the array is subjected to sparse compression by calling computing resources in a source processing unit to generate a compressed array; calling communication resources to execute a transmission mode-based protocol on the compressed array; calling communication resources to perform transmission mode-based aggregation on the compressed array; the technical scheme that the computing resources are called to decompress the compressed array in the target processing unit so as to extract the array can reduce the communication data volume under the condition of ensuring the convergence precision so as to improve the transmission efficiency, reduce the waiting time and improve the overall speed.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
Fig. 1 is a schematic flow chart of a data transmission method of a convolutional neural network provided in the present invention;
FIG. 2 is a block diagram of a data transmission method of a convolutional neural network according to the present invention;
fig. 3 is a schematic pipeline diagram of a data transmission method of a convolutional neural network provided by the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the following embodiments of the present invention are described in further detail with reference to the accompanying drawings.
It should be noted that all expressions using "first" and "second" in the embodiments of the present invention are used for distinguishing two entities with the same name but different names or different parameters, and it should be noted that "first" and "second" are merely for convenience of description and should not be construed as limitations of the embodiments of the present invention, and they are not described in any more detail in the following embodiments.
In view of the above object, a first aspect of embodiments of the present invention proposes an embodiment of a data transmission method of a convolutional neural network that reduces the amount of communication data while ensuring convergence accuracy. Fig. 1 is a schematic flow chart of a data transmission method of a convolutional neural network provided by the present invention.
As shown in fig. 1, the data transmission method of the convolutional neural network includes dividing data to be transmitted into a plurality of arrays based on a data division manner, and sequentially performing the following steps for each array in response to the start of aggregation of the previous array:
step S101, invoking computing resources to execute sparse compression on an array in a source processing unit to generate a compressed array;
step S103, calling communication resources to execute a transmission mode-based protocol on the compressed array;
step S105, calling communication resources to perform transmission mode-based aggregation on the compressed array;
step S107, the computing resource is called to decompress the compressed array in the target processing unit so as to extract the array.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium, and can include the processes of the embodiments of the methods described above when the computer program is executed. The storage medium of the computer may be a magnetic disk, an optical disk, a read-only memory (ROM), a Random Access Memory (RAM), or the like. Embodiments of the computer program may achieve the same or similar effects as any of the preceding method embodiments to which it corresponds.
In some embodiments, performing the sparse compression on the array to generate the compressed array comprises:
extracting the value and position of each element from the array to form a pair of numbers;
deleting the element number pairs with the value of zero;
the remaining pairs of element numbers are combined to form a compressed array.
In some embodiments, further comprising: after deleting the element number pairs having a value of zero, additionally deleting element number pairs having a value less than the filtering threshold based on a predetermined filtering threshold.
In some embodiments, the data partitioning and transmission modes are determined based on the processing unit topology.
In some embodiments, the processing unit topology is determined based on the number and architecture of processing units used by the convolutional neural network.
In some embodiments, the data partitioning is an average allocation based on the number of processing units; the transmission mode is annular transmission or annular full-protocol transmission; the processing unit topology is a ring topology.
In some embodiments, further comprising: while the transport-based aggregation is being performed, the compute resources are also initially invoked to perform sparse compression on its next array.
In some embodiments, further comprising: and a transmission interface is pre-established for the convolutional neural network, and transmission mode-based reduction and aggregation are performed on the compressed array based on the transmission interface.
The following further illustrates embodiments of the invention in accordance with the embodiments shown in fig. 2 and 3.
Referring to fig. 2, the frame is mainly divided into three major parts: the method includes the steps that firstly, a deep learning framework data transmission interface is established, wherein the deep learning framework data transmission interface comprises a pyrrch, a TF, an mxnet and the like, the data transmission interface is consistent with nccl, and the universality of a program is guaranteed. Secondly, topology establishment and selection, namely, topology with lower delay is selected according to the gpu architecture establishment and by combining factors such as the size of data volume and the like. According to different topologies, transmission modes are different, and data division modes are also different, for example, in ring communication, each GPU can take Size/N data each time (Size is the total Size of data to be transmitted, and N is the number of GPUs). And thirdly, a sparse compression communication part, wherein the sparse storage mode adopts a row compression mode, and the transmission homogenization is a one-dimensional array form, so that the expression can be realized only by element values and column marks. For example, the transmission array is:
(0,6,0,0,7,0,0,0,0,0,0,0,2,0,0,1)
can be expressed as:
(1,4,12,15)(6,7,2,1)
it can be seen that with a sparseness of 25%, the amount of transmission is only 50% of the original amount of data. And the matrix after sparse compression can be subjected to reduction operation (summation, maximum value taking and the like) under the compression condition, so that the method has higher acceleration effect compared with the traditional compression method.
But sparse compression and decompression can take up computing time and affect the program efficiency. When the compression and decompression time is optimized and reduced, the same strategy is adopted as that adopted in the traditional compression, and pipeline is adopted to hide the sparse compression time, so that the program efficiency is improved. The compression of the second process is synchronously started in the ring aggregation in a manner as shown in fig. 3, and the communication bandwidth mainly occupied by the ring aggregation and the ring protocol is not large for the occupation of computing resources, so that the next transmission data can be subjected to sparse compression processing in the transmission process by using a pipeline, the compression time is hidden, and the program efficiency is improved.
The embodiment of the invention is based on the annular and tree communication, adopts a sparse compression method, reduces the data volume during transmission and improves the transmission bandwidth. In the case where the degree of thinning is 1/n of the source data, an acceleration ratio of n/2 times can be obtained at the highest. Tests prove that the deep learning framework cannot be negatively converged when a proper threshold value is obtained. Therefore, the communication bandwidth of the GPU is effectively improved through data sparsification, and the convergence of a deep learning model is guaranteed. The problems of low-speed network and low communication efficiency of the GPU are solved to a certain extent.
It can be seen from the foregoing embodiments that, in the data transmission method of a convolutional neural network provided in the embodiments of the present invention, a calculation resource is called to perform sparse compression on an array in a source processing unit to generate a compressed array; calling communication resources to execute a transmission mode-based protocol on the compressed array; calling communication resources to perform transmission mode-based aggregation on the compressed array; the technical scheme that the computing resources are called to decompress the compressed array in the target processing unit so as to extract the array can reduce the communication data volume under the condition of ensuring the convergence precision so as to improve the transmission efficiency, reduce the waiting time and improve the overall speed.
It should be particularly noted that, the steps in the embodiments of the data transmission method of the convolutional neural network described above can be mutually intersected, replaced, added, and deleted, so that the data transmission method of the convolutional neural network that is transformed by these reasonable permutations and combinations shall also belong to the scope of the present invention, and shall not limit the scope of the present invention to the described embodiments.
In view of the above object, a second aspect of the embodiments of the present invention proposes an embodiment of a data transmission apparatus of a convolutional neural network that reduces the amount of communication data while ensuring convergence accuracy. The data transmission device of the convolutional neural network comprises:
a processor; and
a memory storing program code executable by the processor, the program code when executed partitioning data to be transmitted into a plurality of arrays based on a data partitioning manner, and for each array performing the following steps in sequence in response to a previous array thereon starting aggregation:
calling computing resources to perform sparse compression on the array at the source processing unit to generate a compressed array;
calling communication resources to execute a transmission mode-based protocol on the compressed array;
calling communication resources to perform transmission mode-based aggregation on the compressed array;
the computing resource is invoked to perform decompression on the compressed array at the target processing unit to extract the array.
In some embodiments, the data partitioning manner and the transmission manner are both determined based on the topology of the processing unit; the processing unit topology is determined based on the number and architecture of processing units used by the convolutional neural network.
As can be seen from the foregoing embodiments, the data transmission apparatus of the convolutional neural network according to the embodiments of the present invention generates a compressed array by invoking computing resources to perform sparse compression on an array in a source processing unit; calling communication resources to execute a transmission mode-based protocol on the compressed array; calling communication resources to perform transmission mode-based aggregation on the compressed array; the technical scheme that the computing resources are called to decompress the compressed array in the target processing unit so as to extract the array can reduce the communication data volume under the condition of ensuring the convergence precision so as to improve the transmission efficiency, reduce the waiting time and improve the overall speed.
It should be particularly noted that, the above-mentioned embodiment of the data transmission apparatus of the convolutional neural network adopts the embodiment of the data transmission method of the convolutional neural network to specifically describe the working process of each module, and those skilled in the art can easily think that these modules are applied to other embodiments of the data transmission method of the convolutional neural network. Of course, since the steps in the data transmission method embodiment of the convolutional neural network can be mutually intersected, replaced, added, and deleted, the data transmission apparatus of the convolutional neural network that is transformed by these reasonable permutations and combinations shall also belong to the scope of the present invention, and shall not limit the scope of the present invention to the above embodiment.
The foregoing is an exemplary embodiment of the present disclosure, but it should be noted that various changes and modifications could be made herein without departing from the scope of the present disclosure as defined by the appended claims. The functions, steps and/or actions of the method claims in accordance with the disclosed embodiments described herein need not be performed in any particular order. Furthermore, although elements of the disclosed embodiments of the invention may be described or claimed in the singular, the plural is contemplated unless limitation to the singular is explicitly stated.
Those of ordinary skill in the art will understand that: the discussion of any embodiment above is meant to be exemplary only, and is not intended to intimate that the scope of the disclosure, including the claims, of embodiments of the invention is limited to these examples; within the idea of an embodiment of the invention, also technical features in the above embodiment or in different embodiments may be combined and there are many other variations of the different aspects of an embodiment of the invention as described above, which are not provided in detail for the sake of brevity. Therefore, any omissions, modifications, substitutions, improvements, and the like that may be made without departing from the spirit and principles of the embodiments of the present invention are intended to be included within the scope of the embodiments of the present invention.
Claims (10)
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202011104673.3A CN112261023A (en) | 2020-10-15 | 2020-10-15 | A kind of data transmission method and device of convolutional neural network |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202011104673.3A CN112261023A (en) | 2020-10-15 | 2020-10-15 | A kind of data transmission method and device of convolutional neural network |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| CN112261023A true CN112261023A (en) | 2021-01-22 |
Family
ID=74243614
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN202011104673.3A Pending CN112261023A (en) | 2020-10-15 | 2020-10-15 | A kind of data transmission method and device of convolutional neural network |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN112261023A (en) |
Cited By (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2022222578A1 (en) * | 2021-04-21 | 2022-10-27 | 华为技术有限公司 | Aggregation communication method and system, and computer device |
Citations (11)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN101621514A (en) * | 2009-07-24 | 2010-01-06 | 北京航空航天大学 | Network data compressing method, network system and synthesis center equipment |
| US20150067009A1 (en) * | 2013-08-30 | 2015-03-05 | Microsoft Corporation | Sparse matrix data structure |
| CN106775598A (en) * | 2016-12-12 | 2017-05-31 | 温州大学 | A kind of Symmetric Matrices method of the compression sparse matrix based on GPU |
| CN108229644A (en) * | 2016-12-15 | 2018-06-29 | 上海寒武纪信息科技有限公司 | The device of compression/de-compression neural network model, device and method |
| US20190190538A1 (en) * | 2017-12-18 | 2019-06-20 | Facebook, Inc. | Accelerator hardware for compression and decompression |
| CN110134636A (en) * | 2018-02-09 | 2019-08-16 | 中兴通讯股份有限公司 | Model training method, server and computer readable storage medium |
| CN110377288A (en) * | 2018-04-13 | 2019-10-25 | 赛灵思公司 | Neural network compresses compiler and its compiling compression method |
| CN110909870A (en) * | 2018-09-14 | 2020-03-24 | 中科寒武纪科技股份有限公司 | Training device and method |
| CN111324630A (en) * | 2020-03-04 | 2020-06-23 | 中科弘云科技(北京)有限公司 | MPI-based neural network architecture search parallelization method and equipment |
| CN111699695A (en) * | 2017-12-06 | 2020-09-22 | V-诺瓦国际有限公司 | Method and apparatus for decoding a received encoded data set |
| CN111737540A (en) * | 2020-05-27 | 2020-10-02 | 中国科学院计算技术研究所 | A graph data processing method and medium applied to a cluster of distributed computing nodes |
-
2020
- 2020-10-15 CN CN202011104673.3A patent/CN112261023A/en active Pending
Patent Citations (11)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN101621514A (en) * | 2009-07-24 | 2010-01-06 | 北京航空航天大学 | Network data compressing method, network system and synthesis center equipment |
| US20150067009A1 (en) * | 2013-08-30 | 2015-03-05 | Microsoft Corporation | Sparse matrix data structure |
| CN106775598A (en) * | 2016-12-12 | 2017-05-31 | 温州大学 | A kind of Symmetric Matrices method of the compression sparse matrix based on GPU |
| CN108229644A (en) * | 2016-12-15 | 2018-06-29 | 上海寒武纪信息科技有限公司 | The device of compression/de-compression neural network model, device and method |
| CN111699695A (en) * | 2017-12-06 | 2020-09-22 | V-诺瓦国际有限公司 | Method and apparatus for decoding a received encoded data set |
| US20190190538A1 (en) * | 2017-12-18 | 2019-06-20 | Facebook, Inc. | Accelerator hardware for compression and decompression |
| CN110134636A (en) * | 2018-02-09 | 2019-08-16 | 中兴通讯股份有限公司 | Model training method, server and computer readable storage medium |
| CN110377288A (en) * | 2018-04-13 | 2019-10-25 | 赛灵思公司 | Neural network compresses compiler and its compiling compression method |
| CN110909870A (en) * | 2018-09-14 | 2020-03-24 | 中科寒武纪科技股份有限公司 | Training device and method |
| CN111324630A (en) * | 2020-03-04 | 2020-06-23 | 中科弘云科技(北京)有限公司 | MPI-based neural network architecture search parallelization method and equipment |
| CN111737540A (en) * | 2020-05-27 | 2020-10-02 | 中国科学院计算技术研究所 | A graph data processing method and medium applied to a cluster of distributed computing nodes |
Cited By (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2022222578A1 (en) * | 2021-04-21 | 2022-10-27 | 华为技术有限公司 | Aggregation communication method and system, and computer device |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN110991634B (en) | Artificial intelligence accelerator, equipment, chip and data processing method | |
| CN110717574B (en) | Neural network operation method and device and heterogeneous intelligent chip | |
| CN112817730B (en) | Deep neural network service batch processing scheduling method and system and GPU | |
| WO2022105805A1 (en) | Data processing method and in-memory computing chip | |
| WO2022001141A1 (en) | Gpu communication method and device, and medium | |
| CN114281521B (en) | Method, system, equipment and medium for optimizing deep learning heterogeneous resource communication efficiency | |
| CN111079923A (en) | Spark convolution neural network system suitable for edge computing platform and circuit thereof | |
| CN107247623B (en) | A kind of distributed cluster system and data connecting method based on multi-core CPU | |
| US20230306236A1 (en) | Device and method for executing lstm neural network operation | |
| CN109993293B (en) | A Deep Learning Accelerator for Stacked Hourglass Networks | |
| CN116934571B (en) | Task processing method, device, electronic device and storage medium | |
| CN111242286A (en) | A data format conversion method, device and computer-readable storage medium | |
| CN114995782B (en) | Data processing method, apparatus, device and readable storage medium | |
| WO2025061202A1 (en) | Data processing method, apparatus and system for distributed cluster, and nonvolatile readable storage medium | |
| CN110600020B (en) | Gradient transmission method and device | |
| CN107947965B (en) | service chain compiler | |
| CN116227599A (en) | An optimization method, device, electronic equipment and storage medium for a reasoning model | |
| CN115130672A (en) | Method and device for calculating convolution neural network by software and hardware collaborative optimization | |
| CN112261023A (en) | A kind of data transmission method and device of convolutional neural network | |
| WO2016008317A1 (en) | Data processing method and central node | |
| WO2020238106A1 (en) | Data processing method, electronic apparatus, and computer-readable storage medium | |
| US20230083565A1 (en) | Image data processing method and apparatus, storage medium, and electronic device | |
| CN110163793B (en) | Convolution calculation acceleration method and device | |
| CN115374935B (en) | Pruning method of neural network | |
| CN114237864B (en) | A rapid training system and method for artificial intelligence models |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| RJ01 | Rejection of invention patent application after publication | ||
| RJ01 | Rejection of invention patent application after publication |
Application publication date: 20210122 |