[go: up one dir, main page]

CN101894051A - CPU-GPU Cooperative Computing Method Based on Primary and Secondary Data Structure - Google Patents

CPU-GPU Cooperative Computing Method Based on Primary and Secondary Data Structure Download PDF

Info

Publication number
CN101894051A
CN101894051A CN 201010244535 CN201010244535A CN101894051A CN 101894051 A CN101894051 A CN 101894051A CN 201010244535 CN201010244535 CN 201010244535 CN 201010244535 A CN201010244535 A CN 201010244535A CN 101894051 A CN101894051 A CN 101894051A
Authority
CN
China
Prior art keywords
data
cpu
gpu
data structure
major
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN 201010244535
Other languages
Chinese (zh)
Inventor
安虹
姚平
刘谷
徐光�
许牧
李小强
韩文廷
张倩
徐恒阳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Science and Technology of China USTC
Original Assignee
University of Science and Technology of China USTC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Science and Technology of China USTC filed Critical University of Science and Technology of China USTC
Priority to CN 201010244535 priority Critical patent/CN101894051A/en
Publication of CN101894051A publication Critical patent/CN101894051A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

本发明的实施例提出了一种基于主辅数据结构的CPU-GPU合作计算方法,包括以下步骤:根据处理的对象,确定主辅数据内容并进行初始化;启动CPU计算线程和GPU计算线程;读入待处理的数据,经过预处理后存储至主辅数据结构中,同时所述CPU计算线程和GPU计算线程将对主辅数据结构中的数据进行处理,直到没有数据为止。本发明提出的方案,能有效管理并行数据,使得GPGPU平台在处理有效计算量分布不平衡的数据库时,能够确保GPU上各线程负载平衡。本发明提出的上述方案,通过设计简单的、可重复使用的线程划分方法,使得CPU和GPU能够进行完全的并行计算,保持较高的利用率。

Figure 201010244535

The embodiment of the present invention proposes a CPU-GPU cooperative calculation method based on the main and auxiliary data structure, including the following steps: according to the object to be processed, determine the content of the main and auxiliary data and initialize it; start the CPU calculation thread and the GPU calculation thread; read The data to be processed is stored in the main and auxiliary data structures after preprocessing, and at the same time, the CPU computing thread and the GPU computing thread will process the data in the main and auxiliary data structures until there is no more data. The scheme proposed by the invention can effectively manage parallel data, so that when the GPGPU platform processes a database with an unbalanced distribution of effective calculation amount, it can ensure the load balance of each thread on the GPU. The above solution proposed by the present invention enables the CPU and GPU to perform complete parallel computing and maintain a high utilization rate by designing a simple and reusable thread division method.

Figure 201010244535

Description

CPU-GPU cooperative computing method based on major-minor data structure
Technical field
The present invention relates to computer realm, particularly, the present invention relates to CPU-GPU cooperative computing method based on major-minor data structure.
Background technology
The HPC field will reach the output of extreme efficiency, usually must see through a large amount of CPU links, CPU (Central Processing Unit, central processing unit) is the core of controlling computer operation, utilize parallel dispersion treatment to carry out computing, but not only program development difficulty of this structure height, the hardware volume is big, and power consumption is surprising especially.The rise of GPGPU (General-Purpose Computing on Graphics Processing Units, general-purpose computations graphic process unit) notion also is in order to remedy the weakness on these traditional C PU framework.
General single GPU (Graphics Processing Unit; graphic process unit) usually can be built-in tens of to hundreds of programmable processing units; utilize these processing units that specially is skillful in parallel computing as long as see through correct method, just can obtain very large operation efficiency and increase at some application.Also because of such characteristic, following GPGPU also is regarded as the high in the clouds computing, or even the possible solution of artificial intelligence.
Up to now, GPGPU more is subjected to the user certainly in the server application facet than general consumer computing, as in applications such as biomedicine, meteorological simulation, film industry, professional graphics process, can save many operation times by the GPGPU computing, but in the consumer application facet, the benefit that GPGPU brought is then more not obvious compared to professional application.
The characteristics of GPGPU are: CPU is as master control person, the operation system, handles input and output, control program flow process; GPU is as coprocessor, and operation needs a large amount of core function of calculating.
GPGPU faces two problems: 1) the last thread load balance problem of GPU.Because each thread uses identical code, caused the real work amount of each thread all the same, be maximum effectively amount of calculation.And in fact, the effective workload of each thread may be also different, therefore can cause the GPU laod unbalance.2) the utilization factor problem of CPU and GPU.Cooperative computation mode between CPU and GPU will directly influence their utilization factor.Under the synchronization call pattern, CPU calls and must wait for that its calculating finishes behind the GPU and just can carry out further work, makes that the utilization factor of CPU is lower; Under the asynchronous call pattern, though CPU can return immediately, when calculating, GPU carries out parallel computation after calling GPU, the size of this parallel computation amount is difficult to determine.If CPU parallel computation amount is too small, the utilization factor of CPU is still very low; If CPU parallel computation amount is excessive,, will cause the utilization factor of GPU low to such an extent as to need waiting for CPU to distribute new calculation task after GPU calculating is finished to it; When having only the required time of CPU parallel computation amount just identical, could obtain higher CPU and GPU utilization factor simultaneously, but will determine accurately that this CPU parallel computation amount is very difficult with GPU computing time.
Therefore, be necessary to propose a kind of otherwise effective technique scheme, to solve the problem of CPU-GPU cooperative computation.
Summary of the invention
Purpose of the present invention is intended to solve at least one of above-mentioned technological deficiency, particularly proposes a kind of effective CPU-GPU cooperative computation scheme, to improve the HPC of computing machine.
In order to achieve the above object, embodiments of the invention have proposed a kind of CPU-GPU cooperative computing method of major-minor data structure, may further comprise the steps:
According to the object of handling, determine major-minor data structure and carry out initialization;
Read in pending data, till not having data, and send data to CPU computational threads and GPU computational threads and read in end signal RF;
Described CPU computational threads and described GPU computational threads are handled the data of reading in.
According to embodiments of the invention, read in pending data and comprise:
Reading in a unit data, is master data and auxiliary data with its pre-service, is stored to respectively in corresponding master data management interval and the secondary data structure, and keeps mapping relations.
According to embodiments of the invention, described master data is the entity content of unit data of the object of described processing, and described auxiliary data is to describe the information of master data.
According to embodiments of the invention, described CPU computational threads is handled the data of reading in and be may further comprise the steps:
Steps A: judge whether to obtain the RF signal,, flag F L then is set for true, otherwise is set to vacation if obtain;
Step B: scan the master data management interval successively,, call CPU and handle, safeguard secondary data structure simultaneously to satisfying the interval of CPU treatment conditions;
Step C: the value of judge mark FL, if be true, then finish, otherwise continue execution in step A.
According to embodiments of the invention, described GPU computational threads is handled the data of reading in and be may further comprise the steps:
Step D: judge whether to obtain the RF signal,, flag F L then is set for true, otherwise is set to vacation if obtain;
Step e: scan the master data management interval successively,, call CPU and handle, safeguard secondary data structure simultaneously to satisfying the interval of CPU treatment conditions;
Step F: the value of judge mark FL, if be true, then finish, otherwise continue execution in step D.
The such scheme that the present invention proposes is effectively managed parallel data, makes the GPGPU platform when the effective calculated amount of processing distributes unbalanced database, can guarantee that GPU goes up each threads load balance.The such scheme that the present invention proposes by thread dividing method simplicity of design, reusable, makes CPU and GPU can carry out parallel computation completely, keeps higher utilization factor.
Aspect that the present invention adds and advantage part in the following description provide, and part will become obviously from the following description, or recognize by practice of the present invention.
Description of drawings
Above-mentioned and/or additional aspect of the present invention and advantage are from obviously and easily understanding becoming the description of embodiment below in conjunction with accompanying drawing, wherein:
Fig. 1 is the process flow diagram of the major-minor data structure CPU-GPU cooperative computing method of the embodiment of the invention;
Fig. 2 is a main auxiliary data structure synoptic diagram;
Fig. 3 reads in the thread process flow diagram for data;
Fig. 4 is a CPU computational threads process flow diagram;
Fig. 5 is a master control GPU computational threads process flow diagram.
Embodiment
Describe embodiments of the invention below in detail, the example of described embodiment is shown in the drawings, and wherein identical from start to finish or similar label is represented identical or similar elements or the element with identical or similar functions.Below by the embodiment that is described with reference to the drawings is exemplary, only is used to explain the present invention, and can not be interpreted as limitation of the present invention.
In order to realize the present invention's purpose, the invention discloses a kind of CPU-GPU cooperative computing method of major-minor data structure, may further comprise the steps:, determine major-minor data structure and carry out initialization according to the object of handling; Read in pending data, till not having data, and send data to CPU computational threads and GPU computational threads and read in end signal RF; Described CPU computational threads and described GPU computational threads are handled the data of reading in.
As shown in Figure 1, the process flow diagram for the CPU-GPU cooperative computing method of the major-minor data structure of the embodiment of the invention may further comprise the steps:
S110:, determine major-minor data structure and carry out initialization according to the object of handling.
In step S110, determine major-minor data structure and carry out initialization.Usually, master data is the entity content of unit data of the object of described processing, and auxiliary data is to describe the information of master data.
S120: read in all pending data, and send data to CPU computational threads and GPU computational threads and read in end signal RF.
In step S120, read in pending data, till not having data, and send data to CPU computational threads and GPU computational threads and read in end signal RF.
Particularly, reading in pending data comprises:
Reading in a unit data, is master data and auxiliary data with its pre-service, is stored to respectively in corresponding master data management interval and the secondary data structure, and keeps mapping relations.
S130:CPU computational threads and GPU computational threads are handled the data of reading in.
In step S130, CPU computational threads and described GPU computational threads are handled the data of reading in, and particularly, the CPU computational threads is handled the data of reading in and be may further comprise the steps:
Steps A: judge whether to obtain the RF signal,, flag F L then is set for true, otherwise is set to vacation if obtain;
Step B: scan the master data management interval successively,, call CPU and handle, safeguard secondary data structure simultaneously to satisfying the interval of CPU treatment conditions;
Step C: the value of judge mark FL, if be true, then finish, otherwise continue execution in step A.
The GPU computational threads is handled the data of reading in and be may further comprise the steps:
Step e: judge whether to obtain the RF signal,, flag F L then is set for true, otherwise is set to vacation if obtain;
Step F: scan the master data management interval successively,, call CPU and handle, safeguard secondary data structure simultaneously to satisfying the interval of CPU treatment conditions;
Step G: the value of judge mark FL, if be true, then finish, otherwise continue execution in step E.
For the ease of understanding the present invention, the above-mentioned disclosed scheme of the present invention is further launched to describe.
Whole calculation tasks of a program can be divided into main processing procedure and auxiliary process process, wherein main processing procedure is the part that calculated amount is mainly concentrated in the calculation task, and the auxiliary process process is the computation process outside the main processing procedure, and calculated amount is less.Main processing procedure is not all to be suitable for carrying out on GPU yet, and this is determined by the program self character and to the requirement of program feature, therefore need divide calculation task between GPU and CPU, comes deal with data in the mode of cooperative computation.
At first define two notions:
Master data: the entity content in the unit data, this part data is handled by main processing procedure.
Auxiliary data: the other parts data in the unit data except that master data, in the auxiliary process process, use.This part data may be sky, and promptly unit data belongs to master data all, does not need the auxiliary process process.If this part data is not empty, then in the auxiliary process process, handle.
Be the definition and the feature of master data structure and secondary data structure below:
Master data structure: master data structure is used for management and storage master data.
, as criteria for classification the master data by stages is managed according to the required effective calculated amount of master data.Effective calculated amount of master data is determined by master data size, length or further feature, has characterized main processing procedure and has handled the required calculated amount of master data.
The division in master data management interval need decide according to the statistical distribution feature of the effective calculated amount of master data.The purpose of subregion is to decide according to each interval dense degree that goes up data and is divided into GPU and goes up and calculate or CPU goes up and calculates, and in general, data-intensive short interval is suitable for GPU most handles, and the sparse long interval of data is suitable for the CPU processing.
The buffer zone in each master data management interval is used for storage and is divided into master data on this interval, and the size of buffer zone is to be preestablished by the programmer.After satisfying predetermined condition, for example buffer zone is full, and the master data in interval responsible will the buffering of master data management submits to CPU or GPU handles, and carries out follow-up operation afterwards, for example empties buffer zone.
Secondary data structure: secondary data structure is used for management and storage auxiliary data.
Secondary data structure need keep the mapping relations of each master data and corresponding auxiliary data.
Utilize major-minor data structure, the present invention designs a kind of CPU-GPU cooperative computing method.As embodiments of the invention, for example, 3 threads are set on CPU: data are read in thread, CPU computational threads, master control GPU computational threads, mutual asynchronous execution.
Describe each thread operation content below.
Data are read in thread and are operated on the CPU, and its responsibility is to be responsible for all man-machine interactions, reads in data and write the master/auxiliary data structure body array, supervise the execution of two other thread from data source.The operational process that data are read in thread is:
1) the major-minor data structure of initialization;
2) start CPU computational threads and master control GPU computational threads;
3) reading in a unit data, is master data and auxiliary data with its pre-service, is stored to respectively in corresponding master data management interval and the secondary data structure, and keeps mapping relations;
4) continue to read in data, till not having data;
5) send data to CPU computational threads and GPU computational threads and read in end signal RF, wait for their end;
6) carry out necessary aftertreatment.
CPU computational threads responsibility is to handle the master data that suitable CPU handles, and as replenishing that GPU calculates, two kinds of functions is arranged:
1) handles the data that GPU is bad to handle, and use GPU to handle the data that can not obtain to quicken benefit;
2) utilization factor of raising CPU makes it no longer wait for finishing of GPU task.
The operational process of CPU computational threads is as follows:
1) judges whether to obtain the RF signal,, flag F L then is set for true, otherwise is set to vacation if obtain;
2) scan the master data management interval successively,, call CPU and handle, safeguard secondary data structure simultaneously satisfying the interval of CPU treatment conditions;
3) value of judge mark FL if be true, then finish, otherwise changes 1).
Master control GPU computational threads responsibility is to handle the master data that suitable GPU handles.The operational process of master control GPU computational threads is as follows:
1) judges whether to obtain the RF signal,, flag F L then is set for true, otherwise is set to vacation if obtain;
2) scan the master data management interval successively,, call GPU and handle, safeguard secondary data structure simultaneously satisfying the interval of GPU treatment conditions;
3) value of judge mark FL if be true, then finish, otherwise changes 1).
From the foregoing description as can be seen, so-called cooperative computation is meant that CPU and GPU can independently handle data simultaneously, just carry out the division of task according to the given condition of programmer among the present invention.For example, the programmer can stipulate a threshold value, go up the data number between the main data area less than the handling of this threshold value, and handle greater than the GPU that gives of this threshold value by CPU, or the like.
The such scheme that the present invention proposes is effectively managed parallel data, makes the GPGPU platform when the effective calculated amount of processing distributes unbalanced database, can guarantee that GPU goes up each threads load balance.The such scheme that the present invention proposes by thread dividing method simplicity of design, reusable, makes CPU and GPU can carry out parallel computation completely, keeps higher utilization factor.
Technical scheme for a better understanding of the present invention below is further described the present invention by further embodiment.
Use Hmmsearch with bioinformatics below and on the CUDA platform, be embodied as example, describe the specific embodiment of the present invention in detail.Hmmsearch is used for protein sequence database is inquired about, thereby obtains some character of target protein sequence.
The core function of Hmmsearch is P7_vitebi, and the realization of this function on the CUDA platform is called P7_vitebi_kernel.
Write the CUDA code of Hmmsearch according to the present invention.Wherein, major-minor data structure is achieved as follows:
The master data implication is the actual content of protein; The auxiliary data implication is name, length, calibration information of protein or the like.The index value implication is the subscript of the corresponding auxiliary data of certain master data in secondary data structure body array.
Master/auxiliary data structure is divided into two parts: master data structure body array and secondary data structure body array, synoptic diagram as shown in Figure 2.
Master data structure body array length is set to 64, and 64 effective calculated amount intervals are promptly arranged.
The master data management interval is realized that by the master data structure body each element implication in it is as follows:
(1) the protein sequence length of interval that can be managed by this structure is described in effective calculated amount interval.Preceding 60 burst lengths are 32, the protein sequence of management length between 0-1920; The 61st, 62,63,64 intervals manage respectively length [1920,2320), [2320,2720), [2720,3120), [and 3120,37000) between protein sequence;
(2) the maximum master data amount that max_num, this structure can manage is set to 4096;
(3) curren_num, the master data amount of the current management of this structure;
(4) pbuffer[2], size is 2 array of pointers, point to two can a store M AX_NUM master data array (this array adopts dynamic memory management method to manage);
(5) pindex[2], size is 2 array of pointers, point to two can a store M AX_NUM index value array (this array adopts dynamic memory management method to manage), for example, pindex[0] i element of indication array be exactly pbuffer[0] index value of i master data of indication array;
(6) full[2], size is 2 shaping array, expression pbuffer[2] whether the indication zone stored MAX_NUM master data, if be filled with, just can submit to GPU and calculate.For example, full[0] be 0 o'clock, expression pbuffer[0] also be not filled with full[1] and be 1 o'clock, expression pbuffer[1] be filled with;
(7) current_index, value is 0 or 1, is used for expression when which storage of array master data of forward direction, for example, current_index is 1 o'clock, expression is to pbuffer[1] the array stored master data pointed to.
Each element implication is as follows in the secondary data structure body:
(1) structure numbering No. is used for the subscript of minute book structure in array;
(2) next is used to write down the numbering of next idle structure;
(3) paiddata is used in reference to the memory headroom to an auxiliary data of storage.
Simultaneously,, will be organized into an idle structure chained list,, come the gauge outfit of this idle chained list of mark by freelist_head for the idle structure in the whole array.
Data are read in the thread process flow diagram shown in 3, and operational process is as follows:
1) will to master data structure body array (Maindata Management Array, MMA) to carry out initialization as follows for each structure in:
(a)Current_num:0;
(b)pbuffer[0],pbuffer[1]:NULL;
(c)pindex[0],pindex[1]:NULL;
(d)full[0],full[1]:0;
(e)current_index:0;
Then to secondary data structure body array (Aiddata Management Array, AMA) to carry out initialization as follows for each structure in:
(a) No:i (subscript in array);
(b)next:i+1;
(c)paiddata:NULL;
Again with the initialization of idle structure linked list head:
(a)Freelist_head:0
Start CPU computational threads and GPU computational threads then.
2) whether judgment data has been read in and has been finished, and is then to change 6), otherwise change 3).
3) read in a protein sequence, with the actual content of protein as master data, with the name of protein, length, calibration information or the like as auxiliary data.
4) auxiliary data being stored to idle chain gauge outfit node, writing down the No value of this node, is the node that is designated as Next down with the idle chain top-of-form set.
5) obtain protein sequence length, judge its place length of interval, be stored to then among the member of corresponding MM, record 4) in the No value that obtains in the manipulative indexing array.Change 2 then).
6) data have been read in and have been finished, and send data to CPU computational threads and GPU computational threads and read in end signal RF.
7) end of waiting for CPU computational threads and GPU computational threads.
8) aftertreatment data, termination routine.
CPU computational threads process flow diagram is shown in 4, and operational process is as follows:
1) judges whether to obtain the RF signal,, flag F L then is set for true, otherwise is set to vacation if obtain.
2) the variable i value being set is 0
3) if i the member's of MM (MMA[i]) pending primary data store array full (completely being masked as 1), and the master data in this management structure body meets the CPU treatment conditions that set in advance, and then changes 4), otherwise change 6).
4) call P7_vitebi, use CPU that master data is handled, be provided with after finishing and completely be masked as 0.
5) the index stores array of the primary data store array correspondence of handling in the traversal (4) obtains index value IDX, and the node that is designated as IDX among the AMA is down joined in the idle node chained list.
6) i increases by 1.
7) if i greater than 63, changes 8), otherwise change 3).
8) value of judge mark FL if be true, changes 9), otherwise change 1).
9) send end signal to main thread, finish.
Master control GPU computational threads process flow diagram is shown in 5, and operational process is as follows:
1) judges whether to obtain the RF signal,, flag F L then is set for true, otherwise is set to vacation if obtain.
2) the variable i value being set is 0
3) if i the member's of MM (MMA[i]) pending primary data store array full (completely being masked as 1), and the master data in this management structure body meets the GPU treatment conditions that set in advance, and then changes 4), otherwise change 6).
4) call P7_vitebi_kernel, use GPU that master data is handled, be provided with after finishing and completely be masked as 0.
The index stores array of the primary data store array correspondence of handling 5) traversal 4) obtains index value IDX, and the node that is designated as IDX among the AMA is down joined in the idle node chained list.
6) i increases by 1.
7) if i greater than 63, changes 8), otherwise change 3).
8) value of judge mark FL if be true, changes 9), otherwise change 1).
9) send end signal to main thread, finish.
One of ordinary skill in the art will appreciate that and realize that all or part of step that the foregoing description method is carried is to instruct relevant hardware to finish by program, described program can be stored in a kind of computer-readable recording medium, this program comprises one of step or its combination of method embodiment when carrying out.
In addition, each functional unit in each embodiment of the present invention can be integrated in the processing module, also can be that the independent physics in each unit exists, and also can be integrated in the module two or more unit.Above-mentioned integrated module both can adopt the form of hardware to realize, also can adopt the form of software function module to realize.If described integrated module realizes with the form of software function module and during as independently production marketing or use, also can be stored in the computer read/write memory medium.
The above-mentioned storage medium of mentioning can be a ROM (read-only memory), disk or CD etc.
The above only is a preferred implementation of the present invention; should be pointed out that for those skilled in the art, under the prerequisite that does not break away from the principle of the invention; can also make some improvements and modifications, these improvements and modifications also should be considered as protection scope of the present invention.

Claims (5)

1. the CPU-GPU cooperative computing method of a major-minor data structure is characterized in that, may further comprise the steps:
According to the object of handling, determine major-minor data content and carry out initialization;
Start CPU computational threads and GPU computational threads;
Read in pending data, be stored in the major-minor data structure through after the pre-service, described CPU computational threads and GPU computational threads will be handled the data in the major-minor data structure simultaneously, till not having data.
2. the CPU-GPU cooperative computing method of major-minor data structure as claimed in claim 1 is characterized in that, reads in pending data and comprises:
Reading in a unit data, is master data and auxiliary data with its pre-service, is stored to respectively in corresponding master data management interval and the secondary data structure, and keeps mapping relations.
3. the CPU-GPU cooperative computing method of major-minor data structure as claimed in claim 2 is characterized in that, described master data is the entity content of unit data of the object of described processing, and described auxiliary data is to describe the information of master data.
4. the CPU-GPU cooperative computing method of major-minor data structure as claimed in claim 3 is characterized in that, described CPU computational threads is handled the data of reading in and be may further comprise the steps:
Steps A: judge whether to obtain the RF signal,, flag F L then is set for true, otherwise is set to vacation if obtain;
Step B: scan the master data management interval successively,, call CPU and handle, safeguard secondary data structure simultaneously to satisfying the interval of CPU treatment conditions;
Step C: the value of judge mark FL, if be true, then finish, otherwise continue execution in step A.
5. the CPU-GPU cooperative computing method of major-minor data structure as claimed in claim 3 is characterized in that, described GPU computational threads is handled the data of reading in and be may further comprise the steps:
Step D: judge whether to obtain the RF signal,, flag F L then is set for true, otherwise is set to vacation if obtain;
Step e: scan the master data management interval successively,, call GPU and handle, safeguard secondary data structure simultaneously to satisfying the interval of CPU treatment conditions;
Step F: the value of judge mark FL, if be true, then finish, otherwise continue execution in step D.
CN 201010244535 2010-07-29 2010-07-29 CPU-GPU Cooperative Computing Method Based on Primary and Secondary Data Structure Pending CN101894051A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN 201010244535 CN101894051A (en) 2010-07-29 2010-07-29 CPU-GPU Cooperative Computing Method Based on Primary and Secondary Data Structure

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN 201010244535 CN101894051A (en) 2010-07-29 2010-07-29 CPU-GPU Cooperative Computing Method Based on Primary and Secondary Data Structure

Publications (1)

Publication Number Publication Date
CN101894051A true CN101894051A (en) 2010-11-24

Family

ID=43103247

Family Applications (1)

Application Number Title Priority Date Filing Date
CN 201010244535 Pending CN101894051A (en) 2010-07-29 2010-07-29 CPU-GPU Cooperative Computing Method Based on Primary and Secondary Data Structure

Country Status (1)

Country Link
CN (1) CN101894051A (en)

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102591418A (en) * 2010-12-16 2012-07-18 微软公司 Scalable multimedia computer system architecture with qos guarantees
CN103888771A (en) * 2013-12-30 2014-06-25 中山大学深圳研究院 Parallel video image processing method based on GPGPU technology
WO2014139140A1 (en) * 2013-03-15 2014-09-18 Hewlett-Packard Development Company, L.P. Co-processor-based array-oriented database processing
CN104102546A (en) * 2014-07-23 2014-10-15 浪潮(北京)电子信息产业有限公司 Method and system for realizing CPU (central processing unit) and GPU (graphics processing unit) load balance
WO2014206233A1 (en) * 2013-06-25 2014-12-31 华为技术有限公司 Data processing method and device
CN108460458A (en) * 2017-01-06 2018-08-28 谷歌有限责任公司 Execute computational graphs on graphics processing units
CN109753134A (en) * 2018-12-24 2019-05-14 四川大学 A GPU internal energy consumption control system and method based on global decoupling
CN111160551A (en) * 2019-12-04 2020-05-15 上海寒武纪信息科技有限公司 Computational graph execution method, computer device and storage medium
CN112989082A (en) * 2021-05-20 2021-06-18 南京甄视智能科技有限公司 CPU and GPU mixed self-adaptive face searching method and system
CN114170696A (en) * 2021-12-16 2022-03-11 华南理工大学 Real-time toll calculation system and method for differential charging of expressway
CN114254357A (en) * 2021-12-22 2022-03-29 上海阵方科技有限公司 Data processing method and device based on privacy protection and server
CN114924876A (en) * 2022-05-11 2022-08-19 平安科技(深圳)有限公司 Voiceprint recognition method and device based on distributed heterogeneous operation and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101091175A (en) * 2004-09-16 2007-12-19 辉达公司 load balancing
CN101526934A (en) * 2009-04-21 2009-09-09 浪潮电子信息产业股份有限公司 Construction method of GPU and CPU combined processor
US20100118041A1 (en) * 2008-11-13 2010-05-13 Hu Chen Shared virtual memory

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101091175A (en) * 2004-09-16 2007-12-19 辉达公司 load balancing
US20100118041A1 (en) * 2008-11-13 2010-05-13 Hu Chen Shared virtual memory
CN101526934A (en) * 2009-04-21 2009-09-09 浪潮电子信息产业股份有限公司 Construction method of GPU and CPU combined processor

Cited By (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102591418B (en) * 2010-12-16 2015-07-01 微软公司 Scalable multimedia computer system architecture with QOS guarantees
CN102591418A (en) * 2010-12-16 2012-07-18 微软公司 Scalable multimedia computer system architecture with qos guarantees
WO2014139140A1 (en) * 2013-03-15 2014-09-18 Hewlett-Packard Development Company, L.P. Co-processor-based array-oriented database processing
WO2014206233A1 (en) * 2013-06-25 2014-12-31 华为技术有限公司 Data processing method and device
CN103888771A (en) * 2013-12-30 2014-06-25 中山大学深圳研究院 Parallel video image processing method based on GPGPU technology
CN104102546A (en) * 2014-07-23 2014-10-15 浪潮(北京)电子信息产业有限公司 Method and system for realizing CPU (central processing unit) and GPU (graphics processing unit) load balance
CN104102546B (en) * 2014-07-23 2018-02-02 浪潮(北京)电子信息产业有限公司 A kind of method and system for realizing CPU and GPU load balancing
US12190404B2 (en) 2017-01-06 2025-01-07 Google Llc Executing computational graphs on graphics processing units
CN108460458A (en) * 2017-01-06 2018-08-28 谷歌有限责任公司 Execute computational graphs on graphics processing units
CN109753134A (en) * 2018-12-24 2019-05-14 四川大学 A GPU internal energy consumption control system and method based on global decoupling
CN109753134B (en) * 2018-12-24 2022-04-15 四川大学 A GPU internal energy consumption control system and method based on global decoupling
CN111160551B (en) * 2019-12-04 2023-09-29 上海寒武纪信息科技有限公司 Calculation map execution method, computer device, and storage medium
CN111160551A (en) * 2019-12-04 2020-05-15 上海寒武纪信息科技有限公司 Computational graph execution method, computer device and storage medium
CN112989082A (en) * 2021-05-20 2021-06-18 南京甄视智能科技有限公司 CPU and GPU mixed self-adaptive face searching method and system
CN112989082B (en) * 2021-05-20 2021-07-23 南京甄视智能科技有限公司 CPU and GPU mixed self-adaptive face searching method and system
CN114170696A (en) * 2021-12-16 2022-03-11 华南理工大学 Real-time toll calculation system and method for differential charging of expressway
CN114254357A (en) * 2021-12-22 2022-03-29 上海阵方科技有限公司 Data processing method and device based on privacy protection and server
CN114254357B (en) * 2021-12-22 2025-04-29 上海阵方科技有限公司 Data processing method, device and server based on privacy protection
CN114924876A (en) * 2022-05-11 2022-08-19 平安科技(深圳)有限公司 Voiceprint recognition method and device based on distributed heterogeneous operation and storage medium

Similar Documents

Publication Publication Date Title
CN101894051A (en) CPU-GPU Cooperative Computing Method Based on Primary and Secondary Data Structure
EP3698293B1 (en) Neural network processing system having multiple processors and a neural network accelerator
CN101802874B (en) Fragment shader bypass in a graphics processing unit, and apparatus and method thereof
US8676874B2 (en) Data structure for tiling and packetizing a sparse matrix
US8762655B2 (en) Optimizing output vector data generation using a formatted matrix data structure
US11687242B1 (en) FPGA board memory data reading method and apparatus, and medium
JP2010033561A (en) Method and apparatus for partitioning and sorting data set on multiprocessor system
US20210406209A1 (en) Allreduce enhanced direct memory access functionality
CN110333946A (en) One kind being based on artificial intelligence cpu data processing system and method
US9170836B2 (en) System and method for re-factorizing a square matrix into lower and upper triangular matrices on a parallel processor
CN105653204A (en) Distributed graph calculation method based on disk
CN107229995A (en) Realize method, device and computer-readable recording medium that game service amount is estimated
CN115357554A (en) A graph neural network compression method, device, electronic equipment and storage medium
CN102591709A (en) Shapefile master-slave type parallel writing method based on OGR (open geospatial rule)
CN104657111A (en) Parallel computing method and device
CN110516316A (en) A GPU Accelerated Method for Solving Euler's Equation by Discontinuous Galerkin Method
CN113886080A (en) High-performance cluster task scheduling method, device, electronic device and storage medium
CN106295670A (en) Data processing method and data processing equipment
Jiang et al. GLARE: Accelerating Sparse DNN Inference Kernels with Global Memory Access Reduction
CN115202848A (en) Task scheduling method, system, device and storage medium for convolutional neural network
CN115237599B (en) Rendering task processing method and device
CN116860999A (en) Ultra-large language model distributed pre-training method, device, equipment and medium
CN102819454A (en) Finite element explicit parallel solving and simulating method based on graphic processing unit (GPU)
CN110473593A (en) A kind of Smith-Waterman algorithm implementation method and device based on FPGA
CN112051981B (en) Data pipeline calculation path structure and single-thread data pipeline system

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C02 Deemed withdrawal of patent application after publication (patent law 2001)
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20101124