Embodiment
      Describe embodiments of the invention below in detail, the example of described embodiment is shown in the drawings, and wherein identical from start to finish or similar label is represented identical or similar elements or the element with identical or similar functions.Below by the embodiment that is described with reference to the drawings is exemplary, only is used to explain the present invention, and can not be interpreted as limitation of the present invention.
      In order to realize the present invention's purpose, the invention discloses a kind of CPU-GPU cooperative computing method of major-minor data structure, may further comprise the steps:, determine major-minor data structure and carry out initialization according to the object of handling; Read in pending data, till not having data, and send data to CPU computational threads and GPU computational threads and read in end signal RF; Described CPU computational threads and described GPU computational threads are handled the data of reading in.
      As shown in Figure 1, the process flow diagram for the CPU-GPU cooperative computing method of the major-minor data structure of the embodiment of the invention may further comprise the steps:
      S110:, determine major-minor data structure and carry out initialization according to the object of handling.
      In step S110, determine major-minor data structure and carry out initialization.Usually, master data is the entity content of unit data of the object of described processing, and auxiliary data is to describe the information of master data.
      S120: read in all pending data, and send data to CPU computational threads and GPU computational threads and read in end signal RF.
      In step S120, read in pending data, till not having data, and send data to CPU computational threads and GPU computational threads and read in end signal RF.
      Particularly, reading in pending data comprises:
      Reading in a unit data, is master data and auxiliary data with its pre-service, is stored to respectively in corresponding master data management interval and the secondary data structure, and keeps mapping relations.
      S130:CPU computational threads and GPU computational threads are handled the data of reading in.
      In step S130, CPU computational threads and described GPU computational threads are handled the data of reading in, and particularly, the CPU computational threads is handled the data of reading in and be may further comprise the steps:
      Steps A: judge whether to obtain the RF signal,, flag F L then is set for true, otherwise is set to vacation if obtain;
      Step B: scan the master data management interval successively,, call CPU and handle, safeguard secondary data structure simultaneously to satisfying the interval of CPU treatment conditions;
      Step C: the value of judge mark FL, if be true, then finish, otherwise continue execution in step A.
      The GPU computational threads is handled the data of reading in and be may further comprise the steps:
      Step e: judge whether to obtain the RF signal,, flag F L then is set for true, otherwise is set to vacation if obtain;
      Step F: scan the master data management interval successively,, call CPU and handle, safeguard secondary data structure simultaneously to satisfying the interval of CPU treatment conditions;
      Step G: the value of judge mark FL, if be true, then finish, otherwise continue execution in step E.
      For the ease of understanding the present invention, the above-mentioned disclosed scheme of the present invention is further launched to describe.
      Whole calculation tasks of a program can be divided into main processing procedure and auxiliary process process, wherein main processing procedure is the part that calculated amount is mainly concentrated in the calculation task, and the auxiliary process process is the computation process outside the main processing procedure, and calculated amount is less.Main processing procedure is not all to be suitable for carrying out on GPU yet, and this is determined by the program self character and to the requirement of program feature, therefore need divide calculation task between GPU and CPU, comes deal with data in the mode of cooperative computation.
      At first define two notions:
      Master data: the entity content in the unit data, this part data is handled by main processing procedure.
      Auxiliary data: the other parts data in the unit data except that master data, in the auxiliary process process, use.This part data may be sky, and promptly unit data belongs to master data all, does not need the auxiliary process process.If this part data is not empty, then in the auxiliary process process, handle.
      Be the definition and the feature of master data structure and secondary data structure below:
      Master data structure: master data structure is used for management and storage master data.
      , as criteria for classification the master data by stages is managed according to the required effective calculated amount of master data.Effective calculated amount of master data is determined by master data size, length or further feature, has characterized main processing procedure and has handled the required calculated amount of master data.
      The division in master data management interval need decide according to the statistical distribution feature of the effective calculated amount of master data.The purpose of subregion is to decide according to each interval dense degree that goes up data and is divided into GPU and goes up and calculate or CPU goes up and calculates, and in general, data-intensive short interval is suitable for GPU most handles, and the sparse long interval of data is suitable for the CPU processing.
      The buffer zone in each master data management interval is used for storage and is divided into master data on this interval, and the size of buffer zone is to be preestablished by the programmer.After satisfying predetermined condition, for example buffer zone is full, and the master data in interval responsible will the buffering of master data management submits to CPU or GPU handles, and carries out follow-up operation afterwards, for example empties buffer zone.
      Secondary data structure: secondary data structure is used for management and storage auxiliary data.
      Secondary data structure need keep the mapping relations of each master data and corresponding auxiliary data.
      Utilize major-minor data structure, the present invention designs a kind of CPU-GPU cooperative computing method.As embodiments of the invention, for example, 3 threads are set on CPU: data are read in thread, CPU computational threads, master control GPU computational threads, mutual asynchronous execution.
      Describe each thread operation content below.
      Data are read in thread and are operated on the CPU, and its responsibility is to be responsible for all man-machine interactions, reads in data and write the master/auxiliary data structure body array, supervise the execution of two other thread from data source.The operational process that data are read in thread is:
      1) the major-minor data structure of initialization;
      2) start CPU computational threads and master control GPU computational threads;
      3) reading in a unit data, is master data and auxiliary data with its pre-service, is stored to respectively in corresponding master data management interval and the secondary data structure, and keeps mapping relations;
      4) continue to read in data, till not having data;
      5) send data to CPU computational threads and GPU computational threads and read in end signal RF, wait for their end;
      6) carry out necessary aftertreatment.
      CPU computational threads responsibility is to handle the master data that suitable CPU handles, and as replenishing that GPU calculates, two kinds of functions is arranged:
      1) handles the data that GPU is bad to handle, and use GPU to handle the data that can not obtain to quicken benefit;
      2) utilization factor of raising CPU makes it no longer wait for finishing of GPU task.
      The operational process of CPU computational threads is as follows:
      1) judges whether to obtain the RF signal,, flag F L then is set for true, otherwise is set to vacation if obtain;
      2) scan the master data management interval successively,, call CPU and handle, safeguard secondary data structure simultaneously satisfying the interval of CPU treatment conditions;
      3) value of judge mark FL if be true, then finish, otherwise changes 1).
      Master control GPU computational threads responsibility is to handle the master data that suitable GPU handles.The operational process of master control GPU computational threads is as follows:
      1) judges whether to obtain the RF signal,, flag F L then is set for true, otherwise is set to vacation if obtain;
      2) scan the master data management interval successively,, call GPU and handle, safeguard secondary data structure simultaneously satisfying the interval of GPU treatment conditions;
      3) value of judge mark FL if be true, then finish, otherwise changes 1).
      From the foregoing description as can be seen, so-called cooperative computation is meant that CPU and GPU can independently handle data simultaneously, just carry out the division of task according to the given condition of programmer among the present invention.For example, the programmer can stipulate a threshold value, go up the data number between the main data area less than the handling of this threshold value, and handle greater than the GPU that gives of this threshold value by CPU, or the like.
      The such scheme that the present invention proposes is effectively managed parallel data, makes the GPGPU platform when the effective calculated amount of processing distributes unbalanced database, can guarantee that GPU goes up each threads load balance.The such scheme that the present invention proposes by thread dividing method simplicity of design, reusable, makes CPU and GPU can carry out parallel computation completely, keeps higher utilization factor.
      Technical scheme for a better understanding of the present invention below is further described the present invention by further embodiment.
      Use Hmmsearch with bioinformatics below and on the CUDA platform, be embodied as example, describe the specific embodiment of the present invention in detail.Hmmsearch is used for protein sequence database is inquired about, thereby obtains some character of target protein sequence.
      The core function of Hmmsearch is P7_vitebi, and the realization of this function on the CUDA platform is called P7_vitebi_kernel.
      Write the CUDA code of Hmmsearch according to the present invention.Wherein, major-minor data structure is achieved as follows:
      The master data implication is the actual content of protein; The auxiliary data implication is name, length, calibration information of protein or the like.The index value implication is the subscript of the corresponding auxiliary data of certain master data in secondary data structure body array.
      Master/auxiliary data structure is divided into two parts: master data structure body array and secondary data structure body array, synoptic diagram as shown in Figure 2.
      Master data structure body array length is set to 64, and 64 effective calculated amount intervals are promptly arranged.
      The master data management interval is realized that by the master data structure body each element implication in it is as follows:
      (1) the protein sequence length of interval that can be managed by this structure is described in effective calculated amount interval.Preceding 60 burst lengths are 32, the protein sequence of management length between 0-1920; The 61st, 62,63,64 intervals manage respectively length [1920,2320), [2320,2720), [2720,3120), [and 3120,37000) between protein sequence;
      (2) the maximum master data amount that max_num, this structure can manage is set to 4096;
      (3) curren_num, the master data amount of the current management of this structure;
      (4) pbuffer[2], size is 2 array of pointers, point to two can a store M AX_NUM master data array (this array adopts dynamic memory management method to manage);
      (5) pindex[2], size is 2 array of pointers, point to two can a store M AX_NUM index value array (this array adopts dynamic memory management method to manage), for example, pindex[0] i element of indication array be exactly pbuffer[0] index value of i master data of indication array;
      (6) full[2], size is 2 shaping array, expression pbuffer[2] whether the indication zone stored MAX_NUM master data, if be filled with, just can submit to GPU and calculate.For example, full[0] be 0 o'clock, expression pbuffer[0] also be not filled with full[1] and be 1 o'clock, expression pbuffer[1] be filled with;
      (7) current_index, value is 0 or 1, is used for expression when which storage of array master data of forward direction, for example, current_index is 1 o'clock, expression is to pbuffer[1] the array stored master data pointed to.
      Each element implication is as follows in the secondary data structure body:
      (1) structure numbering No. is used for the subscript of minute book structure in array;
      (2) next is used to write down the numbering of next idle structure;
      (3) paiddata is used in reference to the memory headroom to an auxiliary data of storage.
      Simultaneously,, will be organized into an idle structure chained list,, come the gauge outfit of this idle chained list of mark by freelist_head for the idle structure in the whole array.
      Data are read in the thread process flow diagram shown in 3, and operational process is as follows:
      1) will to master data structure body array (Maindata Management Array, MMA) to carry out initialization as follows for each structure in:
      (a)Current_num:0;
      (b)pbuffer[0],pbuffer[1]:NULL;
      (c)pindex[0],pindex[1]:NULL;
      (d)full[0],full[1]:0;
      (e)current_index:0;
      Then to secondary data structure body array (Aiddata Management Array, AMA) to carry out initialization as follows for each structure in:
      (a) No:i (subscript in array);
      (b)next:i+1;
      (c)paiddata:NULL;
      Again with the initialization of idle structure linked list head:
      (a)Freelist_head:0
      Start CPU computational threads and GPU computational threads then.
      2) whether judgment data has been read in and has been finished, and is then to change 6), otherwise change 3).
      3) read in a protein sequence, with the actual content of protein as master data, with the name of protein, length, calibration information or the like as auxiliary data.
      4) auxiliary data being stored to idle chain gauge outfit node, writing down the No value of this node, is the node that is designated as Next down with the idle chain top-of-form set.
      5) obtain protein sequence length, judge its place length of interval, be stored to then among the member of corresponding MM, record 4) in the No value that obtains in the manipulative indexing array.Change 2 then).
      6) data have been read in and have been finished, and send data to CPU computational threads and GPU computational threads and read in end signal RF.
      7) end of waiting for CPU computational threads and GPU computational threads.
      8) aftertreatment data, termination routine.
      CPU computational threads process flow diagram is shown in 4, and operational process is as follows:
      1) judges whether to obtain the RF signal,, flag F L then is set for true, otherwise is set to vacation if obtain.
      2) the variable i value being set is 0
      3) if i the member's of MM (MMA[i]) pending primary data store array full (completely being masked as 1), and the master data in this management structure body meets the CPU treatment conditions that set in advance, and then changes 4), otherwise change 6).
      4) call P7_vitebi, use CPU that master data is handled, be provided with after finishing and completely be masked as 0.
      5) the index stores array of the primary data store array correspondence of handling in the traversal (4) obtains index value IDX, and the node that is designated as IDX among the AMA is down joined in the idle node chained list.
      6) i increases by 1.
      7) if i greater than 63, changes 8), otherwise change 3).
      8) value of judge mark FL if be true, changes 9), otherwise change 1).
      9) send end signal to main thread, finish.
      Master control GPU computational threads process flow diagram is shown in 5, and operational process is as follows:
      1) judges whether to obtain the RF signal,, flag F L then is set for true, otherwise is set to vacation if obtain.
      2) the variable i value being set is 0
      3) if i the member's of MM (MMA[i]) pending primary data store array full (completely being masked as 1), and the master data in this management structure body meets the GPU treatment conditions that set in advance, and then changes 4), otherwise change 6).
      4) call P7_vitebi_kernel, use GPU that master data is handled, be provided with after finishing and completely be masked as 0.
      The index stores array of the primary data store array correspondence of handling 5) traversal 4) obtains index value IDX, and the node that is designated as IDX among the AMA is down joined in the idle node chained list.
      6) i increases by 1.
      7) if i greater than 63, changes 8), otherwise change 3).
      8) value of judge mark FL if be true, changes 9), otherwise change 1).
      9) send end signal to main thread, finish.
      One of ordinary skill in the art will appreciate that and realize that all or part of step that the foregoing description method is carried is to instruct relevant hardware to finish by program, described program can be stored in a kind of computer-readable recording medium, this program comprises one of step or its combination of method embodiment when carrying out.
      In addition, each functional unit in each embodiment of the present invention can be integrated in the processing module, also can be that the independent physics in each unit exists, and also can be integrated in the module two or more unit.Above-mentioned integrated module both can adopt the form of hardware to realize, also can adopt the form of software function module to realize.If described integrated module realizes with the form of software function module and during as independently production marketing or use, also can be stored in the computer read/write memory medium.
      The above-mentioned storage medium of mentioning can be a ROM (read-only memory), disk or CD etc.
      The above only is a preferred implementation of the present invention; should be pointed out that for those skilled in the art, under the prerequisite that does not break away from the principle of the invention; can also make some improvements and modifications, these improvements and modifications also should be considered as protection scope of the present invention.