Detailed Description
In order to make the technical solutions in the present specification better understood by those skilled in the art, the technical solutions in the embodiments of the present specification will be clearly and completely described below with reference to the drawings in the embodiments of the present specification, and it is obvious that the described embodiments are only some embodiments of the present specification, not all embodiments. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are intended to be within the scope of the present disclosure.
Nowadays, intelligent models are increasingly applied to various scenes such as deep learning models which can be used for payment (face), damage assessment (picture recognition), interaction and customer service (voice recognition and content filtering). However, using smart models generally requires strong computational support, so most tasks currently run on top of acceleration devices such as GPUs (graphics processing unit, graphics processors). For higher throughput performance or higher resource utilization, often single-node multiple GPUs, each GPU deploys multiple instances of the model (multi-process or multi-threaded or multiple containers), each instance may independently provide service capabilities (scale-out, in conjunction with upper-level load balancing). For example, some scenes may need to load multiple identical intelligent learning models on one GPU or multiple GPUs to speed up data processing, so that model parameters of multiple intelligent learning models need to be loaded on the GPUs, which occupies more memory. However, the limited GPU hardware video memory capacity (16 GB for all current lines) directly constrains the number of model instances, or deployment of larger models. The video memory can be understood as a high-speed memory on the GPU card, and has a very high bandwidth (up to 700 GB+/sec), but has a small capacity (typically 16/32 GB), so that deep learning tasks of a large model and a large sample are restricted.
The embodiment of the specification provides a video memory allocation processing method, which is characterized in that by calculating and comparing the hash modes in a model loading stage, globally and finely identifying which model parameters are the same, then mapping the same content to the same GPU video memory, reasonably allocating the video memory space for the deployment of an intelligent model, reducing the video memory occupation during the long-term operation of the model, and achieving the purposes of deploying a larger model or more examples.
Fig. 1 is a flow chart of an embodiment of a video memory allocation processing method according to an embodiment of the present disclosure. Although the description provides methods and apparatus structures as shown in the examples or figures described below, more or fewer steps or modular units may be included in the methods or apparatus, whether conventionally or without inventive effort. In the steps or the structures where there is no necessary causal relationship logically, the execution order of the steps or the module structure of the apparatus is not limited to the execution order or the module structure shown in the embodiments or the drawings of the present specification. The described methods or module structures may be implemented in a device, server or end product in practice, in a sequential or parallel fashion (e.g., parallel processor or multi-threaded processing environments, or even distributed processing, server cluster implementations) as shown in the embodiments or figures.
The video memory allocation processing method provided in the embodiments of the present disclosure may be flexibly deployed in environments such as a cloud primary container and a physical bare machine, for example, may be applied to a client or a server, for example, a terminal such as a smart phone, a tablet computer, a computer, an intelligent wearable device, a vehicle-mounted device, etc., and may be specifically determined according to actual needs, and the embodiments of the present disclosure are not limited specifically.
As shown in fig. 1, the method may include the steps of:
step 102, obtaining a model parameter set of a model to be deployed.
In a specific implementation process, the model to be deployed may be understood as an intelligent learning model to be deployed on the GPU, that is, the GPU needs to load a model for executing a corresponding task, which is described in the above embodiment, after the model is deployed on the GPU, the model may be used to execute the corresponding task, and may be a deep learning model, a tree model, a random deep forest model, a regression model, or the like, which may be specifically determined according to actual needs, and embodiments of the present disclosure are not limited specifically. There may be multiple models to be deployed in embodiments of the present disclosure, which may be deployed on different GPUs, and in some embodiments of the present disclosure, the models to be deployed may be located in different threads or different processes or different containers. That is, the embodiment of the specification can realize the distribution management of the model video memory among a plurality of processes, a plurality of containers and even a plurality of GPUs, and realize the unified management of the cross-process and cross-GPU video memory distribution.
In general, there are multiple model parameters in the intelligent learning model, and the model parameters can be understood as weights of the model, that is, the features learned through model training, the weights are often all loaded on the GPU of the reasoning device during model reasoning, one node often loads multiple instances of the model (thus occupying more video memory) for performance (such as throughput), and each instance can independently provide service capability. Therefore, it is necessary to allocate the video memory space of the GPU for the model parameters, that is, the weights, reasonably. In this embodiment of the present disclosure, a classification rule of a model parameter may be preset, and the model parameter of the model to be deployed is divided into a plurality of model parameter sets, where each model parameter set may include a plurality of model parameters. The classification rule of the model parameters may be determined according to actual needs, for example, the classification rule may be divided according to the position of the parameters in the model, the size of the parameters, or the like, or all model parameters of the whole model are directly used as a model parameter set, which is not specifically limited in the embodiment of the present disclosure.
And 104, carrying out hash operation on each obtained model parameter set to obtain a parameter hash value of each model parameter set.
In a specific implementation process, after model parameters of a model to be deployed are divided, hash operation can be sequentially performed on each model parameter set, and specifically, hash operation can be performed on all model parameter values in the model parameter set to obtain parameter hash values of each model parameter set.
In some embodiments of the present disclosure, the obtaining the set of model parameters of the model to be deployed includes:
sequentially acquiring model parameters of each level of the model to be deployed, and taking a set of the model parameters of each level as a model parameter set;
or sequentially acquiring model parameters of the continuous preset level of the model to be deployed, and taking a set of the model parameters of the continuous preset level as a model parameter set.
In a specific implementation process, the embodiment of the specification can divide model parameters of the model to be deployed layer by layer, take a set of all model parameters of each layer as a model parameter set, further realize hash operation on the model parameters of each layer, realize fine granularity operation of each layer, and increase accuracy of data processing. Or in some embodiments of the present disclosure, a model parameter of a continuous preset hierarchy, such as a continuous 3-layer model, of a model to be deployed may be used as a model parameter set, so as to avoid too fine parameter division, increase the calculation amount of hash calculation, and implement hash operation on model parameters in a specified range, so that on the premise of ensuring data processing accuracy, the data processing speed is improved, and an accurate data basis is laid for the memory allocation of subsequent model parameters.
The hierarchy of the model is understood to be a range in the model structure, and particularly for some hierarchical models, network structure models or tree models, the model can be divided into layers according to the structure of the model. For example, the tree structure model can take all tree nodes of each layer as one hierarchy, and the network structure model can take each layer of network as one hierarchy.
In some embodiments of the present disclosure, the obtaining the set of model parameters of the model to be deployed includes:
And calculating the parameter size of the model parameters in the designated range of the model to be deployed, if the parameter size is smaller than a preset threshold value, adding the model parameters adjacent to the designated range into the designated range, and calculating the parameter size of the model parameters in the new designated range until the parameter size is larger than the preset threshold value, wherein the model parameters in the corresponding designated range are used as a model parameter set.
In a specific implementation process, a preset threshold value can be preset, and the parameter size of the model parameter in a specified range can be calculated at a time, wherein the specified range can be determined according to actual needs, such as a hierarchy or n adjacent continuous parameters. If the parameter size of the model parameters in the designated range is smaller than the preset threshold, the parameter in the designated range is smaller, the designated range can be enlarged, namely, the model parameters adjacent to the designated range are added into the designated range, the parameter size of the model parameters in the enlarged designated range is calculated, if the parameter size is still smaller than the preset threshold, the designated range is continuously enlarged until the parameter size in the designated range is larger than the preset threshold, and the model parameters in the latest designated range can be used as a model parameter set. And similarly, the subsequent model parameters are divided into sets. The number of model parameters added to the specified range each time may be determined according to actual needs, for example, the number may be consistent with the specified range, that is, if the parameter size in the specified range is smaller than the preset threshold, the model parameters in the specified range adjacent to the specified range are added to the original specified range.
For example, fig. 2 is a schematic diagram of a model parameter hash calculation in one embodiment of the present disclosure, as shown in fig. 2, where layers in fig. 2 may represent layers of a model, where a specified range is preset to be one layer, and if a parameter size of a layer is less than 2MB, a plurality of consecutive layer parameters (up to >2 MB) are calculated together. As shown in fig. 2, it can be seen that since the size of the model parameters of the L1 level is smaller than 2M, the model parameters of the L2 level are calculated together with the model parameters of the L1 level, and the model parameters of both the L1 and L2 levels are larger than 2M, and the model parameters of the L1 and L2 levels are taken as one model parameter set. Continuing to calculate the model parameters of the L3 level, the model parameters of the L3 level are found to have a parameter size greater than 2M, so that the model parameters of the L3 level can be independently used as a model parameter set. And by analogy, taking model parameters of L4-L6 levels as a model parameter set, and taking model parameters of L7 levels as a model parameter set. This reduces the number of subsequent hash computations, queries, and metadata overhead for particularly small model parameters.
Of course, the maximum granularity may calculate a hash of the model parameters for all levels of the entire model, where the granularity is the coarsest. Hash digest algorithms such as MD5 (producing 128 bits) or SHA1 (producing 160 bits) that are widely validated in the security, storage fields, etc. may be employed.
According to the embodiment of the specification, the model parameter hash calculation methods with different granularities can be configured according to actual use requirements, so that different data processing requirements are met.
And 106, sequentially matching the parameter hash values of the model parameter sets with a video memory mapping table to determine whether the model parameter sets of the model to be deployed are identical to the deployed model parameters in the video memory mapping table, wherein the video memory mapping table comprises the parameter hash values of a plurality of deployed model parameters and physical video memory addresses corresponding to the deployed model parameters.
In a specific implementation process, a video memory mapping table may be pre-constructed, and the physical video memory address of each deployed model parameter and the deployed model parameters are stored in the video memory mapping table. After the parameter hash values of the parameter sets of the models to be deployed are calculated, the parameter hash values of the parameters of the models can be matched with the video memory mapping table, and whether the model parameters of the models to be deployed are repeated with the model parameters of the already deployed models is inquired. In general, if the contents of the model parameters (which may be as fine as the hierarchy level of the model) are identical, the hash value thereof (e.g., using MD5, SHA 1) is calculated based on the contents is also necessarily identical, thereby identifying the model parameters of the same contents. For example, the hash value of the model parameter of each level can be compared with the hash value of the parameter in the video memory mapping table after the hash value of the model parameter of each level is calculated by carrying out the parameter hash operation of each level on the model to be deployed, and if one parameter hash value in the video memory mapping table is the same as the hash value of the parameter of a certain level of the model to be deployed, the model parameter of the level can be considered to be the same as the corresponding deployed model parameter in the video memory mapping table.
In addition, it should be noted that, after all the hash values of the model parameter sets in the model to be deployed are calculated, the hash values may be matched with the video memory mapping table, or after the parameter hash value of one model parameter set is calculated, that is, whether the calculated parameter hash value is consistent with the parameter hash value of the deployed model parameter in the video memory mapping table, the parameter hash value of the next model parameter is calculated, and the specific process may be determined according to the actual needs.
And step 108, if the model parameter set is determined to be the same as the deployed model parameter in the video memory mapping table, a virtual video memory pointer is allocated to the model parameter set, and a physical video memory address of the deployed model parameter which is the same as the model parameter set is mapped to the virtual video memory pointer.
In a specific implementation process, if a certain model parameter set in the model to be deployed is queried to be the same as the deployed model parameter in the video memory mapping table, a virtual video memory pointer can be returned, a physical video memory address of the deployed model parameter which is the same as the model parameter set is acquired according to the video memory mapping table, and the acquired physical video memory address is mapped to the virtual video memory pointer corresponding to the model parameter set. The virtual memory pointer is understood as a pointer visible to an application program, and may be used to point multiple virtual memory pointers to the same physical memory. Fig. 3 is a schematic diagram of a model load memory allocation flow in one embodiment of the present disclosure, as shown in fig. 3, when the hash value of the model parameter set of the 1/N-th hierarchy of the query model 2 is repeated with a deployed model parameter in the video memory mapping table, the corresponding physical video memory address, addr1, may be returned according to the video memory mapping table, and then the addr1 may be mapped onto the virtual video memory pointer, ptr2, corresponding to the model parameter set, without the need of purely attaching the model parameter set of the 1/N-th hierarchy of the model 2 to the GPU. The GWM (Global WEIGHT HASH MANAGER) in FIG. 3 can be understood as Global model parameter management.
The set of model parameters in the to-be-deployed model and the deployed model parameters in the video memory mapping table are the same, and may be the same model as the to-be-deployed model in the video memory mapping table, where all the model parameters of the to-be-deployed model and the deployed model in the video memory mapping table may be the same, or may be a model in which the to-be-deployed model and the deployed model in the video memory mapping table are similar, and some of the model parameters are the same.
In addition, the virtual memory pointer may return a virtual memory pointer after the parameter hash value of the model parameter set is calculated, or may allocate a virtual memory pointer to each model parameter set after the model parameters of the model to be deployed are divided into sets, where the virtual memory pointers corresponding to each model parameter set are different, and the embodiment of the method for generating the virtual memory pointer is not specifically limited in this specification. In some embodiments of the present disclosure, a sub-virtual memory pointer may be generated for each model parameter in the model parameter set according to the virtual memory pointer of the model parameter set, for example, an offset function (reference function) may be used, and the sub-virtual memory pointer of each model parameter may be the virtual memory pointer of the model parameter set+the offset of the model parameter. For example, the virtual memory pointer of the model parameter set is t, and there are 5 model parameters in the model parameter set, where the word virtual memory pointer of the model parameter 1 may be denoted as t+offset1, and so on, to obtain the word virtual memory pointer of each model parameter. The sub virtual video memory pointers of each model parameter and the virtual parameter pointers of the corresponding model parameter set point to the same physical video memory address, so that each model parameter can be correspondingly provided with a virtual video memory pointer of the sub virtual video memory pointer, and the corresponding physical video memory address can be found by utilizing the virtual video memory pointer, so that each model parameter is correspondingly provided with a determined physical video memory. When the model parameters need to be loaded and used, the corresponding physical video memory address can be queried based on the virtual video memory pointer, and then the corresponding model parameters are loaded or referenced.
Fig. 4 is a schematic diagram showing the effect of multi-process, multi-thread, multi-container model video memory allocation management in one embodiment of the present disclosure, as shown in fig. 4, if the models in different threads, processes or containers have the same model parameters, that is, the same content, only one model parameter may be saved on the physical video memory of the GPU, and the physical video memory address stored by the same model parameter is mapped on the virtual video memory pointer by using the virtual video memory pointer manner, so as to realize sharing of the same model parameters in different processes, threads, and containers.
In some embodiments of the present description, the method further comprises:
If the model parameter set is different from all deployed model parameters in the video memory mapping table, a virtual video memory pointer and a physical video memory address are distributed to the model parameters;
And transmitting the model parameters in the model parameter set to the corresponding physical video memory addresses, and mapping the physical video memory addresses allocated by the model parameter set to the corresponding virtual video memory.
In a specific implementation process, as shown in fig. 3, the hash value of the model parameter set at the 1 st/N level of the queried model 1 is different from all deployed model parameters of the video memory mapping table, that is, the model parameter set at the 1 st/N level of the model 1 is not repeated with the deployed model parameters, at this time, no physical video memory address, addr1 and virtual video memory pointer ptr1 may be allocated to the model parameter set, addr1 is mapped to ptr1, and the model parameters in the model parameter set are transmitted to the addr1 in the GPU.
In some embodiments of the present description, the method further comprises:
And adding the parameter hash value and the physical video memory address of the model parameter set into the video memory mapping table.
In a specific implementation process, after the physical video memory address is newly allocated, the parameter hash value of the model parameter set of the newly allocated video memory and the physical video memory address can be added into the video memory mapping table, and the video memory mapping table is updated in time, so that the subsequent inquiry of the model parameters is convenient.
The deployed model parameters in the video memory mapping table may also be parameter sets of the deployed model, and model parameter video memory allocation rules during model deployment may be preset, and parameter hash values of the corresponding model parameter sets are calculated according to preset rules each time the deployed model is loaded. For example, physical video memory is allocated to the model parameters of each level according to the level of the model. When the first model is loaded and deployed, parameter hash values of the model parameter sets of each level of the first model are calculated in sequence, the calculated parameter hash values are matched with the video memory mapping table, and at the moment, no data exists in the video memory mapping table, so that the model parameter sets of each level of the first model can be determined not to be repeated. Physical video memory can be allocated for each hierarchical model parameter set, and each hierarchical model parameter, a corresponding parameter hash value and a physical video memory address are stored in a video memory mapping table.
According to the embodiment of the specification, the hash operation is carried out on the model parameters in the intelligent learning model, whether the model parameters of the model to be deployed are repeated with the model parameters already deployed is determined by comparing the hash values of the model parameters, if so, a new physical video memory is not required to be allocated, the repeated model parameters are mapped to the corresponding physical video memory by using a virtual pointer mode, sharing of the same content is achieved, repeated storage is not required for the same model parameters, and a new physical video memory address is allocated for the model parameters which are not repeated, so that data sharing of the same content is achieved, and the physical video memory space is greatly saved.
In some embodiments of the present disclosure, the video memory mapping table further includes parameter sizes of deployed model parameters, and the sequentially matching parameter hash values of each model parameter set with the video memory mapping table to determine whether each model parameter set of the model to be deployed is the same as the deployed model parameter in the video memory mapping table includes:
And sequentially matching the parameter hash values of the model parameter sets with a video memory mapping table, comparing the parameter sizes of the model parameter sets with the parameter sizes of the target deployed model parameters if the parameter hash values of the model parameter sets are the same as the parameter hash values of the target deployed model parameters in the video memory mapping table, and determining the model parameter sets to be the same as the target deployed model parameters if the parameter sizes of the model parameter sets are the same as the parameter sizes of the target deployed model parameters.
In a specific implementation process, the video memory mapping table can also include the parameter sizes of deployed model parameters, and when the model parameter set of the model to be deployed is checked again, the parameter hash value of the model parameter set can be matched with the parameter hash value in the video memory mapping table, and the parameter sizes can be compared. If the parameter hash value of a certain model parameter set of the model to be deployed is the same as the parameter hash value of the target deployed model parameter in the video memory mapping table, comparing the parameter sizes of the model parameter set and the target deployed model parameter, and if the parameter sizes are the same, determining that the model parameter set is the same as the target deployed model parameter. The target deployed model parameter may be understood as a deployed model parameter in which a parameter hash value in the video memory mapping table is the same as a parameter hash value of a certain model parameter set of the model to be deployed.
In general, if the hash values are the same, the contents of the hash calculation are the same, but there are few cases where the hash values are the same but the contents are different. By comparing the hash value with the parameter, under the condition that the hash value and the parameter are the same, the model parameter set is determined to be the same as the model parameter deployed by the target, so that the accuracy of judging whether the model parameters are the same is improved, the accuracy of model deployment is further improved, and the problem that the system performance is influenced due to the fact that the model parameters are loaded by misjudgment is avoided.
In some embodiments of the present description, the method further comprises:
After the parameter size of the model parameter set is determined to be the same as the parameter size of the target deployed model parameter, comparing the model parameter in the model parameter set with the target deployed model parameter byte by byte, and if the model parameter in the model parameter set is the same as the byte of the target deployed model parameter, determining that the model parameter set is the same as the target deployed model parameter.
In a specific implementation process, after determining that the parameter hash value and the parameter size of a certain model parameter set of the model to be deployed are the same as the parameter hash value and the parameter size of a target deployed model parameter in the video memory mapping table, the model parameter in the model parameter set of the model to be deployed and the target deployed model parameter can be compared byte by byte, and if the model parameter in the model parameter set is the same as the target deployed model parameter byte, the model parameter set can be determined to be the same as the target deployed model parameter.
For example, the to-be-deployed model S has 5 model parameter sets, where the parameter hash value of the model parameter set 1 is a, and after comparing the parameter hash value a of the model parameter set 1 with the parameter hash value in the video memory mapping table, it is found that the parameter hash value of one deployed model parameter P in the video memory mapping table is the same as the parameter hash value of the model parameter set 1, and then the deployed model parameter P can be used as the target deployed model parameter of the model parameter set 1. And comparing the parameter size of the model parameter set 1 with the parameter size of the deployed model parameter P, if the parameter sizes of the model parameter set 1 and the deployed model parameter P are the same, comparing bytes of each model parameter in the model parameter set 1 with bytes of the deployed model parameter P byte by byte, and if the bytes of each model parameter set 1 and the deployed model parameter P are the same, determining that the model parameter set 1 and the deployed model parameter P are the same.
According to the embodiment of the specification, after the hash operation is carried out on the model parameters of the model to be deployed, the parameter hash value, the parameter size and the data in the video memory mapping table of the model parameters are compared, and after the parameter hash value and the parameter size are consistent with the data in the video memory mapping table, the parameter byte-by-byte comparison is carried out, so that the accuracy of repeated query results of the model parameters is improved, the model deployment accuracy is further ensured, and the system performance is improved.
In some embodiments of the present disclosure, the video memory mapping table further includes a video memory number of the deployed model parameter, and the sequentially matching parameter hash values of each model parameter set with the video memory mapping table to determine whether each model parameter set of the model to be deployed is the same as the deployed model parameter in the video memory mapping table includes:
And sequentially matching the parameter hash values of the model parameter sets with a video memory mapping table, comparing the video memory numbers corresponding to the model parameter sets with the video memory numbers corresponding to the target deployed model parameters if the parameter hash values of the model parameter sets are the same as the parameter hash values of the target deployed model parameters in the video memory mapping table, and determining that the model parameter sets are the same as the target deployed model parameters if the video memory numbers are the same.
In a specific implementation process, the video memory mapping table may further include a video memory number of the deployed model parameter, that is, a GPU number, where the video memory number may represent which GPU the model corresponding to the model parameter is deployed. Table 1 shows the contents of a memory map table in one scenario example of the present disclosure, where the memory map table may include GPU# (i.e., memory number), parameter size, parameter hash value, and physical memory address, as shown in Table 1. Gpu# is understood to be the card number of the GPU, and each node, i.e. each device, may have multiple cards, each card holding an instance of only one piece of content. The parameter size may be understood as the size of the model parameters (layer by layer, multi-layer or whole model) of the deployed model, in bytes. Parameter hash values can be understood as the result of computing a hash on deployed model parameters (layer by layer, multi-layer or whole model). For example, a typical MD5hash produces a 128b digest, or SHA1 produces 160 bits. The first three terms, namely gpu#, parameter size, and parameter hash value, can be used as keys of the parameter hash value, and the value is the physical address of the deployed model parameter in gpu#.
TABLE 1 video memory map
| GPU# |
Parameter size |
Parameter hash value |
Physical video memory address |
| 0 |
50MB |
0x**** |
0xE0710000 |
| 0 |
24MB |
0x**** |
0xA4353400 |
| 1 |
12MB |
0x**** |
0x78E3B200 |
| ...... |
...... |
...... |
...... |
When inquiring whether the model parameters of the model to be deployed are repeated with the existing deployed model parameters, the method can compare whether the hash values of the parameters are consistent or not, and can also compare whether the video memory numbers are the same or not. For example, if it is determined that the parameter hash value of the model parameter set 1 of the model to be deployed is the same as the parameter hash value of the target deployed model parameter P in the video memory mapping table, then it may be compared whether the video memory numbers of the two parameters are the same, and if the video memory numbers are also the same, then the models corresponding to the two model parameters may be considered to be deployed on the same GPU, and it may be determined that the model parameter set 1 of the model to be deployed is the same as the target deployed model parameter P. Of course, reference may also be made to the description of the above embodiment, where the parameter hash value, the parameter size, the video memory number, and the parameter byte of the model parameter set of the model to be deployed are sequentially compared with the data of the target deployed model parameter in the video memory mapping table, and after the parameter hash value, the parameter size, the video memory number, and the parameter byte are the same, the two model parameters are determined to be the same.
In the embodiment of the specification, the memory allocation management can be performed on the models deployed on the same memory by comparing the memory labels corresponding to the model parameters, namely, the physical memory sharing is performed on the model parameters with the same content on the same memory, so that the memory consumption of a plurality of models on one display card is reduced, and the performance is basically not influenced. The problem that the performance is affected due to the sharing of model parameters on different display cards is avoided.
The following specifically describes a physical video memory allocation management process in the embodiment of the present disclosure with reference to fig. 3, where, as shown in fig. 3, the method in the embodiment of the present disclosure may include:
1. When the first model loads model parameters, the model parameters of all models can be temporarily stored in a CPU (central processing unit ), and then the hash of the model parameters is calculated according to different granularities such as configuration, layer by layer or continuous multiple layers (even the whole model). The details of the foregoing embodiments may be referred to, and will not be described herein.
2. And checking, namely sending the < GPU#, the parameter size and the parameter hash value > to the GWM. The GWM queries the video memory mapping table, as in the above table 1, and considers that the model parameters already exist only when the GPU#, the parameter size and the parameter hash value are all the same (if necessary, the parameter size and the parameter hash value can be compared first to judge whether the process is repeated, and if so, the process can be compared byte by byte). If not, returning to NULL, otherwise, returning to physical video memory address.
3. If not, the model 1 allocates a physical memory (for example, address addr 1) and maps to a virtual memory such as ptr1, transfers the content of the model parameters from the CPU to addr1 of the GPU, and reports the corresponding information of gpu#, parameter size, parameter hash value and addr1 to the GWM to update the memory map to record the corresponding information. By employing a VMM (virtual Machine monitor ) interface, the driver layer can automatically track the reference count of addr 1.
4. And then, the flow of the model 2 is similar, if the content is the same, the queried address (addr 1) is returned, and the model does not need to reassign the physical video memory, but maps the physical video memory addr1 to a virtual video memory such as ptr2 of the model parameters corresponding to the model 2, so that the weight is shared.
According to the video memory allocation processing method provided by the embodiment of the specification, the video memory is shared based on the content values of the model parameters, so that the video memory during operation is saved, 45% of the video memory can be saved by determining in one practical measurement example, and the performance is basically unchanged. And, sharing of multimode model parameters in the global scope of the node can be realized, for example, the sharing comprises the steps of crossing a plurality of processes or a plurality of containers, and a plurality of GPUs are supported, so that the target use scene is improved. In addition, the recognition and extraction of model parameters are at the level, and the granularity is finer, so that partial identical layers in a plurality of models can be shared, and the target scene is expanded.
In the present specification, each embodiment of the method is described in a progressive manner, and the same and similar parts of each embodiment are referred to each other, and each embodiment mainly describes differences from other embodiments. Reference is made to the description of parts of the method embodiments where relevant.
Based on the video memory allocation processing method, one or more embodiments of the present disclosure further provide a device for video memory allocation processing. The apparatus may include apparatus (including distributed systems), software (applications), modules, plug-ins, servers, clients, etc. that use the methods described in embodiments of the present description in combination with the necessary apparatus to implement the hardware. Based on the same innovative concepts, the embodiments of the present description provide means in one or more embodiments as described in the following embodiments. Because the implementation schemes and methods of the device for solving the problems are similar, the implementation of the device in the embodiments of the present disclosure may refer to the implementation of the foregoing method, and the repetition is omitted. As used below, the term "unit" or "module" may be a combination of software and/or hardware that implements the intended function. While the means described in the following embodiments are preferably implemented in software, implementation in hardware, or a combination of software and hardware, is also possible and contemplated.
Specifically, fig. 5 is a schematic block diagram of an embodiment of a video memory allocation processing device provided in the present specification, and as shown in fig. 5, the video memory allocation processing device provided in the present specification may include:
A parameter obtaining module 51, configured to obtain a model parameter set of a model to be deployed;
The hash operation module 52 is configured to perform hash operation on each obtained model parameter set to obtain a parameter hash value of each model parameter set;
A parameter checking and repeating module 53, configured to match parameter hash values of each model parameter set with a video memory mapping table in sequence, so as to determine whether each model parameter set of the model to be deployed is identical to a deployed model parameter in the video memory mapping table;
And the video memory allocation module 54 is configured to allocate a virtual video memory pointer to the model parameter set if it is determined that the model parameter set is the same as the deployed model parameter in the video memory mapping table, and map a physical video memory address of the deployed model parameter identical to the model parameter set to the virtual video memory pointer.
In some embodiments of the present disclosure, the video memory allocation module is further configured to:
If the model parameter set is different from all deployed model parameters in the video memory mapping table, a virtual video memory pointer and a physical video memory address are distributed to the model parameters;
And transmitting the model parameters in the model parameter set to the corresponding physical video memory addresses, and mapping the physical video memory addresses allocated by the model parameter set to the corresponding virtual video memory.
According to the embodiment of the specification, by carrying out hash operation on the model parameters in the intelligent learning model, whether the model parameters of the model to be deployed are repeated with the model parameters already deployed is determined by comparing the hash values of the model parameters, if so, a new physical video memory is not required to be allocated, the repeated model parameters are mapped to the corresponding physical video memory by using a virtual pointer mode, sharing of the same content is realized, repeated storage is not required for the same model parameters, and a new physical video memory address is allocated for the model parameters which are not repeated, so that data sharing of the same content is realized, and the physical video memory space is greatly saved.
It should be noted that the above-mentioned device according to the description of the corresponding method embodiment may also include other embodiments. Specific implementation manner may refer to the description of the corresponding method embodiments, which is not described herein in detail.
The embodiment of the specification also provides a video memory allocation processing device, which comprises at least one processor and a memory for storing instructions executable by the processor, wherein the video memory allocation processing method of the embodiment is realized when the processor executes the instructions, and the method comprises the following steps:
acquiring a model parameter set of a model to be deployed;
carrying out hash operation on each obtained model parameter set to obtain a parameter hash value of each model parameter set;
Sequentially matching the parameter hash values of each model parameter set with a video memory mapping table to determine whether each model parameter set of the model to be deployed is identical to the deployed model parameter in the video memory mapping table or not;
if the model parameter set is determined to be the same as the deployed model parameter in the video memory mapping table, a virtual video memory pointer is allocated to the model parameter set, and a physical video memory address of the deployed model parameter which is the same as the model parameter set is mapped to the virtual video memory pointer.
It should be noted that the above description of the apparatus or system according to the method embodiment may also include other implementations. Specific implementation may refer to descriptions of related method embodiments, which are not described herein in detail.
An embodiment of the present disclosure provides a video memory allocation processing system, and fig. 6 is a schematic diagram of a principle framework of the video memory allocation processing system in an embodiment of the present disclosure, as shown in fig. 6, where the system includes a plurality of graphics processors, i.e. GPUs, global model parameter management modules, i.e. GWMs (Global WEIGHT HASH MANAGER), each graphics processor includes a plurality of processes or containers, and each process or container includes at least one model to be deployed, where:
Each process or container is internally provided with an inter-process communication module, as shown in fig. 6, a small square at the bottom right corner of each process or container may represent the inter-process communication module, and the global parameter management module is configured to execute the method described in the foregoing embodiment, and query whether model parameters of each model to be deployed are repeated through the inter-process communication module, so as to allocate video memories to the models to be deployed in the multiple graphics processors.
Each node may be deployed with a global parameter management module GWM, an independent process or container, which is responsible for management of all model parameter related data (hash value, physical memory address, reference count, etc.) on the node, and provides IPC (Inter-Process Communication ) services (e.g., UNIX socket). Fig. 7 is a schematic flow chart of physical video memory allocation management in another embodiment of the present disclosure, as shown in fig. 7, the GWM may be configured and started first, and run for a long period of time, after the GWM is started on a node, IPC service is provided externally, and whether model parameters of each model are repeated is queried when waiting for model loading, which may be specifically referred to the description of the above embodiment and will not be repeated here. Such as: the method comprises the steps of deploying an online model, loading model parameters, firstly inquiring whether the physical video memory is repeated based on the hash of the content before the physical video memory of the model parameters is allocated, if so, sharing the physical video memory, mapping a virtual video memory pointer and a corresponding physical video memory address, and newly allocating the video memory without inquiring the repeated content, and sending the hash and other data to the GWM.
By means of calculation and comparison of hash in a model loading stage (some calculation overhead is additionally generated on a CPU, the model loading is often carried out once and then can be carried out for months or years), which model parameters are the same are globally and finely identified, and then the same content is mapped to the same GPU video memory, so that the video memory occupation is reduced during long-term operation of the model, the aim of deploying a larger model or more examples is fulfilled, and a client-server mode is adopted, GWM is introduced, and the distribution management of globally coordinated physical video memories is realized.
The video memory allocation processing device, the video memory allocation processing equipment and the video memory allocation processing system provided by the specification can also be applied to various data analysis processing systems. The system or server or terminal or device may be a separate server or may include a server cluster, a system (including a distributed system), software (applications), an actual operating device, a logic gate device, a quantum computer, etc. using one or more of the methods or one or more embodiments of the present description in combination with necessary hardware implementation. The detection system for reconciling discrepancy data may comprise at least one processor and a memory storing computer executable instructions that when executed by the processor perform the steps of the method described in any one or more of the embodiments described above.
The method embodiments provided in the embodiments of the present specification may be performed in a mobile terminal, a computer terminal, a server, or similar computing device. Taking the operation on the server as an example, fig. 8 is a block diagram of the hardware structure of the video memory allocation processing server in one embodiment of the present specification, and the computer terminal may be the video memory allocation processing server or the video memory allocation processing device in the above embodiment. The server 10 as shown in fig. 8 may include one or more (only one is shown in the figure) processors 100 (the processors 100 may include, but are not limited to, a microprocessor MCU or a processing device such as a programmable logic device FPGA), a nonvolatile memory 200 for storing data, and a transmission module 300 for communication functions. It will be appreciated by those of ordinary skill in the art that the configuration shown in fig. 8 is merely illustrative and is not intended to limit the configuration of the electronic device described above. For example, the server 10 may also include more or fewer plug-ins than shown in FIG. 8, for example, may also include other processing hardware such as a database or multi-level cache, a GPU, or have a different configuration than that shown in FIG. 8.
The nonvolatile memory 200 may be used to store software programs and modules of application software, such as program instructions/modules corresponding to the memory allocation processing method in the embodiment of the present disclosure, and the processor 100 executes the software programs and modules stored in the nonvolatile memory 200 to perform various functional applications and resource data updates. The non-volatile memory 200 may include high speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid state memory. In some examples, the non-volatile memory 200 may further include memory located remotely from the processor 100, which may be connected to the computer terminal via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The transmission module 300 is used to receive or transmit data via a network. Specific examples of the network described above may include a wireless network provided by a communication provider of a computer terminal. In one example, the transmission module 300 includes a network adapter (Network Interface Controller, NIC) that can connect to other network devices through a base station to communicate with the internet. In one example, the transmission module 300 may be a Radio Frequency (RF) module for communicating with the internet wirelessly.
The foregoing describes specific embodiments of the present disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims can be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.
The method or apparatus according to the above embodiments provided in the present specification may implement service logic by a computer program and be recorded on a storage medium, where the storage medium may be read and executed by a computer, to implement the effects of the schemes described in the embodiments of the present specification.
The storage medium may include physical means for storing information, typically by digitizing the information before storing it in an electronic, magnetic, or optical medium. The storage medium may include devices for storing information using electric energy such as various memories, e.g., RAM, ROM, etc., devices for storing information using magnetic energy such as hard disk, floppy disk, magnetic tape, magnetic core memory, bubble memory, usb disk, and devices for storing information using optical means such as CD or DVD. Of course, there are other ways of readable storage medium, such as quantum memory, graphene memory, etc.
The above-mentioned video memory allocation processing method and device provided in the embodiments of the present disclosure may be implemented in a computer by executing corresponding program instructions by a processor, for example, implemented on a PC side using the c++ language of a windows operating system, implemented on a linux system, or implemented on an intelligent terminal using, for example, android, iOS system programming languages, and implemented on a processing logic based on a quantum computer.
Embodiments of the present description are not limited to situations in which industry communication standards, standard computer resource data updates, and data storage rules must be met or described in one or more embodiments of the present description. Some industry standards or embodiments modified slightly based on the implementation described by the custom manner or examples can also realize the same, equivalent or similar or predictable implementation effect after modification of the above examples. Examples of data acquisition, storage, judgment, processing, etc., using these modifications or variations may still fall within the scope of alternative implementations of the examples of this specification.
In the 90 s of the 20 th century, improvements to one technology could clearly be distinguished as improvements in hardware (e.g., improvements to circuit structures such as diodes, transistors, switches, etc.) or software (improvements to the process flow). However, with the development of technology, many improvements of the current method flows can be regarded as direct improvements of hardware circuit structures. Designers almost always obtain corresponding hardware circuit structures by programming improved method flows into hardware circuits. Therefore, an improvement of a method flow cannot be said to be realized by a hardware entity module. For example, a programmable logic device (Programmable Logic Device, PLD) (e.g., field programmable gate array (Field Programmable GATE ARRAY, FPGA)) is an integrated circuit whose logic functions are determined by user programming of the device. A designer programs to "integrate" a digital system onto a PLD without requiring the chip manufacturer to design and fabricate application-specific integrated circuit chips. Moreover, nowadays, instead of manually manufacturing integrated circuit chips, such programming is mostly implemented with "logic compiler (logic compiler)" software, which is similar to the software compiler used in program development and writing, and the original code before being compiled is also written in a specific programming language, which is called hardware description language (Hardware Description Language, HDL), but HDL is not just one, but a plurality of kinds, such as ABEL(Advanced Boolean Expression Language)、AHDL(Altera Hardware Description Language)、Confluence、CUPL(Cornell University Programming Language)、HDCal、JHDL(Java Hardware Description Language)、Lava、Lola、MyHDL、PALASM、RHDL(Ruby Hardware Description Language), and VHDL (Very-High-SPEED INTEGRATED Circuit Hardware Description Language) and Verilog are currently most commonly used. It will also be apparent to those skilled in the art that a hardware circuit implementing the logic method flow can be readily obtained by merely slightly programming the method flow into an integrated circuit using several of the hardware description languages described above.
The controller may be implemented in any suitable manner, for example, the controller may take the form of, for example, a microprocessor or processor and a computer readable medium storing computer readable program code (e.g., software or firmware) executable by the (micro) processor, logic gates, switches, application SPECIFIC INTEGRATED Circuits (ASICs), programmable logic controllers, and embedded microcontrollers, examples of which include, but are not limited to, ARC625D, atmel AT91SAM, microchip PIC18F26K20, and Silicone Labs C8051F320, and the memory controller may also be implemented as part of the control logic of the memory. Those skilled in the art will also appreciate that, in addition to implementing the controller in a pure computer readable program code, it is well possible to implement the same functionality by logically programming the method steps such that the controller is in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers, etc. Such a controller may thus be regarded as a kind of hardware component, and means for performing various functions included therein may also be regarded as structures within the hardware component. Or even means for achieving the various functions may be regarded as either software modules implementing the methods or structures within hardware components.
For convenience of description, the above platform and the terminal are described separately by dividing functions into various modules. Of course, when one or more of the present description is implemented, the functions of each module may be implemented in the same piece or pieces of software and/or hardware, or a module that implements the same function may be implemented by a plurality of sub-modules or a combination of sub-units, or the like. The above-described apparatus embodiments are merely illustrative, and for example, the division of the units is merely a logical function division, and there may be additional divisions in actual implementation, for example, multiple units or plug-ins may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.
These computer program instructions may also be loaded onto a computer or other programmable resource data updating apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are referred to each other, and each embodiment is mainly described in a different manner from other embodiments. In particular, for system embodiments, the description is relatively simple as it is substantially similar to method embodiments, and reference is made to the section of the method embodiments where relevant. In the description of the present specification, a description referring to terms "one embodiment," "some embodiments," "examples," "specific examples," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present specification. In this specification, schematic representations of the above terms are not necessarily directed to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, the different embodiments or examples described in this specification and the features of the different embodiments or examples may be combined and combined by those skilled in the art without contradiction.
The foregoing is merely an example of one or more embodiments of the present specification and is not intended to limit the one or more embodiments of the present specification. Various modifications and alterations to one or more embodiments of this description will be apparent to those skilled in the art. Any modification, equivalent replacement, improvement, or the like, which is within the spirit and principles of the present specification, should be included in the scope of the claims.