CN114168316A

CN114168316A - Video memory allocation processing method, device, equipment and system

Info

Publication number: CN114168316A
Application number: CN202111304911.XA
Authority: CN
Inventors: 赵军平; 吕昕远; 梅晓峰
Original assignee: Alipay Hangzhou Information Technology Co Ltd
Current assignee: Alipay Hangzhou Information Technology Co Ltd
Priority date: 2021-11-05
Filing date: 2021-11-05
Publication date: 2022-03-11

Abstract

The specification provides a video memory allocation processing method, device, equipment and system, which perform hash operation on model parameters in an intelligent learning model, determine whether the model parameters of a model to be deployed are repeated with deployed model parameters by comparing hash values of the model parameters, if so, allocate new physical video memory, map repeated model parameters to corresponding physical video memory in a virtual pointer mode, realize sharing of the same content, do not need to store the same model parameters repeatedly, realize data sharing of the same content, greatly save physical video memory space, further realize deployment of more instances in limited video memory space, and improve system performance.

Description

Video memory allocation processing method, device, equipment and system

Technical Field

The present disclosure relates to computer technologies, and in particular, to a method, an apparatus, a device, and a system for allocating video memory.

Background

With the development of computer internet technology, the application scenarios of the intelligent model are more and more, and the trained intelligent model can be generally deployed on a computer. With the increase of service contents, more and more intelligent models are needed, more and more nodes need to deploy multiple intelligent models, each model needs to occupy the physical video memory of the computer, and the capacity of the physical video memory of the computer is limited, so that the video memory of the computer needs to be reasonably distributed and managed.

Disclosure of Invention

Embodiments of the present disclosure provide a method, an apparatus, a device, and a system for allocating and processing a video memory, so as to reduce occupation of a video memory space and improve utilization rate of the video memory.

In one aspect, an embodiment of the present specification provides a video memory allocation processing method, where the method includes:

obtaining a model parameter set of a model to be deployed;

carrying out Hash operation on each acquired model parameter set to obtain a parameter Hash value of each model parameter set;

matching the parameter hash value of each model parameter set with a video memory mapping table in sequence to determine whether each model parameter set of the model to be deployed is the same as a deployed model parameter in the video memory mapping table; the video memory mapping table comprises parameter hash values of a plurality of deployed model parameters and physical video memory addresses corresponding to the deployed model parameters;

if the model parameter set is determined to be the same as the deployed model parameters in the video memory mapping table, allocating a virtual video memory pointer to the model parameter set, and mapping the physical video memory address of the deployed model parameters, which is the same as the model parameter set, to the virtual video memory pointer.

In another aspect, the present specification provides a video memory allocation processing apparatus, including:

the parameter acquisition module is used for acquiring a model parameter set of the model to be deployed;

the Hash operation module is used for carrying out Hash operation on each acquired model parameter set to obtain a parameter Hash value of each model parameter set;

the parameter duplication checking module is used for sequentially matching the parameter hash value of each model parameter set with a video memory mapping table so as to determine whether each model parameter set of the model to be deployed is the same as the deployed model parameter in the video memory mapping table; the video memory mapping table comprises parameter hash values of a plurality of deployed model parameters and physical video memory addresses corresponding to the deployed model parameters;

and the video memory allocation module is used for allocating a virtual video memory pointer to the model parameter set and mapping the physical video memory address of the deployed model parameter which is the same as the model parameter set onto the virtual video memory pointer if the model parameter set is determined to be the same as the deployed model parameter in the video memory mapping table.

In another aspect, an embodiment of the present specification provides a video memory allocation processing apparatus, which includes at least one processor and a memory for storing processor-executable instructions, where the processor executes the instructions to implement the video memory allocation processing method.

In another aspect, an embodiment of the present specification provides a video memory allocation processing system, where the system includes: the system comprises a plurality of graphics processors and a global model parameter management module, wherein each graphics processor comprises a plurality of processes or containers, each process or container comprises at least one model to be deployed, and the system comprises:

and an interprocess communication module is arranged in each process or container, the global parameter management module is used for executing the video memory allocation processing method, and the interprocess communication module is used for inquiring whether model parameters of each model to be deployed are repeated or not so as to allocate video memory to the models to be deployed which are deployed in the plurality of graphic processors.

According to the video memory allocation processing method, device, equipment and system provided by the specification, the hash operation is performed on the model parameters in the intelligent learning model, whether the model parameters of the model to be deployed are repeated with the deployed model parameters is determined by comparing the hash values of the model parameters, if the model parameters are repeated, new physical video memory does not need to be allocated, the repeated model parameters are mapped to the corresponding physical video memory in a virtual pointer mode, the sharing of the same content is realized, the repeated storage of the same model parameters is not needed, the data sharing of the same content is realized, the physical video memory space is greatly saved, the utilization rate of the video memory space is improved, further, more instances can be deployed in the limited video memory space, and the system performance is improved.

Drawings

In order to more clearly illustrate the embodiments of the present specification or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, it is obvious that the drawings in the following description are only some embodiments described in the present specification, and for those skilled in the art, other drawings can be obtained according to the drawings without any creative effort.

Fig. 1 is a schematic flowchart of an embodiment of a video memory allocation processing method provided in an embodiment of the present specification;

FIG. 2 is a schematic diagram of a model parameter hash calculation in one embodiment of the present disclosure;

FIG. 3 is a schematic diagram illustrating a process for allocating memory for model loading according to an embodiment of the present disclosure;

FIG. 4 is a diagram illustrating the effect of video memory allocation management in a multi-process, multi-thread, multi-container model according to an embodiment of the present disclosure;

fig. 5 is a schematic block diagram of an embodiment of a video memory allocation processing apparatus provided in this specification;

FIG. 6 is a schematic block diagram of a video memory allocation processing system according to an embodiment of the present disclosure;

fig. 7 is a schematic flow chart of physical video memory allocation management in another embodiment of the present disclosure;

fig. 8 is a block diagram of a hardware configuration of the video memory allocation processing server in one embodiment of the present specification.

Detailed Description

In order to make those skilled in the art better understand the technical solutions in the present specification, the technical solutions in the embodiments of the present specification will be clearly and completely described below with reference to the drawings in the embodiments of the present specification, and it is obvious that the described embodiments are only a part of the embodiments of the present specification, and not all of the embodiments. All other embodiments obtained by a person skilled in the art based on the embodiments in the present specification without any inventive step should fall within the scope of protection of the present specification.

Nowadays, the application scenarios of the intelligent model are more and more, such as: the deep learning model can be used in various scenes such as payment (human face), damage assessment (picture recognition), interaction and customer service (voice recognition and content filtering). However, the use of an intelligent model generally requires a strong computational support, so that most tasks currently run on an acceleration device such as a GPU (graphics processing unit). For higher throughput performance or higher resource utilization, often a single-node multi-GPU, each GPU deploys multiple instances of the model (multiprocessing or multithreading or multiple containers), each of which can independently provide service capabilities (scale-out, combined with load balancing of upper layers). Such as: in some scenes, a plurality of identical intelligent learning models may need to be loaded on one GPU or a plurality of GPUs to accelerate data processing speed, and thus, model parameters of the plurality of intelligent learning models need to be loaded on the GPUs, and a large amount of video memory is occupied. However, the limited hardware video memory capacity of the GPU (currently, 16GB is used on the line) directly restricts the number of model instances, or the deployment of larger models. The video memory can be understood as a high-speed memory on a GPU card, the bandwidth is high (reaching 700GB +/sec), but the capacity is small (typically 16/32GB), so that deep learning tasks of large models and large samples are restricted.

The embodiment of the specification provides a video memory allocation processing method, which globally and finely identifies which model parameters are the same through a mode of calculating and comparing hash at a model loading stage, then maps the same content to the same GPU video memory, and reasonably allocates video memory space for the deployment of an intelligent model, so that the video memory occupation is reduced during the long-term operation of the model, and the purpose of deploying a larger model or more instances is achieved.

Fig. 1 is a schematic flowchart of an embodiment of a video memory allocation processing method provided in an embodiment of this specification. Although the present specification provides the method steps or apparatus structures as shown in the following examples or figures, more or less steps or modules may be included in the method or apparatus structures based on conventional or non-inventive efforts. In the case of steps or structures which do not logically have the necessary cause and effect relationship, the execution order of the steps or the block structure of the apparatus is not limited to the execution order or the block structure shown in the embodiments or the drawings of the present specification. When the described method or module structure is applied to a device, a server or an end product in practice, the method or module structure according to the embodiment or the figures may be executed sequentially or in parallel (for example, in a parallel processor or multi-thread processing environment, or even in an implementation environment including distributed processing and server clustering).

The video memory allocation processing method provided in the embodiments of the present description may be flexibly deployed in environments such as a cloud native container and a physical bare computer, for example: the method can be applied to a client or a server, such as: terminals such as smart phones, tablet computers, intelligent wearable devices, and vehicle-mounted devices may be specifically determined according to actual needs, and embodiments of the present specification are not specifically limited.

As shown in fig. 1, the method may include the steps of:

and 102, obtaining a model parameter set of the model to be deployed.

In a specific implementation process, the model to be deployed may be understood as an intelligent learning model that needs to be deployed on the GPU, that is, a model that the GPU needs to load and execute a corresponding task, refer to the description of the above embodiment, and after the model is deployed on the GPU, the model may be used to execute the corresponding task, which may be a deep learning model, a tree model, a random deep forest model, a regression model, or the like, and may be specifically determined according to actual needs, and the embodiments of this specification are not specifically limited. In some embodiments of the present description, the model to be deployed may be located in different threads or different processes or different containers. That is to say, the embodiments of the present specification can implement allocation management of a model video memory in multiple processes, multiple containers, and even multiple GPUs, and implement unified management of video memory allocation across processes and GPUs.

Generally, a plurality of model parameters are in an intelligent learning model, and the model parameters can also be understood as weights of the model, that is, features learned through model training, the weights are often all loaded on a GPU of inference equipment during model inference, one node often loads a plurality of instances of the model (so that more video memory is occupied) for performance (such as throughput), and each instance can independently provide service capability. Therefore, it is necessary to reasonably allocate the video memory space of the GPU for the model parameters, i.e. the weights. In the embodiment of the present specification, a classification rule of a model parameter may be preset, and the model parameter of the model to be deployed is divided into a plurality of model parameter sets, where each model parameter set may include a plurality of model parameters. The classification rules of the model parameters may be determined according to actual needs, such as: the model parameter set may be divided according to the position of the parameter in the model, the size of the parameter, and the like, or all the model parameters of the entire model may be directly used as one model parameter set, which is not specifically limited in the embodiments of the present specification.

And 104, performing hash operation on each acquired model parameter set to obtain a parameter hash value of each model parameter set.

In a specific implementation process, after the model parameters of the model to be deployed are divided, hash operations may be performed on each model parameter set in sequence, specifically, hash operations may be performed on all model parameter values in the model parameter set, so as to obtain a parameter hash value of each model parameter set.

In some embodiments of the present specification, the obtaining a set of model parameters of a model to be deployed includes:

sequentially obtaining model parameters of each level of the model to be deployed, and taking a set of the model parameters of each level as a model parameter set;

or sequentially acquiring model parameters of continuous preset levels of the model to be deployed, and taking a set of the model parameters of the continuous preset levels as a model parameter set.

In a specific implementation process, the embodiment of the present specification may divide the model parameters of the model to be deployed layer by layer, and use a set of all the model parameters of each layer as a model parameter set, thereby implementing hash operation on the model parameters of each layer, implementing fine-grained operation layer by layer, and increasing accuracy of data processing. Alternatively, in some embodiments of the present specification, a continuous preset hierarchy of the model to be deployed may also be: the model parameters of the continuous 3 layers are used as a model parameter set, so that the condition that the parameter division is too fine is avoided, the calculation amount of Hash calculation is increased, the Hash calculation of the model parameters in the specified range can be realized, the data processing speed is improved on the premise of ensuring the data processing accuracy, and an accurate data base is laid for the video memory allocation of the subsequent model parameters.

The hierarchy of the model can be understood as a range in the model structure, and particularly for some hierarchical models, network structure models or tree models, the model can be divided into layers according to the structure of the model. Such as: the tree structure model can take all tree nodes of each layer as a hierarchy, and the network structure model can take each layer of the network as a hierarchy.

calculating the parameter size of the model parameter in the appointed range of the model to be deployed, if the parameter size is smaller than a preset threshold value, adding the model parameter adjacent to the appointed range into the appointed range, calculating the parameter size of the model parameter in the new appointed range, and taking the model parameter in the corresponding appointed range as a model parameter set until the parameter size is larger than the preset threshold value.

In a specific implementation process, a preset threshold may be preset, and the parameter size of the model parameter in the specified range is calculated once, where the specified range may be determined according to actual needs, such as: one level, or adjacent consecutive n parameters. If the parameter size of the model parameter in the designated range is smaller than the preset threshold, the parameter in the designated range is smaller, the designated range can be enlarged, that is, the model parameter adjacent to the designated range is added into the designated range, the parameter size of the model parameter in the enlarged designated range is calculated, if the parameter size is still smaller than the preset threshold, the designated range is continuously enlarged until the parameter size in the designated range is larger than the preset threshold, and the latest model parameter in the designated range can be used as a model parameter set. And by analogy, the subsequent model parameters are subjected to set division. The number of model parameters added to the specified range at each time can be determined according to actual needs, such as: the model parameter in the specified range adjacent to the specified range can be increased to the original specified range if the parameter size in the specified range is smaller than the preset threshold.

For example: fig. 2 is a schematic diagram of the principle of hash calculation of model parameters in an embodiment of the present disclosure, as shown in fig. 2, layers in fig. 2 may represent a hierarchy of a model, a specified range is preset as one hierarchy, and if the size of a parameter of a certain hierarchy is smaller than 2MB, a plurality of successive layer parameters (up to >2MB) are calculated together. As shown in fig. 2, it can be seen that since the size of the model parameter at the L1 level is smaller than 2M, the model parameter at the L2 level is calculated together with the model parameter at the L1 level, and the size of the model parameter at both the L1 and L2 levels is larger than 2M, the model parameters at the L1 and L2 levels are regarded as one model parameter set. Continuing to calculate model parameters at level L3, model parameters at level L3 were found to have a parameter size greater than 2M, so model parameters at level L3 alone can be considered as a set of model parameters. In analogy, model parameters of L4-L6 levels are used as a model parameter set, and model parameters of L7 levels are used as a model parameter set. Thus, for particularly small model parameters, the subsequent hash calculation, the number of queries and the cost of metadata can be reduced.

Of course, the maximum granularity may be computed as a hash of the model parameters at all levels of the entire model, where the granularity is the coarsest. A hash digest algorithm such as MD5 (generating 128 bits) or SHA1 (generating 160 bits) that is widely verified in the fields of security, storage, and the like may be employed.

The embodiment of the specification can configure the hash calculation method of the model parameters with different granularities according to the actual use requirement, and meet different data processing requirements.

Step 106, matching the parameter hash value of each model parameter set with a video memory mapping table in sequence to determine whether each model parameter set of the model to be deployed is the same as a deployed model parameter in the video memory mapping table; the video memory mapping table comprises parameter hash values of a plurality of deployed model parameters and physical video memory addresses corresponding to the deployed model parameters.

In a specific implementation process, a video memory mapping table may be pre-constructed, and the physical video memory address of each deployed model parameter and the deployed model parameter are stored in the video memory mapping table. After the parameter hash values of each model parameter set of the model to be deployed are calculated, the parameter hash values of each model parameter can be matched with the video memory mapping table, and whether the model parameter of the model to be deployed is repeated with the model parameter of the deployed model or not is inquired. In general, if the content of the model parameters (which may be as fine as the layer level of the model) is the same, then the hash values calculated based on the content (e.g., using MD5, SHA1) must also be the same, thereby identifying the model parameters for the same content. For example: the model to be deployed is subjected to level-by-level parameter hash operation, after the hash value of the model parameter of each level is calculated, the hash value of the model parameter of each level can be compared with the parameter hash value in the video memory mapping table, and if one parameter hash value in the video memory mapping table is the same as the parameter hash value of a certain level of the model to be deployed, the model parameter of the level can be considered to be the same as the corresponding deployed model parameter in the video memory mapping table.

In addition, it should be noted that, after the hash values of all the model parameter sets in the model to be deployed are calculated, the model parameter sets may be matched with the video memory mapping table, or after the hash value of a parameter of one model parameter set is calculated, that is, whether the calculated hash value of the parameter is consistent with the hash value of the parameter of the deployed model parameter in the video memory mapping table is matched, the hash value of the parameter of the next model parameter is calculated, the specific process may be determined according to actual needs, and the embodiment of this specification is not specifically limited.

And 108, if the model parameter set is determined to be the same as the deployed model parameters in the video memory mapping table, allocating a virtual video memory pointer to the model parameter set, and mapping the physical video memory address of the deployed model parameters, which is the same as the model parameter set, to the virtual video memory pointer.

In a specific implementation process, if it is found that a certain model parameter set in the model to be deployed is the same as deployed model parameters in the video memory mapping table, a virtual video memory pointer may be returned, a physical video memory address of deployed model parameters that are the same as the model parameter set is obtained according to the video memory mapping table, and the obtained physical video memory address is mapped onto the virtual video memory pointer corresponding to the model parameter set. The virtual memory pointer can be understood as a pointer visible to the application program, and can be used to point multiple virtual memory pointers to the same physical memory. Fig. 3 is a schematic diagram illustrating a model loading memory allocation process in an embodiment of this specification, and as shown in fig. 3, when a hash value of a model parameter set at the 1/N th level of the query model 2 is repeated with a deployed model parameter in the video memory mapping table, a corresponding physical video memory address, namely addr1, can be returned according to the video memory mapping table, and addr1 is mapped to a virtual video memory pointer, namely ptr2, corresponding to the model parameter set, so that the model parameter set at the 1/N th level of the model 2 does not need to be purely owned by the GPU. Gwm (global Weight Hash manager) in fig. 3 may be understood as global model parameter management.

The model parameter set in the model to be deployed is the same as the deployed model parameter in the video memory mapping table, and may be a model in which the model to be deployed is the same as the deployed model parameter in the video memory mapping table, at this time, all the model parameters of the model to be deployed and the model deployed in the video memory mapping table may be the same, or the model to be deployed and the model deployed in the video memory mapping table may be similar models, and some of the model parameters are the same.

In addition, the virtual video memory pointer may return a virtual video memory pointer after the parameter hash value of the model parameter set is calculated, or a virtual video memory pointer may be allocated to each model parameter set after the model parameters of the model to be deployed are divided into sets, where the virtual video memory pointers corresponding to each model parameter set are different, and a method for generating the virtual video memory pointer is not specifically limited in this embodiment of the present specification. In some embodiments of the present description, a sub-virtual video memory pointer may be generated for each model parameter in the model parameter set according to the virtual video memory pointer in the model parameter set, where: an offset function (reference function) may be utilized, and the sub-virtual video memory pointer of each model parameter may be the virtual video memory pointer of the model parameter set + the offset of the model parameter. For example: the virtual video memory pointer of the model parameter set is t, 5 model parameters exist in the model parameter set, wherein the word virtual video memory pointer of the model parameter 1 can be represented as t + offset1, and so on, to obtain the word virtual video memory pointer of each model parameter. The sub-virtual video memory pointer of each model parameter and the virtual parameter pointer of the corresponding model parameter set point to the same physical video memory address, so that each model parameter can correspond to the own virtual video memory pointer, and the corresponding physical video memory address can be found by using the virtual video memory pointer, so that each model parameter corresponds to the determined physical video memory. When the model parameters are required to be loaded, the corresponding physical video memory address can be inquired based on the virtual video memory pointer, and then the corresponding model parameters are loaded or referred.

Fig. 4 is a schematic diagram illustrating an effect of video memory allocation management of a multi-process, multi-thread, and multi-container model in an embodiment of this specification, as shown in fig. 4, if models in different threads, processes, or containers have the same model parameters, that is, the same contents, only one model parameter may be stored in a physical video memory of a GPU, and a physical video memory address stored in the same model parameter is mapped to a virtual video memory pointer in a virtual video memory pointer manner, so as to implement sharing of the same model parameter in different processes, threads, and containers.

In some embodiments of the present description, the method further comprises:

if the model parameter set is determined to be different from all deployed model parameters in the video memory mapping table, allocating a virtual video memory pointer and a physical video memory address to the model parameters;

and transmitting the model parameters in the model parameter set to corresponding physical video memory addresses, and mapping the physical video memory addresses allocated by the model parameter set to the corresponding virtual video memory.

In a specific implementation process, as shown in fig. 3, the hash value of the model parameter set at the 1/N th level of the queried model 1 is different from all deployed model parameters in the video memory mapping table, that is, the model parameter set at the 1/N th level of the model 1 is not repeated with the deployed model parameters, at this time, it may return to be absent, a physical video memory address, that is, addr1 and a virtual video memory pointer ptr1, are allocated to the model parameter set, addr1 is mapped to ptr1, and the model parameters in the model parameter set are transmitted to addr1 in the GPU.

In some embodiments of the present description, the method further comprises:

and adding the parameter hash value and the physical video memory address of the model parameter set into the video memory mapping table.

In a specific implementation process, after the physical video memory address is newly allocated, the parameter hash value of the model parameter set of the newly allocated video memory and the physical video memory address can be added into the video memory mapping table, and the video memory mapping table is updated in time, so that the follow-up query of the model parameter is facilitated.

The deployed model parameters in the video memory mapping table may also be parameter sets of deployed models, model parameter video memory allocation rules during model deployment may be preset, and the parameter hash values of the corresponding model parameter sets are calculated according to the preset rules each time the deployed models are loaded. Such as: and distributing physical video memory for the model parameters of each level according to the level of the model. When the first model is loaded and deployed, the parameter hash values of the model parameter sets of each level of the first model are sequentially calculated, the calculated parameter hash values are matched with the video memory mapping table, and at the moment, the video memory mapping table has no data, so that the model parameter sets of each level of the first model can be determined not to be repeated. A physical video memory can be allocated to each level model parameter set, and each level model parameter, a corresponding parameter hash value and a physical video memory address are stored in a video memory mapping table.

In the embodiment of the specification, the hash operation is performed on the model parameters in the intelligent learning model, whether the model parameters of the model to be deployed are repeated with the deployed model parameters is determined by comparing the hash values of the model parameters, if the model parameters are repeated, new physical video memory does not need to be allocated, the repeated model parameters are mapped to the corresponding physical video memory in a virtual pointer mode, the sharing of the same content is realized, the repeated storage of the same model parameters is not needed, new physical video memory addresses are allocated to the model parameters which are not repeated, the data sharing of the same content is realized, and the physical video memory space is greatly saved.

In some embodiments of this specification, the video memory mapping table further includes a parameter size of a deployed model parameter, and the sequentially matching the parameter hash value of each model parameter set with the video memory mapping table to determine whether each model parameter set of the model to be deployed is the same as a deployed model parameter in the video memory mapping table includes:

and matching the parameter hash value of each model parameter set with a video memory mapping table in sequence, if the parameter hash value of the model parameter set is the same as the parameter hash value of the target deployed model parameter in the video memory mapping table, comparing the parameter size of the model parameter set with the parameter size of the target deployed model parameter, and if the parameter size of the model parameter set is the same as the parameter size of the target deployed model parameter, determining that the model parameter set is the same as the target deployed model parameter.

In a specific implementation process, the video memory mapping table may further include the parameter size of the deployed model parameter, and when the model parameter set of the model to be deployed is subjected to duplicate checking, not only the parameter hash value of the model parameter set may be matched with the parameter hash value in the video memory mapping table, but also the parameter size may be compared. If the parameter hash value of a certain model parameter set of the model to be deployed is the same as the parameter hash value of the target deployed model parameter in the video memory mapping table, then comparing the parameter sizes of the model parameter set and the target deployed model parameter, and if the parameter sizes are also the same, determining that the model parameter set is the same as the target deployed model parameter. The target deployed model parameter may be understood as a deployed model parameter whose parameter hash value in the video memory mapping table is the same as the parameter hash value of a certain model parameter set of the model to be deployed.

Generally, when the hash values are the same, the contents of hash calculation are also the same, but there are few cases where the contents are different while the hash values are the same. By comparing the hash value with the parameter, the model parameter set is determined to be the same as the target deployed model parameter under the condition that the hash value and the parameter are the same, so that the accuracy of judging whether the model parameters are the same or not is improved, the accuracy of model deployment is further improved, and the problem that the system performance is influenced due to the fact that the model parameter is loaded wrongly because of misjudgment is avoided.

In some embodiments of the present description, the method further comprises:

after the parameter size of the model parameter set is determined to be the same as the parameter size of the target deployed model parameter, byte-by-byte comparison is carried out on the model parameters in the model parameter set and the target deployed model parameter, and if the bytes of the model parameters in the model parameter set are the same as the bytes of the target deployed model parameter, the model parameter set is determined to be the same as the target deployed model parameter.

In a specific implementation process, after determining that both the parameter hash value and the parameter size of a certain model parameter set of the model to be deployed are the same as those of a target deployed model parameter in the video memory mapping table, the model parameters in the model parameter set of the model to be deployed and the target deployed model parameters may be compared byte by byte, and if both the model parameters in the model parameter set and the target deployed model parameters are the same, it may be determined that the model parameter set is the same as the target deployed model parameters.

For example: the model S to be deployed has 5 model parameter sets, wherein the parameter hash value of the model parameter set1 is a, and after the parameter hash value a of the model parameter set1 is compared with the parameter hash value in the video memory mapping table, if the parameter hash value of one deployed model parameter P in the video memory mapping table is the same as the parameter hash value of the model parameter set1, the deployed model parameter P can be used as the target deployed model parameter of the model parameter set 1. And comparing the parameter size of the model parameter set1 with the parameter size of the deployed model parameter P, if the parameter sizes of the model parameter set1 and the deployed model parameter P are the same, comparing the bytes of each model parameter in the model parameter set1 with the bytes of the deployed model parameter P byte by byte, and if the bytes of the model parameter set1 and the deployed model parameter P are the same, determining that the model parameter set1 is the same as the deployed model parameter P.

In the embodiment of the specification, after the hash operation is performed on the model parameter of the model to be deployed, the parameter hash value and the parameter size of the model parameter are compared with the data in the video memory mapping table, and after the parameter hash value and the parameter size are consistent with the data in the video memory mapping table, the parameter byte-by-byte comparison is performed, so that the accuracy of the repeated query result of the model parameter is improved, the accurate deployment of the model is further ensured, and the system performance is improved.

In some embodiments of this specification, the video memory mapping table further includes a video memory number of a deployed model parameter, and the matching of the parameter hash value of each model parameter set with the video memory mapping table sequentially determines whether each model parameter set of the model to be deployed is the same as a deployed model parameter in the video memory mapping table, including:

and matching the parameter hash value of each model parameter set with a video memory mapping table in sequence, if the parameter hash value of the model parameter set is the same as the parameter hash value of the target deployed model parameter in the video memory mapping table, comparing the video memory number corresponding to the model parameter set with the video memory number corresponding to the target deployed model parameter, and if the video memory numbers are the same, determining that the model parameter set is the same as the target deployed model parameter.

In a specific implementation process, the video memory mapping table may further include a video memory number, i.e., a GPU number, of the deployed model parameter, where the video memory number may represent on which GPU the model corresponding to the model parameter is loaded and deployed. Table 1 shows contents of a video memory mapping table in a scenario example of this specification, and as shown in table 1, the video memory mapping table may include a GPU # (i.e., a video memory number), a parameter size, a parameter hash value, and a physical video memory address. Here, GPU # may be understood as a card number of the GPU, and each node, i.e. each device, may have multiple cards, each card storing a unique instance of content. Parameter size may be understood as the size of the model parameters (layer-by-layer, multilayer or whole model) of the deployed model in bytes. A parametric hash value may be understood as the result of computing a hash over the deployed model parameters (layer-by-layer, multi-layer, or entire model). For example, a typical MD5hash produces a digest of 128b, or SHA1 produces 160 bit. The first three items, namely the GPU #, the parameter size and the parameter hash value, can be used as keys of the parameter hash value, and the value is the physical address of the deployed model parameter in the GPU #.

Table 1: video memory mapping table

GPU#	Size of parameter	Parameter hash value	Physical video memory address
				0	50MB	0x****	0xE0710000
0	24MB	0x****	0xA4353400
				1	12MB	0x****	0x78E3B200
……	……	……	……

When inquiring whether the model parameter of the model to be deployed is repeated with the existing deployed model parameter, the hash value of the parameter can be compared to judge whether the hash value is consistent, and the video memory number can be compared to judge whether the hash value is the same. For example: if the parameter hash value of the model parameter set1 of the model to be deployed is determined to be the same as the parameter hash value of the target deployed model parameter P in the video memory mapping table, whether the video memory numbers of the model parameter set1 and the target deployed model parameter P are the same can be compared, if the video memory numbers are also the same, the models corresponding to the two model parameters can be considered to be deployed on the same GPU, and it can be determined that the model parameter set1 of the model to be deployed is the same as the target deployed model parameter P. Of course, referring to the description of the above embodiment, the hash value, the size, the video memory number, and the byte of the parameter of the model parameter set of the model to be deployed may be sequentially compared with the data of the target deployed model parameter in the video memory mapping table, and after all the parameters are the same, it is determined that the two model parameters are the same.

In the embodiment of the present specification, by comparing the video memory labels corresponding to the model parameters, video memory allocation management can be performed on the models deployed on the same video memory, that is, the model parameters with the same content on the same video memory are shared by the physical video memory, so that the video memory consumption of multiple models on one video card is reduced, and the performance is not affected basically. The problem that the performance is affected due to sharing of model parameters on different display cards is avoided.

Specifically, the following describes, with reference to fig. 3, a process of allocating and managing physical video memory in an embodiment of the present specification, where as shown in fig. 3, a method in the embodiment of the present specification may include:

1. when the first model loads the model parameters, the model parameters of all models can be temporarily stored in a Central Processing Unit (CPU), and then the hash of the model parameters can be calculated layer by layer or continuously for multiple layers (even the whole model) according to the configuration and other different granularities. For details, reference may be made to the descriptions of the above embodiments, which are not described herein again.

2. Checking the duplicate: sending < GPU #, parameter size, parameter hash value > to GWM. The GWM queries the memory mapping table as follows: in table 1, only when the GPU #, the parameter size, and the parameter hash value are all the same, the model parameter is considered to already exist (if necessary, the parameter size and the parameter hash value may be compared first to determine whether to repeat, and if they are the same, the model parameter may be compared byte by byte). If not, returning NULL; otherwise, returning the physical video memory address.

3. If the model 1 does not exist, a physical video memory (for example, an address addr1) is allocated and mapped to a virtual video memory (for example, ptr 1), the content of the model parameters is transmitted from the CPU to addr1 of the GPU, and the GPU #, the parameter size, the parameter hash value and the corresponding information of addr1 are reported to the GWM so as to update the video memory mapping table to record the corresponding information. By employing a VMM (virtual Machine monitor) interface, the driver layer may automatically track the reference count of addr 1.

4. Then the model 2 flow is similar, if the contents are the same, the inquired address (addr1) is returned, at this time, the model does not need to reallocate the physical video memory, but maps the physical video memory addr1 to the virtual video memory of the model 2 corresponding to the model parameter, such as ptr2, so as to share the weight.

The video memory allocation processing method provided in the embodiment of the present specification shares the video memory by the content value based on the model parameter, thereby saving the video memory during the operation, which is determined to be saved by 45% in an actual measurement example, and the performance is basically unchanged. And sharing of multi-model parameters in a node global scope can be achieved, for example, multiple processes or multiple containers are included, multiple GPUs are supported, and therefore target use scenes are improved. In addition, the identification and extraction of the model parameters are in hierarchy and finer in granularity, so that the same layers in a plurality of models can be shared, and the target scene is expanded.

In the present specification, each embodiment of the method is described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. The relevant points can be obtained by referring to the partial description of the method embodiment.

Based on the video memory allocation processing method, one or more embodiments of the present specification further provide a video memory allocation processing apparatus. The apparatus may include apparatus (including distributed systems), software (applications), modules, plug-ins, servers, clients, etc. that use the methods described in embodiments of the present specification in conjunction with hardware where necessary to implement the methods. Based on the same innovative conception, embodiments of the present specification provide an apparatus as described in the following embodiments. Since the implementation scheme of the apparatus for solving the problem is similar to that of the method, the specific apparatus implementation in the embodiment of the present specification may refer to the implementation of the foregoing method, and repeated details are not repeated. As used hereinafter, the term "unit" or "module" may be a combination of software and/or hardware that implements a predetermined function. Although the means described in the embodiments below are preferably implemented in software, an implementation in hardware, or a combination of software and hardware is also possible and contemplated.

Specifically, fig. 5 is a schematic block structure diagram of an embodiment of a video memory allocation processing apparatus provided in this specification, and as shown in fig. 5, the video memory allocation processing apparatus provided in this specification may include:

a parameter obtaining module 51, configured to obtain a model parameter set of a model to be deployed;

the hash operation module 52 is configured to perform hash operation on each obtained model parameter set to obtain a parameter hash value of each model parameter set;

the parameter duplication checking module 53 is configured to match the parameter hash values of the model parameter sets with a video memory mapping table in sequence, so as to determine whether each model parameter set of the model to be deployed is the same as a deployed model parameter in the video memory mapping table; the video memory mapping table comprises parameter hash values of a plurality of deployed model parameters and physical video memory addresses corresponding to the deployed model parameters;

a video memory allocation module 54, configured to allocate a virtual video memory pointer to the model parameter set if it is determined that the model parameter set is the same as the deployed model parameter in the video memory mapping table, and map a physical video memory address of the deployed model parameter that is the same as the model parameter set to the virtual video memory pointer.

In some embodiments of this specification, the video memory allocation module is further configured to:

The embodiment of the specification performs hash operation on model parameters in the intelligent learning model, determines whether the model parameters of the model to be deployed are repeated with the deployed model parameters by comparing hash values of the model parameters, if the model parameters are repeated, new physical video memory does not need to be allocated, and maps the repeated model parameters to corresponding physical video memory in a virtual pointer mode to realize sharing of the same content, the same model parameters do not need to be repeatedly stored, new physical video memory addresses are allocated to the model parameters which are not repeated, so that data sharing of the same content is realized, and the physical video memory space is greatly saved.

It should be noted that the above-mentioned apparatus may also include other embodiments according to the description of the corresponding method embodiment. The specific implementation manner may refer to the description of the above corresponding method embodiment, and is not described in detail herein.

An embodiment of the present specification further provides a video memory allocation processing apparatus, including: at least one processor and a memory for storing processor-executable instructions, where the processor executes the instructions to implement the video memory allocation processing method of the above embodiment, and the method includes:

obtaining a model parameter set of a model to be deployed;

It should be noted that the above-described device or system may also include other embodiments according to the description of the method embodiments. The specific implementation manner may refer to the description of the related method embodiment, and is not described in detail herein.

An embodiment of the present specification provides a video memory allocation processing system, fig. 6 is a schematic diagram of a schematic framework of the video memory allocation processing system in an embodiment of the present specification, and as shown in fig. 6, the system includes: a plurality of Graphics Processors (GPU), and a global model parameter management module (GWM) (Global Weight Hash manager), wherein each graphics processor comprises a plurality of processes or containers, each process or container comprises at least one model to be deployed, and the GPU is:

an inter-process communication module is arranged in each process or container, as shown in fig. 6, a small square at the bottom right corner of each process or container may represent the inter-process communication module, and the global parameter management module is configured to execute the method according to the above embodiment, and query whether model parameters of each model to be deployed are repeated through the inter-process communication module, so as to allocate a video memory to the model to be deployed in the multiple graphics processors.

Each node may be deployed with a global parameter management module GWM, an individual Process or container, which is responsible for managing data related to all model parameters on the node (hash value, physical memory address, reference count, etc.), and provides an IPC (Inter-Process Communication) service (e.g., UNIX socket). Fig. 7 is a schematic flow diagram of physical video memory allocation management in another embodiment of this specification, and as shown in fig. 7, a GWM may be configured and started first, and the GWM is run for a long time, after the GWM is started on a node, an IPC service is provided to the outside, and whether a model parameter of each model is repeated is queried when waiting for model loading. Such as: the method comprises the steps of deploying an online model, loading model parameters, inquiring whether the physical video memory of the model parameters is repeated or not based on the hash of contents before the physical video memory is distributed, sharing the physical video memory if the physical video memory is loaded, mapping a virtual video memory pointer and a corresponding physical video memory address, newly distributing the video memory if the repeated contents are not inquired, and sending data such as the hash to GWM.

By means of calculating and comparing hash in a model loading stage (some calculation overhead is generated on a CPU in addition, the model loading is often disposable, and then the model loading can be operated for months or years), the model parameters are globally and finely identified to be the same, and then the same content is mapped to the same GPU video memory, so that the video memory occupation is reduced during the long-term operation of the model, the purposes of deploying a larger model or more instances are achieved, a client-server mode is adopted, GWM is introduced, and the distribution management of the globally coordinated physical video memory is realized.

The video memory allocation processing device, the video memory allocation processing apparatus, and the video memory allocation processing system provided in the present specification can also be applied to various data analysis processing systems. The system or server or terminal or device may be a single server, or may include a server cluster, a system (including a distributed system), software (applications), actual operating devices, logical gate devices, quantum computers, etc. using one or more of the methods described herein or one or more embodiments of the system or server or terminal or device, in combination with necessary end devices implementing hardware. The system for checking for discrepancies may comprise at least one processor and a memory storing computer-executable instructions that, when executed by the processor, implement the steps of the method of any one or more of the embodiments described above.

The method embodiments provided by the embodiments of the present specification can be executed in a mobile terminal, a computer terminal, a server or a similar computing device. Taking the example of the present invention running on a server, fig. 8 is a block diagram of a hardware structure of a video memory allocation processing server in an embodiment of the present specification, and the computer terminal may be the video memory allocation processing server or the video memory allocation processing apparatus in the foregoing embodiment. As shown in fig. 8, the server 10 may include one or more (only one shown) processors 100 (the processors 100 may include, but are not limited to, a processing device such as a microprocessor MCU or a programmable logic device FPGA, etc.), a non-volatile memory 200 for storing data, and a transmission module 300 for communication functions. It will be understood by those skilled in the art that the structure shown in fig. 8 is only an illustration and is not intended to limit the structure of the electronic device. For example, the server 10 may also include more or fewer plug-ins than shown in FIG. 8, and may also include other processing hardware, such as a database or multi-level cache, a GPU, or have a different configuration than shown in FIG. 8, for example.

The non-volatile memory 200 may be used to store software programs and modules of application software, such as program instructions/modules corresponding to the video memory allocation processing method in the embodiment of the present specification, and the processor 100 executes various functional applications and resource data updates by running the software programs and modules stored in the non-volatile memory 200. Non-volatile memory 200 may include high-speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, the non-volatile memory 200 may further include memory located remotely from the processor 100, which may be connected to a computer terminal through a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The transmission module 300 is used for receiving or transmitting data via a network. Specific examples of the network described above may include a wireless network provided by a communication provider of the computer terminal. In one example, the transmission module 300 includes a Network adapter (NIC) that can be connected to other Network devices through a base station so as to communicate with the internet. In one example, the transmission module 300 may be a Radio Frequency (RF) module, which is used for communicating with the internet in a wireless manner.

The foregoing description has been directed to specific embodiments of this disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.

The method or apparatus provided by the present specification and described in the foregoing embodiments may implement service logic through a computer program and record the service logic on a storage medium, where the storage medium may be read and executed by a computer, so as to implement the effect of the solution described in the embodiments of the present specification.

The storage medium may include a physical device for storing information, and typically, the information is digitized and then stored using an electrical, magnetic, or optical media. The storage medium may include: devices that store information using electrical energy, such as various types of memory, e.g., RAM, ROM, etc.; devices that store information using magnetic energy, such as hard disks, floppy disks, tapes, core memories, bubble memories, and usb disks; devices that store information optically, such as CDs or DVDs. Of course, there are other ways of storing media that can be read, such as quantum memory, graphene memory, and so forth.

The video memory allocation processing method or apparatus provided in the embodiments of the present specification may be implemented in a computer by a processor executing corresponding program instructions, for example, implemented in a PC end using a c + + language of a windows operating system, implemented in a linux system, or implemented in an intelligent terminal using android and iOS system programming languages, implemented in processing logic based on a quantum computer, or the like.

The embodiments of the present description are not limited to what must be consistent with industry communications standards, standard computer resource data updating and data storage rules, or what is described in one or more embodiments of the present description. Certain industry standards, or implementations modified slightly from those described using custom modes or examples, may also achieve the same, equivalent, or similar, or other, contemplated implementations of the above-described examples. The embodiments using the modified or transformed data acquisition, storage, judgment, processing and the like can still fall within the scope of the alternative embodiments of the embodiments in this specification.

In the 90 s of the 20 th century, improvements in a technology could clearly distinguish between improvements in hardware (e.g., improvements in circuit structures such as diodes, transistors, switches, etc.) and improvements in software (improvements in process flow). However, as technology advances, many of today's process flow improvements have been seen as direct improvements in hardware circuit architecture. Designers almost always obtain the corresponding hardware circuit structure by programming an improved method flow into the hardware circuit. Thus, it cannot be said that an improvement in the process flow cannot be realized by hardware physical modules. For example, a Programmable Logic Device (PLD), such as a Field Programmable Gate Array (FPGA), is an integrated circuit whose Logic functions are determined by programming the Device by a user. A digital system is "integrated" on a PLD by the designer's own programming without requiring the chip manufacturer to design and fabricate application-specific integrated circuit chips. Furthermore, nowadays, instead of manually making an Integrated Circuit chip, such Programming is often implemented by "logic compiler" software, which is similar to a software compiler used in program development and writing, but the original code before compiling is also written by a specific Programming Language, which is called Hardware Description Language (HDL), and HDL is not only one but many, such as abel (advanced Boolean Expression Language), ahdl (alternate Hardware Description Language), traffic, pl (core universal Programming Language), HDCal (jhdware Description Language), lang, Lola, HDL, laspam, hardward Description Language (vhr Description Language), vhal (Hardware Description Language), and vhigh-Language, which are currently used in most common. It will also be apparent to those skilled in the art that hardware circuitry that implements the logical method flows can be readily obtained by merely slightly programming the method flows into an integrated circuit using the hardware description languages described above.

The controller may be implemented in any suitable manner, for example, the controller may take the form of, for example, a microprocessor or processor and a computer-readable medium storing computer-readable program code (e.g., software or firmware) executable by the (micro) processor, logic gates, switches, an Application Specific Integrated Circuit (ASIC), a programmable logic controller, and an embedded microcontroller, examples of which include, but are not limited to, the following microcontrollers: ARC 625D, Atmel AT91SAM, Microchip PIC18F26K20, and Silicone Labs C8051F320, the memory controller may also be implemented as part of the control logic for the memory. Those skilled in the art will also appreciate that, in addition to implementing the controller as pure computer readable program code, the same functionality can be implemented by logically programming method steps such that the controller is in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers and the like. Such a controller may thus be considered a hardware component, and the means included therein for performing the various functions may also be considered as a structure within the hardware component. Or even means for performing the functions may be regarded as being both a software module for performing the method and a structure within a hardware component.

For convenience of description, the above platform and terminal are described as being divided into various modules by functions and described separately. Of course, when implementing one or more of the present description, the functions of each module may be implemented in one or more software and/or hardware, or a module implementing the same function may be implemented by a combination of multiple sub-modules or sub-units, etc. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or plug-ins may be combined or integrated into another system, or some features may be omitted or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

These computer program instructions may also be loaded onto a computer or other programmable resource data update apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments may be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for system embodiments, since they are substantially similar to method embodiments, the description is relatively simple, and the relevant points can be referred to only part of the description of the method embodiments. In the description of the specification, reference to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the specification. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.

The above description is merely exemplary of one or more embodiments of the present disclosure and is not intended to limit the scope of one or more embodiments of the present disclosure. Various modifications and alterations to one or more embodiments described herein will be apparent to those skilled in the art. Any modification, equivalent replacement, improvement or the like made within the spirit and principle of the present specification should be included in the scope of the claims.

Claims

1. A video memory allocation processing method comprises the following steps:

obtaining a model parameter set of a model to be deployed;

2. The method of claim 1, further comprising:

3. The method of claim 2, further comprising:

4. The method of claim 1, wherein the obtaining the set of model parameters of the model to be deployed comprises:

5. The method of claim 1, wherein the obtaining the set of model parameters of the model to be deployed comprises:

6. The method according to claim 1, wherein the video memory mapping table further includes a parameter size of a deployed model parameter, and the sequentially matching the parameter hash value of each model parameter set with the video memory mapping table to determine whether each model parameter set of the model to be deployed is the same as the deployed model parameter in the video memory mapping table includes:

7. The method of claim 6, further comprising:

8. The method according to claim 1, wherein the video memory mapping table further includes a video memory number of a deployed model parameter, and the sequentially matching the parameter hash value of each model parameter set with the video memory mapping table to determine whether each model parameter set of the model to be deployed is the same as the deployed model parameter in the video memory mapping table includes:

9. The method of claim 1, wherein the models to be deployed are located in different threads or different processes or different containers.

10. The method of claim 1, further comprising:

and according to the virtual video memory pointers corresponding to the model parameter set, determining the sub-virtual video memory pointers of each model parameter in the model parameter set by using a reference function, and mapping the physical video memory addresses corresponding to the model parameter set to the sub-virtual video memory pointers of each model parameter.

11. A video memory allocation processing apparatus, the apparatus comprising:

12. The apparatus of claim 11, the video memory allocation module further configured to:

13. A video memory allocation processing device comprising: at least one processor and a memory for storing processor-executable instructions, the processor implementing the method of any one of claims 1-10 when executing the instructions.

14. A video memory allocation processing system, the system comprising: the system comprises a plurality of graphics processors and a global model parameter management module, wherein each graphics processor comprises a plurality of processes or containers, each process or container comprises at least one model to be deployed, and the system comprises:

an inter-process communication module is arranged in each process or container, the global parameter management module is used for executing the method of any one of claims 1 to 10, and the inter-process communication module is used for inquiring whether model parameters of each model to be deployed are repeated or not so as to allocate video memory to the model to be deployed in the plurality of graphics processors.