CN116776940A

CN116776940A - Neural network model generation method and device and electronic equipment

Info

Publication number: CN116776940A
Application number: CN202310716795.5A
Authority: CN
Inventors: 刘嘉炜; 卓正兴; 杨青
Original assignee: Du Xiaoman Technology Beijing Co Ltd
Current assignee: Du Xiaoman Technology Beijing Co Ltd
Priority date: 2023-06-16
Filing date: 2023-06-16
Publication date: 2023-09-19

Abstract

The embodiment of the application provides a neural network model generation method, a device and electronic equipment, wherein the neural network model generation method comprises the following steps: based on various structural parameters and weights of the original neural network model, different first target sub-network models are determined, and according to deployment requirements of target hardware equipment, second target sub-network models meeting deployment conditions of the target hardware equipment are adaptively generated by combining performance parameters of the first target sub-network models. Therefore, a second target sub-network model with small scale and high precision can be used for replacing an original neural network model to be deployed into target hardware equipment, the second target sub-network model has fewer hardware resources, the application range of the original neural network model can be expanded, and meanwhile, the second target sub-network model is one of all sub-network model sets which are matched with the target hardware equipment and has the highest precision, so that the optimal model deployment aiming at the target hardware equipment is realized.

Description

Neural network model generation method and device and electronic equipment

Technical Field

The present application relates to the field of artificial intelligence technologies, and in particular, to a neural network model generating method and apparatus, and an electronic device.

Background

With the development of artificial intelligence technology, neural network models are increasingly widely deployed in different hardware devices to realize corresponding functions in specific application scenarios. In general, the scale of the neural network model and the accuracy of the result which can be output are in positive correlation, namely, the larger the scale of the neural network model is, the more accurate the output result of the neural network model can be obtained.

However, since the neural network model is deployed in a computing device having different hardware resources, for a computing device with a smaller hardware resource, the smaller hardware resource cannot support the operation of the large-scale neural network model, which easily makes the functions implemented by the large-scale neural network model not implemented on the computing device with the smaller hardware resource, and further limits the application range based on the neural network model.

In the related art, in order to solve the problem that a large-scale neural network model cannot be deployed in a computing device with smaller hardware resources, modes of cutting, compressing or knowledge distilling the neural network are proposed to obtain a small-scale neural network model with the same function. However, the related art approaches to obtain a small-scale neural network model all rely on manually empirically setting parameters of the small-scale neural network model, which makes the obtained small-scale neural network model not an optimal small-scale neural network model for the target computing device.

Disclosure of Invention

In view of this, embodiments of the present application provide a method, an apparatus, and an electronic device for generating a neural network model, so as to provide an adaptive neural network model according to different hardware resource environments, expand an application range of the neural network model, and automatically determine an optimal small-scale neural network model based on hardware resources of a deployed computer device.

In a first aspect, an embodiment of the present application provides a neural network model generating method, where the method includes:

determining different first target sub-network models based on various structural parameters, weights and weights of an original neural network model, wherein the structural parameters of each first target sub-network model are obtained by randomly combining various structural parameters of the original neural network model;

and generating a second target sub-network model with highest precision, wherein the performance parameters of the target sub-network model are positively correlated with the structural parameters and the weights of the target sub-network model, according to the performance parameters of each first target sub-network model, and the performance parameters of the second target sub-network model meet the deployment conditions of the target hardware equipment.

With reference to the first aspect, in a second possible embodiment, the determining a different first target sub-network model based on each structural parameter, weight of the original neural network model includes:

Determining the structural parameters of each first target sub-network model according to the preset value range of each structural parameter; the structural parameters of the first target sub-network model are obtained by randomly combining values of the structural parameters in corresponding preset value ranges;

and sampling from the original neural network model based on the structural parameters of each first target sub-network model to obtain a candidate sub-network model set, wherein the candidate sub-network model set comprises a plurality of first target sub-network models.

With reference to the second possible embodiment of the first aspect, in a third possible embodiment, the method further includes:

sampling from the original neural network model according to the maximum value of each structural parameter in a corresponding preset value range to generate a maximum first target sub-network model;

based on a knowledge distillation training model, taking the maximum first target sub-network model as a teacher model, taking other first target sub-network models in the candidate sub-network model set as student models, and training weights of other first target sub-network models in the candidate sub-network model set so as to update the first target sub-network models in the candidate sub-network model set.

With reference to the second or third possible embodiment of the first aspect, in a fourth possible embodiment, the generating, according to the performance parameters of each first target sub-network model, a second target sub-network model with the highest precision, where the performance parameters meet the deployment conditions of the target hardware device includes:

determining a plurality of first target sub-network models with performance parameters meeting the deployment conditions of the target hardware equipment from the candidate sub-network model set;

and determining a first target sub-network model with highest precision as the second target sub-network model from a plurality of first target sub-network models, wherein each performance parameter of the first target sub-network model meets the deployment condition of the target hardware equipment.

With reference to the first aspect, in a fifth possible embodiment, the deployment condition of the target hardware device includes: the parameters of the model, the floating point operands of the model, and the predicted run time of the model on the target hardware device.

With reference to the fifth possible embodiment of the first aspect, in a sixth possible embodiment, the predicted running duration of the model on the target hardware device is obtained in advance by:

Inputting structural parameters of each first target sub-network model into a deployment environment operation duration prediction model, and obtaining the predicted operation duration of each first target sub-network model on the target hardware equipment;

the deployment environment operation time length prediction model is a model which is obtained by training according to the historical operation time length of each third target sub-network model in a preset deployment environment in advance, and the structural parameters of each third target sub-network model are the same as the structural parameters of each first target sub-network model.

With reference to the first aspect, in a seventh possible embodiment, the original neural network model is a transducer neural network model or a variant neural network model based on the transducer neural network model.

In a second aspect, an embodiment of the present application provides a neural network model generating device, where the device includes:

the first determining module is used for determining different first target sub-network models based on various structural parameters and weights of the original neural network models, wherein the structural parameters of each first target sub-network model are obtained by randomly combining the structural parameters of the original neural network models;

And the first model generation module is used for generating a second target sub-network model with highest precision, wherein the performance parameters of the target sub-network model are positively correlated with the structural parameters of the target sub-network model, and the second target sub-network model with highest precision is generated according to the performance parameters of each first target sub-network model, and the performance parameters meet the deployment conditions of the target hardware equipment.

With reference to the second aspect, in a second possible embodiment, the first determining module is specifically configured to:

sampling from the original neural network model based on the structural parameters of each first target sub-network model to obtain a candidate sub-network model set, wherein the candidate sub-network model set comprises a plurality of first target sub-network models;

the first model generation module is specifically configured to generate a maximum first target sub-network model by sampling in the original neural network model according to a maximum value of each structural parameter in a corresponding preset value range;

The apparatus further comprises:

the first model training module is used for training weights of other first target sub-network models in the candidate sub-network model set based on a knowledge distillation training model by taking the maximum first target sub-network model as a teacher model and taking the other first target sub-network models in the candidate sub-network model set as student models so as to update the first target sub-network models in the candidate sub-network model set;

the first model generating module is specifically configured to:

determining a first target sub-network model with highest precision as the second target sub-network model from a plurality of first target sub-network models of which the performance parameters meet the deployment conditions of the target hardware equipment;

the deployment conditions of the target hardware device include: the parameter number of the model, the floating point operand of the model and the predicted running time of the model on the target hardware equipment;

the apparatus further comprises:

the operation time length obtaining module is used for inputting the structural parameters of each first target sub-network model into a deployment operation time length prediction model to obtain the predicted operation time length of each first target sub-network model on the target hardware equipment;

The deployment operation time length prediction model is a model which is obtained by training according to the historical operation time length of each third target sub-network model in a preset deployment environment in advance, and the structural parameters of each third target sub-network model are the same as the structural parameters of each first target sub-network model.

In a third aspect, an embodiment of the present application provides an electronic device, where the electronic device includes:

a processor; and

a memory in which a program is stored,

wherein the program comprises instructions which, when executed by the processor, cause the processor to perform the neural network model generation method according to the first aspect.

In a fourth aspect, an embodiment of the present application provides a non-transitory computer readable storage medium storing computer instructions for causing a computer to execute the neural network model generating method of the first aspect.

The application has the beneficial effects that:

the embodiment of the application provides a neural network model generation method, a device and electronic equipment, wherein the neural network model generation method comprises the following steps: based on various structural parameters and weights of the original neural network model, different first target sub-network models are determined, and according to deployment requirements of target hardware equipment, second target sub-network models meeting deployment conditions of the target hardware equipment are adaptively generated by combining performance parameters of the first target sub-network models.

In the embodiment of the application, as the structural parameters of each first target sub-network model are obtained by randomly combining each structural parameter and weight of the original neural network model, the structural parameters and the performance parameters are in positive correlation. Therefore, the scale of each first target sub-network model is smaller than that of the original neural network model, and each first target sub-network model has the same functions as the original neural network model, but has different precision.

According to the embodiment of the application, the deployment conditions of the target hardware equipment are combined, and the second target sub-network model matched with the deployment conditions of the target hardware equipment is adaptively generated based on the performance parameters of each first target sub-network model. And determining a second target sub-network model with the best matching precision from the first target sub-network models by combining the deployment conditions of the target hardware equipment. Therefore, the second target sub-network model with small scale and high precision can be used for replacing the original neural network model to be deployed into the target hardware equipment, the second target sub-network model has fewer hardware resources, the application range of the original neural network model can be expanded, and meanwhile, the original function which can be realized by the original neural network model can be realized with high precision.

And the second target sub-network model is one with highest precision in all sub-network model sets of the adaptive target hardware equipment, so that the optimal model deployment aiming at the target hardware equipment is realized.

According to one or more technical schemes provided by the embodiment of the application, the self-adaptive neural network model can be provided according to different hardware resources, and the application range technical effect of the neural network model is further enlarged

Drawings

Further details, features and advantages of the application are disclosed in the following description of exemplary embodiments with reference to the following drawings, in which:

fig. 1 is a schematic flow chart of a neural network model generating method according to an embodiment of the present application;

fig. 2 is another possible flow chart of a neural network model generating method according to an embodiment of the present application;

FIG. 3 is a schematic diagram of one possible composition of a candidate sub-network model set according to an embodiment of the present application;

FIG. 4 is a schematic diagram of one possible flow of updating a candidate sub-network model set according to an embodiment of the present application;

FIG. 5 is a schematic diagram of another possible flow chart of a neural network model generating method according to an embodiment of the present application;

FIG. 6 is a schematic diagram of one possible training procedure for the operation duration of the deployment environment according to the embodiment of the present application;

fig. 7 is a schematic flow chart of a possible practical application scenario of the neural network model generating method according to the embodiment of the present application;

fig. 8 is a schematic diagram of a possible logic structure of a neural network model generating device according to an embodiment of the present application;

fig. 9 is a schematic diagram of a possible logic structure of an electronic device according to an embodiment of the present application.

Detailed Description

Embodiments of the present application will be described in more detail below with reference to the accompanying drawings. While the application is susceptible of embodiment in the drawings, it is to be understood that the application may be embodied in various forms and should not be construed as limited to the embodiments set forth herein, but rather are provided to provide a more thorough and complete understanding of the application. It should be understood that the drawings and embodiments of the application are for illustration purposes only and are not intended to limit the scope of the present application.

It should be understood that the various steps recited in the method embodiments of the present application may be performed in a different order and/or performed in parallel. Furthermore, method embodiments may include additional steps and/or omit performing the illustrated steps. The scope of the application is not limited in this respect.

The term "including" and variations thereof as used herein are intended to be open-ended, i.e., including, but not limited to. The term "based on" is based at least in part on. The term "one embodiment" means "at least one embodiment"; the term "another embodiment" means "at least one additional embodiment"; the term "some embodiments" means "at least some embodiments. Related definitions of other terms will be given in the description below. It should be noted that the terms "first," "second," and the like herein are merely used for distinguishing between different devices, modules, or units and not for limiting the order or interdependence of the functions performed by such devices, modules, or units.

It should be noted that references to "one", "a plurality" and "a plurality" in this disclosure are intended to be illustrative rather than limiting, and those skilled in the art will appreciate that "one or more" is intended to be construed as "one or more" unless the context clearly indicates otherwise.

Neural network models are mathematical methods for simulating human actual neural networks, and are often applied to the technical fields of pattern recognition, intelligent control, data analysis and the like. Since the processing accuracy achieved by the neural network model is positively correlated with the scale of the neural network model, the scale of the neural network model is preferably designed to be extremely large in the neural network model design stage in order to maximize the functions of the neural network model and obtain extremely high processing accuracy. This extends to the background that, for computing devices with smaller hardware resources, the hardware conditions such as memory space, processing power of the processor, etc., cannot support running a large-scale neural network model on the computing device.

In an application scenario of natural language processing, for accurately identifying natural language generated by a human, parameters such as the number of layers, connection weights between layers, and the like of an original neural network model are all maximized in the initial stage of designing the neural network model, so as to obtain an original neural network model with an accurate output result. However, for some mobile terminal devices, the original neural network model cannot be installed under the owned hardware conditions, so that the function of identifying the natural language generated by the human being based on the original neural network model cannot be implemented in the mobile terminal device with fewer hardware conditions, and further, the user of the mobile terminal device cannot use the natural language identification function provided by the original neural network model.

In the related art, a target small-scale neural network model with the same function as a large-scale neural network model can be obtained through model quantization, model clipping and model knowledge distillation. The method for acquiring the target small-scale neural network model by the knowledge distillation method is a common method for acquiring the small-scale neural network model, and the method for acquiring the small-scale neural network model by the knowledge distillation method specifically comprises the following steps: a small-scale neural network model is designed based on a large-scale neural network model, then the large-scale neural network model is used as a teacher model, the small-scale neural network model is used as a student model, knowledge distillation training is carried out, and then a target small-scale neural network model with the same function as that of the large-scale neural network model is obtained. However, in this manner, the structural parameters of the small-scale neural network model are manually set by the operator based on personal experience, which tends to cause that the target small-scale neural network model obtained based on this manner is largely not an optimal small-scale neural network model of the computing device adapted to the small hardware resources, resulting in that the processing accuracy of the target small-scale neural network model is not optimal.

In view of the foregoing, in a first aspect, an embodiment of the present application provides a neural network model generating method, which is aimed at solving a problem that a function that can be implemented by a large-scale neural network model cannot be implemented on a computing device with limited hardware resources, and determining an optimal small-scale neural network model based on the hardware resources of the deployed computing device.

The neural network model generation method can be applied to any electronic equipment with the function of running the neural network model, and the electronic equipment comprises, but is not limited to, mobile terminals, personal computers, servers and the like. As shown in fig. 1, the neural network model generating method provided by the embodiment of the application may include the following two steps:

s11, determining different first target sub-network models based on various structural parameters and weights of an original neural network model;

the structural parameters of each first target sub-network model are obtained by randomly combining the structural parameters of the original neural network model.

S12, generating a second target sub-network model with highest precision, wherein the performance parameters meet the deployment conditions of the target hardware equipment, according to the performance parameters of each first target sub-network model;

Wherein, the performance parameters of the target sub-network model are positively correlated with the structural parameters and weights of the target sub-network model.

In the embodiment of the application, as the structural parameters of each first target sub-network model are obtained by randomly combining the structural parameters of the original neural network model, the structural parameters and the performance parameters are in positive correlation. Therefore, the scale of each first target sub-network model is smaller than that of the original neural network model, and each first target sub-network model can realize the same functions which can be realized by the original neural network model, but the precision is different.

According to the embodiment of the application, the second target sub-network model matched with the target hardware deployment condition is adaptively generated based on the performance parameters of each first target sub-network model by combining the deployment condition of the target hardware equipment, namely, the second target sub-network model with the highest matching precision is determined from each first target sub-network model by combining the deployment condition of the target hardware equipment. Therefore, the second target sub-network model with small scale and high precision can be used for replacing the original neural network model to be deployed into the target hardware equipment, and the application range of the original neural network model can be expanded conveniently and the original function which can be realized by the original neural network model can be realized with high precision because the second target sub-network model needs less hardware resources.

Taking the example of the original neural network model for natural language processing as an example, the embodiment of the application is selected and used to randomly combine various structural parameters based on the original neural network model to obtain different structural parameter combinations, and then a plurality of target sub-network models with different structural parameters are generated, wherein the structural parameters of the target sub-network model are obtained by randomly combining the structural parameters of the original neural network model, and the structural parameters of the target sub-network model are smaller than the structural parameters of the original neural network model, but the natural language processing can be realized. Therefore, the performance parameters of the target sub-network model are matched with the deployment hardware conditions of the target mobile terminal, the target sub-network model with the performance parameters matched with the deployment hardware conditions of the mobile terminal can be determined, the natural language processing function realized based on the neural network model is further ensured to be used by users of different hardware devices, the application range of the function is expanded, and the user experience is ensured.

The above steps S11, S12 will be described in detail below:

in step S11, the scale of the original neural network model is generally larger, and it is generally not possible to directly deploy in a device with smaller hardware resources. The original neural network model can be a neural network model which is designed well and deployed on a large-scale hardware device, or can be a large-scale neural network model which is not deployed yet and is based on the design stage of the target application scene.

In one possible embodiment, the original neural network model may be a large language model in a natural language processing application scenario that is capable of recognizing parts-of-speech of vocabulary in a human-generated natural language, capable of recognizing emotion in a human-generated natural language, and so forth. In the embodiment of the application, the original neural network model may be a transducer neural network model, or may be a neural network model obtained based on a transducer neural network model variant, such as ALBert, longformer and Flash, which are well known in the industry.

Based on the above, the neural network model generation method provided by the embodiment of the application not only supports the most basic Transformer neural network model, but also supports other variant neural network models obtained based on the Transformer neural network model variant, and further can adaptively generate the target sub-network model with small scale matched with the deployment condition of the target hardware equipment based on different original neural network models.

In one possible embodiment, the original neural network model may be a large-scale neural network model designed based on the actual application scene requirement, and the structural parameters of the original neural network model in the initial stage of design may be obtained and randomly combined to obtain the structural parameters of different first target sub-network models when step S11 is performed. In another possible embodiment, the original neural network model may also be a super network model of the pre-training network model designed based on the actual application scene requirement, and in executing step S11, the structural parameters of the super network model may be obtained by obtaining the structural parameters of the super network model and performing random combination to obtain the structural parameters of the different first target sub network model.

In the embodiment of the present application, the structural parameters of the neural network model are parameter values for describing the constituent structure of the neural network model, and in view of the complex structure of the neural network, the number of kinds of the structural parameters of the neural network model is several, including but not limited to the number of layers of the neural network, the number of attention heads, the embedding dimension of the embedding, the FFN (Feed Forward Network ) layer expansion rate, and the like. The larger the structural parameter value is, the larger the scale of the corresponding neural network model is, the higher the processing precision of the neural network model is, and the higher the corresponding required hardware condition is.

In view of this, in the embodiment of the present application, each structural parameter of the original neural network model is taken as n different elements, and a specified number of elements are determined from the n different elements and randomly combined, so that different structural parameters of the first target subnetwork model can be obtained. For example, for the Bert neural network model, the structural parameters thereof may be specifically:

the dimension of ebedding is 768, the number of layers is 12, the attention head number is 12, and the expansion rate of FFN layers is 4.

Based on this, the structural parameters of the different first target sub-network models are determined, essentially a random combination of the various structural parameters, weights of the original neural network model. For example, it is simply understood that if the original neural network model is a variant Bert neural network model of the Transformer neural network model, the structural parameters of the Bert neural network model include only two items: the number of layers and the number of attention heads are 12 according to the scale structure of the Bert neural network model when the Bert neural network model is designed, and the number of the attention heads is 12, and the number of layers and the number of the attention heads are taken as randomly combined elements, so that the total generation can be realized:

If the model has only one layer, there are 12 possibilities for this layer of attention header: 12.

if the model has two layers, each layer of attention head has 12 possibilities: 12 ² If the model has 12 layers, there are 12 possibilities for each layer of attention header: 12 ¹² . All the above are added, that is, all possible combinations of structural parameters, in fact seven values of the number of layers 6,7,8,9,10,11,12 according to the three values of the attention tips 6,8, 12. Possible combinations of structural parameters are: 3 ⁶ +3 ⁷ +…3 ¹² A kind of module is assembled in the module and the module is assembled in the module.

The inventor finds that if the structural parameters of the neural network model are too small in the practical application process, the scale of the neural network model is too small, and the original function of the original neural network model cannot be realized. By way of example, the original neural network model with the layer number of 12 and the attention head number of 12 can accurately identify whether the emotion of the natural language input by the human is positive or negative in the natural language processing process, and if the target sub-network model of the original neural network model is only 1 and the attention head number is only 1, the input natural language cannot be processed at all, and the emotion contained in the input natural language cannot be identified. Based on this, in one possible embodiment, the first target sub-network model that is too small to implement the original function of the original neural network model needs to be screened out, and as shown in fig. 2, when step S11 is performed, the first target sub-network model that is suitable for the scale may be determined by the following steps:

S111, determining the structural parameters of each first target sub-network model according to the preset value ranges of each structural parameter of the original neural network model;

s112, sampling from the original neural network model based on the structural parameters of each first target sub-network model to obtain a candidate sub-network model space.

In step S111, the preset value range of each structural parameter is a subinterval of the original value range of each structural parameter, specifically, a subinterval with a larger value in the subintervals of the original value range. Illustratively, taking the original neural network model as the Bert neural network model as an example, the number of layers of the original neural network model is originally in a range of 1 layer to 12 layers, namely, the number of layers l= [1,12]. In order to ensure that the target sub-network model deployed on the target hardware device of the user can realize the function of the original neural network model, the preset value range of the structural parameter layer number can be a large subinterval of the original value range, namely, l= [ x,12].

The larger the value of x is, the larger the number of layers of the corresponding target sub-network model is, and the larger the corresponding model scale is. The value of x can be set according to practical experience, and the number of layers of the minimum-scale neural network model capable of realizing the function of the original neural network model can be determined to be x according to historical data, for example, if the historical data show that the number of layers of the minimum-scale neural network model for realizing the function of the original Bert neural network model is 6, the preset value range of the number of layers of the structural parameters of the original neural network model is: l= [6,12].

Similarly, the preset value ranges of other structural parameters of the original neural network model can be set according to the setting principle of the preset value ranges of the layer number, and are not repeated herein. Specifically, in one possible embodiment, in the case where the original neural network model is the Bert neural network model, the structural parameters of each first target sub-network model may be determined by using the range of values of each structural parameter referred to in the following table:

table 1 a schematic table of possible values of structural parameters

Structural parameters of Bert neural network model	Value range
		Layer number	[6,7,8,9,10,11,12]
Embedding embedding dimension	[72,…,24k,…,768]
		Number of attention points	[6,8,12]
Expansion rate of FFN layer	[0.5,1,1.5,2,2.5,3,3.5,4]

If the original Bert neural network model is sampled in the value range in table 1, the number of first target sub-networks that may be generated is: 1.143*10 ¹⁸ One is obtained from 1.143 x 10 ¹⁸ And each first target sub-network model in the model space can realize the function of the original neural network model, but the corresponding precision is different. In an embodiment of the present application, the set of candidate sub-network models are sub-network models of the original neural network model Each sub-network model belonging to a subset of the original neural network model. Wherein 1.143 x 10 ¹⁸ The number calculation process of the first target subnetwork model is as follows:

for a structurally determined model, the number of embedding per layer is the same, while the number of attention tips per layer and the FFN layer expansion rate are different. For a certain layer in the model, the number of potential target sub-network models is that the number of potential combinations of attention header number and FFN layer expansion rate is 3*8 =24, and the number of potential layers is from 6 to 12: 24 ⁶ +24 ⁷ +…24 ¹² ＝3.81*10 ¹⁶ The number of combined emuds is 30, so the number of first target sub-network models is: 30*3.81*10 ¹⁶ ＝1.143*10 ¹⁸ And each.

Based on this, when executing step S112, according to the structural parameters obtained by random combination in step S11, a corresponding number of first target sub-network models are sampled from the original neural network models, and an abstract candidate sub-network model set is formed by the plurality of first target sub-network models. Exemplary, each structural parameter within the preset value range of the structural parameters of the original Bert neural network model is randomly combined to obtain 1.143×10 ¹⁸ Examples of possible structural parameters are based on 1.143 x 10 obtained ¹⁸ A possible structural parameter is 1.143×10 from the original Bert neural network model ¹⁸ A corresponding first target subnetwork model, then defined by 1.143 x 10 ¹⁸ The first target sub-network model builds a set of candidate sub-network models. If the original neural network model is a pre-trained neural network model obtained through unsupervised learning, the pre-trained neural network model can be utilized to construct a super network model, wherein the structure of the constructed super network model is the same as the structural parameters of the pre-trained neural network model of the original neural network model, and the weights of the super network model are initialized by utilizing the weights of the pre-trained neural network model. The constructed super network model may support dynamic sampling, unlike the pre-trained neural network model obtained through unsupervised learning.

Based on this, when the step S112 is executed, and the first target sub-network space is obtained by sampling from the original neural network model, if the original neural network model is the pre-training neural network model, the sampling from the original neural network model may be performed by dynamically sampling the constructed super-network model. Specifically, since there are multiple optional operations for each neural network layer in the super network. Wherein, the process of dynamically sampling the super network can be carried out from 1.143 x 10 of the super network by sampling ¹⁸ Corresponding sub-network models are sampled in the sub-networks, and then the sampled sub-network models are trained.

Because the weights among the subnetworks are entangled, only a small number of subnetworks need to be sampled from the subnetworks with astronomical numbers for training, so that all subnetworks have considerable accuracy, the whole super-network can be trained, and particularly, referring to fig. 3, different subnetwork models can be sampled from the super-network according to different structural parameters, and each subnetwork model can be trained.

In one possible embodiment, in the process of generating the candidate sub-network model set, as shown in fig. 4, each first target sub-network model in the candidate sub-network model set may be updated by:

s41, sampling from an original neural network model according to the maximum value of each structural parameter in a corresponding preset value range to generate a maximum first target sub-network model;

s42, training weights of other first target sub-network models in each candidate sub-network model set by taking the largest first target sub-network model as a teacher model and taking other first target sub-network models in the candidate sub-network model set as student models based on the knowledge distillation training model.

In the execution of step S41, the description of the sampling portion of the super network model may be referred to, and the maximum value of each structural parameter is taken as a sampling reference, so as to obtain the maximum first target sub-network model from the original neural network model. Specifically, the maximum target sub-network is sampled according to each maximum structural parameter value, and then a loss function is formed by combining the correct label and the data of the first maximum target sub-network model, so that the back propagation update super-network is performed.

It can be appreciated that the maximum first target sub-network model is the same size as the original neural network model or the super-network model constructed based on the original neural network model, and thus the maximum first target sub-network model corresponds to the super-network itself. In this regard, when step S42 is executed, as shown in fig. 5, the maximum first target sub-network model may be used as a teacher model, and the remaining first target sub-network models may be used as student models, and step S41 may be executed to train the maximum first target sub-network model preferentially based on the knowledge distillation technique. Or if a plurality of first target sub-network models are randomly sampled from the rest of first target sub-network models to serve as student models, because weights among the sub-network models are correlated and entangled, even if the sampled sub-network only occupies a small part of all sub-networks, all the sub-networks can have the same functions as the original models.

Then, based on the original target task data, combining the output result of the maximum first target sub-network model as additional training data, for example, taking the output result of the maximum first target sub-network model as a soft label, forming a loss function with the output of each other first target sub-network, and performing back propagation training on each other random first target sub-network model, so as to transfer the knowledge learned on the original neural network model into each small-scale first target sub-network model, realize updating the random first target sub-network model, and further update the candidate sub-network model set, so that each first target sub-network model has the same function as the original neural network model.

It can be understood that the candidate sub-network model set obtained through step S42 includes a plurality of first target sub-network models capable of implementing the same function as the original neural network model, but each scale is different. In order to determine a second target subnetwork model matching the hardware condition of the target hardware device from the set of candidate subnetwork models, in one possible embodiment, when performing step S12, the second target subnetwork model that most matches the target hardware device may be obtained specifically by:

S51, determining a plurality of first target sub-network models with performance parameters meeting deployment conditions of target hardware equipment from a candidate sub-network model set;

s52, determining the first target sub-network model with highest precision as a second target sub-network model from a plurality of first target sub-network models with all performance parameters meeting the deployment conditions of the target hardware equipment.

In the embodiment of the application, the performance parameter of the first target sub-network model is a parameter for describing the performance of the first target sub-network model. The performance parameter and the structural parameter are in positive correlation, namely, the larger the structural parameter of the first target sub-network model is, the better the performance of the first target sub-network model is. Wherein, for a first target subnetwork model, its performance parameters include, but are not limited to: the parameter number of the first target sub-network model, the floating point operand of the first target sub-network model, and the predicted running time of the first target sub-network model on target hardware. In one possible embodiment, the deployment condition of the target hardware device corresponds to a performance parameter of the network model, namely: the parameters of the model, the floating point operands of the model, and the predicted run length of the model on the target hardware device. In one possible embodiment, the parameters of the model may be determined by the amount of memory space required by the model. The floating point operands of a model may refer to how many multiplications the model needs to perform once, representing the computational load of the model.

When executing step S51, a batch of first target sub-network models with the matching rate meeting the matching rate threshold may be determined by obtaining the performance parameters of each first target sub-network model and matching with the deployment conditions of the target hardware device. And then, when the step S12 is executed, determining the first target sub-network model with highest precision from the first target sub-network models.

Specifically, the first target subnetwork model with highest accuracy can be determined through the following steps:

s61, initializing the number n of the first target sub-network models in the sampled candidate sub-network model set, and circularly sampling the corresponding first target sub-network models from the super-network model of the original neural network model to obtain the candidate sub-network model set. Calculating the model parameter, floating point operand and input deployment environment operation time prediction model of each first target sub-network model in the candidate sub-network model set to obtain the predicted operation time until m first target sub-network models meeting deployment requirements are sampled and form a second target sub-network model set;

s62, combining coding features of two random first target sub-network models in a set according to a set probability p to form a new target sub-network model, calculating the parameter number, floating point operand and input deployment environment operation duration prediction model of the new target sub-network model to obtain the predicted operation duration, if the deployment requirement is met, putting the new target sub-network model into the second target sub-network set, and if the deployment requirement is not met, re-executing the step S62;

S63, repeating the steps S61 and S62 for m times, obtaining all target sub-network models meeting deployment requirements, testing the precision of the target sub-network models meeting the requirements, and selecting the target sub-network model with the highest precision as a second target sub-network model.

The accuracy of the target sub-network model can be determined according to the accuracy of whether the output result is correctly output or not by inputting the target task data set into the target sub-network model to obtain the output result of the target sub-network model. Specifically, the output accuracy of the target subnetwork model can be determined according to the correct sample number of the model classification/the number of all samples. In other possible embodiments, the accuracy of the target subnetwork model may be obtained using other metrics than accuracy, such as F1 score, recall, and the like. Alternatively, the target subnetwork may be tested according to other metrics, and the application does not require strict definition of the selected metrics.

In one possible embodiment, when executing step S12, determining the second target subnetwork model according to the deployment condition of the target hardware device, where the deployment condition of the target hardware device is mainly reflected in the following two aspects:

In one aspect, an approximate scale of a target subnetwork model meeting a given deployment environment requirement is determined, primarily based on structural parameters of the target subnetwork model.

And secondly, determining a measurement standard of a target sub-network model meeting the requirements of a given deployment environment, specifically selecting the parameter number of the model, floating point operands of the model or the predicted running time of the model on target hardware equipment.

According to the requirements of the two aspects, determining target searching conditions, and searching in the candidate sub-network model set to search out a second target sub-network model meeting the conditions. For example, if the target subnetwork model needs to be applied to automatic driving of an automobile, there may be a requirement on time, and model operation needs to be completed within 10ms, at this time, the target subnetwork model with the predicted operation time less than 10ms is searched out from the candidate subnetwork model set according to the predicted operation time of the model on the riding hardware device.

Or, for example, if the target sub-network model needs to be applied to the mobile phone application program, the requirement on the model parameter number may not be exceeded by 200k, otherwise, the application program is easy to be too bulky, and the like, and at this time, the target sub-network model with the occupied internal memory less than 200k is searched out from the candidate sub-network models according to the model parameter number.

In one possible embodiment, for the three standards for scaling the model, the parameter and floating point operand are directly calculated, and will not be described in detail. However, the operation time cannot be obtained through calculation, so the following steps are provided to realize the function of obtaining the operation time of the model on a specific deployment environment:

and inputting the structural parameters of each first target sub-network model into a deployment environment operation time length prediction model, and obtaining the predicted operation time length of each first target sub-network model on the target hardware equipment.

The deployment environment operation time length prediction model is a model which is obtained by training in advance according to the historical operation time length of each third target sub-network model in the preset deployment environment, and the structural parameters of each third target sub-network model are the same as those of each first target sub-network model.

Exemplary, as shown in fig. 6, by performing step S11 (or S111, S112) in advance, sampling an original neural network model, or sampling a super network model constructed based on the original neural network model, generating different third target sub-network models, and then testing the actual operation duration of each different third target sub-network model in a preset hardware deployment environment. Then, a characteristic data pair of a sub-network model-operation duration is constructed, the data pair is used as task data, a pre-designed initial deployment environment operation duration prediction model is trained, and the deployment environment operation duration prediction model mentioned in the step S62 can be further obtained through supervised training. The operation time length prediction model of the deployment environment is used for determining the operation time length of the target sub-network model predicted in the preset deployment hardware environment based on the input target sub-network model. The running time refers to the time consumed by the sub-network model to receive the processing task to output a closed loop corresponding to the output result.

In the embodiment of the application, by means of the deployment environment operation time length prediction model, the structural parameters of the target sub-network model to be tested can be input into the deployment environment operation time length prediction model so as to determine the predicted operation time length of the target sub-network model to be tested in the deployment environment operating in the target hardware equipment. The method can further help to determine a matched second target sub-network model according to the deployment condition of the target hardware device.

In order to more clearly illustrate the neural network model generation method provided by the embodiment of the application, the method is specifically described with reference to a practical application flowchart shown in fig. 7:

firstly, a pre-training model is introduced, the pre-training model is usually obtained by performing non-supervision training in a mask prediction mode by using a large amount of non-marked data, and then the pre-training model is subjected to fine adjustment by using target task data to obtain a fine adjustment pre-training model, and the fine adjustment pre-training model realizes the function of target tasks.

And then, constructing a super network model based on the structure parameters of the pre-trained model after fine adjustment, wherein the parameters of the constructed super network model are the same as those of the pre-trained network model, and the weights of the super network model are obtained by initializing the weights of the pre-trained network model. In the embodiment of the application, the original neural network model can be a super network model constructed based on a pre-training network model.

Further, the super network model is sampled through executing steps S111 to S112 to obtain each first target sub network model, and then the target small model with highest precision meeting the deployment condition is determined from the first target small models through executing steps S51 and S52 in combination with the deployment condition of the target hardware device, namely, the second target sub network model meeting the deployment condition is determined.

Then, by executing steps S41 and S42, the second target sub-network model is subjected to knowledge distillation by using the fine-tuning pre-training model as a teacher model, so that the second target sub-network model learns the functions of the super-network model. The deployment gadget is then deployed into the target hardware device. In one possible embodiment, the original neural network model may be pre-deployed in the hardware device on the supply side, and only the structural parameters, performance parameters and weights corresponding to the precision of the second target sub-network model (i.e. the deployment small model in fig. 7) determined according to the above steps are actually deployed in the hardware device on the user side, where the second target sub-network model is directly deployed in the hardware device on the user side.

After the second target subnetwork model is generated in step S12, in one possible embodiment, the weights of the second target subnetwork may be fine-tuned to enable the accuracy of the second target subnetwork model to be improved.

In a second aspect, an embodiment of the present application provides a neural network model generating apparatus, as shown in fig. 8, where the neural network model generating apparatus 800 includes at least the following parts:

the first determining module 801 is configured to determine different first target sub-network models based on each structural parameter and weight of the original neural network model, where the structural parameters of each first target sub-network model are obtained by randomly combining each structural parameter of the original neural network model;

the first model generating module 802 is configured to generate, according to the performance parameters of each first target sub-network model, a second target sub-network model with highest precision, where the performance parameters meet the deployment conditions of the target hardware device, where the performance parameters of the target sub-network model are positively correlated with the structural parameters of the target sub-network model.

In one possible embodiment, the first determining module 801 is specifically configured to:

determining the structural parameters of each first target sub-network model according to the preset value ranges of each structural parameter and weight; the structural parameters of the first target sub-network model are obtained by randomly combining values of the structural parameters in corresponding preset value ranges;

Based on the structural parameters of each first target sub-network model, sampling from the original neural network model to obtain a candidate sub-network model set, wherein the candidate sub-network model set comprises a plurality of first target sub-network models.

In one possible embodiment, the first model generation module 802 is specifically configured to:

sampling in the original neural network model according to the maximum value of each structural parameter in a corresponding preset value range to generate a maximum first target sub-network model;

the neural network model generation device further includes:

the first model training module 803 is configured to train weights of other first target sub-network models in the candidate sub-network model set by using the largest first target sub-network model as a teacher model and other first target sub-network models in the candidate sub-network model set as student models based on the knowledge distillation training model, so as to update the first target sub-network models in the candidate sub-network model set.

And determining the first target sub-network model with highest precision as a second target sub-network model from a plurality of first target sub-network models with all the performance parameters meeting the deployment conditions of the target hardware equipment.

In one possible embodiment, the deployment conditions of the target hardware device include: the parameters of the model, the floating point operands of the model, and the predicted run time of the model on the target hardware device.

In one possible embodiment, the neural network model generating apparatus 800 further includes:

the operation time length obtaining module 804 is configured to input structural parameters of each first target sub-network model into the deployment operation time length prediction model, and obtain a predicted operation time length of each first target sub-network model on the target hardware device;

the deployment operation time length prediction model is a model which is obtained by training in advance according to the historical operation time length of each third target sub-network model in a preset deployment environment, and the structural parameters of each third target sub-network model are the same as those of each first target sub-network model.

The names of messages or information interacted between the devices in the embodiments of the present application are for illustrative purposes only and are not intended to limit the scope of such messages or information.

In a third aspect, exemplary embodiments of the present application further provide an electronic device, including: at least one processor; and a memory communicatively coupled to the at least one processor. The memory stores a computer program executable by the at least one processor for causing the electronic device to perform a method according to an embodiment of the application when executed by the at least one processor.

In a fourth aspect, the exemplary embodiments of the application also provide a non-transitory computer readable storage medium storing a computer program, wherein the computer program, when executed by a processor of a computer, is for causing the computer to perform a method according to an embodiment of the application.

In a fifth aspect, the exemplary embodiments of the application also provide a computer program product comprising a computer program, wherein the computer program, when being executed by a processor of a computer, is for causing the computer to perform a method according to an embodiment of the application.

With reference to fig. 9, a block diagram of an electronic device 800 that may be a server or a client of the present application will now be described, which is an example of a hardware device that may be applied to aspects of the present application. Electronic devices are intended to represent various forms of digital electronic computer devices, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other suitable computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the applications described and/or claimed herein.

As shown in fig. 9, the electronic device 900 includes a computing unit 901 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM) 902 or a computer program loaded from a storage unit 908 into a Random Access Memory (RAM) 903. In the RAM 903, various programs and data required for the operation of the device 900 can also be stored. The computing unit 901, the ROM 902, and the RAM 903 are connected to each other by a bus 904. An input/output (I/O) interface 905 is also connected to the bus 904.

A number of components in the electronic device 900 are connected to the I/O interface 905, including: an input unit 906, an output unit 907, a storage unit 908, and a communication unit 909. The input unit 906 may be any type of device capable of inputting information to the electronic device 900, and the input unit 906 may receive input numeric or character information and generate key signal inputs related to user settings and/or function controls of the electronic device. The output unit 907 may be any type of device capable of presenting information and may include, but is not limited to, a display, speakers, video/audio output terminals, vibrators, and/or printers. The storage unit 904 may include, but is not limited to, magnetic disks, optical disks. The communication unit 909 allows the electronic device 900 to exchange information/data with other devices through a computer network such as the internet and/or various telecommunications networks, and may include, but is not limited to, modems, network cards, infrared communication devices, wireless communication transceivers and/or chipsets, such as bluetooth (TM) devices, wiFi devices, wiMax devices, cellular communication devices, and/or the like.

The computing unit 901 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of computing unit 901 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, etc. The computing unit 901 performs the respective methods and processes described above. For example, in some embodiments, the aforementioned neural network model generation method may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as the storage unit 908. In some embodiments, part or all of the computer program may be loaded and/or installed onto the electronic device 900 via the ROM 902 and/or the communication unit 909. In some embodiments, the computing unit 901 may be configured to perform the aforementioned neural network model generation method by any other suitable means (e.g., by means of firmware).

Program code for carrying out methods of the present application may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of the present application, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

As used herein, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, apparatus, and/or device (e.g., magnetic discs, optical disks, memory, programmable Logic Devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term "machine-readable signal" refers to any signal used to provide machine instructions and/or data to a programmable processor.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the internet.

The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.

Claims

1. A method for generating a neural network model, the method comprising:

2. The method of claim 1, wherein determining the different first target subnetwork model based on the structural parameters, weights of the original neural network model comprises:

3. The method according to claim 2, wherein the method further comprises:

4. A method according to claim 2 or 3, wherein generating a second target subnetwork model with highest accuracy, whose performance parameters meet the deployment conditions of the target hardware device, according to the performance parameters of each of the first target subnetwork models, comprises:

5. The method of claim 1, wherein the deployment condition of the target hardware device comprises: the parameters of the model, the floating point operands of the model, and the predicted run time of the model on the target hardware device.

6. The method of claim 5, wherein the predicted run length of the model on the target hardware device is obtained in advance by:

7. The method of claim 1, wherein the original neural network model is a Transformer neural network model or a variant neural network model based on a Transformer neural network model.

8. A neural network model generation apparatus, the apparatus comprising:

9. The apparatus of claim 8, wherein the first determining module is specifically configured to:

the apparatus further comprises:

The first model generating module is specifically configured to:

the apparatus further comprises:

10. An electronic device, comprising:

a processor; and

a memory in which a program is stored,

wherein the program comprises instructions which, when executed by the processor, cause the processor to perform the method according to any of claims 1-7.

11. A non-transitory computer readable storage medium storing computer instructions for causing a computer to perform the method of any one of claims 1-7.