CN113065641A

CN113065641A - Neural network model training method and device, electronic equipment and storage medium

Info

Publication number: CN113065641A
Application number: CN202110304132.3A
Authority: CN
Inventors: 高志鹏; 苗东; 芮兰兰; 莫梓嘉; 赵晨; 林怡静; 谭清; 付伟
Original assignee: Beijing Quyun Technology Co ltd; Beijing University of Posts and Telecommunications
Current assignee: Beijing Quyun Technology Co ltd; Beijing University of Posts and Telecommunications
Priority date: 2021-03-22
Filing date: 2021-03-22
Publication date: 2021-07-02
Anticipated expiration: 2041-03-22
Also published as: CN113065641B

Abstract

The neural network model training method, the neural network model training device, the electronic equipment and the storage medium are applied to the technical field of information, and the image classification network model to be trained is divided into a plurality of sub-network models according to a plurality of preset division points; calculating corresponding losses respectively by a preset loss function aiming at each group of sub-network models; performing joint training on each group of sub-network models according to the calculated loss to obtain a plurality of groups of sub-network models to be output; respectively calculating a plurality of performance parameters corresponding to each to-be-output sub-network model aiming at each group of sub-network models; respectively calculating the comprehensive performance scores corresponding to the sub-network models through a preset entropy weight model according to the multiple performance parameters corresponding to the sub-network models; and selecting one group with the highest comprehensive performance score in the sub-network models as a target sub-network model. The deployment of the image classification network model can be carried out according to the target sub-network model, so that the convenience of the neural network deployment is improved.

Description

Neural network model training method and device, electronic equipment and storage medium

Technical Field

The present application relates to the field of model training technologies, and in particular, to a neural network model training method and apparatus, an electronic device, and a storage medium.

Background

At present, artificial intelligence can replace a technology for human beings to complete functions of cognition, recognition, analysis, decision and the like. Can realize through artificial intelligence: image recognition, voice recognition, intelligent life, automatic driving and the like, thereby providing great convenience for the life of people.

However, the structure of the neural network applied to the field of artificial intelligence is often too large, and the requirements on computing resources and storage resources are very high, so that most of the applications based on the deep neural network at present need to rely on a cloud platform with massive computing resources, thereby bringing great limitations to the development of artificial intelligence and related services thereof.

Disclosure of Invention

An object of the embodiments of the present application is to provide a neural network model training method, an apparatus, an electronic device, and a storage medium, so as to solve a problem of how to improve convenience of neural network deployment. The specific technical scheme is as follows:

in a first aspect of the embodiments of the present application, a method for training a neural network model is provided, where the method includes:

dividing an image classification network model to be trained into a plurality of sub-network models according to a plurality of preset groups of dividing points, wherein each sub-network model comprises a first sub-network model, a second sub-network model and a third sub-network model;

for each group of sub-network models, inputting a sample image into the first sub-network model, taking the output of the first sub-network model as the input of the second sub-network model, taking the output of the second sub-network model as the input of the first sub-network model, and generating an image classification result output by the third sub-network model;

calculating a first loss corresponding to the output of the first sub-network model, a second loss corresponding to the output of the second sub-network model and a third loss corresponding to the image classification result output by the third sub-network model respectively through a preset loss function aiming at each group of sub-network models;

for each group of sub-network models, performing joint training on the corresponding first sub-network model, second sub-network model and third sub-network model through the corresponding first loss, second loss and third loss respectively to obtain a first sub-network model to be output, a second sub-network model to be output and a third sub-network model to be output;

calculating a plurality of performance parameters corresponding to the first to-be-output sub-network model, the second to-be-output sub-network model and the third to-be-output sub-network model respectively aiming at each group of sub-network models;

respectively calculating the comprehensive performance scores corresponding to the sub-network models through a preset entropy weight model according to the multiple performance parameters corresponding to the sub-network models;

and selecting one group with the highest comprehensive performance score in the sub-network models as a target sub-network model.

In a second aspect of the embodiments of the present application, there is also provided a neural network model training apparatus, including:

the model segmentation module is used for segmenting the image classification network model to be trained into a plurality of sub-network models according to a plurality of preset segmentation points, wherein each sub-network model comprises a first sub-network model, a second sub-network model and a third sub-network model;

a result generation module, configured to, for each group of sub-network models, input a sample image into the first sub-network model, and generate an image classification result output by the third sub-network model with an output of the first sub-network model as an input of the second sub-network model and an output of the second sub-network model as an input of the first sub-network model;

the loss calculation module is used for calculating a first loss corresponding to the output of the first sub-network model, a second loss corresponding to the output of the second sub-network model and a third loss corresponding to the image classification result output by the third sub-network model respectively through a preset loss function aiming at each group of sub-network models;

the joint training module is used for performing joint training on the corresponding first sub-network model, second sub-network model and third sub-network model respectively through the corresponding first loss, second loss and third loss aiming at each group of sub-network models to obtain a first sub-network model to be output, a second sub-network model to be output and a third sub-network model to be output;

the parameter calculation module is used for calculating a plurality of performance parameters corresponding to the first to-be-output sub-network model, the second to-be-output sub-network model and the third to-be-output sub-network model respectively aiming at each group of sub-network models;

the score calculation module is used for calculating the comprehensive performance scores corresponding to the sub-network models through the preset entropy weight model according to the performance parameters corresponding to the sub-network models;

and the model selection module is used for selecting one group with the highest comprehensive performance score in the sub-network models as a target sub-network model.

The embodiment of the application also provides electronic equipment which comprises a processor, a communication interface, a memory and a communication bus, wherein the processor, the communication interface and the memory complete mutual communication through the communication bus;

a memory for storing a computer program;

and the processor is used for realizing any one of the neural network model training methods when executing the program stored in the memory.

An embodiment of the present application further provides a computer-readable storage medium, in which a computer program is stored, and when the computer program is executed by a processor, the method for training a neural network model is implemented.

Embodiments of the present application also provide a computer program product containing instructions, which when run on a computer, cause the computer to perform any one of the above neural network model training methods.

The embodiment of the application has the following beneficial effects:

according to the neural network model training method, the device, the electronic equipment and the storage medium, an image classification network model to be trained is divided into a plurality of sub-network models according to a plurality of preset dividing points, wherein each sub-network model comprises a first sub-network model, a second sub-network model and a third sub-network model; for each group of sub-network models, inputting a sample image into the first sub-network model, taking the output of the first sub-network model as the input of the second sub-network model, taking the output of the second sub-network model as the input of the first sub-network model, and generating an image classification result output by the third sub-network model; calculating a first loss corresponding to the output of the first sub-network model, a second loss corresponding to the output of the second sub-network model and a third loss corresponding to the image classification result output by the third sub-network model respectively through a preset loss function aiming at each group of sub-network models; for each group of sub-network models, performing joint training on the corresponding first sub-network model, second sub-network model and third sub-network model through the corresponding first loss, second loss and third loss respectively to obtain a first sub-network model to be output, a second sub-network model to be output and a third sub-network model to be output; calculating a plurality of performance parameters corresponding to the first to-be-output sub-network model, the second to-be-output sub-network model and the third to-be-output sub-network model respectively aiming at each group of sub-network models; respectively calculating the comprehensive performance scores corresponding to the sub-network models through a preset entropy weight model according to the multiple performance parameters corresponding to the sub-network models; and selecting one group with the highest comprehensive performance score in the sub-network models as a target sub-network model. The image classification network model can be deployed through the sub-networks corresponding to the target sub-network model, and each sub-network in the target sub-network model only comprises one part of the image classification network model, so that different sub-networks can be deployed in different devices, and each device only needs to be in charge of one part of calculation tasks, and convenience in neural network deployment is improved.

Of course, not all advantages described above need to be achieved at the same time in the practice of any one product or method of the present application.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art that other embodiments can be obtained by using the drawings without creative efforts.

Fig. 1 is a schematic flow chart of a neural network model training method according to an embodiment of the present application;

fig. 2 is a candidate network structure selection algorithm according to an embodiment of the present application;

FIG. 3 is a flowchart illustrating an embodiment of generating image classification results;

FIG. 4 is a diagram illustrating an example of a neural network model training method according to an embodiment of the present disclosure;

FIG. 5 is a comparison graph of results obtained by the calculation of the entropy weight topsis algorithm according to the embodiment of the present application;

FIG. 6 is a schematic structural diagram of a neural network model training apparatus according to an embodiment of the present application;

fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments that can be derived by one of ordinary skill in the art from the description herein are intended to be within the scope of the present disclosure.

In a first aspect of an embodiment of the present application, a neural network model training method is provided, where the method includes:

for each group of sub-network models, inputting a sample image into a first sub-network model, taking the output of the first sub-network model as the input of a second sub-network model, taking the output of the second sub-network model as the input of the first sub-network model, and generating an image classification result output by a third sub-network model;

respectively calculating a first loss corresponding to the output of the first sub-network model, a second loss corresponding to the output of the second sub-network model and a third loss corresponding to the image classification result output by the third sub-network model by using a preset loss function aiming at each group of sub-network models;

aiming at each group of sub-network models, performing joint training on the corresponding first sub-network model, second sub-network model and third sub-network model respectively through the corresponding first loss, second loss and third loss to obtain a first sub-network model to be output, a second sub-network model to be output and a third sub-network model to be output;

respectively calculating a plurality of performance parameters corresponding to the first to-be-output sub-network model, the second to-be-output sub-network model and the third to-be-output sub-network model aiming at each group of sub-network models;

Therefore, by the neural network model training method, the image classification network model can be deployed through the sub-network corresponding to the target sub-network model, and each sub-network in the target sub-network model only comprises one part of the image classification network model, so that different sub-networks can be deployed in different devices, and each device only needs to be responsible for one part of calculation tasks, and convenience in neural network deployment is improved.

At present, artificial intelligence has become a technology which can be realized by a machine and can replace human beings to complete functions of cognition, recognition, analysis, decision and the like. Can realize through artificial intelligence: image recognition, voice recognition, intelligent life, automatic driving and the like, thereby providing great convenience for the life of people. However, the structure of the neural network applied to the field of artificial intelligence is often too large, and the requirements on computing resources and storage resources are very high, so that most of the applications based on the deep neural network at present need to rely on a cloud platform with massive computing resources, thereby bringing great limitations to the development of artificial intelligence and related services thereof. On the other hand, in recent years, networking terminal devices have increased explosively, massive networking devices bring massive data, and how to safely and efficiently use the data becomes a problem to be solved urgently. Meanwhile, the development of deep learning is driven by massive data, but the technology is very sensitive to computing and storage resources due to the generation of massive data. The contradiction is particularly obvious for the mobile terminal equipment, and due to the lack of sufficient resources, the mobile terminal equipment needs to completely unload the data to the cloud for processing, which means that the time consumed by communication is greatly increased, and the high time delay caused by the fact is unacceptable for some applications with high real-time requirements; on the other hand, the security and privacy of data cannot be guaranteed in the unloading process. If the mobile terminal device chooses to perform data processing locally, a simpler network model must be used for task processing, which results in a great reduction in task accuracy.

In order to solve the above problem, in the embodiment of the present application, a centralized cloud processing mode is changed to an "edge cloud" cooperative processing mode, and a neural network model is optimized to be more suitable for a novel computing model.

Specifically, referring to fig. 1, fig. 1 is a schematic flow chart of a neural network model training method according to an embodiment of the present application, including:

and step S11, dividing the image classification network model to be trained into a plurality of sub-network models according to the preset plurality of groups of dividing points.

Wherein each set of sub-network models comprises a first sub-network model, a second sub-network model, and a third sub-network model.

The image classification network model may be a network model that is established in advance and used for classifying images, and the preset segmentation points may be segmentation points that are set by devices to which the respective segmented sub-network models are applied according to structural characteristics of the image classification network model. For example, the segmented first sub-network model is applied to a client, such as a smart phone, a computer, etc., the segmented second sub-network model is applied to an edge, such as an edge server like a base station, and the segmented third sub-network model is applied to a cloud, such as a cloud device, etc., and the image classification network model to be trained can be segmented into a plurality of sub-network models according to the computing power of different devices.

The neural network model training method is applied to the intelligent terminal, the network model can be trained through the intelligent terminal, and specifically, the intelligent terminal can be a computer or a server and the like.

Step S12 is to input the sample image into the first sub-network model for each group of sub-network models, and to generate an image classification result output from the third sub-network model by using the output of the first sub-network model as the input of the second sub-network model and the output of the second sub-network model as the input of the first sub-network model.

For example, when there are two sets of dividing points, the first, second, and third sub-network models obtained by dividing the data at the first set of dividing points are M₁、M₂、M₃The first sub-network model, the second sub-network model and the third sub-network model obtained by dividing according to the second group of dividing points are respectively M₄、M₅、M₆Wherein M is₁Has an output of M₂Input of M₂Has an output of M₃Input of M₄Has an output of M₅Input of M₅Has an output of M₆Is input.

Step S13 is to calculate, for each group of sub-network models, a first loss corresponding to an output of the first sub-network model, a second loss corresponding to an output of the second sub-network model, and a third loss corresponding to an image classification result output by the third sub-network model, respectively, by a preset loss function.

The preset loss function may be various loss functions for performing loss calculation, such as a cross-entropy loss function, an absolute-value-to-digital loss function, and the like. Calculating a first loss corresponding to the output of the first sub-network model, a second loss corresponding to the output of the second sub-network model, and a third loss corresponding to the image classification result output by the third sub-network model, respectively, by a preset loss function, e.g., M calculated by a cross entropy loss function for each set of sub-network models₁、M₂、M₃The corresponding first loss, second loss and third loss are respectively L₁、L₂、L₃，M₄、M₅、M₆The corresponding first loss, second loss and third loss are respectively L₄、L₅、L₆。

Optionally, for each group of sub-network models, respectively calculating, by using a preset loss function, a first loss corresponding to an output of the first sub-network model, a second loss corresponding to an output of the second sub-network model, and a third loss corresponding to an image classification result output by the third sub-network model, including:

for each group of sub-network models, by presetting a loss function:

y_{output_n}＝y_{cut_n}(x；ω；b)；

where n denotes the nth division point, x is the input data, ω and b denote the network weight and offset from the input to the current division point, respectively, SoftMax () is a normalization function, y_labelOne-hot ground-route label vector, y, representing the data_{output_n}Output of the network, y_{cut_n}Inputting the input of the network model to the output of the current segmentation point, wherein label represents a label set;

and respectively calculating a first loss corresponding to the output of the first sub-network model, a second loss corresponding to the output of the second sub-network model and a third loss corresponding to the image classification result output by the third sub-network model.

And step S14, performing joint training on the corresponding first sub-network model, second sub-network model and third sub-network model according to the corresponding first loss, second loss and third loss respectively aiming at each group of sub-network models to obtain a first sub-network model to be output, a second sub-network model to be output and a third sub-network model to be output.

The first sub-network model, the second sub-network model and the third sub-network model are jointly trained through the corresponding first loss, the second loss and the third loss, and the first sub-network model, the second sub-network model and the third sub-network model can be trained according to the comprehensive loss by calculating the comprehensive loss corresponding to the first loss, the second loss and the third loss. And calculating the comprehensive loss corresponding to the first loss, the second loss and the third loss by a weighted summation mode. For example, according to a preset weight, for L₁、L₂、L₃Carrying out weighted summation to obtain the comprehensive loss L₇According to L₇To M₁、M₂、M₃Training at the same time to obtain a first to-be-output sub-network model, a second to-be-output sub-network model and a third to-be-output sub-network model m₁、m₂、m₃According to a predetermined weight, for L₄、L₅、L₆Carrying out weighted summation to obtain the comprehensive loss L₈According to L₈To M₄、M₅、M₆Training at the same time to obtain a first to-be-output sub-network model, a second to-be-output sub-network model and a third to-be-output sub-network model m₄、m₅、m₆。

Optionally, for each group of sub-network models, performing joint training on the corresponding first sub-network model, second sub-network model, and third sub-network model through the corresponding first loss, second loss, and third loss, respectively, to obtain a first to-be-output sub-network model, a second to-be-output sub-network model, and a third to-be-output sub-network model, including:

aiming at each group of sub-network models, respectively passing through corresponding first loss, second loss and third loss, and through a preset joint loss function:

calculating the joint loss;

wherein, ω is_nIs the weight of the corresponding candidate segmentation point loss function;

and performing joint training on the corresponding first sub-network model, second sub-network model and third sub-network model according to the joint loss to obtain a first to-be-output sub-network model, a second to-be-output sub-network model and a third to-be-output sub-network model.

Step S15 is to calculate multiple performance parameters corresponding to the first to-be-exported sub-network model, the second to-be-exported sub-network model and the third to-be-exported sub-network model respectively for each group of sub-network models.

Optionally, for each group of sub-network models, respectively calculating multiple performance parameters corresponding to the first to-be-output sub-network model, the second to-be-output sub-network model, and the third to-be-output sub-network model, including: and respectively calculating model accuracy, end-to-end time delay and data drop-out rate corresponding to the first to-be-output sub-network model, the second to-be-output sub-network model and the third to-be-output sub-network model aiming at each group of sub-network models.

The model accuracy, the end-to-end delay and the data drop-out rate corresponding to the first to-be-output sub-network model, the second to-be-output sub-network model and the third to-be-output sub-network model are calculated by the scheme in the prior art, and details are not repeated here.

And step S16, respectively calculating the comprehensive performance scores corresponding to the sub-network models through the preset entropy weight model according to the multiple performance parameters corresponding to the sub-network models.

The comprehensive performance scores corresponding to the sub-network models of the groups are respectively calculated through a preset entropy weight model according to the performance parameters corresponding to the sub-network models of the groups, and the comprehensive performance scores corresponding to the sub-network models of the groups can be calculated through a weighted summation mode according to the performance parameters corresponding to the sub-network models of the groups and preset weights.

Optionally, calculating the comprehensive performance score corresponding to each group of sub-network models may also be implemented by using a preset algorithm, specifically, the algorithm may refer to a candidate network structure selection algorithm in fig. 2, where the algorithm in the figure is: (ii) Triple-partition network selection; input is acc_n(accuracy), cal_n(calculating time delay), comm_n(communication delay), exit_n(withdrawal rate); the output is an Optical keyboard Triple-partition network; setting a network structure, accuracy, calculation time delay, communication time delay and exit rate of NETWORK SELECTION; setting an ideal scoring and a current candidate network structure scoring as 0; scoring is performed through an EntrophyTopsis algorithm aiming at n different candidate network model combinations; when the set ideal score is smaller than the score of the current candidate network structure, setting the ideal score to be equal to the score of the current candidate network structure; otherwise, the next candidate network structure score is continuously calculated.

And step S17, selecting one of the sub-network models with the highest comprehensive performance score as a target sub-network model.

For example, the first model m of the subnetwork to be output is calculated according to the prior art method₁The corresponding model accuracy, end-to-end delay and data drop-out rate are respectively E₁、E₂、E₃Second to-be-output subnetwork model m₂The corresponding model accuracy, end-to-end delay and data drop-out rate are respectively E₄、E₅、E₆Third to-be-output subnetwork model m₃The corresponding model accuracy, end-to-end delay and data drop-out rate are respectively E₇、E₈、E₉Carrying out weighted summation through a preset threshold value to obtain m₁、m₂、m₃Has a comprehensive property of R₁M is calculated according to the method₄、m₅、m₆The corresponding overall performance score is R₂Comparison of R₁And R₂If R is large or small₁Greater than R₂Then select m₁、m₂、m₃Is the target subnetwork model.

Optionally, referring to fig. 3, in step S12, for each group of sub-network models, inputting a sample image into a first sub-network model, and taking an output of the first sub-network model as an input of a second sub-network model, and taking an output of the second sub-network model as an input of the first sub-network model, generating an image classification result output by a third sub-network model, including:

step S121, for each group of sub-network models, inputs the sample image into the first sub-network model, and obtains an output of the first sub-network model.

And step S122, calculating corresponding first credibility through a preset entropy method according to the output of the first sub-network model.

According to the output of the first sub-network model, calculating corresponding first credibility through a preset entropy method, wherein the corresponding first credibility can be calculated through a preset loss function: y is_{output_n}＝y_{cut_n}(x; ω; b); corresponding losses are calculated, and thus corresponding first confidence levels are determined according to the calculated losses.

And step S123, when the first credibility is larger than the preset threshold, inputting the output of the first sub-network model into the second sub-network model to obtain the output of the second sub-network model.

Optionally, the preset threshold is determined by a preset formula:

entropy(y)＝∑_c∈Cy_clog y_cand the value obtained by the calculation is calculated,

where, entropy () represents the entropy method, y is the prediction probability vector of each label corresponding to the output of each sub-model, and C is the set of all labels in the classification task.

And step S124, calculating corresponding second credibility through a preset entropy method according to the output of the second sub-network model.

And step S125, when the second reliability is greater than the preset threshold, inputting the output of the second sub-network model into a third sub-network model to obtain an image classification result output by the third sub-network model.

By the method, the neural network can be divided into three parts by a model segmentation mode, and a segmentation outlet is set, so that part of data can exit the network in advance, and the problems of time delay and equipment resource shortage can be solved.

Referring to fig. 4, fig. 4 is a diagram of an example of a neural network model training method according to an embodiment of the present application, including:

and step S41, establishing candidate segmentation points according to the structural characteristics of different neural networks used by the current task. The method comprises the steps of segmenting a neural network, deploying a plurality of sub-neural networks formed after segmentation in a distributed computing model, dividing the structure of the neural network into three parts by utilizing segmentation points, sequentially deploying the three parts at a mobile terminal, an edge terminal and a cloud terminal, wherein in the computing mode, the mobile terminal, the edge terminal and the cloud terminal respectively only undertake the inference task of part of the neural network and can quit the network at the segmentation points in advance under the condition of ensuring the credibility of data, the credibility of the inference result of the segmentation points is measured by using an entropy method, and the smaller the entropy value obtained by SoftMax output computing is, the higher the credibility of the segmentation points to the data is.

And step S42, performing combined weighted training on the loss functions corresponding to the candidate segmentation points of the neural network to obtain a multi-output model. Because the network used in the present application is provided with a plurality of candidate segmentation points, the plurality of candidate segmentation points need to be jointly trained in the training process. The method is characterized in that a cross entropy loss function commonly used in a classification task is used as an optimized objective function, a loss function is arranged behind each candidate segmentation point, and the network is divided into three parts of 'end edge cloud', so that two candidate segmentation points are required to be selected for optimization in addition to the fixed cloud segmentation points, and the loss function at each segmentation point can reach the relatively high accuracy of the network level where the loss function is located.

And step S43, after training, selecting all the double segmentation point combinations as outlets of the mobile terminal and the edge terminal to form a candidate network. And selecting all double segmentation point combinations from the candidate segmentation points, then combining the fixed cloud segmentation points to perform step S2 operation, and using the trained network as a candidate network to evaluate in the subsequent steps.

Step S44, testing the indexes (accuracy, end-to-end delay data drop rate) affecting the performance of the candidate network under different candidate networks.

In order to ensure the accuracy of exiting data at the end partition point and the edge partition point, the accuracy of exiting data at the mobile end or the edge end can be approximate to that of the cloud exit in the classification task.

The end-to-end delay can be determined by a preset formula: c ═ Σ _l∈upload 4×(|label|+S(f_output) ); and calculating the data quantity to be uploaded at each division point, and then obtaining the communication time delay of the division point according to different current network states. Wherein upload represents a data set to be uploaded at the division point, | label | represents a correct label of the data, S represents a size of the data, and f represents a size of the data_outputThe constant 4 is used to indicate that the floating-point number occupies 4 bytes, indicating the output parameter at the current level. The calculation delay is the time required by the model to infer the model, and under the same calculation resource, the more complex the network model is, the larger the input data is, and the larger the calculation delay is, in the embodiment of the present application, the total calculation delay needs to be calculated under three different calculation resources, namely, the mobile terminal, the edge terminal, and the cloud terminal.

The data exit rate refers to the ratio of data exiting at each division point to the whole data, a training set of CIFAR-10 can be used for network model training and threshold determination, and the exit rate of the data in a test set at the division point can correspond to the entropy value of the output result of the division point, so that the threshold of the division point is determined. After the data exit rate is determined, data may be sorted by entropy, and a corresponding critical data entropy, which is a threshold of the segmentation point, is found according to the data exit rate.

And step S45, inputting each index under different candidate networks into the entropy weight Topsis model for analysis to obtain the network with the best performance.

The entropy weight Topsis can be used in the embodiment of the application to evaluate the comprehensive performance of the distributed neural network model. In particular, the candidate network structure selection algorithm can be seen in fig. 2.

In the application, by using the Cifar-10 data set, under the WiFi environment with the ideal network speed of 25MB/s, experimental results show that under the Cifar-10 data set, the optimal network segmentation point can be selected by applying the algorithm provided by the application, the network formed by the Cifar-10 data set and the optimal network segmentation point can locally withdraw about 75% of data, and the overall inference time of the model is compressed by about 3X. The result shows that in the Cifar-10, most data do not need to depend on a cloud end for data processing, and the time delay and the energy consumption of tasks are greatly reduced. Specifically, at a certain stage, the accuracy of the model does not decrease with the increase of the exit rate of the data, that is, the accuracy of the model has a stable period, and at this stage, the accuracy of the model is basically the same as that of the model when all data exits at the cloud, and after the stable period, the accuracy of the model decreases greatly and enters a rapid decrease period, which indicates that it is meaningful to segment the model and exit in advance. And selecting the position with the fastest gradient change as a turning point of the accuracy rate, wherein the position is the critical point of the stable period and the rapid descending period. Referring to fig. 5, fig. 5 shows a comparison graph of the results obtained by the entropy weight topsis algorithm, where position represents the position of the Exit point, Ex1/Ex3 represents Exit at Exit point 1 and Exit point 3, respectively, Ex2/Ex4 represents Exit at Exit point 2 and Exit point 4, respectively, each line corresponds to a different data Exit rate, where the input value is Exit, Exit rate represents the Exit rate, relative closeness C represents relative approximation, Rank represents ranking, and (Device/Edge) Threshold represents the Device end and Edge end thresholds.

In a second aspect of the embodiments of the present application, there is further provided a neural network model training apparatus, referring to fig. 6, where fig. 6 is a schematic structural diagram of the neural network model training apparatus of the embodiments of the present application, and the apparatus includes:

a model segmentation module 601, configured to segment the image classification network model to be trained into multiple sets of sub-network models according to preset multiple sets of segmentation points, where each set of sub-network model includes a first sub-network model, a second sub-network model, and a third sub-network model;

a result generation module 602, configured to, for each group of sub-network models, input the sample image into a first sub-network model, and generate an image classification result output by a third sub-network model with an output of the first sub-network model as an input of a second sub-network model and an output of the second sub-network model as an input of the first sub-network model;

a loss calculating module 603, configured to calculate, for each group of sub-network models, a first loss corresponding to an output of the first sub-network model, a second loss corresponding to an output of the second sub-network model, and a third loss corresponding to an image classification result output by the third sub-network model through a preset loss function;

the joint training module 604 is configured to perform joint training on the corresponding first sub-network model, second sub-network model, and third sub-network model respectively through the corresponding first loss, second loss, and third loss for each group of sub-network models to obtain a first to-be-output sub-network model, a second to-be-output sub-network model, and a third to-be-output sub-network model;

a parameter calculating module 605, configured to calculate, for each group of sub-network models, multiple performance parameters corresponding to the first to-be-output sub-network model, the second to-be-output sub-network model, and the third to-be-output sub-network model, respectively;

a score calculating module 606, configured to calculate, according to multiple performance parameters corresponding to each group of sub-network models, a comprehensive performance score corresponding to each group of sub-network models through a preset entropy weight model;

and the model selection module 607 is configured to select one of the groups of subnetwork models with the highest comprehensive performance score as the target subnetwork model.

Optionally, the parameter calculating module 605 includes:

and the parameter calculation submodule is used for respectively calculating model accuracy, end-to-end time delay and data drop-out rate corresponding to the first to-be-output sub-network model, the second to-be-output sub-network model and the third to-be-output sub-network model aiming at each group of sub-network models.

Optionally, the parameter calculation sub-module is specifically configured to: inputting the sample image into the first sub-network model aiming at each group of sub-network models to obtain the output of the first sub-network model; calculating corresponding first credibility through a preset entropy method according to the output of the first sub-network model; when the first credibility is larger than a preset threshold value, inputting the output of the first sub-network model into a second sub-network model to obtain the output of the second sub-network model; calculating corresponding second credibility through a preset entropy method according to the output of the second sub-network model; and when the second reliability is larger than the preset threshold, inputting the output of the second sub-network model into a third sub-network model to obtain an image classification result output by the third sub-network model.

Optionally, the preset threshold is determined by a preset formula:

Optionally, the loss calculating module 603 is specifically configured to: for each group of sub-network models, by presetting a loss function:

y_{output_n}＝y_{cut_n}(x；ω；b)；

where n denotes the nth division point, x is the input data, ω and b denote the network weight and offset from the input to the current division point, respectively, SoftMax () is a normalization function, y_labelOne-hot ground-route label vector, y, representing the data_{output_}Output of the network, y_{cut_n}Input of network model to currentOutputting the segmentation points, namely label represents a label set;

Optionally, the joint training module 604 includes:

and the joint loss calculation submodule is used for calculating the joint loss of each group of sub-network models through a preset joint loss function according to the corresponding first loss, the corresponding second loss and the corresponding third loss:

calculating the joint loss;

and the to-be-output model obtaining submodule is used for carrying out joint training on the corresponding first sub-network model, the second sub-network model and the third sub-network model according to the joint loss to obtain the first to-be-output sub-network model, the second to-be-output sub-network model and the third to-be-output sub-network model.

Therefore, through the neural network model training device in the embodiment of the application, the image classification network model can be deployed through the sub-network corresponding to the target sub-network model, and each sub-network in the target sub-network model only comprises one part of the image classification network model, so that different sub-networks can be deployed in different devices, and each device only needs to be responsible for one part of calculation tasks, thereby improving the convenience of neural network deployment.

The embodiment of the present application further provides an electronic device, as shown in fig. 7, which includes a processor 701, a communication interface 702, a memory 703 and a communication bus 704, where the processor 701, the communication interface 702, and the memory 703 complete mutual communication through the communication bus 704,

a memory 703 for storing a computer program;

the processor 701 is configured to implement the following steps when executing the program stored in the memory 703:

The communication bus mentioned in the electronic device may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The communication bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown, but this does not mean that there is only one bus or one type of bus.

The communication interface is used for communication between the electronic equipment and other equipment.

The Memory may include a Random Access Memory (RAM) or a Non-Volatile Memory (NVM), such as at least one disk Memory. Optionally, the memory may also be at least one memory device located remotely from the processor.

The Processor may be a general-purpose Processor, including a Central Processing Unit (CPU), a Network Processor (NP), and the like; but also Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components.

In yet another embodiment provided by the present application, a computer-readable storage medium is further provided, in which a computer program is stored, and the computer program, when executed by a processor, implements the steps of any of the above neural network model training methods.

In yet another embodiment provided by the present application, there is also provided a computer program product containing instructions which, when run on a computer, cause the computer to perform any of the neural network model training methods of the embodiments described above.

In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, cause the processes or functions described in accordance with the embodiments of the application to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, from one website site, computer, server, or data center to another website site, computer, server, or data center via wired (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that incorporates one or more of the available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.

It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

All the embodiments in the present specification are described in a related manner, and the same and similar parts among the embodiments may be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the apparatus, the electronic device, the storage medium and the computer program product embodiment, since they are substantially similar to the method embodiment, the description is relatively simple, and for the relevant points, reference may be made to the partial description of the method embodiment.

The above description is only for the preferred embodiment of the present application and is not intended to limit the scope of the present application. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application are included in the protection scope of the present application.

Claims

1. A neural network model training method, the method comprising:

2. The method of claim 1, wherein calculating a plurality of performance parameters corresponding to the first to-be-exported sub-network model, the second to-be-exported sub-network model, and the third to-be-exported sub-network model for each set of sub-network models comprises:

and respectively calculating model accuracy, end-to-end time delay and data drop-out rate corresponding to the first to-be-output sub-network model, the second to-be-output sub-network model and the third to-be-output sub-network model aiming at each group of sub-network models.

3. The method of claim 2, wherein the generating an image classification result of the output of the third sub-network model with the output of the first sub-network model as the input of the second sub-network model and the output of the second sub-network model as the input of the first sub-network model for each set of sub-network models comprises:

inputting a sample image into the first sub-network model for each group of sub-network models to obtain the output of the first sub-network model;

calculating corresponding first credibility through a preset entropy method according to the output of the first sub-network model;

when the first credibility is larger than a preset threshold, inputting the output of the first sub-network model into the second sub-network model to obtain the output of the second sub-network model;

calculating corresponding second credibility through a preset entropy method according to the output of the second sub-network model;

and when the second credibility is greater than the preset threshold, inputting the output of the second sub-network model into the third sub-network model to obtain an image classification result output by the third sub-network model.

4. The method of claim 3, wherein the preset threshold is determined by a preset formula:

5. The method of claim 1, wherein calculating a first loss corresponding to an output of the first sub-network model, a second loss corresponding to an output of the second sub-network model, and a third loss corresponding to an image classification result output by the third sub-network model by using a predetermined loss function for each group of sub-network models comprises:

for each group of sub-network models, by presetting a loss function:

y_{output_n}＝y_{cut_n}(x；ω；b)；

where n denotes the nth division point, x is the input data, ω and b denote the network weight and offset from the input to the current division point, respectively, SoftMax () is a normalization function, y_labelRepresents one of the datahot ground-truth label vector，y_{output_n}Output of the network, y_{cut_n}Inputting the input of the network model to the output of the current segmentation point, wherein label represents a label set;

6. The method of claim 5, wherein the jointly training the corresponding first, second and third sub-network models with the corresponding first, second and third penalties to obtain a first, second and third to-be-exported sub-network model for each set of sub-network models comprises:

for each group of sub-network models, respectively passing the corresponding first loss, the second loss and the third loss through a preset joint loss function:

calculating the joint loss;

and performing joint training on the corresponding first sub-network model, the second sub-network model and the third sub-network model according to the joint loss to obtain a first sub-network model to be output, a second sub-network model to be output and a third sub-network model to be output.

7. An apparatus for neural network model training, the apparatus comprising:

8. The apparatus of claim 7, wherein the parameter calculation module comprises:

9. An electronic device is characterized by comprising a processor, a communication interface, a memory and a communication bus, wherein the processor and the communication interface are used for realizing mutual communication by the memory through the communication bus;

a memory for storing a computer program;

a processor for implementing the method steps of any of claims 1-6 when executing a program stored in the memory.

10. A computer-readable storage medium, characterized in that a computer program is stored in the computer-readable storage medium, which computer program, when being executed by a processor, carries out the method steps of any one of claims 1 to 6.