CN116227556A

CN116227556A - Method, device, computer equipment and storage medium for acquiring target network model

Info

Publication number: CN116227556A
Application number: CN202310253048.2A
Authority: CN
Inventors: 江宁; 卿海峰; 石璐瑶; 吴文青
Original assignee: Southwest University of Science and Technology
Current assignee: Southwest University of Science and Technology
Priority date: 2023-03-16
Filing date: 2023-03-16
Publication date: 2023-06-06

Abstract

The invention relates to the technical field of artificial intelligence algorithms, in particular to a method, a device, computer equipment and a storage medium for acquiring a target network model, wherein the method comprises the following steps: acquiring a teacher network model and a student network model, wherein the student network model is a model of which the teacher network model is simplified; in the student network model, a residual structure is added between two initial adjacent convolution layers to obtain a target network model, so that in a convolution data set obtained after a training data set passes through a first convolution layer in the student network model, a part of convolution data set is weighted into a residual convolution data set after being processed by the residual structure, local characteristics of the convolution data set are enhanced, and then the partial convolution data set is input into a second convolution layer, and the overall parameter number of the whole network model is hardly influenced by the added residual structure, so that the characteristic enhancement is carried out on the data, and further the performance of the network model is improved.

Description

Method, device, computer equipment and storage medium for acquiring target network model

Technical Field

The present invention relates to the field of artificial intelligence algorithms, and in particular, to a method, an apparatus, a computer device, and a storage medium for acquiring a target network model.

Background

Along with the increasing application field of deep learning, the functional complexity of the deep learning network model is increased, and although the accuracy of identifying or classifying by adopting the network model is improved, the structure and the capacity of the model are improved continuously, and the hardware requirement matched with the model is also complicated, so that the model cannot be applied to miniaturized products.

Therefore, how to reduce the complexity of the network model to ensure the performance thereof is a technical problem to be solved at present.

Disclosure of Invention

The present invention has been made in view of the above problems, and provides a method, apparatus, computer device, and storage medium for acquiring a target network model that overcomes or at least partially solves the above problems.

In a first aspect, the present invention provides a method for obtaining a target network model, including:

acquiring a teacher network model and a student network model, wherein the student network model is a model of which the teacher network model is simplified;

in the student network model, a residual structure is added between two initial adjacent convolution layers to obtain a target network model, so that in a convolution data set obtained after a training data set passes through a first convolution layer in the target network model, part of the convolution data set is weighted into a residual convolution data set after being processed by the residual structure, and local features of the convolution data set are enhanced and then input into a second convolution layer.

Further, the residual structure includes:

a first layer elimination influence layer, which is used for eliminating the first abnormal data in the partial convolution data set;

the residual processing layer is used for carrying out data enhancement on partial convolution data after abnormal data are eliminated, and weighting the partial convolution data to a residual convolution data set to obtain a convolution data set with enhanced local characteristics;

and a second layer of elimination influence layer is used for eliminating second abnormal data in the convolution data set with the enhanced local characteristics.

Further, the residual structure is specifically configured to:

performing feature enhancement on part of the convolution data to obtain feature enhancement data:

v ² ＝∑ _i p _i ² /c

wherein p is _i For any data in any vector in the partial convolution data set, c is the number of channels of the partial convolution data set, v ² For the mean square calculation of any vector in a partial convolution dataset, p _i ' is any vector in a vector set obtained by normalizing the mean square calculation result, gamma and beta are trainable super-parameters, and E is a positive number for preventing denominator from being zero;

and fusing any vector in the normalized vector set with the same position data in the residual convolution data set to obtain the convolution data set with enhanced local characteristics.

Further, in the student network model, adding a residual structure between two initially adjacent convolution layers to obtain a target network model, further includes:

obtaining local feature similarity differences and output differences between the target network model and the teacher network model and differences between the target network model and a real target;

and determining the total network loss of the target network model based on the local feature similarity difference and the output difference between the target network model and the teacher network model and the difference between the target network model and the real target.

Further, the obtaining the local feature similarity difference between the target network model and the teacher network model includes:

acquiring a plurality of first characteristic outputs of the target network model and second characteristic outputs of the same position in the teacher network model;

and determining a local feature similarity difference between the target network model and the teacher network model based on the first feature output and the second feature output.

Further, the determining the total network loss of the target network model based on the local feature similarity difference between the target network model and the teacher network model, the output difference, and the difference between the target network model and the real target includes:

determining a total network loss of the target network model based on local feature similarity differences between the target network model and the teacher network model, output differences, and differences of the target network model and a real target in the following manner:

L _LFNE ＝αL _KD +(1-α)L _CE +βL _SLFN

wherein L is _LFNE L is the total network loss of the target network model _SLFN L is the difference of local feature similarity between the target network model and the teacher network model _KD L is the output difference between the target network model and the teacher network model _CE For the difference of the target network model from the real target, α and β are hyper-parameters for tuning.

Further, after determining the total loss of the target network model based on the local feature similarity difference between the target network model and the teacher network model, the output difference, and the difference between the target network model and the real target, the method further includes:

and adjusting parameters of the target network model based on the total loss of the target network model to obtain an optimized target network model.

In a second aspect, the present invention further provides an apparatus for obtaining a target network model, including:

the system comprises an acquisition module, a calculation module and a calculation module, wherein the acquisition module is used for acquiring a teacher network model and a student network model, and the student network model is a model of which the teacher network model is simplified;

and the adding module is used for adding a residual structure between the two initial adjacent convolution layers in the student network model to obtain a target network model, so that in the convolution data set obtained after the training data set passes through the first convolution layer in the student network model, part of the convolution data set is weighted into the residual convolution data set after the processing of the residual structure, so that the local characteristics of the convolution data set are enhanced, and the local characteristics are input into the second convolution layer.

In a third aspect, the invention also provides a computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the method steps described in the first aspect when executing the program.

In a fourth aspect, the invention also provides a computer-readable storage medium, on which a computer program is stored, characterized in that the program, when being executed by a processor, implements the method steps described in the first aspect.

One or more technical solutions in the embodiments of the present invention at least have the following technical effects or advantages:

the invention provides a method for acquiring a target network model, which comprises the following steps: acquiring a teacher network model and a student network model, wherein the student network model is a model of which the teacher network model is simplified; in the student network model, a residual structure is added between two initial adjacent convolution layers to obtain a target network model, so that in a convolution data set obtained after a training data set passes through a first convolution layer in the student network model, a part of convolution data set is weighted into a residual convolution data set after being processed by the residual structure, local characteristics of the convolution data set are enhanced, and then the partial convolution data set is input into a second convolution layer, and the overall parameter number of the whole network model is hardly influenced by the added residual structure, so that the characteristic enhancement is carried out on the data, and further the performance of the network model is improved.

Based on the training data set VOC, using YOLOV5M as a teacher network model and YOLOV5s as a student network model, the final target network model obtained only had a parameter size of 7.4M, and an accuracy of 67.35% was obtained.

Drawings

Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the invention. Also throughout the drawings, like reference numerals are used to designate like parts. In the drawings:

FIG. 1 is a flowchart illustrating steps of a method for acquiring a target network model according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of a student network model according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of another student network model according to an embodiment of the invention;

FIG. 4 shows a schematic structural diagram of a residual structure LFNR in an embodiment of the invention;

FIG. 5 is a schematic diagram showing a teacher network model, a student network model and an added residual structure used in obtaining a target network model in an embodiment of the present invention;

FIG. 6 is a schematic structural diagram of an apparatus for acquiring a target network model according to an embodiment of the present invention;

fig. 7 is a schematic structural diagram of a computer device for implementing a method for acquiring a target network model in an embodiment of the present invention.

Detailed Description

Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.

Example 1

The embodiment of the invention provides a method for acquiring a target network model, as shown in fig. 1, comprising the following steps:

s101, acquiring a teacher network model and a student network model, wherein the student network model is a model with a simplified teacher network model;

s102, adding a residual structure between two initial adjacent convolution layers in the student network model to obtain a target network model, so that in the convolution data set obtained after the training data set passes through a first convolution layer in the target network model, part of the convolution data set is weighted into the residual convolution data set after the processing of the residual structure, so that the local characteristics of the convolution data set are enhanced, and then the local characteristics are input into a second convolution layer.

In order to be able to apply some high-performance recognition algorithms on small devices (such as cell phones, cameras, drones, etc.) to recognize objects, a low-power, high-performance small model is required.

The invention provides a method for acquiring a target network model, which can acquire the target network model, and compared with the existing identification model, the high target network model has no increase in parameter capacity and effectively improved performance.

The target network model in the invention is obtained by improving the knowledge distillation algorithm.

The conventional knowledge distillation algorithm is to input sample data to an untrained student network model and a trained teacher network model to obtain output results of the student network model and the teacher network model, determine a divergence comparison difference between the two based on the output results of the two, compare the output result of the student network model with a standard result to obtain cross entropy loss, and finally adjust internal parameters of the student network model based on the divergence comparison difference and the cross entropy loss.

The method for acquiring the target network model in the invention comprises the following steps:

s101, acquiring a teacher network model and a student network model, wherein the student network model is a model with the simplified teacher network model.

The teacher network model is a trained model, the student network model is an untrained model, and the teacher network model comprises a plurality of functional blocks (blocks), and each functional Block comprises a plurality of structural layers (layers). The student network models have the same structure, but the parameter amounts of the student network models are smaller.

Next, the student network model is improved, specifically as follows:

The residual structure is specifically added between the two convolution layers that are initially adjacent to the student network model. Specifically, taking two kinds of student network models as an example, as shown in fig. 2, the first kind of student network model is that an initial convolutional layer CONV of the student network model is followed by other structural layers (e.g., BN), and the residual structure LFNR is added between the first layer of convolutional layer CONV and a second layer of convolutional layer CONV that occurs subsequently. As shown in fig. 3, the initial convolutional layer CONV of the second student network model is still followed by the convolutional layer CONV, and thus the residual structure LFNR is increased between the initial two convolutional layers CONV.

The residual structure is described in detail below.

As shown in fig. 4, a schematic structural diagram of the residual structure LFNR is shown. The residual structure includes:

a first layer elimination influencing layer RELU for eliminating first abnormal data in a partial convolution data set;

the residual processing layer LFN is used for carrying out data enhancement on partial convolution data after the abnormal data is eliminated, and weighting the partial convolution data into a residual convolution data set to obtain a convolution data set with enhanced local characteristics;

the second layer eliminates the influence layer RELU, is used for eliminating the second abnormal data in the convolution data set with enhanced local characteristics.

The residual processing layer LFN is a local feature normalization processing mode, and has a small parameter quantity, and almost does not influence the overall parameter quantity of the network model.

Specifically, the residual processing layer LFN is specifically configured to:

v ² ＝∑ _i p _i ² /c

wherein p is _i For any data in any vector in the partial convolution data set, c is the number of channels of the partial convolution data set, v ² For the mean square calculation of any vector in a partial convolution dataset, p _i ' is any vector in a vector set obtained by normalizing a mean square calculation result, gamma and beta are trainable super-parameters, and epsilon is a positive number for preventing denominator from being zero.

And fusing any vector in the vector set obtained after normalization processing with data in the same position in the residual convolution data set to obtain the convolution data set with enhanced local characteristics.

The mean square calculation result v is obtained by means of the mean square calculation of the vectors P= { P1, P2 … Pn } at the same position in the training data set z ² N is the number of vectors, and the numerical value of the local feature can be enhanced through the mean square processing. Then, by normalizing to a unified interval, a vector set P' is obtained, and any vector P in the vector set _i ' calculated by the second equation above. And fusing any vector in the vector set P' with the same position data in the residual convolution data set, thereby obtaining the convolution data with enhanced local characteristics.

The eliminating influence layer RELU, namely the first eliminating influence layer and the second eliminating influence layer, is arranged before and after the residual processing layer LFN, and aims to eliminate abnormal data in the training data set, such as eliminating negative values in the training data set or eliminating larger values or smaller values generated after the residual processing layer LFN processing, so as to ensure that the training data are all effective data.

The processing mode of feature enhancement can improve the performance of the student network model, and further obtain the target network model.

Of course, in order to optimize the target network model, it is also possible to implement by adjusting parameters of the target network model.

Specifically, after obtaining the target network model, the method further includes:

obtaining local feature similarity differences and output differences between the target network model and the teacher network model, and differences between the target network model and a real target;

The method for obtaining the local feature similarity difference between the target network model and the teacher network model comprises the following steps:

acquiring a plurality of first characteristic data of a target network model and outputting second characteristics of the same position in a teacher network model;

As shown in fig. 5, if the teacher network model includes three functional blocks (blocks), the corresponding student network model also has three corresponding functional blocks (blocks). Taking a Block as an example,

wherein the method comprises the steps of，F _t For the training data set input to the teacher network model, LFN is the normalization process of local features, MAXPOOL is the operation of maximum pooling,

and outputting a second characteristic of the functional block in the teacher network model.

F _s To input a training data set of the student network model,

and outputting the first characteristic of the functional block at the same position of the target network model and the teacher network model. />

After obtaining the first feature output and the second feature output, obtaining the local feature similarity difference between the target network model and the teacher network model based on the first feature output and the second feature output, and dividing the network into three functional blocks (blocks), wherein the steps are repeated for three times:

wherein L is _SLFN And (3) calculating MSE as mean square error for local feature similarity difference between the target network model and the teacher network model.

For the output difference L _KD I.e. the difference between the final output of the target network model and the final output of the teacher network model. The final output of the teacher model is p _t The final output of the student model is p _s :

L _KD ＝KL(p _s ，p _t )

For the difference L between the target network model and the real target _CE Is the difference of the final output of the target network model from the real target. The label of the real target is y:

L _CE ＝CE(p _s ，y)

thus, the total network loss of the target network model is obtained according to the following formula:

L _LFNE ＝αL _KD +(1-α)L _CE +βL _SLFN

where α and β are hyper-parameters for modulation.

After determining the total network loss of the target network model, parameters of the target network model are adjusted based on the total network loss of the target network model to obtain an optimized target network model.

Specifically, the overall network loss is propagated in opposite phase by utilizing the gradient descent principle to train the parameters of the target network model, and the parameters are cycled back and forth to set times to obtain an optimized target network model.

Finally, the target network model may be validated.

The teacher network model, the student network model and the residual structure are used in obtaining the target network model, and the residual structure is specifically shown in fig. 5.

By adding the residual structure in the student network model, the performance of the network can be greatly improved under the condition that the overall parameter capacity of the network is not influenced. Therefore, the lightweight network model can also meet the performance requirements of hardware deployment, and the deployment and calculation cost of hardware are reduced. By calculating the total network loss, the local characterization features can be extracted from the teacher network; the obtained local features are passed to the student to guide the student's network learning. Therefore, the performance of the student network can even exceed that of a teacher model, the cost of correspondingly deploying hardware is effectively reduced, the power consumption of products is reduced, and the practical problem of the deep learning model in application is solved. The method is suitable for future development demands, and can be rapidly and efficiently applied to models with increasingly-increased complexity.

Example two

Based on the same inventive concept, the embodiment of the present invention further provides an apparatus for acquiring a target network model, as shown in fig. 6, including:

an obtaining module 601, configured to obtain a teacher network model and a student network model, where the student network model is a model simplified by the teacher network model;

and an adding module 602, configured to add a residual structure between two initially adjacent convolution layers in the student network model, so as to obtain a target network model, so that in a convolution data set obtained after a training data set passes through a first convolution layer in the student network model, a part of the convolution data set is weighted into a remaining convolution data set after being processed by the residual structure, so that local features of the convolution data set are enhanced, and then the local features are input into a second convolution layer.

In an alternative embodiment, the residual structure comprises:

In an alternative embodiment, the residual structure is specifically configured to:

v ² ＝∑ _i p _i ² /c

In an alternative embodiment, the method further comprises: a network total loss determination module comprising:

the acquisition unit is used for acquiring local feature similarity differences and output differences between the target network model and the teacher network model and differences between the target network model and a real target;

and the determining unit is used for determining the total network loss of the target network model based on the local feature similarity difference and the output difference between the target network model and the teacher network model and the difference between the target network model and the real target.

In an alternative embodiment, the obtaining unit is configured to determine the total network loss of the target network model based on the local feature similarity difference between the target network model and the teacher network model, the output difference, and the difference between the target network model and the real target in the following manner:

L _LFNE ＝αL _KD +(1-α)L _CE +βL _SLFN

In an alternative embodiment, the network total loss determination module further includes:

and the adjusting unit is used for adjusting the parameters of the target network model based on the total loss of the target network model so as to obtain an optimized target network model.

Example III

Based on the same inventive concept, an embodiment of the present invention provides a computer device, as shown in fig. 7, including a memory 704, a processor 702, and a computer program stored on the memory 704 and executable on the processor 702, where the processor 602 implements the steps of the method for obtaining a target network model described above when executing the program.

Where in FIG. 7 a bus architecture (represented by bus 700), bus 700 may comprise any number of interconnected buses and bridges, with bus 700 linking together various circuits, including one or more processors, as represented by processor 702, and memory, as represented by memory 704. Bus 700 may also link together various other circuits such as peripheral devices, voltage regulators, power management circuits, etc., as are well known in the art and, therefore, will not be described further herein. Bus interface 706 provides an interface between bus 700 and receiver 701 and transmitter 703. The receiver 701 and the transmitter 703 may be the same element, i.e. a transceiver, providing a unit for communicating with various other apparatus over a transmission medium. The processor 702 is responsible for managing the bus 700 and general processing, while the memory 704 may be used to store data used by the processor 702 in performing operations.

Example IV

Based on the same inventive concept, an embodiment of the present invention provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the above-described method of obtaining a target network model.

The algorithms and displays presented herein are not inherently related to any particular computer, virtual system, or other apparatus. Various general-purpose systems may also be used with the teachings herein. The required structure for a construction of such a system is apparent from the description above. In addition, the present invention is not directed to any particular programming language. It will be appreciated that the teachings of the present invention described herein may be implemented in a variety of programming languages, and the above description of specific languages is provided for disclosure of enablement and best mode of the present invention.

In the description provided herein, numerous specific details are set forth. However, it is understood that embodiments of the invention may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.

Similarly, it should be appreciated that in the foregoing description of exemplary embodiments of the invention, various features of the invention are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure and aiding in the understanding of one or more of the various inventive aspects. However, the disclosed method should not be construed as reflecting the intention that: i.e., the claimed invention requires more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the detailed description are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate embodiment of this invention.

Those skilled in the art will appreciate that the modules in the apparatus of the embodiments may be adaptively changed and disposed in one or more apparatuses different from the embodiments. The modules or units or components of the embodiments may be combined into one module or unit or component and, furthermore, they may be divided into a plurality of sub-modules or sub-units or sub-components. Any combination of all features disclosed in this specification (including any accompanying claims, abstract and drawings), and all of the processes or units of any method or apparatus so disclosed, may be used in combination, except insofar as at least some of such features and/or processes or units are mutually exclusive. Each feature disclosed in this specification (including any accompanying claims, abstract and drawings), may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.

Furthermore, those skilled in the art will appreciate that while some embodiments herein include some features but not others included in other embodiments, combinations of features of different embodiments are meant to be within the scope of the invention and form different embodiments. For example, in the following claims, any of the claimed embodiments can be used in any combination.

Various component embodiments of the invention may be implemented in hardware, or in software modules running on one or more processors, or in a combination thereof. Those skilled in the art will appreciate that some or all of the functionality of some or all of the means for obtaining a model of a target network, computer device, according to embodiments of the invention may be implemented in practice using a microprocessor or Digital Signal Processor (DSP). The present invention can also be implemented as an apparatus or device program (e.g., a computer program and a computer program product) for performing a portion or all of the methods described herein. Such a program embodying the present invention may be stored on a computer readable medium, or may have the form of one or more signals. Such signals may be downloaded from an internet website, provided on a carrier signal, or provided in any other form.

It should be noted that the above-mentioned embodiments illustrate rather than limit the invention, and that those skilled in the art will be able to design alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The invention may be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The use of the words first, second, third, etc. do not denote any order. These words may be interpreted as names.

Claims

1. A method for obtaining a target network model, comprising:

2. The method of claim 1, wherein the residual structure comprises:

3. The method according to claim 1, wherein the residual structure is specifically configured to:

v ² ＝∑ _i p _i ² /c

wherein p is _i For any data in any vector in the partial convolution data set, c is the number of channels of the partial convolution data set, v ² For the mean square calculation of any vector in a partial convolution dataset, p _i ^′ For any vector in a vector set obtained by normalizing the mean square calculation result, gamma and beta are trainable super-parameters, and E is a positive number for preventing denominator from being zero;

4. The method of claim 1, wherein adding a residual structure between two initially adjacent convolution layers in the student network model, after obtaining a target network model, further comprises:

5. The method of claim 4, wherein the obtaining the local feature similarity differences between the target network model and the teacher network model comprises:

6. The method of claim 4, wherein the determining the total network loss for the target network model based on the local feature similarity differences between the target network model and the teacher network model, the output differences, and the differences between the target network model and the real target comprises:

L _LFNE ＝αL _KD +(1-α)L _CE +βL _SLFN

7. The method of claim 4, wherein the determining the total loss of the target network model based on the local feature similarity differences between the target network model and the teacher network model, the output differences, and the differences of the target network model and the real target further comprises:

8. An apparatus for obtaining a target network model, comprising:

9. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the method steps of any of claims 1 to 7 when the program is executed.

10. A computer readable storage medium, on which a computer program is stored, characterized in that the program, when being executed by a processor, implements the method steps of any of claims 1-7.