CN116227556A - Method, device, computer equipment and storage medium for acquiring target network model - Google Patents
Method, device, computer equipment and storage medium for acquiring target network model Download PDFInfo
- Publication number
- CN116227556A CN116227556A CN202310253048.2A CN202310253048A CN116227556A CN 116227556 A CN116227556 A CN 116227556A CN 202310253048 A CN202310253048 A CN 202310253048A CN 116227556 A CN116227556 A CN 116227556A
- Authority
- CN
- China
- Prior art keywords
- network model
- target network
- convolution
- data set
- target
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 41
- 238000012549 training Methods 0.000 claims abstract description 18
- 239000013598 vector Substances 0.000 claims description 31
- 238000012545 processing Methods 0.000 claims description 18
- 238000004364 calculation method Methods 0.000 claims description 15
- 230000002159 abnormal effect Effects 0.000 claims description 13
- 238000004590 computer program Methods 0.000 claims description 8
- 230000008030 elimination Effects 0.000 claims description 7
- 238000003379 elimination reaction Methods 0.000 claims description 7
- 238000004422 calculation algorithm Methods 0.000 abstract description 6
- 238000013473 artificial intelligence Methods 0.000 abstract description 2
- 238000010586 diagram Methods 0.000 description 7
- 238000010606 normalization Methods 0.000 description 3
- 238000013135 deep learning Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000013140 knowledge distillation Methods 0.000 description 2
- 230000005540 biological transmission Effects 0.000 description 1
- 238000012512 characterization method Methods 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000013136 deep learning model Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000002093 peripheral effect Effects 0.000 description 1
- 238000011176 pooling Methods 0.000 description 1
- 230000000644 propagated effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Life Sciences & Earth Sciences (AREA)
- Molecular Biology (AREA)
- Artificial Intelligence (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
- Electrically Operated Instructional Devices (AREA)
Abstract
The invention relates to the technical field of artificial intelligence algorithms, in particular to a method, a device, computer equipment and a storage medium for acquiring a target network model, wherein the method comprises the following steps: acquiring a teacher network model and a student network model, wherein the student network model is a model of which the teacher network model is simplified; in the student network model, a residual structure is added between two initial adjacent convolution layers to obtain a target network model, so that in a convolution data set obtained after a training data set passes through a first convolution layer in the student network model, a part of convolution data set is weighted into a residual convolution data set after being processed by the residual structure, local characteristics of the convolution data set are enhanced, and then the partial convolution data set is input into a second convolution layer, and the overall parameter number of the whole network model is hardly influenced by the added residual structure, so that the characteristic enhancement is carried out on the data, and further the performance of the network model is improved.
Description
Technical Field
The present invention relates to the field of artificial intelligence algorithms, and in particular, to a method, an apparatus, a computer device, and a storage medium for acquiring a target network model.
Background
Along with the increasing application field of deep learning, the functional complexity of the deep learning network model is increased, and although the accuracy of identifying or classifying by adopting the network model is improved, the structure and the capacity of the model are improved continuously, and the hardware requirement matched with the model is also complicated, so that the model cannot be applied to miniaturized products.
Therefore, how to reduce the complexity of the network model to ensure the performance thereof is a technical problem to be solved at present.
Disclosure of Invention
The present invention has been made in view of the above problems, and provides a method, apparatus, computer device, and storage medium for acquiring a target network model that overcomes or at least partially solves the above problems.
In a first aspect, the present invention provides a method for obtaining a target network model, including:
acquiring a teacher network model and a student network model, wherein the student network model is a model of which the teacher network model is simplified;
in the student network model, a residual structure is added between two initial adjacent convolution layers to obtain a target network model, so that in a convolution data set obtained after a training data set passes through a first convolution layer in the target network model, part of the convolution data set is weighted into a residual convolution data set after being processed by the residual structure, and local features of the convolution data set are enhanced and then input into a second convolution layer.
Further, the residual structure includes:
a first layer elimination influence layer, which is used for eliminating the first abnormal data in the partial convolution data set;
the residual processing layer is used for carrying out data enhancement on partial convolution data after abnormal data are eliminated, and weighting the partial convolution data to a residual convolution data set to obtain a convolution data set with enhanced local characteristics;
and a second layer of elimination influence layer is used for eliminating second abnormal data in the convolution data set with the enhanced local characteristics.
Further, the residual structure is specifically configured to:
performing feature enhancement on part of the convolution data to obtain feature enhancement data:
v 2 =∑ i p i 2 /c
wherein p is i For any data in any vector in the partial convolution data set, c is the number of channels of the partial convolution data set, v 2 For the mean square calculation of any vector in a partial convolution dataset, p i ' is any vector in a vector set obtained by normalizing the mean square calculation result, gamma and beta are trainable super-parameters, and E is a positive number for preventing denominator from being zero;
and fusing any vector in the normalized vector set with the same position data in the residual convolution data set to obtain the convolution data set with enhanced local characteristics.
Further, in the student network model, adding a residual structure between two initially adjacent convolution layers to obtain a target network model, further includes:
obtaining local feature similarity differences and output differences between the target network model and the teacher network model and differences between the target network model and a real target;
and determining the total network loss of the target network model based on the local feature similarity difference and the output difference between the target network model and the teacher network model and the difference between the target network model and the real target.
Further, the obtaining the local feature similarity difference between the target network model and the teacher network model includes:
acquiring a plurality of first characteristic outputs of the target network model and second characteristic outputs of the same position in the teacher network model;
and determining a local feature similarity difference between the target network model and the teacher network model based on the first feature output and the second feature output.
Further, the determining the total network loss of the target network model based on the local feature similarity difference between the target network model and the teacher network model, the output difference, and the difference between the target network model and the real target includes:
determining a total network loss of the target network model based on local feature similarity differences between the target network model and the teacher network model, output differences, and differences of the target network model and a real target in the following manner:
L LFNE =αL KD +(1-α)L CE +βL SLFN
wherein L is LFNE L is the total network loss of the target network model SLFN L is the difference of local feature similarity between the target network model and the teacher network model KD L is the output difference between the target network model and the teacher network model CE For the difference of the target network model from the real target, α and β are hyper-parameters for tuning.
Further, after determining the total loss of the target network model based on the local feature similarity difference between the target network model and the teacher network model, the output difference, and the difference between the target network model and the real target, the method further includes:
and adjusting parameters of the target network model based on the total loss of the target network model to obtain an optimized target network model.
In a second aspect, the present invention further provides an apparatus for obtaining a target network model, including:
the system comprises an acquisition module, a calculation module and a calculation module, wherein the acquisition module is used for acquiring a teacher network model and a student network model, and the student network model is a model of which the teacher network model is simplified;
and the adding module is used for adding a residual structure between the two initial adjacent convolution layers in the student network model to obtain a target network model, so that in the convolution data set obtained after the training data set passes through the first convolution layer in the student network model, part of the convolution data set is weighted into the residual convolution data set after the processing of the residual structure, so that the local characteristics of the convolution data set are enhanced, and the local characteristics are input into the second convolution layer.
In a third aspect, the invention also provides a computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the method steps described in the first aspect when executing the program.
In a fourth aspect, the invention also provides a computer-readable storage medium, on which a computer program is stored, characterized in that the program, when being executed by a processor, implements the method steps described in the first aspect.
One or more technical solutions in the embodiments of the present invention at least have the following technical effects or advantages:
the invention provides a method for acquiring a target network model, which comprises the following steps: acquiring a teacher network model and a student network model, wherein the student network model is a model of which the teacher network model is simplified; in the student network model, a residual structure is added between two initial adjacent convolution layers to obtain a target network model, so that in a convolution data set obtained after a training data set passes through a first convolution layer in the student network model, a part of convolution data set is weighted into a residual convolution data set after being processed by the residual structure, local characteristics of the convolution data set are enhanced, and then the partial convolution data set is input into a second convolution layer, and the overall parameter number of the whole network model is hardly influenced by the added residual structure, so that the characteristic enhancement is carried out on the data, and further the performance of the network model is improved.
Based on the training data set VOC, using YOLOV5M as a teacher network model and YOLOV5s as a student network model, the final target network model obtained only had a parameter size of 7.4M, and an accuracy of 67.35% was obtained.
Drawings
Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the invention. Also throughout the drawings, like reference numerals are used to designate like parts. In the drawings:
FIG. 1 is a flowchart illustrating steps of a method for acquiring a target network model according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a student network model according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of another student network model according to an embodiment of the invention;
FIG. 4 shows a schematic structural diagram of a residual structure LFNR in an embodiment of the invention;
FIG. 5 is a schematic diagram showing a teacher network model, a student network model and an added residual structure used in obtaining a target network model in an embodiment of the present invention;
FIG. 6 is a schematic structural diagram of an apparatus for acquiring a target network model according to an embodiment of the present invention;
fig. 7 is a schematic structural diagram of a computer device for implementing a method for acquiring a target network model in an embodiment of the present invention.
Detailed Description
Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.
Example 1
The embodiment of the invention provides a method for acquiring a target network model, as shown in fig. 1, comprising the following steps:
s101, acquiring a teacher network model and a student network model, wherein the student network model is a model with a simplified teacher network model;
s102, adding a residual structure between two initial adjacent convolution layers in the student network model to obtain a target network model, so that in the convolution data set obtained after the training data set passes through a first convolution layer in the target network model, part of the convolution data set is weighted into the residual convolution data set after the processing of the residual structure, so that the local characteristics of the convolution data set are enhanced, and then the local characteristics are input into a second convolution layer.
In order to be able to apply some high-performance recognition algorithms on small devices (such as cell phones, cameras, drones, etc.) to recognize objects, a low-power, high-performance small model is required.
The invention provides a method for acquiring a target network model, which can acquire the target network model, and compared with the existing identification model, the high target network model has no increase in parameter capacity and effectively improved performance.
The target network model in the invention is obtained by improving the knowledge distillation algorithm.
The conventional knowledge distillation algorithm is to input sample data to an untrained student network model and a trained teacher network model to obtain output results of the student network model and the teacher network model, determine a divergence comparison difference between the two based on the output results of the two, compare the output result of the student network model with a standard result to obtain cross entropy loss, and finally adjust internal parameters of the student network model based on the divergence comparison difference and the cross entropy loss.
The method for acquiring the target network model in the invention comprises the following steps:
s101, acquiring a teacher network model and a student network model, wherein the student network model is a model with the simplified teacher network model.
The teacher network model is a trained model, the student network model is an untrained model, and the teacher network model comprises a plurality of functional blocks (blocks), and each functional Block comprises a plurality of structural layers (layers). The student network models have the same structure, but the parameter amounts of the student network models are smaller.
Next, the student network model is improved, specifically as follows:
s102, adding a residual structure between two initial adjacent convolution layers in the student network model to obtain a target network model, so that in the convolution data set obtained after the training data set passes through a first convolution layer in the target network model, part of the convolution data set is weighted into the residual convolution data set after the processing of the residual structure, so that the local characteristics of the convolution data set are enhanced, and then the local characteristics are input into a second convolution layer.
The residual structure is specifically added between the two convolution layers that are initially adjacent to the student network model. Specifically, taking two kinds of student network models as an example, as shown in fig. 2, the first kind of student network model is that an initial convolutional layer CONV of the student network model is followed by other structural layers (e.g., BN), and the residual structure LFNR is added between the first layer of convolutional layer CONV and a second layer of convolutional layer CONV that occurs subsequently. As shown in fig. 3, the initial convolutional layer CONV of the second student network model is still followed by the convolutional layer CONV, and thus the residual structure LFNR is increased between the initial two convolutional layers CONV.
The residual structure is described in detail below.
As shown in fig. 4, a schematic structural diagram of the residual structure LFNR is shown. The residual structure includes:
a first layer elimination influencing layer RELU for eliminating first abnormal data in a partial convolution data set;
the residual processing layer LFN is used for carrying out data enhancement on partial convolution data after the abnormal data is eliminated, and weighting the partial convolution data into a residual convolution data set to obtain a convolution data set with enhanced local characteristics;
the second layer eliminates the influence layer RELU, is used for eliminating the second abnormal data in the convolution data set with enhanced local characteristics.
The residual processing layer LFN is a local feature normalization processing mode, and has a small parameter quantity, and almost does not influence the overall parameter quantity of the network model.
Specifically, the residual processing layer LFN is specifically configured to:
performing feature enhancement on part of the convolution data to obtain feature enhancement data:
v 2 =∑ i p i 2 /c
wherein p is i For any data in any vector in the partial convolution data set, c is the number of channels of the partial convolution data set, v 2 For the mean square calculation of any vector in a partial convolution dataset, p i ' is any vector in a vector set obtained by normalizing a mean square calculation result, gamma and beta are trainable super-parameters, and epsilon is a positive number for preventing denominator from being zero.
And fusing any vector in the vector set obtained after normalization processing with data in the same position in the residual convolution data set to obtain the convolution data set with enhanced local characteristics.
The mean square calculation result v is obtained by means of the mean square calculation of the vectors P= { P1, P2 … Pn } at the same position in the training data set z 2 N is the number of vectors, and the numerical value of the local feature can be enhanced through the mean square processing. Then, by normalizing to a unified interval, a vector set P' is obtained, and any vector P in the vector set i ' calculated by the second equation above. And fusing any vector in the vector set P' with the same position data in the residual convolution data set, thereby obtaining the convolution data with enhanced local characteristics.
The eliminating influence layer RELU, namely the first eliminating influence layer and the second eliminating influence layer, is arranged before and after the residual processing layer LFN, and aims to eliminate abnormal data in the training data set, such as eliminating negative values in the training data set or eliminating larger values or smaller values generated after the residual processing layer LFN processing, so as to ensure that the training data are all effective data.
The processing mode of feature enhancement can improve the performance of the student network model, and further obtain the target network model.
Of course, in order to optimize the target network model, it is also possible to implement by adjusting parameters of the target network model.
Specifically, after obtaining the target network model, the method further includes:
obtaining local feature similarity differences and output differences between the target network model and the teacher network model, and differences between the target network model and a real target;
and determining the total network loss of the target network model based on the local feature similarity difference and the output difference between the target network model and the teacher network model and the difference between the target network model and the real target.
The method for obtaining the local feature similarity difference between the target network model and the teacher network model comprises the following steps:
acquiring a plurality of first characteristic data of a target network model and outputting second characteristics of the same position in a teacher network model;
and determining a local feature similarity difference between the target network model and the teacher network model based on the first feature output and the second feature output.
As shown in fig. 5, if the teacher network model includes three functional blocks (blocks), the corresponding student network model also has three corresponding functional blocks (blocks). Taking a Block as an example,
wherein the method comprises the steps of,F t For the training data set input to the teacher network model, LFN is the normalization process of local features, MAXPOOL is the operation of maximum pooling,and outputting a second characteristic of the functional block in the teacher network model.
F s To input a training data set of the student network model,and outputting the first characteristic of the functional block at the same position of the target network model and the teacher network model. />
After obtaining the first feature output and the second feature output, obtaining the local feature similarity difference between the target network model and the teacher network model based on the first feature output and the second feature output, and dividing the network into three functional blocks (blocks), wherein the steps are repeated for three times:
wherein L is SLFN And (3) calculating MSE as mean square error for local feature similarity difference between the target network model and the teacher network model.
For the output difference L KD I.e. the difference between the final output of the target network model and the final output of the teacher network model. The final output of the teacher model is p t The final output of the student model is p s :
L KD =KL(p s ,p t )
For the difference L between the target network model and the real target CE Is the difference of the final output of the target network model from the real target. The label of the real target is y:
L CE =CE(p s ,y)
thus, the total network loss of the target network model is obtained according to the following formula:
L LFNE =αL KD +(1-α)L CE +βL SLFN
where α and β are hyper-parameters for modulation.
After determining the total network loss of the target network model, parameters of the target network model are adjusted based on the total network loss of the target network model to obtain an optimized target network model.
Specifically, the overall network loss is propagated in opposite phase by utilizing the gradient descent principle to train the parameters of the target network model, and the parameters are cycled back and forth to set times to obtain an optimized target network model.
Finally, the target network model may be validated.
The teacher network model, the student network model and the residual structure are used in obtaining the target network model, and the residual structure is specifically shown in fig. 5.
By adding the residual structure in the student network model, the performance of the network can be greatly improved under the condition that the overall parameter capacity of the network is not influenced. Therefore, the lightweight network model can also meet the performance requirements of hardware deployment, and the deployment and calculation cost of hardware are reduced. By calculating the total network loss, the local characterization features can be extracted from the teacher network; the obtained local features are passed to the student to guide the student's network learning. Therefore, the performance of the student network can even exceed that of a teacher model, the cost of correspondingly deploying hardware is effectively reduced, the power consumption of products is reduced, and the practical problem of the deep learning model in application is solved. The method is suitable for future development demands, and can be rapidly and efficiently applied to models with increasingly-increased complexity.
One or more technical solutions in the embodiments of the present invention at least have the following technical effects or advantages:
the invention provides a method for acquiring a target network model, which comprises the following steps: acquiring a teacher network model and a student network model, wherein the student network model is a model of which the teacher network model is simplified; in the student network model, a residual structure is added between two initial adjacent convolution layers to obtain a target network model, so that in a convolution data set obtained after a training data set passes through a first convolution layer in the student network model, a part of convolution data set is weighted into a residual convolution data set after being processed by the residual structure, local characteristics of the convolution data set are enhanced, and then the partial convolution data set is input into a second convolution layer, and the overall parameter number of the whole network model is hardly influenced by the added residual structure, so that the characteristic enhancement is carried out on the data, and further the performance of the network model is improved.
Based on the training data set VOC, using YOLOV5M as a teacher network model and YOLOV5s as a student network model, the final target network model obtained only had a parameter size of 7.4M, and an accuracy of 67.35% was obtained.
Example two
Based on the same inventive concept, the embodiment of the present invention further provides an apparatus for acquiring a target network model, as shown in fig. 6, including:
an obtaining module 601, configured to obtain a teacher network model and a student network model, where the student network model is a model simplified by the teacher network model;
and an adding module 602, configured to add a residual structure between two initially adjacent convolution layers in the student network model, so as to obtain a target network model, so that in a convolution data set obtained after a training data set passes through a first convolution layer in the student network model, a part of the convolution data set is weighted into a remaining convolution data set after being processed by the residual structure, so that local features of the convolution data set are enhanced, and then the local features are input into a second convolution layer.
In an alternative embodiment, the residual structure comprises:
a first layer elimination influence layer, which is used for eliminating the first abnormal data in the partial convolution data set;
the residual processing layer is used for carrying out data enhancement on partial convolution data after abnormal data are eliminated, and weighting the partial convolution data to a residual convolution data set to obtain a convolution data set with enhanced local characteristics;
and a second layer of elimination influence layer is used for eliminating second abnormal data in the convolution data set with the enhanced local characteristics.
In an alternative embodiment, the residual structure is specifically configured to:
performing feature enhancement on part of the convolution data to obtain feature enhancement data:
v 2 =∑ i p i 2 /c
wherein p is i For any data in any vector in the partial convolution data set, c is the number of channels of the partial convolution data set, v 2 For the mean square calculation of any vector in a partial convolution dataset, p i ' is any vector in a vector set obtained by normalizing the mean square calculation result, gamma and beta are trainable super-parameters, and E is a positive number for preventing denominator from being zero;
and fusing any vector in the normalized vector set with the same position data in the residual convolution data set to obtain the convolution data set with enhanced local characteristics.
In an alternative embodiment, the method further comprises: a network total loss determination module comprising:
the acquisition unit is used for acquiring local feature similarity differences and output differences between the target network model and the teacher network model and differences between the target network model and a real target;
and the determining unit is used for determining the total network loss of the target network model based on the local feature similarity difference and the output difference between the target network model and the teacher network model and the difference between the target network model and the real target.
In an alternative embodiment, the obtaining unit is configured to determine the total network loss of the target network model based on the local feature similarity difference between the target network model and the teacher network model, the output difference, and the difference between the target network model and the real target in the following manner:
L LFNE =αL KD +(1-α)L CE +βL SLFN
wherein L is LFNE L is the total network loss of the target network model SLFN L is the difference of local feature similarity between the target network model and the teacher network model KD L is the output difference between the target network model and the teacher network model CE For the difference of the target network model from the real target, α and β are hyper-parameters for tuning.
In an alternative embodiment, the network total loss determination module further includes:
and the adjusting unit is used for adjusting the parameters of the target network model based on the total loss of the target network model so as to obtain an optimized target network model.
Example III
Based on the same inventive concept, an embodiment of the present invention provides a computer device, as shown in fig. 7, including a memory 704, a processor 702, and a computer program stored on the memory 704 and executable on the processor 702, where the processor 602 implements the steps of the method for obtaining a target network model described above when executing the program.
Where in FIG. 7 a bus architecture (represented by bus 700), bus 700 may comprise any number of interconnected buses and bridges, with bus 700 linking together various circuits, including one or more processors, as represented by processor 702, and memory, as represented by memory 704. Bus 700 may also link together various other circuits such as peripheral devices, voltage regulators, power management circuits, etc., as are well known in the art and, therefore, will not be described further herein. Bus interface 706 provides an interface between bus 700 and receiver 701 and transmitter 703. The receiver 701 and the transmitter 703 may be the same element, i.e. a transceiver, providing a unit for communicating with various other apparatus over a transmission medium. The processor 702 is responsible for managing the bus 700 and general processing, while the memory 704 may be used to store data used by the processor 702 in performing operations.
Example IV
Based on the same inventive concept, an embodiment of the present invention provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the above-described method of obtaining a target network model.
The algorithms and displays presented herein are not inherently related to any particular computer, virtual system, or other apparatus. Various general-purpose systems may also be used with the teachings herein. The required structure for a construction of such a system is apparent from the description above. In addition, the present invention is not directed to any particular programming language. It will be appreciated that the teachings of the present invention described herein may be implemented in a variety of programming languages, and the above description of specific languages is provided for disclosure of enablement and best mode of the present invention.
In the description provided herein, numerous specific details are set forth. However, it is understood that embodiments of the invention may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.
Similarly, it should be appreciated that in the foregoing description of exemplary embodiments of the invention, various features of the invention are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure and aiding in the understanding of one or more of the various inventive aspects. However, the disclosed method should not be construed as reflecting the intention that: i.e., the claimed invention requires more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the detailed description are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate embodiment of this invention.
Those skilled in the art will appreciate that the modules in the apparatus of the embodiments may be adaptively changed and disposed in one or more apparatuses different from the embodiments. The modules or units or components of the embodiments may be combined into one module or unit or component and, furthermore, they may be divided into a plurality of sub-modules or sub-units or sub-components. Any combination of all features disclosed in this specification (including any accompanying claims, abstract and drawings), and all of the processes or units of any method or apparatus so disclosed, may be used in combination, except insofar as at least some of such features and/or processes or units are mutually exclusive. Each feature disclosed in this specification (including any accompanying claims, abstract and drawings), may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.
Furthermore, those skilled in the art will appreciate that while some embodiments herein include some features but not others included in other embodiments, combinations of features of different embodiments are meant to be within the scope of the invention and form different embodiments. For example, in the following claims, any of the claimed embodiments can be used in any combination.
Various component embodiments of the invention may be implemented in hardware, or in software modules running on one or more processors, or in a combination thereof. Those skilled in the art will appreciate that some or all of the functionality of some or all of the means for obtaining a model of a target network, computer device, according to embodiments of the invention may be implemented in practice using a microprocessor or Digital Signal Processor (DSP). The present invention can also be implemented as an apparatus or device program (e.g., a computer program and a computer program product) for performing a portion or all of the methods described herein. Such a program embodying the present invention may be stored on a computer readable medium, or may have the form of one or more signals. Such signals may be downloaded from an internet website, provided on a carrier signal, or provided in any other form.
It should be noted that the above-mentioned embodiments illustrate rather than limit the invention, and that those skilled in the art will be able to design alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The invention may be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The use of the words first, second, third, etc. do not denote any order. These words may be interpreted as names.
Claims (10)
1. A method for obtaining a target network model, comprising:
acquiring a teacher network model and a student network model, wherein the student network model is a model of which the teacher network model is simplified;
in the student network model, a residual structure is added between two initial adjacent convolution layers to obtain a target network model, so that in a convolution data set obtained after a training data set passes through a first convolution layer in the target network model, part of the convolution data set is weighted into a residual convolution data set after being processed by the residual structure, and local features of the convolution data set are enhanced and then input into a second convolution layer.
2. The method of claim 1, wherein the residual structure comprises:
a first layer elimination influence layer, which is used for eliminating the first abnormal data in the partial convolution data set;
the residual processing layer is used for carrying out data enhancement on partial convolution data after abnormal data are eliminated, and weighting the partial convolution data to a residual convolution data set to obtain a convolution data set with enhanced local characteristics;
and a second layer of elimination influence layer is used for eliminating second abnormal data in the convolution data set with the enhanced local characteristics.
3. The method according to claim 1, wherein the residual structure is specifically configured to:
performing feature enhancement on part of the convolution data to obtain feature enhancement data:
v 2 =∑ i p i 2 /c
wherein p is i For any data in any vector in the partial convolution data set, c is the number of channels of the partial convolution data set, v 2 For the mean square calculation of any vector in a partial convolution dataset, p i ′ For any vector in a vector set obtained by normalizing the mean square calculation result, gamma and beta are trainable super-parameters, and E is a positive number for preventing denominator from being zero;
and fusing any vector in the normalized vector set with the same position data in the residual convolution data set to obtain the convolution data set with enhanced local characteristics.
4. The method of claim 1, wherein adding a residual structure between two initially adjacent convolution layers in the student network model, after obtaining a target network model, further comprises:
obtaining local feature similarity differences and output differences between the target network model and the teacher network model and differences between the target network model and a real target;
and determining the total network loss of the target network model based on the local feature similarity difference and the output difference between the target network model and the teacher network model and the difference between the target network model and the real target.
5. The method of claim 4, wherein the obtaining the local feature similarity differences between the target network model and the teacher network model comprises:
acquiring a plurality of first characteristic outputs of the target network model and second characteristic outputs of the same position in the teacher network model;
and determining a local feature similarity difference between the target network model and the teacher network model based on the first feature output and the second feature output.
6. The method of claim 4, wherein the determining the total network loss for the target network model based on the local feature similarity differences between the target network model and the teacher network model, the output differences, and the differences between the target network model and the real target comprises:
determining a total network loss of the target network model based on local feature similarity differences between the target network model and the teacher network model, output differences, and differences of the target network model and a real target in the following manner:
L LFNE =αL KD +(1-α)L CE +βL SLFN
wherein L is LFNE L is the total network loss of the target network model SLFN L is the difference of local feature similarity between the target network model and the teacher network model KD L is the output difference between the target network model and the teacher network model CE For the difference of the target network model from the real target, α and β are hyper-parameters for tuning.
7. The method of claim 4, wherein the determining the total loss of the target network model based on the local feature similarity differences between the target network model and the teacher network model, the output differences, and the differences of the target network model and the real target further comprises:
and adjusting parameters of the target network model based on the total loss of the target network model to obtain an optimized target network model.
8. An apparatus for obtaining a target network model, comprising:
the system comprises an acquisition module, a calculation module and a calculation module, wherein the acquisition module is used for acquiring a teacher network model and a student network model, and the student network model is a model of which the teacher network model is simplified;
and the adding module is used for adding a residual structure between the two initial adjacent convolution layers in the student network model to obtain a target network model, so that in the convolution data set obtained after the training data set passes through the first convolution layer in the student network model, part of the convolution data set is weighted into the residual convolution data set after the processing of the residual structure, so that the local characteristics of the convolution data set are enhanced, and the local characteristics are input into the second convolution layer.
9. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the method steps of any of claims 1 to 7 when the program is executed.
10. A computer readable storage medium, on which a computer program is stored, characterized in that the program, when being executed by a processor, implements the method steps of any of claims 1-7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310253048.2A CN116227556A (en) | 2023-03-16 | 2023-03-16 | Method, device, computer equipment and storage medium for acquiring target network model |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310253048.2A CN116227556A (en) | 2023-03-16 | 2023-03-16 | Method, device, computer equipment and storage medium for acquiring target network model |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116227556A true CN116227556A (en) | 2023-06-06 |
Family
ID=86576858
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310253048.2A Pending CN116227556A (en) | 2023-03-16 | 2023-03-16 | Method, device, computer equipment and storage medium for acquiring target network model |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116227556A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116993962A (en) * | 2023-07-20 | 2023-11-03 | 广东南方智媒科技有限公司 | Two-dimensional code detection method, device, equipment and readable storage medium |
-
2023
- 2023-03-16 CN CN202310253048.2A patent/CN116227556A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116993962A (en) * | 2023-07-20 | 2023-11-03 | 广东南方智媒科技有限公司 | Two-dimensional code detection method, device, equipment and readable storage medium |
CN116993962B (en) * | 2023-07-20 | 2024-04-26 | 广东南方智媒科技有限公司 | Two-dimensional code detection method, device, equipment and readable storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Chen et al. | Sdae: Self-distillated masked autoencoder | |
WO2021143396A1 (en) | Method and apparatus for carrying out classification prediction by using text classification model | |
CN113705769B (en) | Neural network training method and device | |
US20200334520A1 (en) | Multi-task machine learning architectures and training procedures | |
US20200334457A1 (en) | Image recognition method and apparatus | |
US20230095606A1 (en) | Method for training classifier, and data processing method, system, and device | |
CN112184508B (en) | Student model training method and device for image processing | |
US9875294B2 (en) | Method and apparatus for classifying object based on social networking service, and storage medium | |
CN116010713A (en) | Innovative entrepreneur platform service data processing method and system based on cloud computing | |
CN112561082B (en) | Method, device, equipment and storage medium for generating model | |
CN111368937A (en) | Image classification method and device, and training method, device, equipment and medium thereof | |
US20240135174A1 (en) | Data processing method, and neural network model training method and apparatus | |
CN112529146A (en) | Method and device for training neural network model | |
CN111738403B (en) | Neural network optimization method and related equipment | |
US11636667B2 (en) | Pattern recognition apparatus, pattern recognition method, and computer program product | |
CN113961765B (en) | Searching method, searching device, searching equipment and searching medium based on neural network model | |
CN110598837A (en) | Artificial neural network adjusting method and device | |
CN115238909A (en) | Data value evaluation method based on federal learning and related equipment thereof | |
CN113065634B (en) | Image processing method, neural network training method and related equipment | |
CN110874627B (en) | Data processing method, data processing device and computer readable medium | |
CN116227556A (en) | Method, device, computer equipment and storage medium for acquiring target network model | |
WO2023220878A1 (en) | Training neural network trough dense-connection based knowlege distillation | |
CN111783936B (en) | Convolutional neural network construction method, device, equipment and medium | |
CN116258871A (en) | Fusion feature-based target network model acquisition method and device | |
US12100196B2 (en) | Method and machine learning system to perform quantization of neural network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |