CN112836799A

CN112836799A - Rapid incremental reconstruction method and system for lightweight model

Info

Publication number: CN112836799A
Application number: CN202110129469.5A
Authority: CN
Inventors: 张志�; 李娜; 王晶; 路晓男; 谭礼晋; 王日冬; 杨丽萍; 余博; 朱宇涛; 姚梁希; 时鸽; 李超
Original assignee: Pla 93114
Current assignee: Pla 93114
Priority date: 2021-01-29
Filing date: 2021-01-29
Publication date: 2021-05-25

Abstract

The invention relates to a method and a system for quick incremental reconstruction of a lightweight model, wherein the method comprises the following steps: establishing a first neural network model for processing a first type of task; performing incremental reconstruction on the first neural network model to obtain a second neural network model; training a second neural network model by using a second type of task; semantic feature migration between the first type of task and the second type of task is realized through countertraining and migration learning in the training process; processing the first type of task and the second type of task by using the trained second neural network model; when the input is a second type task, extracting second type semantic features, and executing the second type task according to the second type semantic features; when the input is the first type task, the second type semantic features are extracted, feature migration is carried out to obtain the first type semantic features, and the first type task is executed according to the first type semantic features. The invention realizes the rapid reconstruction of the model on the edge equipment, and not only retains the old task but also adapts to the new task.

Description

Rapid incremental reconstruction method and system for lightweight model

Technical Field

The invention relates to the technical field of deep learning, in particular to a method and a system for quick incremental reconstruction of a lightweight model.

Background

The structure of the deep learning, once determined, is difficult to adjust during the training process. The structure of the neural network directly determines the capacity of the learning model. The fixed structure of the neural network means that the capacity of the model is also limited, and in the case of limited capacity, the neural network must erase old knowledge in order to learn a new task, which is disastrous to forget. So-called catastrophic forgetting: i.e., after learning new knowledge, forget the previously learned content almost completely. It makes artificial intelligence lacking the ability to adapt to the environment and to learn incrementally (continuously) like a living being.

Dropout can effectively reduce catastrophic forgetting. After dropout is used, information can be more evenly distributed among different neurons, and after fine adjustment is carried out on new data, the phenomenon that the performance of the whole model is sharply reduced due to the fact that a specific neuron is changed does not occur.

The idea of knowledge distillation can assist the network in resisting catastrophic forgetfulness. And using the old model as a teacher network, and constraining the samples in the new task by using soften softmax output by the teacher network, wherein the constraint can still train the new samples and maintain the performance of the old task even if the class definitions of the new task and the old task are different.

Elastic weight consolidation is used for the reference of a memory storage mechanism in a living body, and useful memory is protected and is deleted. The elastic weight curing can consider that the cured part in the network contributes more weight to the correct prediction of the network output, so that the cured part in the network can keep the weight as much as possible in the learning process of a new sample, and the network can adapt to the distribution of the new sample by updating other weights.

The method needs to store old task data or expands a larger model for a new task, and cannot adapt to edge equipment with limited storage capacity and computing capacity.

Disclosure of Invention

In view of the foregoing analysis, the present invention aims to provide a method and a system for fast incremental reconstruction of a lightweight model, which enable fast reconstruction of a model on an edge device, and both retain an old task and adapt to a new task.

The technical scheme provided by the invention is as follows:

a rapid incremental reconstruction method for a lightweight model comprises the following steps:

step S1, establishing a first neural network model for processing a first type of task; the first neural network model extracts the first semantic features of the input first type tasks and executes the first type tasks according to the first semantic features;

step S2, performing incremental reconstruction on the first neural network model to obtain a second neural network model for processing the first type of task and the second type of task;

step S3, training a second neural network model by utilizing a second class of tasks; semantic feature migration between the first type of task and the second type of task is realized through countertraining and migration learning in the training process;

step S4, processing the first type of task and the second type of task by using the trained second neural network model; when the input is a second type task, extracting second type semantic features, and executing the second type task according to the second type semantic features; when the input is the first type task, the second type semantic features are extracted, feature migration is carried out to obtain the first type semantic features, and the first type task is executed according to the first type semantic features.

Further, the first neural network model comprises a task sharing module and a first task module;

training the first neural network model using a first class of tasks; in the course of the training, the user can,

the task sharing module learns and extracts the first type semantic features of the input first type tasks to obtain first type task sharing parameters;

the first task module learns to execute the first type of task according to the first type of semantic features to obtain first task execution parameters.

Further, the task sharing module comprises a stub structure and a feature layer stacking structure; and performing operations including multilayer convolution, pooling, Batch Normalization and a nonlinear function on the input task to obtain the convolution characteristic of the task, wherein the convolution characteristic is used for outputting the semantic characteristic of the task.

Further, the task sharing module is a residual error network ResNet18 for removing the last full connection layer; the method comprises 17 convolutional layers, wherein a nonlinear function uses a ReLU, and finally, a feature vector is extracted through global pooling to serve as an output vector of a task sharing module.

Furthermore, on the basis of reserving the first neural network model, a first migration module, a first task discrimination module and a second task module are added to the second neural network model after incremental reconstruction;

in the training process of the second neural network model,

the task sharing module learns and extracts second semantic features of the input second tasks to obtain second task sharing parameters; meanwhile, retaining the first type of task sharing parameters before reconstruction, and extracting the first type of semantic features of the input second type of tasks by using the first type of task sharing parameters;

the second task module learns to execute a second type of task according to the second type of semantic features to obtain second task execution parameters;

the first migration module learns to migrate the second semantic features to the first semantic features to obtain migrated first semantic features;

the first task distinguishing module is used for distinguishing the types of the first type semantic features of the second type tasks and the migrated first feature semantic features.

Further, the confrontation training process in the second neural network model training includes:

the first task distinguishing module learns to distinguish whether the input features are first-class semantic features of the second-class task or migrated first-feature semantic features; the task sharing module and the first migration module learn to confuse the first task judging module, so that the first task judging module cannot distinguish that the input features really belong to semantic features processed by the first type of task sharing parameters or the second type of task sharing parameters;

the second semantic features output by the task sharing module are better migrated to the first semantic features by the first migration module through countertraining.

Further, in the confrontation training process, the first task judging module conducts training according to the minimized task classification loss, and the task sharing module and the first migration module conduct training according to the maximized task classification loss.

Further, the first migration module adopts a full-connection layer structure, and comprises a three-layer full-connection network, and a BN operation and a nonlinear activation function ReLU are connected behind each full-connection layer;

the first task distinguishing module adopts a multilayer perceptron structure, comprises 3 layers of perceptrons and is used for classifying the tasks of the features extracted by the task sharing module and the first migration module respectively.

Further, after step S4, the method further includes:

performing incremental reconstruction on the second neural network model to obtain a third neural network model;

training a third neural network model by using a third class task, and realizing semantic feature migration between the second class task and the third class task through countermeasure training and migration learning in the training process;

processing the second type of task and the third type of task by using the trained third neural network model; when the input is a third type task, extracting a third type semantic feature, and executing the third type task according to the third type semantic feature; when the input is a second type task, extracting a third type semantic feature, performing feature migration to obtain a second type semantic feature, and executing the second type task according to the second type semantic feature.

The invention also discloses an incremental reconstruction system based on the rapid incremental reconstruction method of the lightweight model, wherein the system before reconstruction comprises a task sharing module and a first task module;

the task sharing module is used for extracting the first type semantic features of the input first type tasks;

the first task module is used for executing a first type of task according to the first type of semantic features;

the reconstructed system comprises a task sharing module, a first task module, a first migration module, a first task judging module and a second task module;

the task sharing module is used for extracting the second semantic features of the input first type tasks or second type tasks;

the second task module is used for executing the second type of task according to the second type of semantic features of the second type of task;

the first migration module is used for migrating the second semantic features of the first type of tasks to the first semantic features of the first type of tasks;

the first task module is used for executing the first type of tasks according to the first type of semantic features of the first type of tasks output by the first migration module;

the first task distinguishing module is used for performing confrontation training with the task sharing module and the first migration module when the reconstructed system is trained;

in the countermeasure training, a first task distinguishing module learns to distinguish whether the input features are first-class semantic features of a second-class task or migrated first-feature semantic features; the task sharing module and the first migration module learn to confuse the first task judging module, so that the first task judging module cannot distinguish that the input features really belong to semantic features processed by the first type of task sharing parameters or the second type of task sharing parameters; the second semantic features output by the task sharing module are better migrated to the first semantic features by the first migration module through countertraining.

The invention can realize at least one of the following beneficial effects:

based on the rapid incremental reconstruction method and the rapid incremental reconstruction system for the lightweight model, the model can efficiently learn a new task under limited resources through model reconstruction and countermeasure training, the existing functions are reserved, and the rapid reconstruction of the model on edge equipment is realized to adapt to the new task.

Drawings

The drawings are only for purposes of illustrating particular embodiments and are not to be construed as limiting the invention, wherein like reference numerals are used to designate like parts throughout.

FIG. 1 is a flowchart of a method for rapid incremental reconstruction of a lightweight model according to this embodiment;

fig. 2 is a schematic diagram of a network structure of the ResNet18 in this embodiment;

fig. 3 is a connection diagram of the components of the post-reconstruction lightweight model fast incremental reconstruction system according to this embodiment.

Detailed Description

The preferred embodiments of the present invention will now be described in detail with reference to the accompanying drawings, which form a part hereof, and which together with the embodiments of the invention serve to explain the principles of the invention.

The embodiment discloses a rapid incremental reconstruction method of a lightweight model, which can be applied to edge equipment, so that the edge equipment not only retains the processing function of an old task but also adapts to a new task after passing through a rapid reconstruction model.

As shown in fig. 1, the specific method comprises the following steps:

step S1, establishing a first neural network model for processing a first type of task;

the first neural network model extracts the first semantic features of the input first type tasks and executes the first type tasks according to the first semantic features;

specifically, the first neural network model comprises a task sharing module and a first task module;

the task sharing module learns and extracts the first type semantic features of the input first type tasks to obtain parameters of a first task sharing model;

The task sharing module comprises a incomplete structure and a feature layer stacking structure; the convolution characteristics of the task are obtained by carrying out operations including multilayer convolution, pooling, Batch Normalization and nonlinear functions on the input task, and the operation is used for outputting high-level semantic information of the task. The higher the level of semantic information contained in the higher level features, the more beneficial the task sharing module to obtain the high level semantic information to complete the related task functions.

Preferably, the task sharing module in this embodiment uses the residual error network ResNet18 with the last full connection layer removed as the task sharing module, which includes 17 convolutional layers, uses ReLU as the nonlinear function, and finally obtains 512-dimensional feature vectors through global pooling extraction.

Residual structure and stacked structure of the ResNet18 network structure, as shown in fig. 2, the arc above the picture represents the residual structure, and 17 convolutions are arranged in sequence to form the stacked structure.

The first task module is a light-weight network and executes a first class of tasks according to the 512-dimensional feature vector.

The first task module is a network formed by fully-connected layers, the specific number of layers is set according to task requirements, the input dimension of the specific number of layers is the same as the output dimension of the task sharing module, and the output dimension is the dimension of the execution result of the first type of tasks, for example, if the first type of tasks are classification tasks, the output dimension is the number of categories to be classified.

S2, performing incremental reconstruction on the first neural network model to obtain a second neural network model; the second neural network model can process both the first type of tasks and the second type of tasks;

the first neural network model in step S1 can only process the first type of task, and when a second type of task different from the first type of task needs to be learned, in order to avoid the catastrophic forgetting of erasing old knowledge, the embodiment performs incremental reconstruction on the first neural network model to obtain a second neural network model. On the basis of reserving the first neural network model, the second neural network model after incremental reconstruction is additionally provided with a first migration module, a first task discrimination module and a second task module;

specifically, in the second neural network model obtained after reconstruction,

the second task module is a network which is similar to the first task module and is composed of full connection layers, the specific number of layers is set according to task requirements, the input dimension of the specific number of layers is the same as the output dimension of the task sharing module, and the output dimension is the dimension of an execution result of the second type of task.

specifically, the first migration module adopts a full-connection layer structure, and comprises three layers of full-connection networks, wherein each full-connection layer is connected with a BN operation and a nonlinear activation function ReLU;

more specifically, the number of weights of the three-layer fully-connected network is 512 × 128, 128 × 128, and 128 × 512, respectively, the input of the first layer is the output of the task sharing module, and therefore corresponds to the dimension of the output feature vector of the task sharing module, the first layer is 512 × 128, the second layer is a transition, and the third layer requires the dimension of the output of the first migration module to be the same as the dimension of the input, and therefore the output is also set to be 512 dimensions. And the migration of the second semantic features output by the task sharing module after the training of the first type of tasks to the first semantic features is realized through a simple full-connection layer structure.

specifically, the first task distinguishing module adopts a simple multilayer perceptron structure to classify the tasks of the features extracted by the task sharing module and the first migration module respectively;

more specifically, the method is formed by adopting a 3-layer perceptron, the number of nodes of each layer is 512, 128 and 2 respectively, and two outputs are used for judging whether the feature belongs to a first semantic feature or a second semantic feature.

The perceptron uses a fully-connected strategy, and only the input dimension is required to be the same as the output dimension of the task sharing module and the first migration module, and the output dimension is the number of the categories of the related tasks. The class corresponding to the maximum value in the output vector is the prediction class corresponding to the input image.

Because a second type of task which is a new task relative to the first type of task needs to be learned and processed, a training set consisting of the second type of task needs to be used for training the second neural network model;

in order to reconstruct a model in an edge device quickly and enable the model to have the recognition capability for a new task class, a task data set with sufficient and complete data volume corresponding to an extended class and a truth label are required to be used as training samples. The data set sample has the characteristics of multiple scales, multiple directions, random positions, random quantity and equal density distribution.

in the training process of the second neural network model,

the task sharing module learns and extracts second semantic features of the input second tasks to obtain second task sharing parameters; and meanwhile, reserving the first type of task sharing parameters before reconstruction as common output parameters of the task sharing module. In the training process, the first class of semantic features of the input second class of tasks are extracted by using the first class of task sharing parameters;

preferably, the distributed dropout is used to order the model to randomly inactivate a part of neurons according to rules during the training process, so that the neurons do not participate in a certain round of training, and activate the neurons before the next round of training begins. The aim is to distribute the information learned by the network evenly in each layer of neurons, and after fine adjustment on new data, the performance of the whole model cannot be reduced sharply due to the fact that a specific neuron is changed.

The second task module learns to execute a second type of task according to the second type of semantic features to obtain second task execution parameters; the system is used for executing the second type task according to the second type semantic features of the second type task;

the first task distinguishing module is used for distinguishing the types of the first type semantic features of the second type tasks and the migrated first feature semantic features;

in order to better realize the migration of the second semantic features output by the task sharing module to the first semantic features, the confrontation training process in the training of the second neural network model comprises the following steps:

More specifically, in the countermeasure training process, the first task discrimination module performs training according to the minimized task classification loss;

preferably, the first task distinguishing module uses cross entropy loss to learn, and is essentially a classified decision device, namely, the input is the features output by the task sharing module and the first migration module, and the output is the task type to which the features belong. The training basis of the first task discrimination module minimizes cross-entropy classification loss.

The task sharing module and the first migration module are trained according to the maximum task classification loss;

the task sharing module and the first migration module maximize the classification loss of the judger, namely, the characteristics output by learning enable the judger to wrongly classify the tasks to which the judger belongs, and the task sharing module and the first migration module form a countermeasure.

Specifically, the cross-entropy classification loss is minimized as follows:

the classification penalty of the maximum discriminator is as follows:

in the formula, i represents the index of a training sample, and N training samples are in total in a training set; c represents the task sharing parameter category index adopted in the training, C is the total category number of the current task sharing parameters, y_icWhether the ith training sample is truly classified under the category of the c task sharing parameterThe true value processed by the task sharing parameter is 1 or 0; p is a radical of_icProbability that the feature discriminated for the discriminator is the feature of the i-th training sample processed by the c-th task sharing parameter; this example C ═ 2; i.e. including a first type of task sharing parameter and a second task sharing parameter.

And finally, the first task distinguishing module cannot distinguish whether the input features are the first-class semantic features of the second-class task or the migrated first-feature semantic features through training, the countermeasure training is considered to be finished, and the first migration module can realize the migration of the second-class semantic features output by the task sharing module to the first-class semantic features.

After the training is finished, the first type of task sharing parameters and the first task judging module before reconstruction are reserved in the task sharing module to reduce the computing pressure of the edge device.

The second neural network model obtained by performing incremental reconstruction on the first neural network model can be realized on edge equipment, and the old task processing function is reserved and the new task is adapted through the rapid reconstruction model.

According to the method, when a third type task updated relative to the second type task needs to learn,

further, incremental reconstruction can be performed on the second neural network model to obtain a third neural network model;

In summary, according to the method for quickly incrementally reconstructing a lightweight model in this embodiment, through model reconstruction and countermeasure training, a model can efficiently learn a new task under limited resources, and existing functions are retained, so that the model is quickly reconstructed on edge equipment, and the method is suitable for the new task.

The embodiment also discloses a rapid incremental reconstruction system of the lightweight model, which is applied to the edge device, wherein the system comprises a task sharing module and a first task module before reconstruction;

after reconstruction, the system is as shown in fig. 3, and includes a task sharing module, a first task module, a first migration module, a first task discrimination module, and a second task module;

Technical details and beneficial effects of the rapid incremental reconstruction system for the lightweight model in the embodiment are the same as those of the rapid incremental reconstruction method for the lightweight model, and are not described herein again. Please refer to the description in the fast incremental reconstruction method of the lightweight model.

The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention.

Claims

1. A rapid incremental reconstruction method for a lightweight model is characterized by comprising the following steps:

2. The method for rapid incremental reconstruction of a lightweight model according to claim 1,

the first neural network model comprises a task sharing module and a first task module;

3. The method for rapid incremental reconstruction of a lightweight model according to claim 2,

the task sharing module comprises a incomplete structure and a feature layer stacking structure; and performing operations including multilayer convolution, pooling, Batch Normalization and a nonlinear function on the input task to obtain the convolution characteristic of the task, wherein the convolution characteristic is used for outputting the semantic characteristic of the task.

4. The method for rapid incremental reconstruction of a lightweight model according to claim 3,

the task sharing module is a residual error network ResNet18 for removing the last full connection layer; the method comprises 17 convolutional layers, wherein a nonlinear function uses a ReLU, and finally, a feature vector is extracted through global pooling to serve as an output vector of a task sharing module.

5. The method for rapid incremental reconstruction of a lightweight model according to claim 2,

on the basis of reserving the first neural network model, the second neural network model after incremental reconstruction is additionally provided with a first migration module, a first task discrimination module and a second task module;

in the training process of the second neural network model,

6. The method of claim 5, wherein the countermeasure training process in the second neural network model training comprises:

7. The method for rapid incremental reconstruction of a lightweight model according to claim 6,

in the confrontation training process, the first task judging module conducts training according to the minimized task classification loss, and the task sharing module and the first migration module conduct training according to the maximized task classification loss.

8. The method for rapid incremental reconstruction of a lightweight model according to claim 6,

the first migration module adopts a full connection layer structure and comprises three layers of full connection networks, and each full connection layer is connected with a BN operation and a nonlinear activation function ReLU;

9. The method for rapid incremental reconstruction of a lightweight model according to any of claims 1 to 8, wherein after step S4, the method further comprises:

10. An incremental reconstruction system based on the rapid incremental reconstruction method of the lightweight model according to any one of claims 1 to 9, wherein the system before reconstruction comprises a task sharing module and a first task module;