CN115587217A

CN115587217A - Multi-terminal video detection model online retraining method

Info

Publication number: CN115587217A
Application number: CN202211268163.9A
Authority: CN
Inventors: 刘思聪; 王乐豪; 於志文; 于昊艺; 郭斌
Original assignee: Northwestern Polytechnical University
Current assignee: Northwestern Polytechnical University
Priority date: 2022-10-17
Filing date: 2022-10-17
Publication date: 2023-01-10

Abstract

The invention relates to a multi-terminal video detection model online retraining method, which comprises the steps of obtaining characteristic distribution of each terminal and the whole according to sample labels and input modules of intermediate models, sampling on the general characteristic distribution to obtain an augmented sample, and then mutually learning a local model and an intermediate model corresponding to the augmented sample according to a new data set to update the local model and the intermediate model; aggregating a plurality of intermediate models to generate a global model, and carrying out knowledge distillation by using the global model and a plurality of local models corresponding to the global model based on a picture set of an original data set, wherein the global model is used as a teacher model to teach knowledge to the local models used as student models; based on the characteristics of the models, the local model and the intermediate model are deployed on a digital accelerator, the output of the two models is processed to a certain degree, the global model is deployed on a simulation accelerator, and meanwhile, weighted noise is injected in the mutual learning process, so that the models adapt to the influence of the noise.

Description

Multi-terminal video detection model online retraining method

Technical Field

The invention relates to the field of retraining of a compression depth model and hardware acceleration based on deep learning, in particular to an online retraining method of a multi-terminal video detection model.

Background

Along with the improvement of the living standard of people and the high-speed development of science and technology, electronic equipment such as a smart phone is widely popularized, the deployment of a deep learning mobile terminal is concerned by more and more researchers, but the deployment of the neural network on the mobile terminal is difficult and hindered by a huge neural network and a mobile terminal hardware platform with limited resources. Researchers have begun to move from model compression techniques and deep learning hardware accelerators, and have made significant advances in this area. However, video analysis based on deep learning inevitably faces the problem of data drift, that is, video data of a real scene is different from data in training of a deep model, and under the influence of the data, the precision of a shallow lightweight model in the real scene may be obviously reduced, so that the requirements of customers are not met, and online model retraining is an effective way for solving the problem.

At present, most of models are retrained on line by relying on knowledge distillation technology, and in the process, a terminal model is used as a student model to learn from a teacher model located on an edge server, so that model updating is realized. However, because the edge server resources are limited, the deployed 'teacher' model has a simple structure, and the performance of the whole system is affected by the problem of data drift. Meanwhile, the speed of on-line retraining of the model also influences the average reasoning precision of the terminal model, so that the acceleration of the retraining process is also important.

Disclosure of Invention

Technical problem to be solved

In order to avoid the defects of the prior art, the invention provides an online retraining method of a multi-terminal video detection model.

Technical scheme

A multi-terminal video detection model online retraining method is characterized by comprising the following steps:

step 1: uploading the data collected and screened by each terminal to an edge terminal, and labeling the data without the label by using a global model positioned in an edge server;

step 2: obtaining a data set of each intermediate model and the feature distribution of the overall data before aggregation by using a sample label of the local model and a prediction module of the intermediate model;

and step 3: all local models sharing a global model are sampled in parallel on the overall characteristic distribution according to the characteristic distribution to obtain an augmentation sample, and an original data set of the augmentation sample is updated;

and 4, step 4: according to the new data set, the corresponding local model and the corresponding intermediate model are mutually learned, the local model and the intermediate model are updated, and loss functions corresponding to the training of the two models are rewritten as follows:

L _local ＝αL _Clocal +(1-α)D _KL (P _mid ||P _local )

L _mid ＝βL _Cmid +(1-β)D _KL (P _mid ||P _local )

where α and β are hyper-parameters, for controlling the proportion of knowledge from data and other models, L _Clocal And L _Cmid Loss functions based on data labels, P, for the local model and the intermediate model, respectively _local And P _mid Respectively are the inference results of the local model and the intermediate model;

and 5: aggregating the intermediate models of the plurality of terminal devices by utilizing a FedAVG algorithm to generate a global model;

step 6: based on the originally collected and screened data set of each terminal, knowledge distillation is carried out by utilizing the global model and a plurality of local models corresponding to the global model, the global model is used as a teacher model, and knowledge is taught to the local models used as student models;

and 7: the method comprises the steps that a local model and an intermediate model are deployed in a digital accelerator, and input of the models is processed to a certain degree, specifically, after a picture set uploaded by a terminal model is subjected to inference and labeling by a global model, if a small target object does not exist in a picture, picture pixels are reduced, and if a small object exists in the picture, the picture pixels are not compressed in order to guarantee the retraining effect;

and 8: deploying the global model on the simulation accelerator, and optimizing the global model: on one hand, the adopted mutual learning can make the model converge to a more gentle minimum value point, so that the robustness of the model to noise is enhanced; on the other hand, weight noise is injected in the mutual learning process of the local model and the intermediate model, so that the model adapts to the influence of noise, and specifically, in the mutual learning forward propagation process, the weight of the l-th layer is made to satisfy:

W _l ∈N(W _l0 ,σ _N,l ² )

σ _N,l ＝μ(W _lmax -W _lmin )

in addition, in order to prevent the too large weight selection amplitude from affecting the efficiency and precision of model training, the weights are further restricted:

W _lmin ≤W _l ≤W _lmax

wherein, W _l0 Is the original layer I weight, σ _N,l For injected noise, μ is the noise figure, W _lmax And W _lmin The maximum weight and the minimum weight of the ith layer are respectively;

and step 9: and the edge server transmits the updated model parameters back to the terminal, and the terminal immediately deploys a new model and continues to perform real-time video analysis.

A computer system, comprising: one or more processors, a computer readable storage medium, for storing one or more programs, wherein when the one or more programs are executed by the one or more processors, the one or more processors are caused to implement the above-described method.

A computer-readable storage medium having stored thereon computer-executable instructions for performing the above-described method when executed.

Advantageous effects

According to the multi-terminal video detection model online retraining method provided by the invention, online updating of the terminal model is completed by utilizing a terminal compression model continuous evolution framework based on mutual learning, so that the precision of a compression depth model influenced by data drift is obviously improved, meanwhile, the model updating speed is greatly accelerated by a retraining hardware acceleration method based on memory calculation, hardware resources required by model retraining are greatly reduced, and the retraining efficiency is improved.

Drawings

The drawings, in which like reference numerals refer to like parts throughout, are for the purpose of illustrating particular embodiments only and are not to be considered limiting of the invention.

Fig. 1 is a schematic diagram of the overall system structure of a low-data-transmission-quantity multi-terminal video detection model online retraining method.

Fig. 2 is a data processing in the system.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail below with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. In addition, the technical features involved in the respective embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.

The present invention utilizes the following principles: the method combines algorithms such as federal mutual learning and sample enhancement, realizes retraining of a terminal model and a global model and updating of the models, selects a proper hardware acceleration scheme for each model based on the characteristics of each model in a continuous evolution system, and optimizes the model deployed on the hardware acceleration scheme according to the characteristics of the hardware, so that the software and the hardware are more adaptive. The invention can greatly improve the reasoning precision of the terminal model influenced by data drift, and after hardware acceleration, the model precision, the resource occupation and the acceleration effect are remarkably improved.

The invention has 2 figures in total, please refer to fig. 1 and fig. 2, the specific steps of the invention are as follows:

step 1: and uploading the data collected and screened by each terminal to an edge terminal, and labeling the data without the label by using a global model (uncompressed model) positioned in an edge server.

Step 2: and obtaining the data set of each intermediate model before aggregation and the feature distribution of the overall data by using a sample label of a local model (a compression model for video inference at a terminal) and a prediction module of an intermediate model (the model is the same as an original global model and is used for mutual learning and model aggregation).

And step 3: all local models sharing one global model are parallelly sampled on the feature distribution according to the overall feature distribution to obtain an augmented sample, an original data set of the augmented sample is updated, and the influence of different terminal device data non-id can be reduced with the help of sample enhancement, so that the accuracy of the global model generated by aggregation is improved, and the heterogeneous problem of data and knowledge in federal learning is solved.

L _local ＝αL _Clocal +(1-α)D _KL (P _mid ||P _local )

L _mid ＝βL _Cmid +(1-β)D _KL (P _mid ||P _local )

where α and β are hyper-parameters, used to control the ratio of knowledge from data and other models, L _Clocal And L _Cmid Loss functions based on data labels, P, for the local model and the intermediate model, respectively _local And P _mid Are the inference results of the local model and the intermediate model, respectively. In the mutual learning process, the precision of the local model and the intermediate model can be improved, the training effect of mutual learning is better than the effect of separate and independent training of the two models, and the concrete advantages are approximately three points: firstly, the class probability estimation output by the neural network can restore the contact information between different classes in the data to a certain extent, so that the interaction of class estimation between networks can be transferred and learned to the data distribution characteristics,thereby improving the generalization capability of the model; secondly, mutual learning also plays a certain regularization role, the prediction result of the model is over-determined in the training process due to the independent hot coding of the truth labels, so that overfitting of the model is easily caused, and the models learn the class probability of each other in the mutual learning process, so that the phenomenon can be effectively prevented; finally, the network adjusts the learning process of the network by referring to the learning experience of other models in the training process, so that the result can be converged to a more gentle minimum value point, and the network has better generalization capability and has reduced sensitivity to noise, thereby enabling the network to have more choices for hardware acceleration schemes of the system.

And 5: and aggregating the intermediate models of the plurality of terminal devices by utilizing a FedAVG algorithm to generate a global model.

Step 6: based on the data set originally collected and screened by each terminal, knowledge distillation is carried out by utilizing the global model and a plurality of local models corresponding to the global model, the global model is used as a teacher model, and knowledge is taught to the local models used as student models.

And 7: the local model and the intermediate model are deployed in a digital accelerator (such as a GPU) and perform certain processing on the input of the model, specifically, after the global model finishes reasoning and labeling on a picture set uploaded by a terminal model, if no small target object exists in the picture, the picture pixels are reduced, and if the small object exists in the picture, the picture pixels are not compressed in order to ensure the retraining effect. Under the optimization scheme, the precision of the model is not obviously reduced, the video memory occupation is obviously reduced, the feature map input to the network is reduced due to the reduction of picture pixels, intermediate results in model training are reduced, and the operation amount and energy consumption of retraining are reduced, so that the retraining process of the system is accelerated, and the performance of the system is optimized.

And 8: the global model is deployed on the simulation accelerator, and is optimized, so that robustness of the global model to weighted noise is improved, and a better effect is achieved on the simulation accelerator. On one hand, the model can be converged to a more gentle minimum value point by adopting mutual learning, so that the robustness of the model to noise is enhanced; on the other hand, weight noise is injected in the mutual learning process of the local model and the intermediate model, so that the model adapts to the influence of noise, and specifically, in the mutual learning forward propagation process, the weight of the l-th layer is made to satisfy the following conditions:

W _l ∈N(W _l0 ,σ _N,l ² )

σ _N,l ＝μ(W _lmax -W _lmin )

in addition, in order to prevent the excessive weight selection amplitude from affecting the efficiency and the accuracy of model training, we further constrain the weights:

W _lmin ≤W _l ≤W _lmax

wherein, W _l0 Is the original l-th layer weight, σ _N,L For injected noise, μ is the noise figure, W _lmax And W _lmin The largest weight and the smallest weight of the ith layer. Under the training mode, the robustness of the aggregated global model to the weight noise can be enhanced, the accuracy reduction of the aggregated global model in inference prediction on a simulation accelerator is reduced, and the overall performance of the system is improved.

While the invention has been described with reference to specific embodiments, the invention is not limited thereto, and various equivalent modifications or substitutions can be easily made by those skilled in the art within the technical scope of the present disclosure.

Claims

1. A multi-terminal video detection model online retraining method is characterized by comprising the following steps:

step 1: uploading the data collected and screened by each terminal to an edge terminal, and labeling data without labels by using a global model located in an edge server;

step 2: obtaining a data set of each intermediate model before aggregation and feature distribution of overall data by using a sample label of a local model and a prediction module of the intermediate model;

and step 3: all local models sharing a global model are sampled in parallel on the overall characteristic distribution according to the characteristic distribution to obtain an augmentation sample, and the original data set of the augmentation sample is updated;

L _local ＝αL _Clocal +(1-αD _KL (P _mid ||P _local )

L _mid ＝βL _Cmid +(1-β)D _KL (P _mid ||P _local )

and 6: based on the originally collected and screened data set of each terminal, knowledge distillation is carried out by utilizing the global model and a plurality of local models corresponding to the global model, the global model is used as a teacher model, and knowledge is taught to the local models used as student models;

and 7: the method comprises the following steps that a local model and an intermediate model are deployed in a digital accelerator, and certain processing is carried out on the input of the models, specifically, after the global model finishes reasoning and labeling a picture set uploaded by a terminal model, if no small target object exists in a picture, picture pixels are reduced, and if the small object exists in the picture, the picture pixels are not compressed in order to ensure the retraining effect;

and step 8: deploying the global model on the simulation accelerator, and optimizing the global model: on one hand, the adopted mutual learning can make the model converge to a more gentle minimum value point, so that the robustness of the model to noise is enhanced; on the other hand, weight noise is injected in the mutual learning process of the local model and the intermediate model, so that the model adapts to the influence of noise, and specifically, in the mutual learning forward propagation process, the weight of the l-th layer is made to satisfy:

W _l ∈N(W _l0 ,σ _N,l ² )

σ _N,l ＝μ(W _lmax -W _lmin )

W _lmin ≤W _l ≤W _lmax

2. A computer system, comprising: one or more processors, a computer readable storage medium, for storing one or more programs, wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method of claim 1.

3. A computer-readable storage medium having stored thereon computer-executable instructions for, when executed, implementing the method of claim 1.