CN115587217A - Multi-terminal video detection model online retraining method - Google Patents

Multi-terminal video detection model online retraining method Download PDF

Info

Publication number
CN115587217A
CN115587217A CN202211268163.9A CN202211268163A CN115587217A CN 115587217 A CN115587217 A CN 115587217A CN 202211268163 A CN202211268163 A CN 202211268163A CN 115587217 A CN115587217 A CN 115587217A
Authority
CN
China
Prior art keywords
model
local
models
terminal
global
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211268163.9A
Other languages
Chinese (zh)
Inventor
刘思聪
王乐豪
於志文
于昊艺
郭斌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Northwestern Polytechnical University
Original Assignee
Northwestern Polytechnical University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Northwestern Polytechnical University filed Critical Northwestern Polytechnical University
Priority to CN202211268163.9A priority Critical patent/CN115587217A/en
Publication of CN115587217A publication Critical patent/CN115587217A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/7867Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using information manually generated, e.g. tags, keywords, comments, title and artist information, manually generated time, location and usage information, user ratings
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biophysics (AREA)
  • Multimedia (AREA)
  • Health & Medical Sciences (AREA)
  • Library & Information Science (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Databases & Information Systems (AREA)
  • Computational Linguistics (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to a multi-terminal video detection model online retraining method, which comprises the steps of obtaining characteristic distribution of each terminal and the whole according to sample labels and input modules of intermediate models, sampling on the general characteristic distribution to obtain an augmented sample, and then mutually learning a local model and an intermediate model corresponding to the augmented sample according to a new data set to update the local model and the intermediate model; aggregating a plurality of intermediate models to generate a global model, and carrying out knowledge distillation by using the global model and a plurality of local models corresponding to the global model based on a picture set of an original data set, wherein the global model is used as a teacher model to teach knowledge to the local models used as student models; based on the characteristics of the models, the local model and the intermediate model are deployed on a digital accelerator, the output of the two models is processed to a certain degree, the global model is deployed on a simulation accelerator, and meanwhile, weighted noise is injected in the mutual learning process, so that the models adapt to the influence of the noise.

Description

Multi-terminal video detection model online retraining method
Technical Field
The invention relates to the field of retraining of a compression depth model and hardware acceleration based on deep learning, in particular to an online retraining method of a multi-terminal video detection model.
Background
Along with the improvement of the living standard of people and the high-speed development of science and technology, electronic equipment such as a smart phone is widely popularized, the deployment of a deep learning mobile terminal is concerned by more and more researchers, but the deployment of the neural network on the mobile terminal is difficult and hindered by a huge neural network and a mobile terminal hardware platform with limited resources. Researchers have begun to move from model compression techniques and deep learning hardware accelerators, and have made significant advances in this area. However, video analysis based on deep learning inevitably faces the problem of data drift, that is, video data of a real scene is different from data in training of a deep model, and under the influence of the data, the precision of a shallow lightweight model in the real scene may be obviously reduced, so that the requirements of customers are not met, and online model retraining is an effective way for solving the problem.
At present, most of models are retrained on line by relying on knowledge distillation technology, and in the process, a terminal model is used as a student model to learn from a teacher model located on an edge server, so that model updating is realized. However, because the edge server resources are limited, the deployed 'teacher' model has a simple structure, and the performance of the whole system is affected by the problem of data drift. Meanwhile, the speed of on-line retraining of the model also influences the average reasoning precision of the terminal model, so that the acceleration of the retraining process is also important.
Disclosure of Invention
Technical problem to be solved
In order to avoid the defects of the prior art, the invention provides an online retraining method of a multi-terminal video detection model.
Technical scheme
A multi-terminal video detection model online retraining method is characterized by comprising the following steps:
step 1: uploading the data collected and screened by each terminal to an edge terminal, and labeling the data without the label by using a global model positioned in an edge server;
step 2: obtaining a data set of each intermediate model and the feature distribution of the overall data before aggregation by using a sample label of the local model and a prediction module of the intermediate model;
and step 3: all local models sharing a global model are sampled in parallel on the overall characteristic distribution according to the characteristic distribution to obtain an augmentation sample, and an original data set of the augmentation sample is updated;
and 4, step 4: according to the new data set, the corresponding local model and the corresponding intermediate model are mutually learned, the local model and the intermediate model are updated, and loss functions corresponding to the training of the two models are rewritten as follows:
L local =αL Clocal +(1-α)D KL (P mid ||P local )
L mid =βL Cmid +(1-β)D KL (P mid ||P local )
where α and β are hyper-parameters, for controlling the proportion of knowledge from data and other models, L Clocal And L Cmid Loss functions based on data labels, P, for the local model and the intermediate model, respectively local And P mid Respectively are the inference results of the local model and the intermediate model;
and 5: aggregating the intermediate models of the plurality of terminal devices by utilizing a FedAVG algorithm to generate a global model;
step 6: based on the originally collected and screened data set of each terminal, knowledge distillation is carried out by utilizing the global model and a plurality of local models corresponding to the global model, the global model is used as a teacher model, and knowledge is taught to the local models used as student models;
and 7: the method comprises the steps that a local model and an intermediate model are deployed in a digital accelerator, and input of the models is processed to a certain degree, specifically, after a picture set uploaded by a terminal model is subjected to inference and labeling by a global model, if a small target object does not exist in a picture, picture pixels are reduced, and if a small object exists in the picture, the picture pixels are not compressed in order to guarantee the retraining effect;
and 8: deploying the global model on the simulation accelerator, and optimizing the global model: on one hand, the adopted mutual learning can make the model converge to a more gentle minimum value point, so that the robustness of the model to noise is enhanced; on the other hand, weight noise is injected in the mutual learning process of the local model and the intermediate model, so that the model adapts to the influence of noise, and specifically, in the mutual learning forward propagation process, the weight of the l-th layer is made to satisfy:
W l ∈N(W l0N,l 2 )
σ N,l =μ(W lmax -W lmin )
in addition, in order to prevent the too large weight selection amplitude from affecting the efficiency and precision of model training, the weights are further restricted:
W lmin ≤W l ≤W lmax
wherein, W l0 Is the original layer I weight, σ N,l For injected noise, μ is the noise figure, W lmax And W lmin The maximum weight and the minimum weight of the ith layer are respectively;
and step 9: and the edge server transmits the updated model parameters back to the terminal, and the terminal immediately deploys a new model and continues to perform real-time video analysis.
A computer system, comprising: one or more processors, a computer readable storage medium, for storing one or more programs, wherein when the one or more programs are executed by the one or more processors, the one or more processors are caused to implement the above-described method.
A computer-readable storage medium having stored thereon computer-executable instructions for performing the above-described method when executed.
Advantageous effects
According to the multi-terminal video detection model online retraining method provided by the invention, online updating of the terminal model is completed by utilizing a terminal compression model continuous evolution framework based on mutual learning, so that the precision of a compression depth model influenced by data drift is obviously improved, meanwhile, the model updating speed is greatly accelerated by a retraining hardware acceleration method based on memory calculation, hardware resources required by model retraining are greatly reduced, and the retraining efficiency is improved.
Drawings
The drawings, in which like reference numerals refer to like parts throughout, are for the purpose of illustrating particular embodiments only and are not to be considered limiting of the invention.
Fig. 1 is a schematic diagram of the overall system structure of a low-data-transmission-quantity multi-terminal video detection model online retraining method.
Fig. 2 is a data processing in the system.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail below with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. In addition, the technical features involved in the respective embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.
The present invention utilizes the following principles: the method combines algorithms such as federal mutual learning and sample enhancement, realizes retraining of a terminal model and a global model and updating of the models, selects a proper hardware acceleration scheme for each model based on the characteristics of each model in a continuous evolution system, and optimizes the model deployed on the hardware acceleration scheme according to the characteristics of the hardware, so that the software and the hardware are more adaptive. The invention can greatly improve the reasoning precision of the terminal model influenced by data drift, and after hardware acceleration, the model precision, the resource occupation and the acceleration effect are remarkably improved.
The invention has 2 figures in total, please refer to fig. 1 and fig. 2, the specific steps of the invention are as follows:
step 1: and uploading the data collected and screened by each terminal to an edge terminal, and labeling the data without the label by using a global model (uncompressed model) positioned in an edge server.
Step 2: and obtaining the data set of each intermediate model before aggregation and the feature distribution of the overall data by using a sample label of a local model (a compression model for video inference at a terminal) and a prediction module of an intermediate model (the model is the same as an original global model and is used for mutual learning and model aggregation).
And step 3: all local models sharing one global model are parallelly sampled on the feature distribution according to the overall feature distribution to obtain an augmented sample, an original data set of the augmented sample is updated, and the influence of different terminal device data non-id can be reduced with the help of sample enhancement, so that the accuracy of the global model generated by aggregation is improved, and the heterogeneous problem of data and knowledge in federal learning is solved.
And 4, step 4: according to the new data set, the corresponding local model and the corresponding intermediate model are mutually learned, the local model and the intermediate model are updated, and loss functions corresponding to the training of the two models are rewritten as follows:
L local =αL Clocal +(1-α)D KL (P mid ||P local )
L mid =βL Cmid +(1-β)D KL (P mid ||P local )
where α and β are hyper-parameters, used to control the ratio of knowledge from data and other models, L Clocal And L Cmid Loss functions based on data labels, P, for the local model and the intermediate model, respectively local And P mid Are the inference results of the local model and the intermediate model, respectively. In the mutual learning process, the precision of the local model and the intermediate model can be improved, the training effect of mutual learning is better than the effect of separate and independent training of the two models, and the concrete advantages are approximately three points: firstly, the class probability estimation output by the neural network can restore the contact information between different classes in the data to a certain extent, so that the interaction of class estimation between networks can be transferred and learned to the data distribution characteristics,thereby improving the generalization capability of the model; secondly, mutual learning also plays a certain regularization role, the prediction result of the model is over-determined in the training process due to the independent hot coding of the truth labels, so that overfitting of the model is easily caused, and the models learn the class probability of each other in the mutual learning process, so that the phenomenon can be effectively prevented; finally, the network adjusts the learning process of the network by referring to the learning experience of other models in the training process, so that the result can be converged to a more gentle minimum value point, and the network has better generalization capability and has reduced sensitivity to noise, thereby enabling the network to have more choices for hardware acceleration schemes of the system.
And 5: and aggregating the intermediate models of the plurality of terminal devices by utilizing a FedAVG algorithm to generate a global model.
Step 6: based on the data set originally collected and screened by each terminal, knowledge distillation is carried out by utilizing the global model and a plurality of local models corresponding to the global model, the global model is used as a teacher model, and knowledge is taught to the local models used as student models.
And 7: the local model and the intermediate model are deployed in a digital accelerator (such as a GPU) and perform certain processing on the input of the model, specifically, after the global model finishes reasoning and labeling on a picture set uploaded by a terminal model, if no small target object exists in the picture, the picture pixels are reduced, and if the small object exists in the picture, the picture pixels are not compressed in order to ensure the retraining effect. Under the optimization scheme, the precision of the model is not obviously reduced, the video memory occupation is obviously reduced, the feature map input to the network is reduced due to the reduction of picture pixels, intermediate results in model training are reduced, and the operation amount and energy consumption of retraining are reduced, so that the retraining process of the system is accelerated, and the performance of the system is optimized.
And 8: the global model is deployed on the simulation accelerator, and is optimized, so that robustness of the global model to weighted noise is improved, and a better effect is achieved on the simulation accelerator. On one hand, the model can be converged to a more gentle minimum value point by adopting mutual learning, so that the robustness of the model to noise is enhanced; on the other hand, weight noise is injected in the mutual learning process of the local model and the intermediate model, so that the model adapts to the influence of noise, and specifically, in the mutual learning forward propagation process, the weight of the l-th layer is made to satisfy the following conditions:
W l ∈N(W l0N,l 2 )
σ N,l =μ(W lmax -W lmin )
in addition, in order to prevent the excessive weight selection amplitude from affecting the efficiency and the accuracy of model training, we further constrain the weights:
W lmin ≤W l ≤W lmax
wherein, W l0 Is the original l-th layer weight, σ N,L For injected noise, μ is the noise figure, W lmax And W lmin The largest weight and the smallest weight of the ith layer. Under the training mode, the robustness of the aggregated global model to the weight noise can be enhanced, the accuracy reduction of the aggregated global model in inference prediction on a simulation accelerator is reduced, and the overall performance of the system is improved.
And step 9: and the edge server transmits the updated model parameters back to the terminal, and the terminal immediately deploys a new model and continues to perform real-time video analysis.
While the invention has been described with reference to specific embodiments, the invention is not limited thereto, and various equivalent modifications or substitutions can be easily made by those skilled in the art within the technical scope of the present disclosure.

Claims (3)

1. A multi-terminal video detection model online retraining method is characterized by comprising the following steps:
step 1: uploading the data collected and screened by each terminal to an edge terminal, and labeling data without labels by using a global model located in an edge server;
step 2: obtaining a data set of each intermediate model before aggregation and feature distribution of overall data by using a sample label of a local model and a prediction module of the intermediate model;
and step 3: all local models sharing a global model are sampled in parallel on the overall characteristic distribution according to the characteristic distribution to obtain an augmentation sample, and the original data set of the augmentation sample is updated;
and 4, step 4: according to the new data set, the corresponding local model and the corresponding intermediate model are mutually learned, the local model and the intermediate model are updated, and loss functions corresponding to the training of the two models are rewritten as follows:
L local =αL Clocal +(1-αD KL (P mid ||P local )
L mid =βL Cmid +(1-β)D KL (P mid ||P local )
where α and β are hyper-parameters, for controlling the proportion of knowledge from data and other models, L Clocal And L Cmid Loss functions based on data labels, P, for the local model and the intermediate model, respectively local And P mid Respectively are the inference results of the local model and the intermediate model;
and 5: aggregating the intermediate models of the plurality of terminal devices by utilizing a FedAVG algorithm to generate a global model;
and 6: based on the originally collected and screened data set of each terminal, knowledge distillation is carried out by utilizing the global model and a plurality of local models corresponding to the global model, the global model is used as a teacher model, and knowledge is taught to the local models used as student models;
and 7: the method comprises the following steps that a local model and an intermediate model are deployed in a digital accelerator, and certain processing is carried out on the input of the models, specifically, after the global model finishes reasoning and labeling a picture set uploaded by a terminal model, if no small target object exists in a picture, picture pixels are reduced, and if the small object exists in the picture, the picture pixels are not compressed in order to ensure the retraining effect;
and step 8: deploying the global model on the simulation accelerator, and optimizing the global model: on one hand, the adopted mutual learning can make the model converge to a more gentle minimum value point, so that the robustness of the model to noise is enhanced; on the other hand, weight noise is injected in the mutual learning process of the local model and the intermediate model, so that the model adapts to the influence of noise, and specifically, in the mutual learning forward propagation process, the weight of the l-th layer is made to satisfy:
W l ∈N(W l0N,l 2 )
σ N,l =μ(W lmax -W lmin )
in addition, in order to prevent the too large weight selection amplitude from affecting the efficiency and precision of model training, the weights are further restricted:
W lmin ≤W l ≤W lmax
wherein, W l0 Is the original layer I weight, σ N,l For injected noise, μ is the noise figure, W lmax And W lmin The maximum weight and the minimum weight of the ith layer are respectively;
and step 9: and the edge server transmits the updated model parameters back to the terminal, and the terminal immediately deploys a new model and continues to perform real-time video analysis.
2. A computer system, comprising: one or more processors, a computer readable storage medium, for storing one or more programs, wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method of claim 1.
3. A computer-readable storage medium having stored thereon computer-executable instructions for, when executed, implementing the method of claim 1.
CN202211268163.9A 2022-10-17 2022-10-17 Multi-terminal video detection model online retraining method Pending CN115587217A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211268163.9A CN115587217A (en) 2022-10-17 2022-10-17 Multi-terminal video detection model online retraining method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211268163.9A CN115587217A (en) 2022-10-17 2022-10-17 Multi-terminal video detection model online retraining method

Publications (1)

Publication Number Publication Date
CN115587217A true CN115587217A (en) 2023-01-10

Family

ID=84780678

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211268163.9A Pending CN115587217A (en) 2022-10-17 2022-10-17 Multi-terminal video detection model online retraining method

Country Status (1)

Country Link
CN (1) CN115587217A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116525117A (en) * 2023-07-04 2023-08-01 之江实验室 Data distribution drift detection and self-adaption oriented clinical risk prediction system

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116525117A (en) * 2023-07-04 2023-08-01 之江实验室 Data distribution drift detection and self-adaption oriented clinical risk prediction system
CN116525117B (en) * 2023-07-04 2023-10-10 之江实验室 Data distribution drift detection and self-adaption oriented clinical risk prediction system

Similar Documents

Publication Publication Date Title
CN110335290B (en) Twin candidate region generation network target tracking method based on attention mechanism
CN109891897B (en) Method for analyzing media content
US20210042580A1 (en) Model training method and apparatus for image recognition, network device, and storage medium
CN113159073B (en) Knowledge distillation method and device, storage medium and terminal
US20220351019A1 (en) Adaptive Search Method and Apparatus for Neural Network
WO2021218517A1 (en) Method for acquiring neural network model, and image processing method and apparatus
CN113326930B (en) Data processing method, neural network training method, related device and equipment
CN113705769A (en) Neural network training method and device
CN113313119B (en) Image recognition method, device, equipment, medium and product
CN111989696A (en) Neural network for scalable continuous learning in domains with sequential learning tasks
WO2021042857A1 (en) Processing method and processing apparatus for image segmentation model
CN111797992A (en) Machine learning optimization method and device
CN113792621B (en) FPGA-based target detection accelerator design method
CN115081588A (en) Neural network parameter quantification method and device
CN115759237A (en) End-to-end deep neural network model compression and heterogeneous conversion system and method
CN115587217A (en) Multi-terminal video detection model online retraining method
CN112580627A (en) Yoov 3 target detection method based on domestic intelligent chip K210 and electronic device
CN116194933A (en) Processing system, processing method, and processing program
CN112906800B (en) Image group self-adaptive collaborative saliency detection method
US20220004849A1 (en) Image processing neural networks with dynamic filter activation
CN115965078A (en) Classification prediction model training method, classification prediction method, device and storage medium
CN116363415A (en) Ship target detection method based on self-adaptive feature layer fusion
CN115810129A (en) Object classification method based on lightweight network
CN113343924B (en) Modulation signal identification method based on cyclic spectrum characteristics and generation countermeasure network
Lin et al. Collaborative Framework of Accelerating Reinforcement Learning Training with Supervised Learning Based on Edge Computing

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination