CN116071624B

CN116071624B - Smoking detection data labeling method based on active learning

Info

Publication number: CN116071624B
Application number: CN202310042572.5A
Authority: CN
Inventors: 刘鹏; 张真; 张堃; 王美民; 江兴斌
Original assignee: Nanjing Innovative Data Technologies Inc
Current assignee: Nanjing Innovative Data Technologies Inc
Priority date: 2023-01-28
Filing date: 2023-01-28
Publication date: 2023-06-27
Anticipated expiration: 2043-01-28
Also published as: CN116071624A

Abstract

The invention discloses a smoking detection data labeling method based on active learning, which comprises the following steps: s1, acquiring smoking data and cigarette data through a network, and pre-training by utilizing YOLOv7 to obtain a primary model; s2, deploying the primary model into an actual scene, collecting an actual image through a camera, and testing the primary model; s3, screening samples through an active learning strategy according to a prediction result obtained by the primary deployment model; and S4, carrying out targeted labeling on the screened samples by a data labeling person, retraining a next generation model, and repeatedly executing the steps S1 to S4. The invention provides a new mode for annotating data and model iteration, effectively reduces the data annotating time, reduces the annotating cost of a data annotator and improves the efficiency of iteration models.

Description

Smoking detection data labeling method based on active learning

Technical Field

The invention relates to the technical field of computer vision algorithms, in particular to a smoking detection data labeling method based on active learning.

Background

The data is used as production data in an artificial intelligent model, is one of four driving locomotives of the artificial intelligent, and needs to continuously iterate the model in the process of landing an actual model algorithm so as to adapt to a specific application scene, and particularly, after one detection model is deployed in the actual scene, various error detection and omission detection problems can be caused because the data in the actual scene and the data in the training model do not belong to the same distribution. Therefore, secondary data acquisition is required for data annotation and retraining. In this process, there is a large amount of similar data to be annotated. Because the data annotators do not know the details of the training of the specific model, the repeated data can not be judged how to discard, so that huge useless workload is caused, and the iterative model is not advanced. In addition, the training time is prolonged by repeating useless data in a large amount, and the efficiency of the iterative model is reduced.

Aiming at the problem of detecting whether a pedestrian smokes in a public place, in the current situation, the cigarette is smaller in a specific scene and is easy to confuse with objects such as a frame, a card and the like of a mobile phone, so that detection errors can be caused, a large number of false alarms can be generated each time after deployment, and a large number of re-labels are required for a data annotator.

For the problems in the related art, no effective solution has been proposed at present.

Disclosure of Invention

Aiming at the problems in the related art, the invention provides a smoking detection data labeling method based on active learning, which aims to solve the technical problems of overhigh data labeling cost and overlow efficiency of a target detection algorithm in the prior related art.

For this purpose, the invention adopts the following specific technical scheme:

a smoking detection data labeling method based on active learning comprises the following steps:

s1, acquiring smoking data and cigarette data through a network, and pre-training by utilizing YOLOv7 to obtain a primary model;

s2, deploying the primary model into an actual scene, collecting an actual image through a camera, and testing the primary model;

s3, screening samples through an active learning strategy according to a prediction result obtained by the primary deployment model;

and S4, carrying out targeted labeling on the screened samples by a data annotator, retraining a next occasional uncertainty generation model, and repeatedly executing the steps S1 to S4.

Further, the method for acquiring smoking data and cigarette data through a network and pre-training by utilizing YOLOv7 to obtain a primary model comprises the following steps:

s11, acquiring a cigarette close-up picture and a display picture set when the cigarettes are sold in a network;

and step S12, performing model training by using a training script provided by the YOLOv7 official, so as to obtain a primary model.

Further, the deploying the primary model into the actual scene, collecting the actual image through the camera, and testing the primary model includes the following steps:

s21, deploying the model into an actual application scene, and collecting an actual image through a camera to perform error detection on the model;

and S22, approving, recording and storing the error detection picture by an inspector to obtain a database containing an error data set.

Further, the screening of the sample through the active learning strategy according to the prediction result obtained by the primary deployment model comprises the following steps:

step S31, screening data by utilizing active learning;

and S32, calculating the information score of the picture by evaluating the occasional uncertainty and the cognitive uncertainty, and judging whether the picture is marked.

Further, the screening of the data by active learning includes the following steps:

step S311, after the YOLOv7 output layer is connected, a Gaussian mixture density network is connected to predict the average value of the Gaussian mixture distribution

Sum of variances->

Calculating occasional uncertainty +.>

And cognitive uncertainty->

；

Step S312, outputting three groups of parameters for the position of the target frame through a Gaussian mixture density network model;

step S313, calculating Gaussian distribution weights, gaussian distribution mean values and Gaussian distribution variances of the position information of the target frame according to the parameters.

Further, the occasional uncertainty

And cognitive uncertainty->

The calculation formula of (2) is as follows:

；

where k=1, 2, …, k, and k is the number of gaussian distributions of the mixture gaussian model,

is the weight of the kth gaussian distribution parameter.

Further, the three sets of parameters are u, sigma, pi;

where u is the mean, sigma is the variance, pi is the mixing coefficient;

the characteristics corresponding to the three groups of parameters comprise an abscissa x, an ordinate y, a width w and a height h of the center of the target frame;

according to the parameters, the Gaussian distribution weight, the Gaussian distribution mean value and the Gaussian distribution variance of the position information of the target frame are calculated, and the calculation modes are respectively as follows:

；

；

；

；

in the method, in the process of the invention,

four candidate values total->

Representing all object frames in a picture, < >>

Is a normalization function.

Further, the calculating the information score of the picture by evaluating the occasional uncertainty and the cognitive uncertainty, and judging whether the picture is marked comprises the following steps:

step S321, define

A score aggregated for occasional uncertainty and cognitive uncertainty for a jth target object in an ith picture;

step S322, defining the set of all scores as

And calculating the mean +.>

Sum of variances->

；

Step S323, normalizing the uncertainty and the information quantity of the pictures to obtain the information quantity of each frame of each picture;

step S324, the information quantity of the ith target frame is calculated, and whether the picture should be marked or not is judged according to a specified threshold value.

Further, the calculation formula of the score of the aggregate of the occasional uncertainty and the cognitive uncertainty of the jth target object in the ith picture is as follows:

；

in the method, in the process of the invention,aas an abbreviation for occasional uncertainty,eas an abbreviation for the uncertainty of cognition,uthe information quantity loss degree;

the calculation formula for normalizing the uncertainty and the information quantity of the picture is as follows:

；

the calculation formula for calculating the information quantity of the ith target frame is as follows:

；

in the method, in the process of the invention,

the number of target frames in the ith picture.

Further, the pertinence labeling of the screened samples by the data annotators, retraining the next generation model, and repeating the steps S1 to S4 includes the following steps:

step S41, the screened pictures are sent to a data annotator for re-annotation;

and step S42, training the data set of the model by utilizing the new data set mixed primary model to obtain a new generation model.

The beneficial effects of the invention are as follows:

1. the invention aims to effectively reduce the re-labeling of repeated data by a data labeling person and increase the effectiveness and efficiency of the iteration of a deployment model. The data annotation is a crucial part in the whole process, and is also a guarantee of model iteration efficiency. In an actual task, a large amount of redundancy exists in a plurality of wrong data, so that the model cannot be effectively evolved and iterated due to repeated data annotation and useless data annotation.

2. The task scene of the invention is to detect whether a pedestrian smokes in a public place, and the image acquired by the camera uses a target detection model YOLOv7 to detect the position and the area of the cigarette, and the automatic labeling method provided by the invention relies on a mixed Gao Simi-degree network, so that the position of each predicted target frame and the probability distribution output by the classification head can be estimated, and the input can be predicted as parameters of a certain Gaussian probability distribution, including the mean value and the variance. The invention estimates accidental uncertainty and cognitive uncertainty through forward propagation of a single model, and also summarizes the uncertainty of the two types through a scoring function to be set as two parts so as to acquire the information quantity fraction of each image.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are needed in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flow chart of a smoke detection data annotation method based on active learning according to an embodiment of the invention;

FIG. 2 is a smoke detection data annotation method based on active learning according to an embodiment of the invention;

fig. 3 is an inference process of YOLOv7 model in a smoke detection data labeling method based on active learning according to an embodiment of the present invention.

Detailed Description

For the purpose of further illustrating the various embodiments, the present invention provides the accompanying drawings, which are a part of the disclosure of the present invention, and which are mainly used for illustrating the embodiments and for explaining the principles of the operation of the embodiments in conjunction with the description thereof, and with reference to these matters, it will be apparent to those skilled in the art to which the present invention pertains that other possible embodiments and advantages of the present invention may be practiced.

According to the embodiment of the invention, a smoke detection data labeling method based on active learning is provided.

The invention will be further described with reference to the accompanying drawings and detailed description, as shown in fig. 1 and fig. 2, a method for labeling smoking detection data based on active learning according to an embodiment of the invention, the method comprises the following steps:

specifically, the YOLOv7 detection model is a YOLO (You Look Only Once) series target detection model, and the model is a dense small target detection model based on a convolutional neural network.

As shown in fig. 3, the reasoning process of the YOLOv7 model is as follows:

step one: firstly, extracting features of an image to be detected through a feature extraction skeleton network, wherein the whole skeleton network has 50 layers, firstly, through 4 layers of convolution modules, each volume base module consists of a convolution network layer, a batch normalization layer and a SiLU activation function, wherein the volume base module comprises a plurality of volume base modules and a plurality of volume base modules, wherein the volume base modules are respectively connected with the volume base modules through the convolution network layer and the SiLU activation function

Is a nonlinear activation function, if the input image is 640×640×3, the feature map size obtained by the 4-layer convolution module is 160×160×128;

step two: the method comprises the steps of entering an ELAN module, wherein the input characteristic map and the output characteristic map of the ELAN module are the same in size, the input characteristic map is respectively transformed into the same size by 4 convolution modules, channels are half of the number of channels of the original input characteristic map, then the channels are spliced according to the channel direction to form 2 times of the number of channels of the original input characteristic map, and finally the output identical to the input characteristic map can be changed back through the convolution modules, wherein the ELAN is specifically expressed as follows:

；

wherein, the liquid crystal display device comprises a liquid crystal display device,

for convolution module->

For the splicing function->

Transforming the outputs on the paths for 4 convolution modules;

step three: after passing through the ELAN, finally, downsampling is carried out by using an MP1 module formed by the pooling layer and the convolution module together, and a feature extraction framework is formed by stacking the modules;

step four: features output by the feature extraction skeleton network are detected by a detection head, namely a feature pyramid feature extraction network (SPPCSPC) to respectively extract three types of micro, medium and large target frames, wherein the three types of target frames comprise position information and classification information of the target frames. And the output of the network head spliced characteristic skeleton network and the intermediate result are subjected to the mode of up-sampling and then down-sampling to predict the target frame.

The method for acquiring smoking data and cigarette data through a network and pre-training by utilizing YOLOv7 to obtain a primary model comprises the following steps of:

step S12, performing model training by using a training script provided by the YOLOv7 official to obtain the initial stageModel F ₀ ；

Specifically, training the collected data set by using a YOLOv7 model method, wherein the specific collection mode is that the data set is searched or disclosed through a network;

model training using a training script provided by YOLOv7 official to obtain a primary model F ₀ The method comprises the following steps: the input is a picture, the position information of the two parts of targets is output to comprise the center position x and y of the targets, the width w and the height h of the target position frame and the confidence coefficient C of the target position _box And class probability distribution C of the target box _class =（P _smoke ，P _other ) There are two categories in total in the cigarette detection task, including the cigarette category (smoke) and the other category (other).

the method for testing the primary model comprises the following steps of:

specifically, the test time is generally about 1 month. Due to different scene reasons, the detection model can perform error detection on objects in the scene, such as mobile phones, zippers on back bags and the like;

the method for screening the samples through the active learning strategy according to the prediction result obtained by the primary deployment model comprises the following steps:

step S31, screening data by utilizing active learning;

the active learning is a machine learning or artificial intelligence method for labeling by actively selecting the most valuable sample. The aim is to achieve as good performance of the model as possible using as few, high quality sample labeling as possible. That is, the active learning method can improve the gain of the sample and the label, and maximize the performance of the model under the premise of limited label budget, which is a scheme for improving the data efficiency from the perspective of the sample, so that the method is applied to tasks with high labeling cost, high labeling difficulty and the like, such as medical images, unmanned operation, abnormal detection and related problems based on internet big data.

Specifically, the screening of the data by active learning includes the following steps:

Sum of variances->

Calculating occasional uncertainty +.>

And cognitive uncertainty->

；

step S313, calculating Gaussian distribution weight, gaussian distribution mean and Gaussian distribution variance of the position information of the target frame according to the parameters;

in particular, the occasional uncertainty

And cognitive uncertainty->

The calculation formula of (2) is as follows:

；

the weight of the kth Gaussian distribution parameter;

the three sets of parameters are u, sigma, pi;

where u is the mean, sigma is the variance, pi is the mixing coefficient (i.e., the coefficient fusing the mean values);

thenu _x The mean value of the horizontal coordinates of the center point of the target frame;

the center point, width and height of the target frame are respectively

，/>

，/>

；

；

；

；

；

in the method, in the process of the invention,

four candidate values total->

Representing all object frames in a picture, < >>

Is a normalization function;

step S32, calculating information score of the picture by evaluating occasional uncertainty and cognitive uncertainty, and judging whether the picture is marked;

specifically, the calculating the information score of the picture by evaluating the occasional uncertainty and the cognitive uncertainty, and judging whether the picture is marked comprises the following steps:

step S321, define

step S322, defining the set of all scores as

And calculating the mean +.>

Sum of variances->

；

step S324, calculate the firstiInformation quantity of each target frame, and judging whether the picture should be marked or not according to a specified threshold value;

specifically, the firstiThe first of the picturesjTarget objectThe calculation formula of the score for occasional uncertainty and cognitive uncertainty aggregation of the volume is:

；

in which a picture has a plurality of objects each having a frame, theniThe first of the picturesjThe occasional uncertainty of a box is marked asijSubscript isa、uShort for indicating uncertainties of uncerty, the sum of cognitive uncertainty and accidental uncertainty is the degree of information loss;

the information loss degree is normalized to obtain u _hat (u with a cap on top),

representing the average of all the information loss degrees.

；

；

in the method, in the process of the invention,

the number of target frames in the ith picture.

S4, carrying out targeted labeling on the screened samples by a data labeling person, retraining a next generation model, and repeatedly executing the steps S1 to S4;

the method comprises the steps of carrying out targeted labeling on the screened samples by a data annotator, retraining a next generation model, and repeatedly executing the steps S1 to S4, wherein the steps comprise the following steps:

step S41, the screened pictures are sent to a data annotator for re-annotation;

step S42, retraining the data set of the training model by using the new data set and the mixed primary model to obtain a new generation model F ₁ ；

The data annotators refer to personnel capable of annotating the data sets through the data annotating tool.

In summary, by means of the above technical scheme of the invention, in order to effectively reduce the re-labeling of repeated data by a data labeling person and increase the effectiveness and efficiency of model iteration deployment, the model landing process of the detection algorithm mainly comprises data acquisition, data labeling, model training, primary model deployment, false result acquisition, re-labeling and model iteration. The data annotation is a crucial part in the whole process, and is also a guarantee of model iteration efficiency. In an actual task, a large amount of redundancy exists in a plurality of wrong data, so that the model cannot be effectively evolved and iterated due to repeated data annotation and useless data annotation.

The task scene of the invention is to detect whether pedestrians smoke in public places, and the core problem is that the position and the area of the cigarettes are detected by using a target detection model YOLOv7 through images acquired by a camera. The automatic labeling method provided by the invention relies on a mixed Gao Simi-degree network, and can estimate the position of each predicted target frame and the probability distribution output by the classification head, so that the input can be predicted as parameters of a certain Gaussian probability distribution, including the mean value and the variance. The invention estimates accidental uncertainty and cognitive uncertainty through forward propagation of a single model, and also summarizes the uncertainty of the two types through a scoring function to be set as two parts so as to acquire the information quantity fraction of each image.

The foregoing description of the preferred embodiments of the invention is not intended to be limiting, but rather is intended to cover all modifications, equivalents, alternatives, and improvements that fall within the spirit and scope of the invention.

Claims

1. The smoking detection data labeling method based on active learning is characterized by comprising the following steps of:

the method for screening the sample through the active learning strategy according to the prediction result obtained by the primary deployment model comprises the following steps:

step S31, screening data by utilizing active learning;

the screening of the data by active learning comprises the following steps:

Sum of variances->

Calculating the puppetMolar uncertainty->

And cognitive uncertainty->

；

the occasional uncertainty

And cognitive uncertainty->

The calculation formula of (2) is as follows:

；

；

the weight of the kth Gaussian distribution parameter;

the three sets of parameters are u, sigma, pi;

where u is the mean, sigma is the variance, pi is the mixing coefficient;

；

；

；

；

in the method, in the process of the invention,

four candidate values total->

Representing all object frames in a picture, < >>

Is a normalization function;

the method for calculating the information score of the picture by evaluating the occasional uncertainty and the cognitive uncertainty comprises the following steps of:

step S321, define

step S322, defining the set of all scores as

And calculating the mean +.>

Sum of variances->

；

step S324, calculating the information quantity of the ith target frame, and judging whether the picture should be marked or not according to a specified threshold value;

the calculation formula of the score of the occasional uncertainty and the cognitive uncertainty aggregate of the jth target object in the ith picture is as follows:

；

；

；

in the method, in the process of the invention,

the number of target frames in the ith picture.

2. The method for labeling smoking detection data based on active learning according to claim 1, wherein the steps of collecting smoking data and cigarette data through a network and pre-training by utilizing YOLOv7 to obtain a primary model comprise the following steps:

3. The method for labeling smoking detection data based on active learning according to claim 2, wherein the deploying the primary model into an actual scene, collecting an actual image by a camera, and testing the primary model comprises the following steps:

4. The method for labeling smoking detection data based on active learning according to claim 3, wherein the targeted labeling of the screened samples by the data labeler, retraining the next generation model, and repeating steps S1 to S4 comprises the steps of:

step S41, the screened pictures are sent to a data annotator for re-annotation;