CN116071624B - Smoking detection data labeling method based on active learning - Google Patents

Smoking detection data labeling method based on active learning Download PDF

Info

Publication number
CN116071624B
CN116071624B CN202310042572.5A CN202310042572A CN116071624B CN 116071624 B CN116071624 B CN 116071624B CN 202310042572 A CN202310042572 A CN 202310042572A CN 116071624 B CN116071624 B CN 116071624B
Authority
CN
China
Prior art keywords
model
data
uncertainty
picture
steps
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310042572.5A
Other languages
Chinese (zh)
Other versions
CN116071624A (en
Inventor
刘鹏
张真
张堃
王美民
江兴斌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing Innovative Data Technologies Inc
Original Assignee
Nanjing Innovative Data Technologies Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing Innovative Data Technologies Inc filed Critical Nanjing Innovative Data Technologies Inc
Priority to CN202310042572.5A priority Critical patent/CN116071624B/en
Publication of CN116071624A publication Critical patent/CN116071624A/en
Application granted granted Critical
Publication of CN116071624B publication Critical patent/CN116071624B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/778Active pattern-learning, e.g. online learning of image or video features
    • G06V10/7784Active pattern-learning, e.g. online learning of image or video features based on feedback from supervisors
    • G06V10/7788Active pattern-learning, e.g. online learning of image or video features based on feedback from supervisors the supervisor being a human, e.g. interactive learning with a human teacher
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/52Surveillance or monitoring of activities, e.g. for recognising suspicious objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/30Computing systems specially adapted for manufacturing

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Databases & Information Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Medical Informatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a smoking detection data labeling method based on active learning, which comprises the following steps: s1, acquiring smoking data and cigarette data through a network, and pre-training by utilizing YOLOv7 to obtain a primary model; s2, deploying the primary model into an actual scene, collecting an actual image through a camera, and testing the primary model; s3, screening samples through an active learning strategy according to a prediction result obtained by the primary deployment model; and S4, carrying out targeted labeling on the screened samples by a data labeling person, retraining a next generation model, and repeatedly executing the steps S1 to S4. The invention provides a new mode for annotating data and model iteration, effectively reduces the data annotating time, reduces the annotating cost of a data annotator and improves the efficiency of iteration models.

Description

Smoking detection data labeling method based on active learning
Technical Field
The invention relates to the technical field of computer vision algorithms, in particular to a smoking detection data labeling method based on active learning.
Background
The data is used as production data in an artificial intelligent model, is one of four driving locomotives of the artificial intelligent, and needs to continuously iterate the model in the process of landing an actual model algorithm so as to adapt to a specific application scene, and particularly, after one detection model is deployed in the actual scene, various error detection and omission detection problems can be caused because the data in the actual scene and the data in the training model do not belong to the same distribution. Therefore, secondary data acquisition is required for data annotation and retraining. In this process, there is a large amount of similar data to be annotated. Because the data annotators do not know the details of the training of the specific model, the repeated data can not be judged how to discard, so that huge useless workload is caused, and the iterative model is not advanced. In addition, the training time is prolonged by repeating useless data in a large amount, and the efficiency of the iterative model is reduced.
Aiming at the problem of detecting whether a pedestrian smokes in a public place, in the current situation, the cigarette is smaller in a specific scene and is easy to confuse with objects such as a frame, a card and the like of a mobile phone, so that detection errors can be caused, a large number of false alarms can be generated each time after deployment, and a large number of re-labels are required for a data annotator.
For the problems in the related art, no effective solution has been proposed at present.
Disclosure of Invention
Aiming at the problems in the related art, the invention provides a smoking detection data labeling method based on active learning, which aims to solve the technical problems of overhigh data labeling cost and overlow efficiency of a target detection algorithm in the prior related art.
For this purpose, the invention adopts the following specific technical scheme:
a smoking detection data labeling method based on active learning comprises the following steps:
s1, acquiring smoking data and cigarette data through a network, and pre-training by utilizing YOLOv7 to obtain a primary model;
s2, deploying the primary model into an actual scene, collecting an actual image through a camera, and testing the primary model;
s3, screening samples through an active learning strategy according to a prediction result obtained by the primary deployment model;
and S4, carrying out targeted labeling on the screened samples by a data annotator, retraining a next occasional uncertainty generation model, and repeatedly executing the steps S1 to S4.
Further, the method for acquiring smoking data and cigarette data through a network and pre-training by utilizing YOLOv7 to obtain a primary model comprises the following steps:
s11, acquiring a cigarette close-up picture and a display picture set when the cigarettes are sold in a network;
and step S12, performing model training by using a training script provided by the YOLOv7 official, so as to obtain a primary model.
Further, the deploying the primary model into the actual scene, collecting the actual image through the camera, and testing the primary model includes the following steps:
s21, deploying the model into an actual application scene, and collecting an actual image through a camera to perform error detection on the model;
and S22, approving, recording and storing the error detection picture by an inspector to obtain a database containing an error data set.
Further, the screening of the sample through the active learning strategy according to the prediction result obtained by the primary deployment model comprises the following steps:
step S31, screening data by utilizing active learning;
and S32, calculating the information score of the picture by evaluating the occasional uncertainty and the cognitive uncertainty, and judging whether the picture is marked.
Further, the screening of the data by active learning includes the following steps:
step S311, after the YOLOv7 output layer is connected, a Gaussian mixture density network is connected to predict the average value of the Gaussian mixture distribution
Figure SMS_1
Sum of variances->
Figure SMS_2
Calculating occasional uncertainty +.>
Figure SMS_3
And cognitive uncertainty->
Figure SMS_4
Step S312, outputting three groups of parameters for the position of the target frame through a Gaussian mixture density network model;
step S313, calculating Gaussian distribution weights, gaussian distribution mean values and Gaussian distribution variances of the position information of the target frame according to the parameters.
Further, the occasional uncertainty
Figure SMS_5
And cognitive uncertainty->
Figure SMS_6
The calculation formula of (2) is as follows:
Figure SMS_7
where k=1, 2, …, k, and k is the number of gaussian distributions of the mixture gaussian model,
Figure SMS_8
is the weight of the kth gaussian distribution parameter.
Further, the three sets of parameters are u, sigma, pi;
where u is the mean, sigma is the variance, pi is the mixing coefficient;
the characteristics corresponding to the three groups of parameters comprise an abscissa x, an ordinate y, a width w and a height h of the center of the target frame;
according to the parameters, the Gaussian distribution weight, the Gaussian distribution mean value and the Gaussian distribution variance of the position information of the target frame are calculated, and the calculation modes are respectively as follows:
Figure SMS_9
Figure SMS_10
Figure SMS_11
Figure SMS_12
in the method, in the process of the invention,
Figure SMS_13
four candidate values total->
Figure SMS_14
Representing all object frames in a picture, < >>
Figure SMS_15
Is a normalization function.
Further, the calculating the information score of the picture by evaluating the occasional uncertainty and the cognitive uncertainty, and judging whether the picture is marked comprises the following steps:
step S321, define
Figure SMS_16
A score aggregated for occasional uncertainty and cognitive uncertainty for a jth target object in an ith picture;
step S322, defining the set of all scores as
Figure SMS_17
And calculating the mean +.>
Figure SMS_18
Sum of variances->
Figure SMS_19
Step S323, normalizing the uncertainty and the information quantity of the pictures to obtain the information quantity of each frame of each picture;
step S324, the information quantity of the ith target frame is calculated, and whether the picture should be marked or not is judged according to a specified threshold value.
Further, the calculation formula of the score of the aggregate of the occasional uncertainty and the cognitive uncertainty of the jth target object in the ith picture is as follows:
Figure SMS_20
in the method, in the process of the invention,aas an abbreviation for occasional uncertainty,eas an abbreviation for the uncertainty of cognition,uthe information quantity loss degree;
the calculation formula for normalizing the uncertainty and the information quantity of the picture is as follows:
Figure SMS_21
the calculation formula for calculating the information quantity of the ith target frame is as follows:
Figure SMS_22
in the method, in the process of the invention,
Figure SMS_23
the number of target frames in the ith picture.
Further, the pertinence labeling of the screened samples by the data annotators, retraining the next generation model, and repeating the steps S1 to S4 includes the following steps:
step S41, the screened pictures are sent to a data annotator for re-annotation;
and step S42, training the data set of the model by utilizing the new data set mixed primary model to obtain a new generation model.
The beneficial effects of the invention are as follows:
1. the invention aims to effectively reduce the re-labeling of repeated data by a data labeling person and increase the effectiveness and efficiency of the iteration of a deployment model. The data annotation is a crucial part in the whole process, and is also a guarantee of model iteration efficiency. In an actual task, a large amount of redundancy exists in a plurality of wrong data, so that the model cannot be effectively evolved and iterated due to repeated data annotation and useless data annotation.
2. The task scene of the invention is to detect whether a pedestrian smokes in a public place, and the image acquired by the camera uses a target detection model YOLOv7 to detect the position and the area of the cigarette, and the automatic labeling method provided by the invention relies on a mixed Gao Simi-degree network, so that the position of each predicted target frame and the probability distribution output by the classification head can be estimated, and the input can be predicted as parameters of a certain Gaussian probability distribution, including the mean value and the variance. The invention estimates accidental uncertainty and cognitive uncertainty through forward propagation of a single model, and also summarizes the uncertainty of the two types through a scoring function to be set as two parts so as to acquire the information quantity fraction of each image.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are needed in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of a smoke detection data annotation method based on active learning according to an embodiment of the invention;
FIG. 2 is a smoke detection data annotation method based on active learning according to an embodiment of the invention;
fig. 3 is an inference process of YOLOv7 model in a smoke detection data labeling method based on active learning according to an embodiment of the present invention.
Detailed Description
For the purpose of further illustrating the various embodiments, the present invention provides the accompanying drawings, which are a part of the disclosure of the present invention, and which are mainly used for illustrating the embodiments and for explaining the principles of the operation of the embodiments in conjunction with the description thereof, and with reference to these matters, it will be apparent to those skilled in the art to which the present invention pertains that other possible embodiments and advantages of the present invention may be practiced.
According to the embodiment of the invention, a smoke detection data labeling method based on active learning is provided.
The invention will be further described with reference to the accompanying drawings and detailed description, as shown in fig. 1 and fig. 2, a method for labeling smoking detection data based on active learning according to an embodiment of the invention, the method comprises the following steps:
s1, acquiring smoking data and cigarette data through a network, and pre-training by utilizing YOLOv7 to obtain a primary model;
specifically, the YOLOv7 detection model is a YOLO (You Look Only Once) series target detection model, and the model is a dense small target detection model based on a convolutional neural network.
As shown in fig. 3, the reasoning process of the YOLOv7 model is as follows:
step one: firstly, extracting features of an image to be detected through a feature extraction skeleton network, wherein the whole skeleton network has 50 layers, firstly, through 4 layers of convolution modules, each volume base module consists of a convolution network layer, a batch normalization layer and a SiLU activation function, wherein the volume base module comprises a plurality of volume base modules and a plurality of volume base modules, wherein the volume base modules are respectively connected with the volume base modules through the convolution network layer and the SiLU activation function
Figure SMS_24
Is a nonlinear activation function, if the input image is 640×640×3, the feature map size obtained by the 4-layer convolution module is 160×160×128;
step two: the method comprises the steps of entering an ELAN module, wherein the input characteristic map and the output characteristic map of the ELAN module are the same in size, the input characteristic map is respectively transformed into the same size by 4 convolution modules, channels are half of the number of channels of the original input characteristic map, then the channels are spliced according to the channel direction to form 2 times of the number of channels of the original input characteristic map, and finally the output identical to the input characteristic map can be changed back through the convolution modules, wherein the ELAN is specifically expressed as follows:
Figure SMS_25
wherein, the liquid crystal display device comprises a liquid crystal display device,
Figure SMS_26
for convolution module->
Figure SMS_27
For the splicing function->
Figure SMS_28
Transforming the outputs on the paths for 4 convolution modules;
step three: after passing through the ELAN, finally, downsampling is carried out by using an MP1 module formed by the pooling layer and the convolution module together, and a feature extraction framework is formed by stacking the modules;
step four: features output by the feature extraction skeleton network are detected by a detection head, namely a feature pyramid feature extraction network (SPPCSPC) to respectively extract three types of micro, medium and large target frames, wherein the three types of target frames comprise position information and classification information of the target frames. And the output of the network head spliced characteristic skeleton network and the intermediate result are subjected to the mode of up-sampling and then down-sampling to predict the target frame.
The method for acquiring smoking data and cigarette data through a network and pre-training by utilizing YOLOv7 to obtain a primary model comprises the following steps of:
s11, acquiring a cigarette close-up picture and a display picture set when the cigarettes are sold in a network;
step S12, performing model training by using a training script provided by the YOLOv7 official to obtain the initial stageModel F 0
Specifically, training the collected data set by using a YOLOv7 model method, wherein the specific collection mode is that the data set is searched or disclosed through a network;
model training using a training script provided by YOLOv7 official to obtain a primary model F 0 The method comprises the following steps: the input is a picture, the position information of the two parts of targets is output to comprise the center position x and y of the targets, the width w and the height h of the target position frame and the confidence coefficient C of the target position box And class probability distribution C of the target box class =(P smoke ,P other ) There are two categories in total in the cigarette detection task, including the cigarette category (smoke) and the other category (other).
S2, deploying the primary model into an actual scene, collecting an actual image through a camera, and testing the primary model;
the method for testing the primary model comprises the following steps of:
s21, deploying the model into an actual application scene, and collecting an actual image through a camera to perform error detection on the model;
specifically, the test time is generally about 1 month. Due to different scene reasons, the detection model can perform error detection on objects in the scene, such as mobile phones, zippers on back bags and the like;
and S22, approving, recording and storing the error detection picture by an inspector to obtain a database containing an error data set.
S3, screening samples through an active learning strategy according to a prediction result obtained by the primary deployment model;
the method for screening the samples through the active learning strategy according to the prediction result obtained by the primary deployment model comprises the following steps:
step S31, screening data by utilizing active learning;
the active learning is a machine learning or artificial intelligence method for labeling by actively selecting the most valuable sample. The aim is to achieve as good performance of the model as possible using as few, high quality sample labeling as possible. That is, the active learning method can improve the gain of the sample and the label, and maximize the performance of the model under the premise of limited label budget, which is a scheme for improving the data efficiency from the perspective of the sample, so that the method is applied to tasks with high labeling cost, high labeling difficulty and the like, such as medical images, unmanned operation, abnormal detection and related problems based on internet big data.
Specifically, the screening of the data by active learning includes the following steps:
step S311, after the YOLOv7 output layer is connected, a Gaussian mixture density network is connected to predict the average value of the Gaussian mixture distribution
Figure SMS_29
Sum of variances->
Figure SMS_30
Calculating occasional uncertainty +.>
Figure SMS_31
And cognitive uncertainty->
Figure SMS_32
Step S312, outputting three groups of parameters for the position of the target frame through a Gaussian mixture density network model;
step S313, calculating Gaussian distribution weight, gaussian distribution mean and Gaussian distribution variance of the position information of the target frame according to the parameters;
in particular, the occasional uncertainty
Figure SMS_33
And cognitive uncertainty->
Figure SMS_34
The calculation formula of (2) is as follows:
Figure SMS_35
where k=1, 2, …, k, and k is the number of gaussian distributions of the mixture gaussian model,
Figure SMS_36
the weight of the kth Gaussian distribution parameter;
the three sets of parameters are u, sigma, pi;
where u is the mean, sigma is the variance, pi is the mixing coefficient (i.e., the coefficient fusing the mean values);
the characteristics corresponding to the three groups of parameters comprise an abscissa x, an ordinate y, a width w and a height h of the center of the target frame;
thenu x The mean value of the horizontal coordinates of the center point of the target frame;
the center point, width and height of the target frame are respectively
Figure SMS_37
,/>
Figure SMS_38
,/>
Figure SMS_39
According to the parameters, the Gaussian distribution weight, the Gaussian distribution mean value and the Gaussian distribution variance of the position information of the target frame are calculated, and the calculation modes are respectively as follows:
Figure SMS_40
Figure SMS_41
Figure SMS_42
Figure SMS_43
in the method, in the process of the invention,
Figure SMS_44
four candidate values total->
Figure SMS_45
Representing all object frames in a picture, < >>
Figure SMS_46
Is a normalization function;
step S32, calculating information score of the picture by evaluating occasional uncertainty and cognitive uncertainty, and judging whether the picture is marked;
specifically, the calculating the information score of the picture by evaluating the occasional uncertainty and the cognitive uncertainty, and judging whether the picture is marked comprises the following steps:
step S321, define
Figure SMS_47
A score aggregated for occasional uncertainty and cognitive uncertainty for a jth target object in an ith picture;
step S322, defining the set of all scores as
Figure SMS_48
And calculating the mean +.>
Figure SMS_49
Sum of variances->
Figure SMS_50
Step S323, normalizing the uncertainty and the information quantity of the pictures to obtain the information quantity of each frame of each picture;
step S324, calculate the firstiInformation quantity of each target frame, and judging whether the picture should be marked or not according to a specified threshold value;
specifically, the firstiThe first of the picturesjTarget objectThe calculation formula of the score for occasional uncertainty and cognitive uncertainty aggregation of the volume is:
Figure SMS_51
in the method, in the process of the invention,aas an abbreviation for occasional uncertainty,eas an abbreviation for the uncertainty of cognition,uthe information quantity loss degree;
in which a picture has a plurality of objects each having a frame, theniThe first of the picturesjThe occasional uncertainty of a box is marked asijSubscript isauShort for indicating uncertainties of uncerty, the sum of cognitive uncertainty and accidental uncertainty is the degree of information loss;
the information loss degree is normalized to obtain u hat (u with a cap on top),
Figure SMS_52
representing the average of all the information loss degrees.
The calculation formula for normalizing the uncertainty and the information quantity of the picture is as follows:
Figure SMS_53
the calculation formula for calculating the information quantity of the ith target frame is as follows:
Figure SMS_54
in the method, in the process of the invention,
Figure SMS_55
the number of target frames in the ith picture.
S4, carrying out targeted labeling on the screened samples by a data labeling person, retraining a next generation model, and repeatedly executing the steps S1 to S4;
the method comprises the steps of carrying out targeted labeling on the screened samples by a data annotator, retraining a next generation model, and repeatedly executing the steps S1 to S4, wherein the steps comprise the following steps:
step S41, the screened pictures are sent to a data annotator for re-annotation;
step S42, retraining the data set of the training model by using the new data set and the mixed primary model to obtain a new generation model F 1
The data annotators refer to personnel capable of annotating the data sets through the data annotating tool.
In summary, by means of the above technical scheme of the invention, in order to effectively reduce the re-labeling of repeated data by a data labeling person and increase the effectiveness and efficiency of model iteration deployment, the model landing process of the detection algorithm mainly comprises data acquisition, data labeling, model training, primary model deployment, false result acquisition, re-labeling and model iteration. The data annotation is a crucial part in the whole process, and is also a guarantee of model iteration efficiency. In an actual task, a large amount of redundancy exists in a plurality of wrong data, so that the model cannot be effectively evolved and iterated due to repeated data annotation and useless data annotation.
The task scene of the invention is to detect whether pedestrians smoke in public places, and the core problem is that the position and the area of the cigarettes are detected by using a target detection model YOLOv7 through images acquired by a camera. The automatic labeling method provided by the invention relies on a mixed Gao Simi-degree network, and can estimate the position of each predicted target frame and the probability distribution output by the classification head, so that the input can be predicted as parameters of a certain Gaussian probability distribution, including the mean value and the variance. The invention estimates accidental uncertainty and cognitive uncertainty through forward propagation of a single model, and also summarizes the uncertainty of the two types through a scoring function to be set as two parts so as to acquire the information quantity fraction of each image.
The foregoing description of the preferred embodiments of the invention is not intended to be limiting, but rather is intended to cover all modifications, equivalents, alternatives, and improvements that fall within the spirit and scope of the invention.

Claims (4)

1. The smoking detection data labeling method based on active learning is characterized by comprising the following steps of:
s1, acquiring smoking data and cigarette data through a network, and pre-training by utilizing YOLOv7 to obtain a primary model;
s2, deploying the primary model into an actual scene, collecting an actual image through a camera, and testing the primary model;
s3, screening samples through an active learning strategy according to a prediction result obtained by the primary deployment model;
s4, carrying out targeted labeling on the screened samples by a data labeling person, retraining a next generation model, and repeatedly executing the steps S1 to S4;
the method for screening the sample through the active learning strategy according to the prediction result obtained by the primary deployment model comprises the following steps:
step S31, screening data by utilizing active learning;
step S32, calculating information score of the picture by evaluating occasional uncertainty and cognitive uncertainty, and judging whether the picture is marked;
the screening of the data by active learning comprises the following steps:
step S311, after the YOLOv7 output layer is connected, a Gaussian mixture density network is connected to predict the average value of the Gaussian mixture distribution
Figure QLYQS_1
Sum of variances->
Figure QLYQS_2
Calculating the puppetMolar uncertainty->
Figure QLYQS_3
And cognitive uncertainty->
Figure QLYQS_4
Step S312, outputting three groups of parameters for the position of the target frame through a Gaussian mixture density network model;
step S313, calculating Gaussian distribution weight, gaussian distribution mean and Gaussian distribution variance of the position information of the target frame according to the parameters;
the occasional uncertainty
Figure QLYQS_5
And cognitive uncertainty->
Figure QLYQS_6
The calculation formula of (2) is as follows:
Figure QLYQS_7
Figure QLYQS_8
where k=1, 2, …, k, and k is the number of gaussian distributions of the mixture gaussian model,
Figure QLYQS_9
the weight of the kth Gaussian distribution parameter;
the three sets of parameters are u, sigma, pi;
where u is the mean, sigma is the variance, pi is the mixing coefficient;
the characteristics corresponding to the three groups of parameters comprise an abscissa x, an ordinate y, a width w and a height h of the center of the target frame;
according to the parameters, the Gaussian distribution weight, the Gaussian distribution mean value and the Gaussian distribution variance of the position information of the target frame are calculated, and the calculation modes are respectively as follows:
Figure QLYQS_10
Figure QLYQS_11
Figure QLYQS_12
Figure QLYQS_13
in the method, in the process of the invention,
Figure QLYQS_14
four candidate values total->
Figure QLYQS_15
Representing all object frames in a picture, < >>
Figure QLYQS_16
Is a normalization function;
the method for calculating the information score of the picture by evaluating the occasional uncertainty and the cognitive uncertainty comprises the following steps of:
step S321, define
Figure QLYQS_17
A score aggregated for occasional uncertainty and cognitive uncertainty for a jth target object in an ith picture;
step S322, defining the set of all scores as
Figure QLYQS_18
And calculating the mean +.>
Figure QLYQS_19
Sum of variances->
Figure QLYQS_20
Step S323, normalizing the uncertainty and the information quantity of the pictures to obtain the information quantity of each frame of each picture;
step S324, calculating the information quantity of the ith target frame, and judging whether the picture should be marked or not according to a specified threshold value;
the calculation formula of the score of the occasional uncertainty and the cognitive uncertainty aggregate of the jth target object in the ith picture is as follows:
Figure QLYQS_21
in the method, in the process of the invention,aas an abbreviation for occasional uncertainty,eas an abbreviation for the uncertainty of cognition,uthe information quantity loss degree;
the calculation formula for normalizing the uncertainty and the information quantity of the picture is as follows:
Figure QLYQS_22
the calculation formula for calculating the information quantity of the ith target frame is as follows:
Figure QLYQS_23
in the method, in the process of the invention,
Figure QLYQS_24
the number of target frames in the ith picture.
2. The method for labeling smoking detection data based on active learning according to claim 1, wherein the steps of collecting smoking data and cigarette data through a network and pre-training by utilizing YOLOv7 to obtain a primary model comprise the following steps:
s11, acquiring a cigarette close-up picture and a display picture set when the cigarettes are sold in a network;
and step S12, performing model training by using a training script provided by the YOLOv7 official, so as to obtain a primary model.
3. The method for labeling smoking detection data based on active learning according to claim 2, wherein the deploying the primary model into an actual scene, collecting an actual image by a camera, and testing the primary model comprises the following steps:
s21, deploying the model into an actual application scene, and collecting an actual image through a camera to perform error detection on the model;
and S22, approving, recording and storing the error detection picture by an inspector to obtain a database containing an error data set.
4. The method for labeling smoking detection data based on active learning according to claim 3, wherein the targeted labeling of the screened samples by the data labeler, retraining the next generation model, and repeating steps S1 to S4 comprises the steps of:
step S41, the screened pictures are sent to a data annotator for re-annotation;
and step S42, training the data set of the model by utilizing the new data set mixed primary model to obtain a new generation model.
CN202310042572.5A 2023-01-28 2023-01-28 Smoking detection data labeling method based on active learning Active CN116071624B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310042572.5A CN116071624B (en) 2023-01-28 2023-01-28 Smoking detection data labeling method based on active learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310042572.5A CN116071624B (en) 2023-01-28 2023-01-28 Smoking detection data labeling method based on active learning

Publications (2)

Publication Number Publication Date
CN116071624A CN116071624A (en) 2023-05-05
CN116071624B true CN116071624B (en) 2023-06-27

Family

ID=86176403

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310042572.5A Active CN116071624B (en) 2023-01-28 2023-01-28 Smoking detection data labeling method based on active learning

Country Status (1)

Country Link
CN (1) CN116071624B (en)

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112149721B (en) * 2020-09-10 2023-11-17 南京大学 Target detection method for reducing labeling requirements based on active learning
CN113591662A (en) * 2021-07-24 2021-11-02 深圳市铁越电气有限公司 Method, system and storage medium for recognizing smoking calling behavior
CN114170677A (en) * 2021-11-12 2022-03-11 深圳先进技术研究院 Network model training method and equipment for detecting smoking behavior
CN114998679A (en) * 2022-05-25 2022-09-02 河南爬客智能机器人有限公司 Online training method, device and equipment for deep learning model and storage medium
CN115376101A (en) * 2022-08-25 2022-11-22 天津大学 Incremental learning method and system for automatic driving environment perception

Also Published As

Publication number Publication date
CN116071624A (en) 2023-05-05

Similar Documents

Publication Publication Date Title
CN111444821B (en) Automatic identification method for urban road signs
CN109034044B (en) Pedestrian re-identification method based on fusion convolutional neural network
CN111563557B (en) Method for detecting target in power cable tunnel
EP3690714B1 (en) Method for acquiring sample images for inspecting label among auto-labeled images to be used for learning of neural network and sample image acquiring device using the same
CN111444939B (en) Small-scale equipment component detection method based on weak supervision cooperative learning in open scene of power field
CN109902806A (en) Method is determined based on the noise image object boundary frame of convolutional neural networks
CN111368690B (en) Deep learning-based video image ship detection method and system under influence of sea waves
CN107392901A (en) A kind of method for transmission line part intelligence automatic identification
CN111723657B (en) River foreign matter detection method and device based on YOLOv3 and self-optimization
CN110084165A (en) The intelligent recognition and method for early warning of anomalous event under the open scene of power domain based on edge calculations
CN113920107A (en) Insulator damage detection method based on improved yolov5 algorithm
CN110647802A (en) Remote sensing image ship target detection method based on deep learning
CN113139594B (en) Self-adaptive detection method for airborne image unmanned aerial vehicle target
CN112837315A (en) Transmission line insulator defect detection method based on deep learning
CN111274886B (en) Deep learning-based pedestrian red light running illegal behavior analysis method and system
CN111612051A (en) Weak supervision target detection method based on graph convolution neural network
CN111507227B (en) Multi-student individual segmentation and state autonomous identification method based on deep learning
CN108596883A (en) It is a kind of that method for diagnosing faults is slid based on the Aerial Images stockbridge damper of deep learning and distance restraint
CN116385958A (en) Edge intelligent detection method for power grid inspection and monitoring
CN111046728A (en) Straw combustion detection method based on characteristic pyramid network
CN115761627A (en) Fire smoke flame image identification method
CN116206223A (en) Fire detection method and system based on unmanned aerial vehicle edge calculation
CN115147745A (en) Small target detection method based on urban unmanned aerial vehicle image
CN112084860A (en) Target object detection method and device and thermal power plant detection method and device
CN114187506A (en) Remote sensing image scene classification method of viewpoint-aware dynamic routing capsule network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant