CN112633392A

CN112633392A - Terahertz human body security inspection image target detection model training data augmentation method

Info

Publication number: CN112633392A
Application number: CN202011589752.8A
Authority: CN
Inventors: 李�诚; 柳桃荣; 余开; 涂昊; 刘泽鑫
Original assignee: Brainware Terahertz Information Technology Co ltd
Current assignee: Brainware Terahertz Information Technology Co ltd
Priority date: 2020-12-29
Filing date: 2020-12-29
Publication date: 2021-04-09
Anticipated expiration: 2040-12-29
Also published as: CN112633392B

Abstract

The invention discloses a terahertz human body security inspection image target detection model training data augmentation method, which belongs to the technical field of image processing. In the training process, randomly generating an area in a human body foreground area and randomly selecting an image from the suspected object image augmentation subset to cover the image to form augmented training data. According to the invention, suspicious object image augmentation subsets are cut from a small amount of existing images with suspicious object labels, a human body foreground region is extracted by using a segmentation algorithm or a manual marking mode, a coverage region is randomly generated in a training process, the suspicious object augmentation images are randomly selected for coverage, and a data label is regenerated, so that the target detection model training convergence process can be accelerated under the condition of unbalanced samples, and the model detection accuracy is improved.

Description

Terahertz human body security inspection image target detection model training data augmentation method

Technical Field

The invention relates to the technical field of terahertz human body security inspection image processing, in particular to a terahertz human body security inspection image target detection model training data augmentation method.

Background

Target detection can classify and locate interesting targets from images, and is a common task in computer vision, and deep learning is a solution mainly adopted at present.

The terahertz human body security inspection image suspicious object target detection belongs to the field of specific application, and the task is to find and position suspicious objects hidden under human body clothes through the human body security inspection image, but problems of rare suspicious object samples, high data acquisition cost, extremely unbalanced overall sample distribution and the like exist in actual model training, so that the target detection model training effect is poor and the detection accuracy is low. The problems need to be solved urgently, and therefore a terahertz human body security inspection image target detection model training data augmentation method is provided.

Disclosure of Invention

The technical problem to be solved by the invention is as follows: the method can complete the directional data amplification according to the number ratio of suspicious object labels of various categories, and improve the detection accuracy of the target detection model.

The invention solves the technical problems through the following technical scheme, and the invention comprises the following steps:

s1: preparation phase

S11: cutting an augmented data subset from the target detection data set by using a suspicious object labeling frame, counting the number and the proportion of each type of suspicious object labels, and calculating the percentage of augmented hits;

s12: marking a human body foreground region of a training sample in a training data set by using manual marking or a semantic segmentation model, and making segmentation marking data;

s2: training phase

S21: loading single training image data, target detection labeling data and segmented human body foreground semantic label data, and setting an augmentation counter to be 1;

s22: marking by using a human body frame, calculating the coordinates of a random starting point of a covered area, adding 1 to an augmentation counter, giving up augmentation if the number of times of marking reaches the upper limit, and directly sending image data and target detection marking data to network training; s23: randomly selecting an augmentation coverage picture from the augmentation data subset according to the number ratio of the suspicious object type labels;

s24: calculating pixel proportion mu of the covered area and the human body foreground area, and intersection ratio sigma of the covered area and all suspicious object target frames, if mu and sigma meet set qualified conditions, turning to step S25, and if mu and sigma do not meet set qualified conditions, returning to step S22;

s25: and covering the original image by using the augmented subset picture, modifying the labeled data of the training picture, and sending the training picture to the network for training.

Further, in the step S11, a label (x1, y1, x2, y2, c) exists for each image, where x1 and y1 are coordinates of a point at the upper left corner of the target frame, (x2 and y2) are coordinates of a point at the lower right corner of the target frame, and c is a target category in the frame.

Further, in the step S11, the cropping mode is to extract only the image on the original image using the suspicious object labeling box [ y 1: y2, x 1: x2]In-range image data, i.e. only regions of articles of the suspect category in the image are cropped; and counting the total number of the suspicious object target frame labels and the number of the suspicious object labels of each category, and calculating the ratio of each category in the total number:

wherein C is_iIs the number of suspect labels of category i; calculating corresponding reciprocal

Statistical reciprocal ratio

Wherein gamma is_iIs the probability of random amplification of each type of suspicious objectThe probability of the expansion of the suspicious object with a small number of tags is high, and the probability of the expansion of the suspicious object with a large number of tags is low.

Further, in the step S12, it is necessary to acquire a human body segmentation foreground region for each image containing a human body frame, and generate corresponding segmentation label data, where the foreground pixel in the grayscale label image is 255 and the background pixel is 0.

Further, in step S21, the image data refers to image matrix data, the target detection annotation data refers to data of a human body frame and a possible target frame in the image, the segmented human body foreground semantic label data refers to an image label with a width and a height consistent with the image matrix, and an augmentation counter is set to 1 before each image is augmented.

Further, in the step S22, a human body box is marked (x)_p1,y_p1,x_p2,y_p2) The calculation of the random initial point coordinates of the covered area means that two random numbers r in the (0,1) interval are taken by utilizing uniform distribution_x，r_y(ii) a And then calculating the coordinates of the starting point:

x_s＝x_p1+r_x(x_p2-x_p1)

y_s＝y_p1+r_y(y_p2-y_p1)。

further, in the step S23, a specific process of selecting the augmented coverage picture is as follows:

s231: selecting an augmented picture from the augmented data subset according to the number ratio of the suspicious object class labels, and calculating a random hit interval under the suspicious object class i percentage;

s232: then generating a random number with the interval of (0,100) to determine the suspicious object class needing to be augmented;

s233: and selecting a specific single picture from the single-class suspicious object augmentation subset by using uniform distribution and randomness.

Further, in the step S24, let the width and height of the augmented picture be w, h, and the coordinates of the covered area frame be (x)_s,y_s,x_s+ w,y_s+ h) the area of the covered region frame pixel isAnd when S is wh, counting the total pixel sum S of the human foreground area in the frame_fThe proportion calculation mode is as follows:

further, in the step S24, the pass conditions for μ and σ are:

further, in the step S25, modifying the training image annotation data refers to adding the covered area coordinates and the category on the basis of the original label.

Compared with the prior art, the invention has the following advantages: according to the method for amplifying the training data of the terahertz human body security inspection image target detection model, suspicious object regions are cut out from a small amount of existing images with suspicious object labels to form suspicious object image amplification subsets, the terahertz image human body foreground regions are extracted by using a segmentation algorithm or a manual labeling mode, coverage regions are randomly generated and are randomly selected to be covered by the suspicious object amplification images in the training process, and the data labels are regenerated.

Drawings

FIG. 1 is a picture of an actual security inspection in a second embodiment of the present invention;

FIG. 2 is a schematic diagram of a data amplification process according to a second embodiment of the present invention; FIG. 3 is a flowchart illustrating a process of obtaining a data augmented subset according to a second embodiment of the present invention;

FIG. 3 is a diagram illustrating the data amplification effect according to a second embodiment of the present invention.

FIG. 4 is a diagram illustrating the data amplification effect according to a second embodiment of the present invention.

Detailed Description

The following examples are given for the detailed implementation and specific operation of the present invention, but the scope of the present invention is not limited to the following examples.

Example one

The embodiment provides a technical scheme: a terahertz human body security inspection image target detection model training data augmentation method comprises the following steps:

s1: preparation phase

s2: training phase

In the step S11, there is a label (x1, y1, x2, y2, c) for each image, where x1 and y1 are coordinates of the upper left corner point of the target frame, (x2, y2) are coordinates of the lower right corner point of the target frame, and c is the target category in the frame.

In the step S11, the cropping mode is to extract only the image on the original image using the suspicious object labeling box [ y 1: y2, x 1: x2]In-range image data, i.e. only cropping picturesArea of the item in the image in the suspect category; and counting the total number of the suspicious object target frame labels and the number of the suspicious object labels of each category, and calculating the ratio of each category in the total number:

Statistical reciprocal ratio

Wherein gamma is_iThe probability of random amplification of each type of suspicious object is high, the probability of amplification of the suspicious object with a small number of labels is high, and the probability of amplification of the suspicious object with a large number of labels is low.

In step S12, it is necessary to acquire a human body segmentation foreground region for each image including a human body frame, and generate corresponding segmentation label data, where the foreground pixel in the grayscale label image is 255 and the background pixel is 0.

In step S21, the image data refers to image matrix data, the target detection annotation data refers to data of a human body frame and a target frame that may exist in the image, the segmented human body foreground semantic label data refers to an image label with a width and a height consistent with the image matrix, and an augmentation counter is set to 1 before each image is augmented.

In step S22, the number upper limit of the augmentation counter is set, and after the number upper limit is reached and no qualified augmented picture can be generated, the loop is exited and training is performed directly using the original image data and the target frame label.

In step S22, a human body frame is labeled (x)_p1,y_p1,x_p2,y_p2) The calculation of the random initial point coordinates of the covered area means that two random numbers r in the (0,1) interval are taken by utilizing uniform distribution_x，r_y(repeatable); and then calculating the coordinates of the starting point:

x_s＝x_p1+r_x(x_p2-x_p1)

y_s＝y_p1+r_y(y_p2-y_p1)。

in step S23, the specific process of selecting the augmented overlay picture is as follows:

In step S24, the width and height of the enlarged picture are w, h, and the coordinates of the covered area frame are (x)_s,y_s,x_s+w,y_s+ h), the area of the covered area frame pixel is S ═ wh, the total S of the human foreground area pixels in the frame is counted_fThe proportion calculation mode is as follows:

in step S24, it is necessary to calculate the intersection ratio of the covered area and all other suspicious object labeling boxes, and take the maximum value, if there is no suspicious object label in the current image, σ is set to 0.

In step S24, the pass conditions for μ and σ are:

in step S25, modifying the training image annotation data refers to adding the covered area coordinates and the category based on the original label.

Example two

As shown in fig. 1, most of image samples collected by terahertz human body security inspection equipment are normal samples which do not carry suspicious articles, only a very small number of human bodies carry suspicious articles, and specific sensitive articles such as knives are rare, so that the samples required by a target detection model are very unbalanced due to scarcity of label data of the suspicious articles, and the directly trained target detection model has low detection accuracy and cannot accurately identify the suspicious articles.

As shown in fig. 2, the data augmentation method provided by the present invention can complete random augmentation and improve the detection accuracy of the deep learning model under the condition of only a small number of specific suspicious object samples.

The specific process of the data augmentation method in this embodiment is as follows:

step S1.1: preparing training data, cutting out an augmentation subset from an original image by using a suspicious object marking frame, counting the number of each type of suspicious object, calculating the ratio of each type of suspicious object, and calculating the percentage of augmentation hit of each type of suspicious object, as shown in FIG. 3

For example, the current object detection model needs to detect suspicious objects of the following categories: rectangular articles (rectangle), pistol (gun), knife three types (knife), cutting out corresponding article area images by using the target frame marking data of the suspicious article, and renaming as rectangle _1.jpg, rectangle _2.jpg …, gun _1.jpg, gun _2.jpg …, knife _1.jpg, knife _2.jpg, ….

Assuming that 10000 security check images of human body exist in the training data set, wherein the number of rectangle frames is 400, gun is 80, and knife is 20, the ratio of the number of each category is 0.8(400/500), 0.16(80/500), 0.04(20/500), the reciprocal is 1.25(1/0.8), 6.25(1/0.16), 25(1/0.04), the percentage of the reciprocal, namely the percentage of augmented hits, is 3.9%, 19.2%, 76.9%, and the random selection of pictures from the augmented subset is completed at the later stage according to the percentage of augmented hits.

Step S1.2: preparing training data, and extracting human body foreground regions in all training samples by using an image semantic segmentation model or a manual labeling mode;

step S2.1: in the training stage, as shown in fig. 4, loading single training image data, target detection labeling data and segmented human body foreground semantic label data, and setting an augmentation counter to be 1;

step S2.2: training phase, labeling (x) with human body box (person)_p1,y_p1,x_p2,y_p2) Calculating the coordinate of a random initial point of the covered area, and adding 1 to an augmentation counter; continuously utilizing the human body frame for marking, calculating the random initial point coordinate of the covered area, adding 1 to an augmentation counter, giving up augmentation if the number of times reaches the upper limit, and directly sending the image data and the target detection marking data into a network for training;

generating the random starting point may be generated as follows:

1) two random numbers r in the interval of (0,1) are taken by utilizing uniform distribution_x,r_y；

2) Calculating the coordinates of the starting point:

x_s＝x_p1+r_x(x_p2-x_p1)

y_s＝y_p1+r_x(y_p2-y_p1)

s2.3, training: randomly selecting an augmentation coverage picture from the augmentation data subset according to the number ratio of the suspicious object type labels;

the specific random selection method is as follows:

1) calculating random augmented hit intervals by using the augmented hit percentage obtained in the step S1.1, for example, the augmented hit percentages of rectangle, gun and knife are 3.9%, 19.2% and 76.9%, and then setting the intervals as the following corresponding intervals [0,4 ], [4,23 ], [23,100] after rounding;

2) random numbers are uniformly distributed in a [0,100] interval and fall into the interval, namely, the augmentation data are taken from the corresponding class augmentation pictures, and assuming that the random number is 21, the augmentation pictures are selected from all the augmentation subset pictures with the class gun;

3) when the sub-set pictures are amplified from a single category (such as category gun), a certain specific amplified picture can be randomly selected in a uniform distribution mode, and each picture can be ensured to be randomly accessed;

s2.4 training stage: calculating the pixel proportion of the covered area and the human body foreground area, comparing the covered area with the suspicious object target frame, if the comparison meets the requirement, turning to the step S2.5, and if the comparison does not meet the requirement, returning to the step S2.2;

the pixel intersection ratio calculation method is as follows: quiltThe coordinates of the coverage area frame are (x)_n1,y_n1,x_n2,y_n2) And the area of the covered area frame pixels is as follows: s_n＝(x_n2-x_n1)(y_n2-y_n1) (ii) a Counting the number S of human foreground pixels in the covered area_fPixel ratio of

The calculation method of the intersection and comparison ratio between the covered area and other suspicious frames iou (intersection over union) may adopt a method common to the target detection field, and is not described herein again, it is noted that the intersection and comparison between the covered area and all suspicious frames need to be calculated, and the maximum value σ max (ious) is taken, in order to ensure that the covered area is located in the foreground area of the human body, μmay be required>0.99, while sigma may be required to avoid covering other suspects<0.1; the step S2.1 sets the augmentation counter to prevent the covered area generated randomly many times from always failing to meet the above requirement, and to avoid the process from falling into a forced exit mechanism designed for local loop.

S2.5 training stage: and covering the original image by using the augmented subset picture, modifying the labeled data of the training picture, and sending the training picture to the network for training. After the image is covered, the label needs to be adjusted as follows, and the original label is (x)₁₁,y₁₁,x₁₂,y₁₂,c₁)， (x₂₁,y₂₁,x₂₂,y₂₂,c₂) .., the augmented subset item class label (x) should be added after augmentation₁₁,y₁₁,x₁₂,y₁₂,c₁)，(x₂₁,y₂₁,x₂₂,y₂₂,c₂)..(x^′,y^′,x^′+w,y^′+h,c_new) After the amplification, the effect as shown in fig. 4 can be produced.

To sum up, in the method for amplifying training data of the terahertz human body security inspection image target detection model in the embodiment, the suspicious object region is cut out from a small amount of existing images with suspicious object labels to form a suspicious object image amplification subset, the terahertz image human body foreground region is extracted by using a segmentation algorithm or a manual labeling mode, the coverage region is randomly generated and the suspicious object amplification image coverage is randomly selected in the training process, and the data labels are regenerated.

Although embodiments of the present invention have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present invention, and that variations, modifications, substitutions and alterations can be made to the above embodiments by those of ordinary skill in the art within the scope of the present invention.

Claims

1. A terahertz human body security inspection image target detection model training data augmentation method is characterized by comprising the following steps:

s1: preparation phase

s2: training phase

s22: marking by using a human body frame, calculating the coordinates of a random starting point of a covered area, adding 1 to an augmentation counter, giving up augmentation if the number of times of marking reaches the upper limit, and directly sending image data and target detection marking data to network training;

s23: randomly selecting an augmentation coverage picture from the augmentation data subset according to the number ratio of the suspicious object type labels;

2. The terahertz human body security inspection image target detection model training data augmentation method according to claim 1, characterized in that: in the step S11, there is a label (x1, y1, x2, y2, c) for each image, where x1 and y1 are coordinates of the upper left corner point of the target frame, (x2, y2) are coordinates of the lower right corner point of the target frame, and c is the target category in the frame.

3. The terahertz human body security inspection image target detection model training data augmentation method according to claim 2, characterized in that: the clipping mode is to extract only the image on the original image using the suspicious object labeling box [ y 1: y2, x 1: x2]In-range image data, i.e. only regions of articles of the suspect category in the image are cropped; and counting the total number of the suspicious object target frame labels and the number of the suspicious object labels of each category, and calculating the ratio of each category in the total number:

Statistical reciprocal ratio

4. The terahertz human body security inspection image target detection model training data augmentation method according to claim 3, characterized in that: in step S12, it is necessary to acquire a human body segmentation foreground region for each image including a human body frame, and generate corresponding segmentation label data, where the foreground pixel in the grayscale label image is 255 and the background pixel is 0.

5. The terahertz human body security inspection image target detection model training data augmentation method according to claim 4, characterized in that: in step S21, the image data refers to image matrix data, the target detection annotation data refers to data of a human body frame and a target frame that may exist in the image, the segmented human body foreground semantic label data refers to an image label with a width and a height consistent with the image matrix, and an augmentation counter is set to 1 before each image is augmented.

6. The terahertz human body security inspection image target detection model training data augmentation method according to claim 5, characterized in that: in step S22, a human body frame is labeled (x)_p1,y_p1,x_p2,y_p2) The calculation of the random initial point coordinates of the covered area means that two random numbers r in the (0,1) interval are taken by utilizing uniform distribution_x，r_y(ii) a And then calculating the coordinates of the starting point:

x_s＝x_p1+r_x(x_p2-x_p1)

y_s＝y_p1+r_y(y_p2-y_p1)。

7. the method for amplifying the training data of the terahertz human body security inspection image target detection model according to claim 6, wherein in the step S23, the specific process of selecting the amplification coverage picture is as follows:

8. The terahertz human body security inspection image target detection model training data augmentation method according to claim 7, characterized in that: in step S24, the width and height of the enlarged picture are w, h, and the coordinates of the covered area frame are (x)_s,y_s,x_s+w,y_s+ h), the area of the covered area frame pixel is S ═ wh, the total S of the human foreground area pixels in the frame is counted_fThe proportion calculation mode is as follows:

9. the terahertz human body security inspection image target detection model training data augmentation method according to claim 8, characterized in that: in step S24, the pass conditions for μ and σ are:

10. the terahertz human body security inspection image target detection model training data augmentation method according to claim 9, characterized in that: in step S25, modifying the training image annotation data refers to adding the covered area coordinates and the category based on the original label.