CN116433891A

CN116433891A - Training method and device of target detection model and target detection method

Info

Publication number: CN116433891A
Application number: CN202111680390.8A
Authority: CN
Inventors: 朱柯弘; 张梓航; 赵自然; 顾建平; 金颖康
Original assignee: Beijing Shenmu Technology Co ltd; Nuctech Co Ltd
Current assignee: Beijing Shenmu Technology Co ltd; Nuctech Co Ltd
Priority date: 2021-12-30
Filing date: 2021-12-30
Publication date: 2023-07-14

Abstract

A training method and device for a target detection model and a target detection method are provided. The training method comprises the following steps: acquiring a training sample set comprising a first training sample and a second training sample, wherein the first training sample and the second training sample respectively comprise clear data and fuzzy data; inputting a first training sample into a target detection model; determining a first loss value of the target detection model by using a first loss function according to the output value of the target detection model and the second training sample; inputting the first loss value into a Gaussian mixture model; constructing an ambiguity matrix by using a Gaussian mixture model, wherein the ambiguity matrix is used for representing the ambiguity of data in a first training sample and a second training sample; constructing a fuzzy loss function according to the ambiguity matrix, wherein the fuzzy loss function is constructed to be capable of distinguishing clear data and fuzzy data in the first training sample and the second training sample; constructing a second loss function according to the fuzzy loss function; and retraining the target detection model by using the first training sample set, the second training sample set and the second loss function.

Description

Training method and device of target detection model and target detection method

Technical Field

The present disclosure relates to the field of image processing technologies, and in particular, to a training method and apparatus for a target detection model, a target detection method, an electronic device, a computer-readable storage medium, and a program product.

Background

The passive terahertz human body security inspection technology can discover suspects hidden on the body surface through common materials such as clothes, shoes and the like. The terahertz imaging device does not emit electromagnetic waves, does not have ionizing radiation completely by receiving terahertz wave imaging generated by a human body, has high safety, and is particularly suitable for human body security inspection. Therefore, attention has been paid to the industry in recent years. And various algorithms are utilized to detect suspicious objects in the terahertz images, so that human resources can be further saved, the processing efficiency is improved, and the detection speed is improved.

The terahertz image suspected object detection is carried out by utilizing the traditional method, and is mainly based on an optical image processing technology, and the images are subjected to sliding traversal according to manually constructed characteristics to find matched samples. The method has the advantages of low processing speed, high complexity and poor robustness, and is not widely applied. With the development of the deep learning technology in the field of image detection, researchers also apply the deep learning technology to terahertz suspicion detection. However, most researches only apply the general detection paradigm to the terahertz image, do not analyze the characteristics of the terahertz image and the terahertz open non-induction scene, and lack consideration of the conditions of suspected object deformation, fuzzy labeling, background interference and the like, so that the performance of the terahertz image is limited in practical application.

The above information disclosed in this section is only for understanding the background of the inventive concept of the present disclosure, and thus, the above information may contain information that does not constitute prior art.

Disclosure of Invention

In view of at least one aspect of the above technical problems, a training method and device for an object detection model, an object detection method, an electronic device, a computer-readable storage medium, and a program product are provided.

In one aspect, a training method of a target detection model based on fuzzy learning is provided, including:

acquiring a training sample set, wherein the training sample set comprises a first training sample and a second training sample, the first training sample and the second training sample are respectively used for representing the input and the output of the target detection model, the first training sample comprises clear data and fuzzy data, and the second training sample comprises clear data and fuzzy data;

inputting the first training sample into the target detection model;

determining a first loss value of the target detection model by using a first loss function according to the output value of the target detection model and the second training sample;

inputting the first loss value into a Gaussian mixture model;

Constructing an ambiguity matrix by using the Gaussian mixture model, wherein the ambiguity matrix is used for representing the ambiguity of data in the first training sample and the second training sample;

constructing a fuzzy loss function according to the ambiguity matrix, wherein the fuzzy loss function is configured to distinguish clear data and fuzzy data in the first training sample and the second training sample;

constructing a second loss function according to the fuzzy loss function; and

and retraining the target detection model by using the first training sample set, the second training sample set and the second loss function.

According to some exemplary embodiments, the first penalty function includes a first frame penalty, a first confidence penalty, and a first classification penalty.

According to some exemplary embodiments, the first frame loss employs a GIoU loss, the first confidence loss employs a mixed entropy loss, and the first classification loss employs a mixed entropy loss.

According to some exemplary embodiments, the first loss function is calculated specifically by:

wherein l _w Is a first loss function of the first set of coefficients,

is the first frame loss,/ >

Is a first confidence loss,/>

Is the first classification loss, σ _box 、σ _obj Sum sigma _cls The weight coefficients of the first frame loss, the first confidence loss, and the first classification loss, respectively.

According to some exemplary embodiments, the mixed entropy loss includes a cross entropy loss and a negative entropy loss.

According to some exemplary embodiments, the mixed entropy loss is calculated specifically by:

l _ME ＝l _CE +β·l _NE ＝-y·log(y ^* )-β·y ^* ·log(y ^* )

wherein l _ME Representing the mixed entropy loss, l _CE And l _NE Respectively representing cross entropy loss and negative entropy loss, wherein beta is a weight coefficient of the negative entropy loss; y is a value in the second training sample and represents a labeling value; y is ^* For the output of the object detection modelThe value represents the predicted value.

According to some exemplary embodiments, the ambiguity matrix is expressed in particular as:

wherein C is _f In order to provide a matrix of ambiguities,

representing the predicted value as +.>

Time-stamp value y _c N represents the number of row and column elements in the fuzzy matrix.

According to some exemplary embodiments, the blur loss function is specifically represented by:

l _f ＝-(1-r)((1-y ^* ) ^1-r y ^*r ) ^γ log(y ^* )-r((1-y ^* ) ^1-r y ^*r ) ^γ log(1-y ^* )

wherein l _f Representing a fuzzy loss function, r representing the proportion of fuzzy data; and gamma is more than or equal to 0 and is used for adjusting the sensitivity of the target detection model to data blurring.

According to some exemplary embodiments, the proportion r of the blur data is obtained by:

r＝1-diag(C _f )

wherein diag (C) _f ) Representing an ambiguity matrix C _f Is a diagonal element of (c).

According to some exemplary embodiments, the second loss function is calculated specifically by:

l＝σ _box ′·l _box +σ _obj ′·l _obj +σ _cls ′·l _cls

where l is the second loss function, l _box Is the second frame loss, l _obj Is a second confidence lossLoss, l _cls Is the second classification loss, sigma _box ′、σ _obj ' and sigma _cls ' are the weight coefficients of the second bounding box penalty, the second confidence penalty, and the second classification penalty, respectively.

According to some exemplary embodiments, the second bounding box loss employs a GIoU loss, and the second confidence loss and the second classification loss are both represented by the fuzzy loss function.

According to some exemplary embodiments, the constructing an ambiguity matrix by using the gaussian mixture model specifically includes:

and respectively constructing a confidence ambiguity matrix and a classification ambiguity matrix by using the Gaussian mixture model aiming at the first confidence loss and the first classification loss.

According to some exemplary embodiments, n=2 for the confidence ambiguity matrix; and/or the number of the groups of groups,

for the classification ambiguity matrix, N is equal to the total number of classes of the classification output by the object detection model.

According to some exemplary embodiments, the first training sample comprises image data acquired by scanning an imaging region with a terahertz imaging device; and/or the number of the groups of groups,

the second training sample comprises image data with a label frame, wherein the label frame is used for representing the position and/or type of blurring of an object to be detected in the image data.

In another aspect, there is provided a target detection method comprising:

acquiring image data;

inputting the image data into a target detection model, wherein the target detection model is trained by the method; and

and determining an object to be detected in the image data according to the output of the object detection model.

In still another aspect, there is provided a training apparatus of a target detection model based on fuzzy learning, including:

the training sample acquisition module is used for acquiring a training sample set, wherein the training sample set comprises a first training sample and a second training sample, the first training sample and the second training sample are respectively used for representing the input and the output of the target detection model, the first training sample comprises clear data and fuzzy data, and the second training sample comprises clear data and fuzzy data;

The target detection module is used for: receiving the first training sample, and determining a first loss value of the target detection model by using a first loss function according to the output value of the target detection model and the second training sample;

the system comprises an ambiguity matrix construction module, a fuzzy model and a fuzzy model generation module, wherein the ambiguity matrix construction module comprises a Gaussian mixture model and is used for: receiving the first loss value, and constructing an ambiguity matrix by using the Gaussian mixture model, wherein the ambiguity matrix is used for representing the ambiguity of data in the first training sample and the second training sample;

a blur loss function construction module configured to construct a blur loss function according to the ambiguity matrix, wherein the blur loss function is configured to be capable of distinguishing between clear data and blur data in the first training sample and the second training sample;

the loss function construction module is used for constructing a second loss function according to the fuzzy loss function; and

and the retraining module is used for retraining the target detection model by utilizing the first training sample set, the second training sample set and the second loss function.

In yet another aspect, there is provided an electronic device comprising:

one or more processors;

storage means for storing one or more programs,

wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to perform the method as described above.

According to some exemplary embodiments, the electronic device is a passive terahertz imaging apparatus.

In yet another aspect, a computer-readable storage medium having stored thereon executable instructions that, when executed by a processor, cause the processor to perform the method as described above is provided.

In yet another aspect, a computer program product is provided, comprising a computer program which, when executed by a processor, implements a method as described above.

In the embodiment of the disclosure, the clear characteristics of the target to be detected are gradually extracted, so that the influence of fuzzy information is reduced, the detection rate and the classification precision of the trained target detection model to the target to be detected can be improved, and the false alarm rate is reduced.

Drawings

For a better understanding of embodiments of the present disclosure, embodiments of the present disclosure will be described in detail with reference to the following drawings:

Fig. 1 is a schematic structural view of a passive terahertz imaging device according to an exemplary embodiment of the present disclosure.

Fig. 2A to 2E schematically show images acquired by the terahertz imaging apparatus in different cases, respectively.

FIG. 3 is a schematic flow chart of a training method of a fuzzy learning based object detection model in accordance with an exemplary embodiment of the present disclosure.

FIG. 4 is a detailed flow chart of a training method of a fuzzy learning based object detection model in accordance with an exemplary embodiment of the present disclosure, wherein the process of fuzzy learning is shown in more detail.

Fig. 5 is a schematic diagram of CE loss and NE loss.

Fig. 6A and 6B show the loss distribution at different loss functions, respectively.

Fig. 7 is a schematic diagram of a blur loss function according to an embodiment of the present disclosure.

Fig. 8 is a block diagram of a training apparatus of an object detection model according to an exemplary embodiment of the present disclosure.

Fig. 9 schematically illustrates a block diagram of an electronic device adapted to implement a training method or an object detection method of an object detection model according to an exemplary embodiment of the present disclosure.

Detailed Description

Specific embodiments of the present disclosure will be described in detail below, it should be noted that the embodiments described herein are for illustration only and are not intended to limit the present disclosure. In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present disclosure. However, it will be apparent to one of ordinary skill in the art that: no such specific details need be employed to practice the present disclosure. In other instances, well-known structures, materials, or methods have not been described in detail in order to avoid obscuring the present disclosure.

Throughout the specification, references to "one embodiment," "an embodiment," "one example," or "an example" mean: a particular feature, structure, or characteristic described in connection with the embodiment or example is included within at least one embodiment of the disclosure. Thus, the appearances of the phrases "in one embodiment," "in an embodiment," "one example," or "an example" in various places throughout this specification are not necessarily all referring to the same embodiment or example. Furthermore, the particular features, structures, or characteristics may be combined in any suitable combination and/or sub-combination in one or more embodiments or examples. Furthermore, it will be understood by those of ordinary skill in the art that the term "and/or" as used herein includes any and all combinations of one or more of the associated listed items.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. The terms "comprises," "comprising," and/or the like, as used herein, specify the presence of stated features, steps, operations, and/or components, but do not preclude the presence or addition of one or more other features, steps, operations, or components.

All terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art unless otherwise defined. It should be noted that the terms used herein should be construed to have meanings consistent with the context of the present specification and should not be construed in an idealized or overly formal manner.

The inventor finds that, unlike the conventional optical image target detection, the passive terahertz image target detection under the application scene of no-sense clearance has the conditions of characteristic blurring and labeling blurring.

To this end, an embodiment of the present disclosure provides a training method of a fuzzy learning-based target detection model, the training method of the fuzzy learning-based target detection model including: acquiring a training sample set, wherein the training sample set comprises a first training sample and a second training sample, the first training sample and the second training sample are respectively used for representing the input and the output of the target detection model, the first training sample comprises clear data and fuzzy data, and the second training sample comprises clear data and fuzzy data; inputting the first training sample into the target detection model; determining a first loss value of the target detection model by using a first loss function according to the output value of the target detection model and the second training sample; inputting the first loss value into a Gaussian mixture model; constructing an ambiguity matrix by using the Gaussian mixture model, wherein the ambiguity matrix is used for representing the ambiguity of data in the first training sample and the second training sample; constructing a fuzzy loss function according to the ambiguity matrix, wherein the fuzzy loss function is configured to distinguish clear data and fuzzy data in the first training sample and the second training sample; constructing a second loss function according to the fuzzy loss function; and retraining the target detection model using the first training sample set, the second training sample set, and the second loss function. In the embodiment of the disclosure, the clear characteristics of the target to be detected are gradually extracted, so that the influence of fuzzy information is reduced, the detection rate and the classification precision of the trained target detection model to the target to be detected can be improved, and the false alarm rate is reduced.

As shown in fig. 1, a passive terahertz imaging device according to an exemplary embodiment of the present disclosure may include a reflection plate 2 and its servo system, a lens 3, a detector array 4, a data acquisition and processing device 6, a display device 7, and a distribution box 5. The terahertz waves spontaneously radiated by the detected object and the terahertz waves reflecting the background environment are incident on the reflecting plate 2 through the window 1 on the shell, reflected to the lens 3 by the reflecting plate 2, received by the detector array 4 after the converging action of the lens 3, and the received terahertz waves are converted into electric signals by the detector array 4. The data acquisition and processing device 6 is connected to the detector array 4 to receive electrical signals from the detector array 4 and generate millimeter wave/terahertz wave images. The display device 7 is connected to the data acquisition and processing device 6, and is used for receiving and displaying the terahertz wave image generated by the data acquisition and processing device 6. The distribution box 5 is configured to supply power to the entire passive terahertz imaging apparatus.

In the actual working process, the servo system of the reflecting plate 2 controls the reflecting plate 2 to reciprocate, and the reciprocal of the period T of the motion is the imaging frame rate s. When the reflecting plate 2 swings from the maximum elevation angle to the minimum depression angle, the swinging angle is theta, so that the scanning of the view field angle with the height direction of 2 theta in the range of the depth of view is completed, and the reflecting plate 2 forms a pattern from the maximum elevation angle to the minimum elevation angle. The data acquisition and processing device 6 acquires data throughout this process. The control system of the reflecting plate 2 may be equipped with a position encoder, for example, to feed back the scanning position of the reflecting plate with high accuracy. When the data acquisition and processing device 6 acquires data, firstly, the acquired data is marked according to the information of the position encoder and is used for distinguishing the data of the next graph, then the data acquisition and processing device 6 processes and reconstructs the acquired data to generate a terahertz image, and then the data acquisition and processing device 6 can transmit the image data to the display device 7 so as to display the image data, mark suspicious objects and automatically alarm on the display device 7.

For the passive terahertz imaging device, the inventor finds that the feature blurring and the labeling blurring exist in target detection of the terahertz image acquired by the passive terahertz imaging device through research.

It should be noted that, in this document, the embodiments are described taking the "terahertz imaging apparatus" as an example, but the embodiments of the disclosure are not limited to terahertz imaging, and may be applied to other imaging modes such as millimeter wave imaging without collision.

Specifically, the feature blurring is mainly caused by terahertz imaging characteristics, shielding deformation, motion deformation, imaging angle deformation, ambient light interference, electromagnetic wave interference and other factors. The passive terahertz imaging technology is based on the blackbody radiation theory, and utilizes a terahertz antenna to sense the radiation energy of a human body, so that a two-dimensional gray terahertz image with energy intensity information is formed. When a suspected object (namely an object to be detected) is carried, the suspected object area and the human body form gray level difference due to shielding of radiation energy of the human body, so that suspected object information can be displayed in an image. Terahertz images have better contour information and gray information, but have lower image resolution than visible light images, and are susceptible to occlusion, motion, imaging angles and external environments, thereby causing feature blurring. Fig. 2A to 2E schematically show images acquired by the terahertz imaging apparatus in different cases, respectively. For example, fig. 2A to 2E respectively show imaging results of the same person carrying the firearm simulator and the mobile phone simulator under different conditions, and it can be seen that, with respect to the ideal imaging shown in fig. 2A, in fig. 2B to 2E, the image features in these conditions are blurred to different extents due to factors such as shielding deformation, movement deformation, angle deformation, external interference deformation, and the like. For example, the characteristics of a firearm simulator are in some cases similar to those of a cell phone simulator. Under open non-inductive application scenes, the conditions of human body movement, suspected object carrying modes, external environments and the like are diversified, and image feature blurring is ubiquitous, so that certain influence can be caused on target recognition.

Label blur is label information blur generated by manual labeling. Because manual labeling can only be judged through terahertz image information, and strong subjectivity exists, when image features are blurred, labeling personnel cannot label a target through fixed standards. For the existence of the target, the class to which the target belongs, different labeling staff may give different labeling results, and even the same labeling staff gives different labeling results under different scenes. This would obscure the annotation information such that the model cannot distinguish between valid tag information during the training process.

For example, the deep neural network model optimizes the empirical risk R by optimizing under the data set D _l，D The model parameters θ are obtained, and the process can be expressed as follows:

wherein l (·) represents a loss function; (X, Y) ε D represents the sample and its corresponding label; f (X, θ) represents the model prediction result. One premise of obtaining the desired result of this formula (1) is that the data set D at this time is clear, clean, and accurate. When there is blurring of the data and the tag, equation (1) changes to:

wherein, (X _f ，Y _f )∈D _f Respectively representing fuzzy samples and fuzzy labels in fuzzy data sets; θ _f The parameters trained from the fuzzy data set are represented. Due to { X, Y } and { X } _f ，Y _f The different distribution characteristics are presented, it being understood that θ _f Not equal to θ, model parameters trained on fuzzy datasets do not achieve optimal effects on clear datasets, which would affect model performance.

In addition, for the function argmin (x), argmin represents a variable value when the objective function f (x) is made to take a minimum value.

In an embodiment of the present disclosure, a training method of a target detection model based on fuzzy learning and a target detection method using the target detection model are provided. In the training method, fuzzy data in terahertz image data is evaluated by using a warm-up training and Gaussian mixture model (Gaussian Mixture Model, abbreviated as GMM), a fuzzy matrix is estimated, fuzzy sample weight is adjusted, clear information in a training sample is extracted through gradual iteration, and therefore the influence of feature blurring and label blurring in terahertz images is reduced. Correspondingly, the target detection model obtained by the training method can improve the detection rate of passive terahertz human suspects under the application scene of no-sense clearance, reduce the false alarm rate and improve the classification precision.

FIG. 3 is a schematic flow chart of a training method of a fuzzy learning based object detection model in accordance with an exemplary embodiment of the present disclosure. FIG. 4 is a detailed flow chart of a training method of a fuzzy learning based object detection model in accordance with an exemplary embodiment of the present disclosure, wherein the process of fuzzy learning is shown in more detail.

As shown in fig. 3 and 4, a training method of a fuzzy learning based object detection model according to an exemplary embodiment of the present disclosure may include operations S310 to S380, and may be performed by a processor or any electronic device including a processor.

In operation S310, a training sample set is obtained, where the training sample set includes a first training sample and a second training sample, the first training sample and the second training sample are used to characterize input and output of the target detection model, respectively, the first training sample includes clear data and fuzzy data, and the second training sample includes clear data and fuzzy data.

For example, image data actually sampled by the terahertz imaging apparatus may be used as the first training sample. As described above, the characteristics of a part of the image data actually sampled by the terahertz imaging apparatus may be unclear, that is, there is a case of characteristic blurring, due to the influence of occlusion, motion, imaging angle, and external environment. That is, the first training data includes clear data and blurred data.

The image data obtained by actual sampling of the terahertz imaging device can be manually marked, for example, the position and/or type of an object to be detected (such as various suspects) are marked in the form of a marking frame. And taking the image data with the label frame as a second training sample. As described above, because the manual labeling can only judge through terahertz image information, and has strong subjectivity, when the image features are blurred, labeling personnel cannot label the target through fixed standards, namely, the labeling blurring condition exists. Therefore, in the second training sample, the labeling frame is mainly used for representing the position and/or type of blurring of the object to be detected in the image data.

In operation S320, the first training sample is input into the target detection model.

In operation S330, a first loss value of the target detection model is determined using a first loss function according to the output value of the target detection model and the second training sample.

In operation S320 and operation S330, a warm-up training is performed on the target detection model, in which a first loss value of the target detection model is determined using a first loss function.

For example, in the warm-up training, 10-30 epochs may be trained.

In training of the deep machine learning model, the expression Epoch represents: the complete training of the model using all the data of the training set may be referred to as "first generation training". For example, when a complete data set passes through the neural network model once and returns once, this process is called an Epoch. That is, all training samples were propagated forward and backward in the neural network once. It should be appreciated that when the number of samples of an Epoch (i.e., all training samples) may be too large (for a computer), it is often necessary to divide it into multiple small blocks, i.e., multiple latches, for training.

In an embodiment of the present disclosure, the first penalty function includes a first frame penalty, a first confidence penalty, and a first classification penalty.

Specifically, the first loss function is specifically calculated by the following formula (3):

wherein l _w Is a first loss function of the first set of coefficients,

is the first frame loss,/>

Is a first confidence loss,/>

For example, the first frame loss employs a GIoU loss, the first confidence loss employs a mixed entropy loss, and the first classification loss employs a mixed entropy loss.

It should be appreciated that IoU (Intersection over Union) loss is a commonly used indicator in target detection that is used not only to determine positive and negative samples, but also to evaluate the distance between the predicted box and the real box. IoU losses can be expressed by the following formula:

wherein A, B represents two detection frames, namely a predicted detection frame and a true detection frame, respectively.

Since IoU is a ratio concept, it is insensitive to the size of the target object. Based on this, GIoU loss is proposed, and the specific formula is as follows:

The above formula means: first calculate two detection framesMinimum closure area U (i.e., the area of the minimum frame that includes both the predicted and actual detection frames), recalculate IoU, recalculate the area closure area A of the closure area that does not belong to both frames _c Finally, the specific gravity was subtracted from IoU to obtain GIoU.

In embodiments of the present disclosure, the mixed Entropy loss includes a Cross Entropy loss (CE) and a Negative Entropy loss (NE). For example, the mixed entropy loss is specifically calculated by the following formula (4):

l _ME ＝l _CE +β·l _NE ＝--y·log(y ^* )-β·y ^* ·log(y ^* ) (4)

wherein l _ME Representing the mixed entropy loss, l _CE And l _NE Respectively representing cross entropy loss and negative entropy loss, wherein beta is a weight coefficient of the negative entropy loss; y is a value in the second training sample and represents a labeling value; y is ^* And representing a predicted value for the output value of the target detection model.

Fig. 5 is a schematic diagram of CE loss and NE loss. In fig. 5, the solid line represents the curve of the NE loss function, and the broken line represents the curve of the CE loss function. It can be seen that CE loss is small when the model prediction is correct and large when the prediction is incorrect, which can lead the model parameters to be adjusted towards the correct direction of the prediction; the NE loss is smaller when the model predicts extreme values and larger when the model predicts intermediate values, which allows the model parameters to be adjusted toward the predicted extreme values. Therefore, the mixed entropy has the commonality of the two, and the loss function containing the mixed entropy is adopted for warm-up training, so that the loss distribution difference between the clear sample and the fuzzy sample can be increased, and the data distinction is enhanced. Fig. 6A and 6B show the loss distribution at different loss functions, respectively. Fig. 6A shows the cross-loss results, and fig. 6B shows the mixed entropy loss results. In fig. 6A and 6B, dark color indicates the distribution of blurred data, and light color indicates the distribution of clear data. It can be seen that the loss distribution using the mixed entropy is more uniform and the difference between the distribution of the clear data and the blurred data is more pronounced than the cross entropy, and the fitting can be better performed.

In operation S340, the first loss value is input to a gaussian mixture model.

In an embodiment of the present disclosure, the sharp data and the blurred data are modeled using a gaussian mixture model (Gaussian Mixture Model, abbreviated GMM). The gaussian mixture model can be regarded as a model composed of a plurality of single gaussian models, expressed as:

wherein μ, σ, ω represent the expectation, variance, and sub-distribution weights, respectively; phi _k ，μ _k ，σ _k ，ω _k The kth sub-distribution and its corresponding expectations, variances and distribution weights are represented, respectively. The Gaussian mixture model has better adaptability to asymmetric fuzzy distribution, and can better distinguish clear data and fuzzy data.

It should be appreciated that a gaussian mixture model is a model that precisely quantizes a thing using a gaussian probability density function (also known as a normal distribution curve), decomposing a thing into several gaussian probability density function based models.

In operation S350, an ambiguity matrix is constructed using the gaussian mixture model, where the ambiguity matrix is used to represent the ambiguities of the data in the first training sample and the second training sample.

In an embodiment of the present disclosure, the ambiguity matrix is specifically expressed as:

wherein C is _f In order to provide a matrix of ambiguities,

representing the predicted value as +.>

Time-stamp value y _c Probability of N tableThe number of row and column elements in the fuzzy matrix is shown.

That is, in the ambiguity matrix, its elements

Representing the predicted value as +.>

Time-stamp value y _c Is a probability of (2).

For example, can be used

Respectively representing the ith sample, < ->

Estimated by the following formula:

wherein, the liquid crystal display device comprises a liquid crystal display device,

representing a prediction category n, and labeling a sample subset with the category m; />

A subset of samples representing a prediction class n;

representing a sample subset of the annotation class m; τ represents a blur threshold.

For example, the constructing the ambiguity matrix by using the gaussian mixture model specifically includes: and respectively constructing a confidence ambiguity matrix and a classification ambiguity matrix by using the Gaussian mixture model aiming at the first confidence loss and the first classification loss.

Note that the classification loss is reduced by using GMM model

And confidence loss->

In modeling separately, it is necessary to perform GMM modeling on confidence loss of all candidate frames, and only classification loss of candidate frames containing a target. According to the trained GMM, selecting a class with higher average loss value, and obtaining confidence ambiguity f for each candidate frame through the following steps _obj And classification ambiguity f _cls Is estimated by (a):

representing the predicted value of the i-th candidate box.

Based on the confidence ambiguity f obtained above _obj And classification ambiguity f _cls The confidence and classification ambiguity matrices may be estimated separately.

For example, for the confidence ambiguity matrix, n=2; and/or, for the classification ambiguity matrix, N is equal to the total number of classes of the classification output by the object detection model.

Referring to FIG. 4, in confidence ambiguity matrix C _{f_obj} Each row may correspond to a predicted value, and each column may correspond to a labeled value. For example, for one sample, confidence ambiguity matrix C _{f_obj} For a matrix of 2 x 2, the 1 st row and 1 st column element represents the probability that the predicted value is "there" and the labeled value is "there", the 1 st row and 2 nd column element represents the probability that the predicted value is "no" and the labeled value is "there", and the 2 nd row and 1 st column element tableThe element in row 2 and column 2 indicates the probability that the predicted value is "none" and the labeled value is "none". That is, confidence ambiguity matrix C for each sample _{f_obj} In other words, the elements in the diagonal line represent the case where the predicted value and the labeled value agree, that is, the elements in the diagonal line represent clear data.

In the classification ambiguity matrix C _{f_cls} Each row may correspond to a respective classification of the prediction and each column may correspond to a respective labeled classification, e.g., may be represented by class 1, class 2, …, class N, respectively. For example, for one sample, the classification ambiguity matrix C _{f_cls} As an N matrix, the 1 st row, 1 st column element represents the probability of the predicted value being "class 1" and the labeled value being "class 1", the 1 st row, 2 nd column element represents the probability of the predicted value being "class 1" and the labeled value being "class 2", and so on, and the 1 st row, N column element represents the probability of the predicted value being "class 1" and the labeled value being "class N". The elements of the other rows are similar. That is, the classification ambiguity matrix C for each sample _{f_cls} In other words, the elements in the diagonal line represent cases where the predicted value and the labeled value of the classification agree, that is, the elements in the diagonal line represent clear data.

In operation S360, a blur loss function is constructed from the blur degree matrix, wherein the blur loss function is configured to be able to distinguish between clear data and blur data in the first training sample and the second training sample.

In an embodiment of the present disclosure, the blur loss function is specifically not represented by the following formula (9):

l _f ＝-(1-r)((1-y ^* ) ^1-r y ^*r ) ^γ log(y ^* )-r((1-y ^* ) ^1-r y ^*r ) ^γ log(1-y ^* ) (9)

Wherein the proportion r of the blurred data is obtained by the following formula (10):

r＝1-diag(C _f ) (10)

It should be noted that the above loss function only describes the case when labeled as a positive example.

For example, fig. 7 is a schematic diagram of a blur loss function according to an embodiment of the present disclosure. In fig. 7, a comparison of the curve of the CE loss function with the curve of the blur loss function according to an embodiment of the present disclosure is shown. As can be seen from fig. 7, for high confidence samples (e.g., samples with confidence greater than 0.5), the fuzzy loss function may weight them, i.e., increase their weight; for low confidence samples (e.g., samples with confidence between 0.05 and 0.5), the fuzzy loss function may de-weight them, i.e., reduce their weight; for very low confidence samples (e.g., samples with confidence less than 0.05), the fuzzy loss function may correct the label of the sample, i.e., change the labeling value of the sample. Meanwhile, the fuzzy loss function is influenced by the parameter fuzzy proportion r, as r increases, the high confidence interval of the fuzzy loss function is further weighted, the low confidence interval is further weighted down, and the extremely low confidence interval is increased. That is, in the embodiment of the disclosure, the fuzzy loss function may adaptively distinguish the fuzzy sample from the clear sample according to the fuzzy data proportion, and correct the labeling of the fuzzy sample, so that the model learns the clear feature, thereby improving the detection performance.

In operation S370, a second loss function is constructed from the blur loss function.

In an embodiment of the present disclosure, the second loss function is specifically calculated by the following formula (11):

l＝σ _box ′·l _box +σ _obj ′·l _obj +σ _cls ′·l _cls (11)

where l is the second loss function, l _box Is the second frame loss, l _obj Is a second confidence loss, l _cls Is of the second classificationLoss, sigma _box ′、σ _robj ' and sigma _cls ' are the weight coefficients of the second bounding box penalty, the second confidence penalty, and the second classification penalty, respectively.

For example, the second bounding box loss employs a GIoU loss, and the second confidence loss and the second classification loss are both represented by the fuzzy loss function, i.e., calculated by equation (9) above.

By designing the fuzzy loss function, fuzzy information can be automatically extracted without manually processing the fuzzy data, and the fuzzy information is not completely discarded, so that the information can be fully utilized.

In operation S380, retraining the target detection model using the first training sample set, the second training sample set, and the second loss function.

For example, the target detection model may be retrained using a loss function shown in equation (11) and the parameters updated using a random gradient descent (Stochastic Gradient Descent, abbreviated SGD) algorithm.

During the training process, an iterative update may be performed, for example, the GMM training step may be returned again, and the iterative update performed until an ideal index or maximum training period is reached.

In the training method of the target detection model provided by the embodiment of the disclosure, the influence of terahertz image feature blurring and labeling blurring can be reduced, so that the detection rate of terahertz suspects and the suspects classification precision of the target detection model are improved, and the false alarm rate is reduced.

It should be noted that, in the embodiment of the present disclosure, the object detection model may be applicable to different model structures, including, but not limited to, first/second order object detection methods or detection methods such as anchor free, transducer, etc. That is, the training method according to embodiments of the present disclosure may optimize different model structures.

In the embodiment of the disclosure, the improvement of the training method on the practical application effect is mainly focused on the training stage, and the resource occupation and the detection speed in the model application test process are less affected.

Embodiments of the present disclosure also provide a target detection method, which may include the steps of: acquiring image data; inputting the image data into a target detection model, wherein the target detection model is trained by the method; and determining an object to be detected in the image data according to the output of the object detection model.

Based on the training method of the target detection model, the embodiment of the disclosure also provides a training device of the target detection model. The device will be described in detail below in connection with fig. 8.

As shown in fig. 8, the training apparatus 700 of the object detection model includes a training sample acquisition module 710, an object detection module 720, an ambiguity matrix construction module 730, an ambiguity loss function construction module 740, a loss function construction module 750, and a retraining module 760.

The training sample acquisition module 710 is configured to acquire a training sample set, where the training sample set includes a first training sample and a second training sample, the first training sample and the second training sample are used to characterize input and output of the target detection model, the first training sample includes clear data and fuzzy data, and the second training sample includes clear data and fuzzy data, respectively. In some exemplary embodiments, the training sample acquisition module 710 may be configured to perform the operation S310 described above, which is not described herein.

The target detection module 720 is configured to: and receiving the first training sample, and determining a first loss value of the target detection model by using a first loss function according to the output value of the target detection model and the second training sample. In some exemplary embodiments, the object detection module 720 may be configured to perform operations S320 and S330 described above, which are not described herein.

The ambiguity matrix construction module 730 includes a gaussian mixture model, and the ambiguity matrix construction module 730 is configured to: and receiving the first loss value, and constructing an ambiguity matrix by using the Gaussian mixture model, wherein the ambiguity matrix is used for representing the ambiguity of the data in the first training sample and the second training sample. In some exemplary embodiments, the ambiguity matrix construction module 730 may be configured to perform operations S340 and S350 described above, which are not described herein.

The blur loss function construction module 740 is configured to construct a blur loss function according to the blur degree matrix, where the blur loss function is configured to distinguish between the clear data and the blur data in the first training sample and the second training sample. In some exemplary embodiments, the fuzzy loss function construction module 740 may be used to perform the operation S360 described above, and will not be described again.

The loss function construction module 750 is configured to construct a second loss function according to the fuzzy loss function. In some exemplary embodiments, the loss function construction module 750 may be used to perform the operation S370 described above, which is not described herein.

The retraining module 760 is configured to retrain the target detection model using the first training sample set, the second training sample set, and the second loss function. In some exemplary embodiments, the retraining module 760 may be used to perform the operation S380 described above, and will not be described again.

According to embodiments of the present disclosure, any of the training sample acquisition module 710, the object detection module 720, the ambiguity matrix construction module 730, the ambiguity loss function construction module 740, the loss function construction module 750, and the retraining module 760 may be combined in one module to be implemented, or any of the modules may be split into a plurality of modules. Alternatively, at least some of the functionality of one or more of the modules may be combined with at least some of the functionality of other modules and implemented in one module. According to embodiments of the present disclosure, at least one of the training sample acquisition module 710, the object detection module 720, the ambiguity matrix construction module 730, the ambiguity loss function construction module 740, the loss function construction module 750, and the retraining module 760 may be implemented at least in part as hardware circuitry, such as a Field Programmable Gate Array (FPGA), a Programmable Logic Array (PLA), a system-on-chip, a system-on-substrate, a system-on-package, an Application Specific Integrated Circuit (ASIC), or as hardware or firmware in any other reasonable manner of integrating or packaging the circuitry, or as any one of or a suitable combination of any of the three implementations of software, hardware, and firmware. Alternatively, at least one of the training sample acquisition module 710, the object detection module 720, the ambiguity matrix construction module 730, the ambiguity loss function construction module 740, the loss function construction module 750, and the retraining module 760 may be implemented, at least in part, as a computer program module that, when executed, performs the corresponding functions.

As shown in fig. 9, an electronic device 800 according to an embodiment of the present disclosure includes a processor 801 that can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 802 or a program loaded from a storage section 808 into a Random Access Memory (RAM) 803. The processor 801 may include, for example, a general purpose microprocessor (e.g., a CPU), an instruction set processor and/or an associated chipset and/or a special purpose microprocessor (e.g., an Application Specific Integrated Circuit (ASIC)), or the like. The processor 801 may also include on-board memory for caching purposes. The processor 801 may include a single processing unit or multiple processing units for performing the different actions of the method flows according to embodiments of the disclosure.

For example, the electronic device may be a passive terahertz imaging apparatus.

In the RAM 803, various programs and data required for the operation of the electronic device 800 are stored. The processor 801, the ROM 802, and the RAM 803 are connected to each other by a bus 804. The processor 801 performs various operations of the method flow according to the embodiments of the present disclosure by executing programs in the ROM 802 and/or the RAM 803. Note that the program may be stored in one or more memories other than the ROM 802 and the RAM 803. The processor 801 may also perform various operations of the method flows according to embodiments of the present disclosure by executing programs stored in the one or more memories.

According to an embodiment of the present disclosure, the electronic device 800 may also include an input/output (I/O) interface 805, the input/output (I/O) interface 805 also being connected to the bus 804. The electronic device 800 may also include one or more of the following components connected to the I/O interface 805: an input portion 806 including a keyboard, mouse, etc.; an output portion 807 including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and a speaker; a storage section 808 including a hard disk or the like; and a communication section 809 including a network interface card such as a LAN card, a modem, or the like. The communication section 809 performs communication processing via a network such as the internet. The drive 88 is also connected to the I/O interface 805 as needed. A removable medium 811 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 88 as necessary, so that a computer program read out therefrom is mounted into the storage section 808 as necessary.

The present disclosure also provides a computer-readable storage medium that may be embodied in the apparatus/device/system described in the above embodiments; or may exist alone without being assembled into the apparatus/device/system. The computer-readable storage medium carries one or more programs which, when executed, implement methods in accordance with embodiments of the present disclosure.

According to embodiments of the present disclosure, the computer-readable storage medium may be a non-volatile computer-readable storage medium, which may include, for example, but is not limited to: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this disclosure, a computer-readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. For example, according to embodiments of the present disclosure, the computer-readable storage medium may include ROM 802 and/or RAM 803 and/or one or more memories other than ROM 802 and RAM 803 described above.

Embodiments of the present disclosure also include a computer program product comprising a computer program containing program code for performing the methods shown in the flowcharts. The program code, when executed in a computer system, causes the computer system to implement the item recommendation method provided by embodiments of the present disclosure.

The above-described functions defined in the system/apparatus of the embodiments of the present disclosure are performed when the computer program is executed by the processor 801. The systems, apparatus, modules, units, etc. described above may be implemented by computer program modules according to embodiments of the disclosure.

In one embodiment, the computer program may be based on a tangible storage medium such as an optical storage device, a magnetic storage device, or the like. In another embodiment, the computer program may also be transmitted, distributed, and downloaded and installed in the form of a signal on a network medium, and/or from a removable medium 811 via a communication portion 809. The computer program may include program code that may be transmitted using any appropriate network medium, including but not limited to: wireless, wired, etc., or any suitable combination of the foregoing.

In such an embodiment, the computer program may be downloaded and installed from a network via the communication section 809, and/or installed from the removable media 811. The above-described functions defined in the system of the embodiments of the present disclosure are performed when the computer program is executed by the processor 801. The systems, devices, apparatus, modules, units, etc. described above may be implemented by computer program modules according to embodiments of the disclosure.

According to embodiments of the present disclosure, program code for performing computer programs provided by embodiments of the present disclosure may be written in any combination of one or more programming languages, and in particular, such computer programs may be implemented in high-level procedural and/or object-oriented programming languages, and/or assembly/machine languages. Programming languages include, but are not limited to, such as Java, c++, python, "C" or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, partly on a remote computing device, or entirely on the remote computing device or server. In the case of remote computing devices, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., connected via the Internet using an Internet service provider).

The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

Those skilled in the art will appreciate that the features recited in the various embodiments of the disclosure and/or in the claims may be provided in a variety of combinations and/or combinations, even if such combinations or combinations are not explicitly recited in the disclosure. In particular, the features recited in the various embodiments of the present disclosure and/or the claims may be variously combined and/or combined without departing from the spirit and teachings of the present disclosure. All such combinations and/or combinations fall within the scope of the present disclosure.

The embodiments of the present disclosure are described above. However, these examples are for illustrative purposes only and are not intended to limit the scope of the present disclosure. Although the embodiments are described above separately, this does not mean that the measures in the embodiments cannot be used advantageously in combination. The scope of the disclosure is defined by the appended claims and equivalents thereof. Various alternatives and modifications can be made by those skilled in the art without departing from the scope of the disclosure, and such alternatives and modifications are intended to fall within the scope of the disclosure.

Claims

1. The training method of the target detection model based on fuzzy learning is characterized by comprising the following steps of:

Inputting the first training sample into the target detection model;

inputting the first loss value into a Gaussian mixture model;

constructing a second loss function according to the fuzzy loss function; and

2. The method of claim 1, wherein the first loss function comprises a first frame loss, a first confidence loss, and a first classification loss.

3. The method of claim 2, wherein the first frame loss employs a GIoU loss, the first confidence loss employs a mixed entropy loss, and the first classification loss employs a mixed entropy loss.

4. A method according to claim 3, wherein the first loss function is calculated in particular by:

wherein l _w Is a first loss function of the first set of coefficients,

is the first frame loss,/>

Is a first confidence loss,/>

5. The method of claim 4, wherein the mixed entropy loss comprises a cross entropy loss and a negative entropy loss.

6. The method according to claim 5, wherein the mixed entropy loss is calculated in particular by:

l _ME ＝l _CE +β·l _NE ＝-y·log(y ^* )-β·y ^* ·log(y ^* )

7. The method of claim 6, wherein the ambiguity matrix is expressed in particular as:

wherein C is _f In order to provide a matrix of ambiguities,

representing the predicted value as +.>

8. The method of claim 7, wherein the blur loss function is specifically represented by:

Wherein l _f Representation blurringA loss function, r, representing the proportion of blurred data; and gamma is more than or equal to 0 and is used for adjusting the sensitivity of the target detection model to data blurring.

9. The method of claim 8, wherein the proportion r of the blurred data is obtained by:

r＝1-diag(C _f )

10. The method according to any of claims 2-9, wherein the second loss function is calculated in particular by:

l＝σ _box ′·l _box +σ _obj ′·l _obj +σ _cls ′·l _cls

where l is the second loss function, l _box Is the second frame loss, l _obj Is a second confidence loss, l _cls Is the second classification loss, sigma _box ′、σ _obj ' and sigma _cls ' are the weight coefficients of the second bounding box penalty, the second confidence penalty, and the second classification penalty, respectively.

11. The method of claim 10, wherein the second bounding box loss employs a GIoU loss, and the second confidence loss and the second classification loss are both represented by the fuzzy loss function.

12. The method of claim 11, wherein the constructing an ambiguity matrix using the gaussian mixture model specifically comprises:

13. The method of claim 12, wherein N = 2 for the confidence ambiguity matrix; and/or the number of the groups of groups,

14. The method of any of claims 1-9, wherein the first training sample comprises image data acquired by scanning an imaging region with a terahertz imaging device; and/or the number of the groups of groups,

15. A method of detecting an object, comprising:

acquiring image data;

inputting the image data into a target detection model, wherein the target detection model is trained by the method of any one of claims 1-14; and

16. A training device for a fuzzy learning-based object detection model, comprising:

17. An electronic device, comprising:

one or more processors;

storage means for storing one or more programs,

wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to perform the method of any of claims 1-15.

18. The electronic device of claim 17, wherein the electronic device is a passive terahertz imaging apparatus.

19. A computer readable storage medium having stored thereon executable instructions which, when executed by a processor, cause the processor to perform the method according to any of claims 1 to 15.

20. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any one of claims 1 to 15.