CN112990381A

CN112990381A - Distorted image target identification method and device

Info

Publication number: CN112990381A
Application number: CN202110509224.5A
Authority: CN
Inventors: 杨帆; 刘利卉; 朱莹; 冯帅; 胡建国
Original assignee: Nanjing Zhenshi Intelligent Technology Co Ltd
Current assignee: Xiaoshi Technology Jiangsu Co ltd
Priority date: 2021-05-11
Filing date: 2021-05-11
Publication date: 2021-06-18
Anticipated expiration: 2041-05-11
Also published as: CN112990381B

Abstract

The invention discloses a distorted image target identification method, which uses an identification model to identify a target image in a distorted image; the identification model is a distortion correction identification model obtained by taking various types of target images collected from distorted images as training samples and training a deep neural network by using a distortion correction loss function; the distortion correction loss function is obtained by taking the classification confidence value of the corresponding class of the training sample on a common classification model as the distortion measurement parameter of the training sample and performing distortion correction on the basic loss function by using the distortion measurement parameter; the common classification model is obtained by training a deep neural network by taking each class of target images collected from undistorted images as training samples. The invention also discloses a distorted image target recognition device. The method can realize the identification of the distorted image target without any distorted image correction treatment, and has simple algorithm and easy realization.

Description

Distorted image target identification method and device

Technical Field

The invention relates to a target identification method, in particular to a distorted image target identification method, and belongs to the technical field of intersection of machine learning, image processing and computer vision.

Background

The distorted image represented by the image shot by the fisheye lens has the characteristics of short focal length and large field of view, and has wide market demand in an omnibearing vision system. The distorted image can reach an ultra-large visual angle close to or larger than 180 degrees, and a wider scene can be captured by utilizing the distorted image, so that the distorted image has great potential application value. For example, a video monitoring system applying distorted images to some public places adopts a ceiling-mounted mode, so that the scenes of the whole area can be recorded. Therefore, people do not need to install a plurality of monitoring cameras in different areas, and space, resources and use cost are saved. For example, people always encounter the situation in daily life, clearly feel beautiful in the front of eyes, and how the people cannot record the situation by using a photographing and shooting device in hands, which is mainly caused by the fact that the visual angle capability of the photographing and shooting device cannot reach the range which can be seen by human eyes.

Although the distorted image has the advantage of large field of view, which can reach or even exceed the range that human eyes can see, the ultra-large visual angle of the distorted image is achieved by sacrificing the appearance of the shot object in the original form; that is, the distorted image is distorted compared to the undistorted image, which also causes great difficulty in object detection, object recognition, and object tracking of the distorted image. In order to improve the detection and recognition accuracy, in the prior art, for the target detection and recognition of the distorted image, it is usually necessary to perform extremely complex image correction on the distorted image, and then perform subsequent model training and target detection and classification recognition by using the corrected image, which obviously has a huge cost.

Some researchers have proposed that the distorted image is directly used to train the recognition model, in order to overcome the influence of image distortion, the distorted image is divided into a plurality of regions according to the distortion degree, and a detector and a classifier are trained for each region with different distortions, for example, the scheme of "a method, a device and a computing device for training an object recognition neural network" disclosed in chinese patent application CN109840883A is too complicated to train the detector and the classifier respectively for different regions, although the problem of image distortion is considered. In addition, because image distortion is a continuous process, the manual marking of the image distortion degree is often greatly influenced by the main observation, and therefore, the marking of the distortion degree of the training sample is also a great difficulty.

Disclosure of Invention

The invention aims to overcome the defects of the prior art and provide a distorted image target identification method, which can realize the quick and accurate identification of a distorted image target without any distorted image correction processing, and has simple algorithm and easy realization.

The invention specifically adopts the following technical scheme to solve the technical problems:

a distorted image target identification method uses an identification model to identify a target image in a distorted image; the identification model is a distortion correction identification model obtained by taking various types of target images collected from distorted images as training samples and training a deep neural network by using a distortion correction loss function; the distortion correction loss function is obtained by taking the classification confidence value of the corresponding class of the training sample on a common classification model as the distortion measurement parameter of the training sample and performing distortion correction on the basic loss function by using the distortion measurement parameter; the common classification model is obtained by training a deep neural network by taking each class of target images collected from undistorted images as training samples.

As one preferable scheme, the base loss function is a contrast loss function contrast loss, and the distortion correction loss function simaritycontrast loss is specifically as follows:

in the formula (I), the compound is shown in the specification,drepresents all input of the iteration batchNSamples in a training sampleiAnd a samplejIdentifying a Euclidean distance of the feature; y is a sampleiAnd a samplejWhether or not it isLabels of the same class, y =1 represents that two samples are of the same class, and y =0 represents that they are not of the same class; margin is a set threshold; max () is a maximum function;similarityis a sampleiAnd a samplejNormalized similarity between classification confidence values of corresponding classes on a common classification model.

Further preferably, the similarity is a euclidean distance or an L1 distance.

As another preferred embodiment, the base loss function is a cross entropy loss function crossentropy loss, and the distortion correction cross entropy loss function, micromeritycross entropy loss, is specifically as follows:

in the formula, N represents the number of samples;irepresents a sample number; c represents a category number; m represents the total number of categories;y _icis a class mark variable, if the class and sampleiIs 1 if the same, otherwise is 0;p _icrepresenting a sampleiBelong to the categorycLog is a logarithmic function,score _icis shown asiThe samples correspond to the second sample on the common classification modelcA classification confidence value for each class category.

Based on the same inventive concept, the following technical scheme can be obtained:

a distorted image target recognition device comprises a recognition model for recognizing a target image in a distorted image; the identification model is a distortion correction identification model obtained by taking various types of target images collected from distorted images as training samples and training a deep neural network by using a distortion correction loss function; the distortion correction loss function is obtained by taking the classification confidence value of the corresponding class of the training sample on a common classification model as the distortion measurement parameter of the training sample and performing distortion correction on the basic loss function by using the distortion measurement parameter; the common classification model is obtained by training a deep neural network by taking each class of target images collected from undistorted images as training samples.

in the formula (I), the compound is shown in the specification,drepresents all input of the iteration batchNSamples in a training sampleiAnd a samplejIdentifying a Euclidean distance of the feature; y is a sampleiAnd a samplejWhether the samples are labels of the same class, y =1 represents that the two samples are of the same class, and y =0 represents that the samples are not of the same class; margin is a set threshold; max () is a maximum function;similarityis a sampleiAnd a samplejNormalized similarity between classification confidence values of corresponding classes on a common classification model.

Further preferably, the similarity is a euclidean distance or an L1 distance.

Compared with the prior art, the technical scheme of the invention has the following beneficial effects:

the invention firstly proposes that the classification confidence value of the corresponding class of the distorted target image on the common classification model is taken as the distortion measurement parameter of the distorted target image, the parameter can better distinguish the difference of the distortion degree of the same object at different positions in the distorted image, and distortion correction is carried out on the training loss function in the recognition model based on the distortion measurement parameter, so that the trained distortion correction recognition model can accurately recognize and classify the distorted target image; in addition, the technical scheme of the invention does not need to carry out distortion image correction or complex operations such as regional training detectors, classifiers and the like, so that the method is easier to realize, has better real-time performance and has less requirements on software and hardware resources compared with the prior art.

Detailed Description

In order to quickly and accurately identify a distorted image target, the method takes the classification confidence value of the corresponding class of the distorted target image on a common classification model as the distortion measurement parameter of the distorted target image, and carries out distortion correction on a training loss function in an identification model based on the distortion measurement parameter, so that the trained distortion correction identification model can accurately identify and classify the distorted target image; and because complex operations such as distorted image correction or regional training detectors, classifiers and the like are not required, the method is easier to realize, has better real-time performance and has less requirements on software and hardware resources.

In a distorted image represented by an image captured by a fisheye camera, the degree of image distortion of the same object is different at different positions, and therefore it is necessary to characterize the degree of distortion more accurately. If the common classification model obtained by training the deep neural network by taking each class of target images acquired from the undistorted images as training samples classifies the images of the same object at different positions under the fisheye camera, the classification confidence value of the corresponding class of the object labeling class on the common classification model can be obtained. The classification confidence value with the variation range of [0,1] can reflect the image distortion degree, the classification confidence values of the same object at different positions under the influence of distortion are different, and at the position with smaller distortion, the distorted object image is the object image closest to the common camera, so the classification confidence value is the highest; where the distortion is large, the distorted object image will be significantly different from the object image under normal cameras, and the classification confidence value identification should be low. Therefore, the invention proposes that the classification confidence is used as an index for evaluating the image distortion, and the training loss function of the recognition model is subjected to distortion correction according to the index. The distorted image target identification method provided by the invention specifically comprises the following steps:

identifying a target image in the distorted image by using an identification model; the identification model is a distortion correction identification model obtained by taking various types of target images collected from distorted images as training samples and training a deep neural network by using a distortion correction loss function; the distortion correction loss function is obtained by taking the classification confidence value of the corresponding class of the training sample on a common classification model as the distortion measurement parameter of the training sample and performing distortion correction on the basic loss function by using the distortion measurement parameter; the common classification model is obtained by training a deep neural network by taking each class of target images collected from undistorted images as training samples.

For the public to understand, the following will explain the technical solution of the present invention in further detail by taking the automatic identification of bottled beverages in vending machines as an example:

first, training samples are prepared and labeled. The method comprises the steps of acquiring bottled beverage images of all categories by using a fish-eye camera, marking a visible bottle cap area of each beverage in the images in a rectangular frame mode manually, and recording the corresponding beverage category. Intercepting and storing bottle cap areas in the images according to manually marked rectangular frames, and sending the bottle cap areas into a common classification model for classification and identification, wherein the common classification model is obtained by training bottled beverage images of all categories acquired by a common camera, and the classification categories at least comprise all categories of bottled beverages sold by vending machines, and can be a common classification model capable of identifying more other types of beverages; the common classification model classifies and identifies the intercepted bottle cap region images (namely the target images) of various categories according to the known labeling categories to obtain classification confidence values of the corresponding categories on the common classification model of each target image, the value of the classification confidence value is in the range of 0-1, the classification confidence value of each target image is recorded, the distortion degree of the target image is represented by the classification confidence value, and the larger the value is, the larger the distortion is.

And (3) taking the bottle cap area images of the classes marked with the classes and obtained classification confidence values of the corresponding classes on the common classification model as training samples, and sending the training samples into a deep neural network (such as rene 18) for training to obtain corresponding recognition features. Assuming that the number N of training samples fed into the neural network at a time is 32, and the identification feature dimension of each image is 1 × 512, the identification feature matrix dimension of this batch of images is 32 × 512. Under the view angle of a common camera, the image has no distortion, the identification difficulty is low, and the class is easy to separate from the class. However, under the influence of distortion, target images of different classes may be similar under the condition of larger distortion under the distorted view angle, and if a traditional loss function is still adopted, the inseparability between classes or the confusion of class-to-class boundaries can be caused. Therefore, the method takes the traditional loss function (such as cross entropy loss function Cross EntropyLoss, contrast loss function ContrastLoss and the like) as the basic loss function, and utilizes the classification confidence value of the corresponding class of the training sample on the common classification model to carry out distortion correction on the basic loss function. The distortion correction loss function obtained has different forms according to the difference of the basic loss function, and is described below by taking contrast loss and cross control loss as examples.

The conventional ContrastLoss function is shown as follows:

in the formula (I), the compound is shown in the specification,drepresents all input of the iteration batchNSamples in a training sampleiAnd a samplejIdentifying a Euclidean distance of the feature; y is a sampleiAnd a samplejWhether or not it isLabels of the same class, y =1 represents that two samples are of the same class, and y =0 represents that they are not of the same class; margin is a set threshold; max () is a function taking the maximum value.

The distortion correction loss function SimiarityContraslos obtained by improving the contrast loss function ContrasLoss is concretely as follows:

Specifically, the 32 identification features are combined one by one without repeating one another to obtain 16 × 31 combined pairs in total, the 16 × 31 combined pairs are marked again according to the marked class mark pairs of the 32 images, the new class of the same class mark pair combination is marked as 1, the new class of the different classes is marked as 0, namely y in the formula; and simultaneously calculating the similarity of classification confidence values of two training samples in each combination pair of 16-31 combinations on the corresponding classes of the common classification model, wherein the measurement of the similarity can specifically adopt common measurement modes such as Euclidean distance, L distance and the like, and then normalizing the obtained similarity to [0,1]]Within the range of (1), namely obtaining the correspondingsimilarity. Calculating network loss according to the distortion correction loss function SimiarityContrastloss and updating network parameters; and analogizing in sequence, training all the training samples until the network converges to a preset index or reaches a preset iteration number, and finishing the training to obtain a trained distortion correction recognition model.

The target test image detected from the distorted image can be accurately identified by adopting the trained distortion correction identification model. During specific identification, selecting a bottle cap area image with small distortion under a fisheye camera as a bottom library picture for each category of beverage in advance, calculating and storing the identification characteristics of each category of bottom library pictures through a trained distortion correction main network model; and sending the bottle cap area image detected by the detector into a trained distortion correction main body network model, calculating corresponding identification features, and comparing the identification features with the identification features of all categories in the bottom library, wherein the category with the highest similarity is regarded as the identification category. Because the trained recognition network has strong feature extraction capability, when the newly added categories are responded, additional labels and data are not needed, and features different from the existing categories can be extracted, so that frequent iterative model for the newly added commodity categories can be avoided.

(II) the expression of the traditional cross entropy loss function Cross EntropyLoss is as follows:

in the formula, N represents the number of samples;irepresents a sample number; c represents a category number; m represents the total number of categories;y _icis a class mark variable, if the class and sampleiIs 1 if the same, otherwise is 0;p _icrepresenting a sampleiBelong to the categorycLog is a logarithmic function.

The distortion correction cross entropy loss function similitycross entropy loss load obtained by improving the traditional cross entropy loss function cross entropy loss load is as follows:

Specifically, the 32 1 × 512-dimensional identification features are sent to a full connection layer, the output dimension of the full connection layer corresponds to the total number of sample categories, each image outputs 1 × category number classification confidence value features, the corresponding classification confidence value matrix dimension of the 32 images corresponds to 32 × category numbers, the classification confidence value matrix corresponding to the 32 images, the corresponding class labels and the corresponding distortion scores of each image are sent to a corresponding distortion correction cross entropy loss function to calculate network loss and update network parameters; and analogizing in turn, training all the training samples until the network converges to a preset index or reaches a preset iteration number, finishing the training, and obtaining a trained distortion correction recognition (classification) model.

The distortion correction identification (classification) model finally has a fully-connected layer, the output dimension of the fully-connected layer corresponds to the total number of categories of the commodities, and the indexes of each column correspond to the categories one by one. During specific identification, the bottle cap area image detected by the detector is sent to a trained distortion correction identification (classification) model, a classification confidence value with the corresponding dimension being 1 x the number of classes is calculated, and the class mark corresponding to the maximum value in the classification confidence value is taken as an identification result. Because the classification type of the classification network is fixed, manual marking and data are needed when the classification is newly added, and if the times of newly adding commodities are frequent, the classification model needs frequent iteration and is not beneficial to maintenance.

In order to verify the effect of the technical scheme of the invention, 10 categories of distorted image test sets under the independent sales counter are adopted, and 300 images in each category are used as the distorted image test sets. 10 classes of undistorted images, 300 pieces of each class are used as a undistorted image test set. The statistical indicator is the accuracy, i.e. total number of images identified (classified) correct/total number of images. In table 1, the deep neural networks of the normal recognition model and the recognition model of the present invention are both resnet18, the loss function of the normal recognition model is constraint loss, and the loss function of the recognition model of the present invention is similarity constraint loss. In table 2, the deep neural networks of the normal classification model and the classification model of the present invention are both resnet18, the loss function of the normal classification model is a cross entropy loss function, and the loss function of the classification model of the present invention is a distortion correction cross entropy loss function.

TABLE 1

	Test set of distorted images	Undistorted image test set
			Common recognition model	87.7%	92.3%
Identification model of the invention	91.3%	93.3%

TABLE 2

	Test set of distorted images	Undistorted image test set
			General classification model	82.5%	94.2%
The invention classification model	92.6%	94.9%

The present invention is directed to object recognition of a distorted image represented by an image captured by a fisheye camera. Because the image distortion is a continuous process, and the distortion degree of the manually marked image is often greatly influenced by subjectivity, the method for automatically marking the distortion degree of the distorted image by utilizing the conventional common classification model is provided by the invention, and additional manpower is not needed in the process. Secondly, the existing recognition algorithm needs to correct images under a fisheye camera or a distorted visual angle, the distorted information of the images is not utilized to supervise and learn in the training process, or the distorted areas need to be forcibly divided, and each area trains a model. On the basis of the existing deep neural network, the image distortion information is fused to train the recognition model, so that the model can still obtain good recognition accuracy under the distorted image with rich changes, correction or other complex operations are not needed to be carried out on the image, different models are not needed to be trained in different regions, and the method has remarkable advantages in the aspects of instantaneity and implementation cost.

Claims

1. A distorted image target identification method uses an identification model to identify a target image in a distorted image; the method is characterized in that the identification model is a distortion correction identification model obtained by taking various types of target images collected from distorted images as training samples and training a deep neural network by using a distortion correction loss function; the distortion correction loss function is obtained by taking the classification confidence value of the corresponding class of the training sample on a common classification model as the distortion measurement parameter of the training sample and performing distortion correction on the basic loss function by using the distortion measurement parameter; the common classification model is obtained by training a deep neural network by taking each class of target images collected from undistorted images as training samples.

2. A distorted image object identifying method as claimed in claim 1, wherein the basic loss function is contrast loss function contrast loss, and the distortion correction loss function simaritycontastloss is specified as follows:

3. A distorted image object identifying method as claimed in claim 2, characterized in that the similarity is euclidean distance or L1 distance.

4. A distorted image object recognition method as claimed in claim 1, wherein the base loss function is a cross entropy loss function crossentropy loss, and the distortion correction cross entropy loss function, similaritycross entropy loss, is specified as follows:

in the formula, N represents the number of samples;irepresents a sample number; c represents a category number;m represents the total number of categories;y _icis a class mark variable, if the class and sampleiIs 1 if the same, otherwise is 0;p _icrepresenting a sampleiBelong to the categorycLog is a logarithmic function,score _icis shown asiThe samples correspond to the second sample on the common classification modelcA classification confidence value for each class category.

5. A distorted image target recognition device comprises a recognition model for recognizing a target image in a distorted image; the method is characterized in that the identification model is a distortion correction identification model obtained by taking various types of target images collected from distorted images as training samples and training a deep neural network by using a distortion correction loss function; the distortion correction loss function is obtained by taking the classification confidence value of the corresponding class of the training sample on a common classification model as the distortion measurement parameter of the training sample and performing distortion correction on the basic loss function by using the distortion measurement parameter; the common classification model is obtained by training a deep neural network by taking each class of target images collected from undistorted images as training samples.

6. A distorted image object identifying device as claimed in claim 5, wherein the basic loss function is contrast loss function contrast loss, and the distortion correction loss function simaritycontrast loss is specified as follows:

7. A distorted image object identifying device as claimed in claim 6, characterized in that the similarity is euclidean distance or L1 distance.

8. A distorted image object identifying device as claimed in claim 5, wherein the base loss function is a cross entropy loss function crossentropy loss, and the distortion correcting cross entropy loss function, similaritycross entropy loss, is specified as follows: