CN114037886A - Image recognition method and device, electronic equipment and readable storage medium - Google Patents

Image recognition method and device, electronic equipment and readable storage medium Download PDF

Info

Publication number
CN114037886A
CN114037886A CN202111300902.3A CN202111300902A CN114037886A CN 114037886 A CN114037886 A CN 114037886A CN 202111300902 A CN202111300902 A CN 202111300902A CN 114037886 A CN114037886 A CN 114037886A
Authority
CN
China
Prior art keywords
image
training
recognized
obtaining
similarity
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111300902.3A
Other languages
Chinese (zh)
Inventor
张伟东
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing Unisinsight Technology Co Ltd
Original Assignee
Chongqing Unisinsight Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing Unisinsight Technology Co Ltd filed Critical Chongqing Unisinsight Technology Co Ltd
Priority to CN202111300902.3A priority Critical patent/CN114037886A/en
Publication of CN114037886A publication Critical patent/CN114037886A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/0002Inspection of images, e.g. flaw detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30168Image quality inspection

Abstract

The application provides an image recognition method, an image recognition device, an electronic device and a readable storage medium, wherein feature information of an image to be recognized is obtained by using a recognition model obtained through training, and a quality score of the image to be recognized is obtained by using an evaluation model obtained through training. And then calculating the similarity between the feature information of the image to be recognized and the feature information of each preset image in the plurality of preset images, obtaining a re-recognition result of the image to be recognized based on the obtained plurality of similarities, and obtaining an attribute recognition result of the image to be recognized based on the quality score and each similarity of the image to be recognized, wherein the attribute recognition result can represent whether the image to be recognized has the same target object as the preset image corresponding to each similarity. According to the scheme, in the process of determining the attribute identification result, the image quality score is taken into consideration, the attribute identification result can be judged adaptively based on the image quality score, and the accuracy and the recall rate of the attribute identification result are improved.

Description

Image recognition method and device, electronic equipment and readable storage medium
Technical Field
The present application relates to the field of machine learning technologies, and in particular, to an image recognition method, an image recognition apparatus, an electronic device, and a readable storage medium.
Background
In the fields of intelligent transportation and the like, the rapid retrieval and target attribute identification of objects such as non-motor vehicles, pedestrians, motor vehicles and the like are widely required, traffic polices can be assisted to maintain traffic order, and target personnel can be assisted to be rapidly found.
Currently, a common processing method is to use a machine learning model to realize image recognition processing, so as to determine whether a target object exists. When judging whether a target object exists, the image to be recognized is generally compared with a preset known image, and a threshold value is set for judgment. However, since there is a great difference between the image qualities of different images, the method of setting the fixed threshold in the prior art may cause a problem that the recall rate of high-quality images is reduced or the accuracy rate of low-quality images is reduced.
Disclosure of Invention
An object of the present application includes, for example, providing an image recognition method, apparatus, electronic device and readable storage medium, which can improve the accuracy and recall of image attribute recognition results.
The embodiment of the application can be realized as follows:
in a first aspect, the present application provides an image recognition method, including:
obtaining characteristic information of an image to be recognized by using a recognition model obtained by training;
obtaining the quality score of the image to be recognized by using the trained evaluation model;
calculating the similarity between the feature information of the image to be identified and the feature information of each preset image in a plurality of preset images;
obtaining a re-recognition result of the image to be recognized based on the obtained multiple similarities, and obtaining an attribute recognition result of the image to be recognized based on the quality score of the image to be recognized and each similarity, wherein the attribute recognition result represents whether the image to be recognized has a target object which is the same as a preset image corresponding to each similarity.
In an alternative embodiment, the method further includes a step of training the obtained recognition model in advance, and the step includes:
training a constructed first network model by using a plurality of collected training samples, and obtaining characteristic information of each training sample obtained by the first network model;
calculating the quality score of each training sample based on the characteristic information of a plurality of training samples;
and adjusting the training weight of each training sample according to the quality score of each training sample, and continuing training the first network model until a preset requirement is met, so as to obtain an identification model corresponding to the first network model.
In an alternative embodiment, the method further comprises the step of pre-training the evaluation model, which comprises:
after the first network model is trained and iterated, training a constructed second network model by using the plurality of training samples based on the obtained quality scores of the training samples;
after the training weight of each training sample is adjusted and the first network model is continuously trained to obtain an updated quality score, the second network model is trained based on the updated quality score until a preset requirement is met, and an evaluation model corresponding to the second network model is obtained.
In an optional embodiment, the step of calculating a quality score of each training sample based on feature information of a plurality of training samples includes:
aiming at each training sample, calculating to obtain an intra-class distance according to the characteristic information of the training sample and the training samples belonging to the same class, and calculating to obtain an inter-class distance according to the characteristic information of the training samples and the training samples belonging to different classes;
and obtaining the quality score of each training sample based on the intra-class distance and the inter-class distance of each training sample.
In an optional embodiment, before the step of calculating the similarity between the feature information of the image to be recognized and the feature information of each of the preset images in the plurality of preset images, the method further includes:
and performing dimension reduction processing on the feature information of the image to be identified to obtain feature information with set dimension.
In an optional implementation manner, the step of obtaining feature information of an image to be recognized by using the recognition model obtained by training includes:
determining the area where the target object is located in each preset image, and performing pixel shielding processing on the area, which does not contain the target object, of each preset image;
executing pixel shielding processing of the same area of each preset image on the image to be recognized;
and obtaining the characteristic information of the image to be recognized and each preset image after the pixel shielding processing by using the recognition model obtained by training.
In an optional implementation manner, the step of obtaining an attribute identification result of the image to be identified based on the quality score of the image to be identified and each of the similarities includes:
setting a discrimination threshold value based on the quality score of the image to be recognized;
and obtaining an attribute identification result of the image to be identified according to the discrimination threshold and each similarity.
In an optional implementation manner, the step of obtaining an attribute identification result of the image to be identified according to the discrimination threshold and each of the similarities includes:
comparing each similarity with the discrimination threshold, and if the similarity is smaller than the discrimination threshold, determining that the image to be recognized does not have a target object which is the same as a preset image corresponding to the similarity;
and if the similarity is greater than or equal to the discrimination threshold, determining that the image to be recognized has the same target object as the preset image corresponding to the similarity.
In a second aspect, the present application provides an image recognition apparatus, the apparatus comprising:
the information acquisition module is used for acquiring the characteristic information of the image to be recognized by using the recognition model obtained by training;
the score obtaining module is used for obtaining the quality score of the image to be recognized by utilizing the trained evaluation model;
the calculation module is used for calculating the similarity between the feature information of the image to be identified and the feature information of each preset image in a plurality of preset images;
and the result obtaining module is used for obtaining a re-identification result of the image to be identified based on the obtained multiple similarities, and obtaining an attribute identification result of the image to be identified based on the quality score of the image to be identified and each similarity, wherein the attribute identification result represents whether the image to be identified has a target object which is the same as a preset image corresponding to each similarity.
In a third aspect, the present application provides an electronic device, comprising: a memory for storing a computer program and a processor for executing the computer program to implement the image recognition method according to any one of the preceding embodiments.
In a fourth aspect, the present application provides a computer-readable storage medium for storing a computer program, wherein the computer program, when executed by a processor, implements the image recognition method according to any of the previous embodiments.
The beneficial effects of the embodiment of the application include, for example:
the application provides an image recognition method, an image recognition device, an electronic device and a readable storage medium, wherein feature information of an image to be recognized is obtained by using a recognition model obtained through training, and a quality score of the image to be recognized is obtained by using an evaluation model obtained through training. And then calculating the similarity between the feature information of the image to be recognized and the feature information of each preset image in the plurality of preset images, obtaining a re-recognition result of the image to be recognized based on the obtained plurality of similarities, and obtaining an attribute recognition result of the image to be recognized based on the quality score and each similarity of the image to be recognized, wherein the attribute recognition result can represent whether the image to be recognized has the same target object as the preset image corresponding to each similarity. According to the scheme, in the process of determining the attribute identification result, the image quality score is taken into consideration, the attribute identification result can be judged adaptively based on the image quality score, and the accuracy and the recall rate of the attribute identification result are improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are required to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained from the drawings without inventive effort.
Fig. 1 is a schematic view of an application scenario of an image recognition method according to an embodiment of the present application;
fig. 2 is a flowchart of an image recognition method according to an embodiment of the present application;
fig. 3 is a flowchart of a model training method in the image recognition method according to the embodiment of the present application;
fig. 4 is a schematic structural diagram of a network model provided in an embodiment of the present application;
FIG. 5 is a flowchart of sub-steps included in step S102 of FIG. 3;
fig. 6 is another flowchart of a model training method in the image recognition method according to the embodiment of the present disclosure;
FIG. 7 is a flowchart of sub-steps included in step S201 of FIG. 2;
fig. 8 is a schematic view of an image after pixel occlusion processing according to an embodiment of the present application;
FIG. 9 is a flowchart of sub-steps involved in step S204 of FIG. 2;
fig. 10 is a block diagram of an electronic device according to an embodiment of the present application;
fig. 11 is a functional block diagram of an image recognition apparatus according to an embodiment of the present application.
Icon: 110-a storage medium; 120-a processor; 130-image recognition means; 131-an information obtaining module; 132-score obtaining module; 133-a calculation module; 134-result obtaining module; 140-communication interface.
Detailed Description
For detection applications such as non-motor vehicles, pedestrians and the like based on image recognition, many processing methods exist at present. For example, the existing processing methods include a multi-scale pedestrian re-identification method adopting multi-granularity depth feature fusion, and mainly include that a residual network is selected, global coarse-granularity fusion features, local coarse-granularity fusion features and local attention fine-granularity fusion features are mined, all the features are fused for re-identification, and pressure brought to re-identification by complex background or attitude change is relieved. However, in the method, various branches exist in the network, the training difficulty is high, and the resource consumption is high when the application is deployed.
In addition, in the prior art, a pedestrian re-recognition mode based on PTGAN (person Transfer gan) is adopted, and mainly a style background migration algorithm of PTGAN is trained in advance, so that a target background to be retrieved under a current camera is converted into a background under a target camera during re-recognition, and then re-recognition is performed, and the problem of high difficulty in cross-camera retrieval is solved. However, the PTGAN network in this method is not flexible enough, and the background in the real application scene is complex and cannot be handled well.
In addition, in the prior art, a method of fusing pedestrian re-identification and feature identification based on deep learning is adopted, and mainly by fusing any combination of pedestrian re-identification and pedestrian attribute (such as gender and age) identification, whether the pedestrian is the same person is judged according to the re-identification and attribute identification results. However, this method requires a lot of manpower to label each attribute label of the pedestrian, and the network has many branches, and the training and deployment process is complicated.
Therefore, in the prior art, in order to improve the recall rate and accuracy of image recognition processing, most of the adopted processing modes have the defects of complex network models or large amount of manpower labeling.
In view of the above problems in the prior art, the present application provides an image identification scheme, which can adaptively determine an attribute identification result based on an image quality condition by taking into account an image quality score in an attribute identification result determination process, thereby improving accuracy and recall rate of the attribute identification result.
In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. The components of the embodiments of the present application, generally described and illustrated in the figures herein, can be arranged and designed in a wide variety of different configurations.
Thus, the following detailed description of the embodiments of the present application, presented in the accompanying drawings, is not intended to limit the scope of the claimed application, but is merely representative of selected embodiments of the application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures.
In the description of the present application, it is noted that the terms "first", "second", and the like are used merely for distinguishing between descriptions and are not intended to indicate or imply relative importance.
It should be noted that the features of the embodiments of the present application may be combined with each other without conflict.
Please refer to fig. 1, which is a schematic view of an application scenario of the image recognition method according to the embodiment of the present application, where the scenario includes a plurality of terminal devices, and the plurality of terminal devices may be in communication connection with the processing device to implement interaction of data and information.
Each terminal device may be a camera device, such as a camera device installed beside a road, and may be configured to capture images of pedestrians, vehicles, and the like on the road and beside the road, and send the captured images to the processing device.
The processing device may be a backend server, such as a backend server of a traffic administration department that provides a traffic administration platform. The processing device can perform analysis processing based on the received images sent by the terminal devices, so that target re-identification and attribute identification are realized. The target re-identification refers to the process of comparing and retrieving images to be re-identified and retrieved through a certain technology and huge images of a bottom library, and the technology can output the same or similar images to be re-identified and retrieved under the same camera or different cameras. Attribute recognition refers to some characteristic recognition of objects in an image, such as whether to include non-motor vehicles, whether a person in a vehicle is wearing white clothing, and the like.
The processing device may be a single server, or may be a server cluster formed by a plurality of servers, and this embodiment is not particularly limited.
It is understood that the scenario shown in fig. 1 is only one possible example, and in other possible embodiments, the scenario may include only a part of the components shown in fig. 1 or may also include other components.
Fig. 2 is a flowchart illustrating an image recognition method provided in an embodiment of the present application, where the image recognition method may be implemented by the processing device shown in fig. 1. It should be understood that, in other embodiments, the order of some steps in the image recognition method of the present embodiment may be interchanged according to actual needs, or some steps may be omitted or deleted. The detailed steps of the image recognition method are described below.
S201, obtaining the characteristic information of the image to be recognized by using the recognition model obtained by training.
And S202, obtaining the quality score of the image to be recognized by using the trained evaluation model.
S203, calculating the similarity between the feature information of the image to be identified and the feature information of each preset image in a plurality of preset images.
S204, obtaining re-identification results of the images to be identified based on the obtained multiple similarities, and obtaining attribute identification results of the images to be identified based on the quality scores of the images to be identified and the similarities, wherein the attribute identification results represent whether the images to be identified have the same target objects as preset images corresponding to the similarities.
In this embodiment, the recognition model and the evaluation model may be obtained in advance based on training of the training sample, where the recognition model obtained through training may be used to process the image, and the feature information of the image may be obtained through the recognition model. And the evaluation model can be used for evaluating the quality of the image, wherein the quality evaluation result of the image can embody the excellent degree of the image recognition effect.
On the basis of obtaining the recognition model and the evaluation model through pre-training, in an actual application stage, an image to be recognized, which needs to be recognized and processed, is obtained, wherein the image to be recognized can be an image acquired by any terminal device in communication connection with the processing device. The recognition processing of the image to be recognized may include recognizing the similarity between the image to be recognized and a plurality of preset images, and may also include recognizing whether the image to be recognized has a target object, where the target object may be, for example, a non-motor vehicle, a pedestrian with a white jacket, or the like. The setting of the target object may be determined as required.
The preset images may be images pre-stored in a database, and each preset image may be an object to be compared. The characteristic information of the image to be recognized can be obtained by utilizing the recognition model obtained through pre-training, and the quality score of the image to be recognized can be obtained through the evaluation model obtained through pre-training.
On the basis, the similarity between the image to be recognized and each preset image is calculated, and the similarity can be calculated based on the feature information of the image to be recognized and the feature information of each preset image. For example, euclidean distances may be used to represent the similarity between images.
On the basis of obtaining the similarity between the image to be recognized and each preset image, the preset images can be sequenced from high to low in similarity, and the sequenced result can be used as a re-recognition result of the image to be recognized.
In addition, the attribute identification result of the image to be identified can be obtained based on the quality score and each similarity of the image to be identified. The quality score of the image to be recognized can reflect the image quality of the image to be recognized, and generally, the image with high quality is more easily recognized correctly and the image with low quality is more easily recognized wrongly in the implementation process. If the images of different quality conditions all adopt the same judgment standard indiscriminately, the recall rate of high-quality images may be reduced, and the accuracy rate of low-quality images may be reduced. Therefore, in the process of attribute identification, the quality score of the image to be identified is taken into consideration, and the attribute identification can be judged adaptively based on the high-low condition of the image to be identified.
The attribute identification result can represent whether the image to be identified has the same target object as each preset image. For example, the objective of the attribute recognition is to determine whether the image to be recognized includes a non-motor vehicle, and each of the plurality of preset images may include a non-motor vehicle image, or a part of the preset images includes a non-motor vehicle image, and the other part of the preset images does not include a non-motor vehicle image.
Based on the quality score of the image to be recognized and the similarity between the image to be recognized and each preset image, an attribute recognition result of the image to be recognized can be obtained, namely, whether the image to be recognized has the same target object of each preset image or not is obtained.
In the scheme, in the process of determining the attribute identification result, the image score is taken into consideration, the attribute identification result can be judged adaptively based on the image score, and the accuracy and the recall rate of the attribute identification result can be improved.
Referring to fig. 3, the process of obtaining the recognition model and the evaluation model by pre-training will be described first.
S101, training the constructed first network model by using a plurality of collected training samples, and obtaining the characteristic information of each training sample obtained by the first network model.
And S102, calculating the quality score of each training sample based on the characteristic information of a plurality of training samples.
S103, training the first network model after adjusting the training weight of each training sample according to the quality score of each training sample until a preset requirement is met, and obtaining an identification model corresponding to the first network model.
In this embodiment, images collected by terminal devices at different positions in a historical generation may be obtained as training samples. The training samples can be divided into a plurality of groups, the training samples can be clustered through a clustering algorithm, and then the intra-class and inter-class cleaning treatment is carried out in combination with manual treatment. The same group may contain multiple training samples, which may be images taken by different terminal devices of the same person and the same non-motor vehicle. The multiple training samples in the same group can be marked by using the same ID, so that the multiple training samples with the same ID can have various different backgrounds, postures and the like, and the forms of the training samples are enriched.
Referring to fig. 4, in the present embodiment, the first network model may be a Deep Layer Aggregation (DLANet) network model, which takes Dla60X as a main body part, and a non-reference module SimAm is incorporated therein, so that the network training focuses more on the important information of each training sample. In addition, an External attention module is added into the first network model, so that the model learns potential relations among different training samples.
The first network model is provided with a full connection layer FC behind a main body part, the characteristic dimension of the input full connection layer is set to be 1024 dimensions, in addition, an SGD (Stochastic Gradient Descenergy) can be selected by an optimizer, and the learning rate attenuation measurement adopts fixed step length attenuation.
In this embodiment, when the training sample is used to train the first network model, the adopted measurement Loss function may be triple Loss and arcfacce Loss, and the comprehensive Loss function of the first network model obtained by combining the triple Loss and the arcfacce Loss is as follows:
L=λ1Tripletloss+λ2Arcfaceloss
wherein λ is1And λ2Weights for Tripletloss and arcfacfeloss, respectively, during network training.
Before inputting the training samples into the first network model, each training sample may be preprocessed, which may include random cropping, random blurring noise, random brightening and darkening, data enhancement, random data erasure enhancement, and the like, and finally scaled to a size of 192 × 288 width and then input into the first network model to train the model.
The sample label of each training sample input into the first network model may be an ID to which each training sample belongs, and after the processing of the first network model, the model may finally output an output ID of each training sample, and may also obtain feature information of each training sample.
And judging the excellence of the training effect of the first network model, wherein the judgment can be carried out through the quality scores of the training samples. The quality score of each training sample can be calculated based on the feature information of the training sample obtained by the first network model.
In general, in implementation, an image with higher image quality is more easily recognized, and an image with lower image quality is more easily misrecognized. Therefore, in the process of model training, the learning strength of the image with lower image quality can be improved, so that the model can learn more feature information of the image with lower image quality, and the accuracy of the subsequent model in processing the image with lower image quality is improved.
Therefore, in this embodiment, after the quality score of each training sample is calculated based on the feature information obtained by the first network model, the training weight of each training sample may be adjusted according to the quality score of each training sample.
The training weight of the training sample may be adjusted in such a manner that the training weight of the training sample with a higher quality score may be appropriately decreased, and the training weight of the training sample with a lower quality score may be appropriately increased. For example, assuming the quality score of a training sample is q, its training weight may be adjusted to 1/q.
After the training weights of the training samples are adjusted, the first network model may be trained continuously based on the adjusted training weights. Under the condition that the preset requirements are met, the identification model obtained based on the first network model can be obtained.
For example, the loss function of the first network model is determined to meet the preset requirement when convergence is not reduced any more, or the model iteration number is determined to meet the preset requirement when the set maximum number of times is reached, or the model training duration is determined to meet the preset requirement when the set maximum duration is reached.
Referring to fig. 5, in this embodiment, the step of calculating the quality score of the training sample based on the feature information of the training sample obtained by the first network model may be implemented by:
and S1021, aiming at each training sample, calculating to obtain an intra-class distance according to the characteristic information of the training sample and the training samples belonging to the same class, and calculating to obtain an inter-class distance according to the characteristic information of the training sample and the training samples belonging to different classes.
S1022, based on the intra-class distance and the inter-class distance of each training sample, the quality score of each training sample is obtained.
In this embodiment, when the quality of the image is determined to be high or low, if the distance between the image and the image belonging to the same class is shorter and the distance between the image and the image belonging to a different class is longer, it indicates that the quality of the image is higher. Therefore, the distance between the image and the image of the same type and the distance between the images of different types can be combined to obtain the quality score of the image.
Therefore, the training samples are divided into a plurality of groups, each group of training samples has the same ID, that is, the same group of training samples can be used as the same category. For each training sample, the intra-class distance of the training sample can be obtained based on the feature information of the training sample and the feature information of other training samples with the same ID to which the training sample belongs. In addition, for the training sample, one training sample can be selected from each ID in the training samples belonging to other IDs, and the distance between the training sample and each training sample of other IDs on the feature information is calculated respectively, so as to obtain the inter-class distance.
The intra-class distance may be an average value of distances between the plurality of intra-class feature information obtained by calculation, and the inter-class distance may be an average value of distances between the plurality of inter-class feature information. The distance between the characteristic information in the classes and between the classes can be calculated by adopting a Wasserstein distance calculation formula.
Assuming that X, Y, F represents the training sample set, the sample labels of the training samples, and the extracted feature information, the data set in the form of a triplet is as follows:
Figure BDA0003338328890000134
wherein n represents the number of training samples, the training samples belonging to the same ID form a positive sample pair, and the training samples belonging to different IDs form a negative sample pair. For each training sample, the intra-class and inter-class distributions may be formed as follows:
Figure BDA0003338328890000131
Figure BDA0003338328890000132
wherein < f (x)i),f(xj) The cosine distance representing the feature information of two training samples. Calculating the similarity distribution Wasserstein distance between the classes as follows:
Figure BDA0003338328890000133
further, in order to convert the intra-class distance and the inter-class distance into a standard uniform quality score, the quality scores of all training samples may be normalized to a [0, 100] interval, and the quality scores may be expressed as follows:
Figure BDA0003338328890000141
wherein:
Figure BDA0003338328890000142
wherein:
Figure BDA0003338328890000143
in this way, the quality score of the training samples can be obtained based on the intra-class distance and the inter-class distance of each training sample.
In this embodiment, the quality score of the training sample may be calculated by using the above-described manner of calculating the intra-class distance and the inter-class distance in the model training stage, but the quality score of the image cannot be obtained by using this manner in the actual recognition stage. In this embodiment, the evaluation model may be obtained by pre-training, and the quality score of the image may be obtained by evaluating the evaluation model in the actual recognition stage.
Referring to fig. 6, in the present embodiment, the evaluation model can be obtained by pre-training in the following manner:
and S104, after the first network model is trained and iterated, training the constructed second network model by using the plurality of training samples based on the obtained quality scores of the training samples.
S105, after the training weight of each training sample is adjusted and the first network model is continuously trained to obtain an updated quality score, training the second network model based on the updated quality score until a preset requirement is met, and obtaining an evaluation model corresponding to the second network model.
Referring to fig. 4, in the embodiment, training of the evaluation model may be performed while training the recognition model, and the constructed second network model may be obtained by multiplexing the architecture of the first network model. Moreover, the second network model can be initialized by using the model parameters of the first network model, so that the knowledge migration effect is achieved, and the matching degree of the evaluation result of the second network model and the re-identification of the first network model is improved.
After each iteration of the first network model, calculating the quality score of the training sample based on the characteristic information of the training sample obtained by the first network model, wherein the quality score can be used as a sample label of the training sample input into the second network model.
Wherein the training samples input to the second network model may be scaled to a size of 192 × 288 width. On the basis of the first network model, the second network model can set a dropout layer before the last fully connected layer to prevent the model from being over-fitted. The loss function of the second network model can be SmoothL1Loss function, as follows:
Figure BDA0003338328890000151
and training the second network model by using the quality score obtained based on the characteristic information obtained by the first network model as a label. After the training weight of the training sample is adjusted by the first network model for each iterative training, the quality score calculated based on the characteristic information obtained by the iterative training can be used as a new sample label of the training sample input in the next iterative training of the second network model.
If the training of the second network model meets the preset requirement, for example, the loss function of the second network model is converged and is not reduced, or the training times of the second network model reach the set maximum times, or the training duration reaches the set maximum duration, it can be determined that the training meets the preset requirement. An evaluation model trained from the second network model may be obtained.
Therefore, the effect of mutually promoting feature learning of the first network model and the second network model can be achieved in an alternating iterative training mode, and the effect of the first network model and the second network model is improved.
In this embodiment, the recognition model and the evaluation model can be obtained by pre-training in the above manner, and the obtained recognition model and evaluation model can be used for processing the image to be recognized in the actual recognition stage.
In this embodiment, in the actual recognition stage, after obtaining the feature information of the image to be recognized based on the recognition model and obtaining the quality score of the image to be recognized based on the evaluation model, the similarity between the feature information of the image to be recognized and each preset image may be calculated.
In order to remove noise characteristics and reduce processing load, principal component dimension reduction operation can be performed before similarity is calculated. In detail, the feature information of the image to be recognized may be subjected to dimension reduction processing to obtain feature information of a set dimension.
The image to be recognized can be extracted to obtain 1024-dimensional feature information after being processed by the recognition model, the set dimension can be 512 dimensions, namely, the 1024-dimensional feature information is reduced to 512 dimensions through PCA.
During dimension reduction, dimension reduction can be realized by using a mean value mean and a dimension reduction transformation matrix A, wherein the mean value mean and the dimension reduction transformation matrix A can be obtained by using a training sample and a trained recognition model in advance. In detail, after the recognition model is trained, each training sample is traversed through the recognition model to extract feature information, and the mean value mean and the dimensionality reduction transformation matrix A are calculated based on the extracted feature information.
Based on the mean, the dimensionality reduction transformation matrix A and the 1024-dimensional feature information, 512-dimensional feature information can be obtained through the following formula:
X512=(X1024-mean)*AT
therefore, the noise characteristics can be removed and the recognition effect can be improved by performing principal component dimension reduction processing on the image to be recognized.
In this embodiment, the attribute recognition processing on the image to be recognized mainly aims at the recognition of a certain target object, such as a non-motor vehicle, a motor vehicle, the back of a person riding, and the like, and these target objects generally do not occupy the whole layout of the image and are generally located in a certain relatively fixed area of the image.
It can be seen that the target object is often only in a certain partial region of the image, and if the identification processing is performed on all regions of the image, the identification effect is not improved under the condition that a long time and many resources are required.
In view of the above, in the present embodiment, referring to fig. 7, when extracting the feature information of the image to be recognized, the following may be implemented:
s2011, determining a region where the target object is located in each preset image, and performing pixel occlusion processing on the region, which does not include the target object, of each preset image.
S2012, the image to be identified is executed with the pixel shielding processing of the same area with each preset image.
And S2013, obtaining the image to be recognized after the pixel shielding processing and the characteristic information of each preset image by using the recognition model obtained through training.
In this embodiment, in the actual recognition stage, pixel occlusion processing may be performed on each preset image and the image to be recognized, so that the recognition model may focus more on extracting feature information related to the target object.
The area where the target object is located in the preset image may be determined first, for example, if the targeted target object is a non-motor vehicle, the area where the target object is located is a lower portion of the image, and if the targeted target object is a riding person, the area where the target object is located is an upper portion of the image. In addition, pixel occlusion processing is performed on a region not including the target object in each preset image, and for example, if the target object is a non-motor vehicle, pixel occlusion processing may be performed on an upper portion of the image. The upper part of each preset image may be masked with gray pixels.
Likewise, the same pixel occlusion processing as that of each preset image may be performed on the image to be recognized. When the target object is a non-motor vehicle, then the pixel occlusion processed image may be as shown in FIG. 8.
And extracting the characteristic information of the image to be recognized and each preset image after pixel shielding processing by using a recognition model obtained by pre-training. Therefore, when the identification model extracts the feature information, the identification model can pay more attention to the processing of the feature information related to the target object, and the accuracy of feature extraction is improved.
On the basis, the attribute recognition result of the image to be recognized can be obtained based on the quality score of the image to be recognized obtained by the evaluation model and the similarity between the obtained image to be recognized and each preset image.
In this embodiment, the quality score of the image to be recognized is taken into consideration, and the judgment standard of the attribute recognition can be adaptively adjusted based on the quality of the image to be recognized.
Optionally, referring to fig. 9, in the step of obtaining the attribute identification result based on the quality score and the similarity, the following steps may be implemented:
s2041, setting a discrimination threshold value based on the quality score of the image to be recognized.
And S2042, obtaining an attribute identification result of the image to be identified according to the discrimination threshold and each similarity.
Generally, images with higher image quality are more likely to be correctly recognized, and images with lower image quality are more likely to be erroneously recognized. Therefore, if the same determination criteria are set for images of different image qualities, the recall rate of the image with higher image quality may be reduced, and the accuracy rate of the image with lower image quality may be reduced.
Therefore, in this embodiment, the discrimination threshold may be set based on the quality score of the image to be recognized, for example, the discrimination threshold may be set lower when the quality score is higher, and the discrimination threshold may be set higher when the quality score is lower.
On the basis, the attribute recognition result of the image to be recognized is obtained based on the set discrimination threshold and each similarity, that is, whether the image to be recognized has the target object which is the same as each preset image or not is judged.
In this embodiment, the discrimination threshold is set based on the quality score of the image, and the discrimination standard can be adaptively adjusted based on the image quality condition, so that the recall rate of the high-quality image and the discrimination accuracy of the low-quality image can be improved.
In detail, when determining the attribute identification result, each similarity may be compared with a discrimination threshold, and if the similarity is smaller than the discrimination threshold, it is determined that the image to be identified does not have the same target object as the preset image corresponding to the similarity. And if the similarity is greater than or equal to the discrimination threshold, determining that the image to be recognized has the same target object as the preset image corresponding to the similarity.
According to the image recognition scheme provided by the embodiment, the image quality recognition task is introduced into the target re-recognition task, so that the discrimination standard of attribute recognition can be adaptively adjusted according to images of different quality conditions, and the recognition recall rate of high-quality images and the recognition accuracy of low-quality images are improved.
In the model training stage, in order to improve the learning strength of the model on low-quality images, quality scores are obtained based on the obtained characteristic information of the training samples after each iteration of the model, and the training weights of the training samples are adjusted based on the quality scores, so that the training effect of the recognition model is continuously improved.
On the basis, an evaluation model capable of evaluating the image quality is introduced, and the evaluation model is trained by calculating the quality score through the characteristic information extracted by the recognition model. And the effect of mutual promotion and promotion is achieved through the alternate iterative training of the recognition model and the evaluation model. And the trained evaluation model can be used for subsequent evaluation judgment of image quality.
And PCA dimension reduction processing is carried out on the feature information extracted by the recognition model, so that the extracted noise features can be removed, and the recognition effect is improved.
Referring to fig. 10, a schematic diagram of exemplary components of an electronic device according to an embodiment of the present application is provided, where the electronic device may be the processing device shown in fig. 1. The electronic device may include a storage medium 110, a processor 120, an image recognition apparatus 130, and a communication interface 140. In this embodiment, the storage medium 110 and the processor 120 are both located in the electronic device and are separately disposed. However, it should be understood that the storage medium 110 may be separate from the electronic device and may be accessed by the processor 120 through a bus interface. Alternatively, the storage medium 110 may be integrated into the processor 120, for example, may be a cache and/or general purpose registers.
The image recognition device 130 may be understood as the electronic device or the processor 120 of the electronic device, or may be understood as a software functional module that is independent of the electronic device or the processor 120 and implements the image recognition method under the control of the electronic device.
As shown in fig. 11, the image recognition apparatus 130 may include an information obtaining module 131, a score obtaining module 132, a calculating module 133, and a result obtaining module 134. The functions of the functional modules of the image recognition apparatus 130 are described in detail below.
The information obtaining module 131 is configured to obtain feature information of an image to be recognized by using the recognition model obtained through training;
it is understood that the information obtaining module 131 may be configured to perform the step S201, and for detailed implementation of the information obtaining module 131, reference may be made to the content related to the step S201.
A score obtaining module 132, configured to obtain a quality score of the image to be recognized by using the trained evaluation model;
it is understood that the score obtaining module 132 may be configured to perform the step S202, and for a detailed implementation of the score obtaining module 132, reference may be made to the content related to the step S202.
A calculating module 133, configured to calculate a similarity between the feature information of the image to be identified and feature information of each of a plurality of preset images;
it is understood that the calculating module 133 can be used to execute the step S203, and for the detailed implementation of the calculating module 133, reference can be made to the contents related to the step S203.
The result obtaining module 134 is configured to obtain a re-recognition result of the image to be recognized based on the obtained multiple similarities, and obtain an attribute recognition result of the image to be recognized based on the quality score of the image to be recognized and each of the similarities, where the attribute recognition result represents whether the image to be recognized has a target object that is the same as a preset image corresponding to each of the similarities.
It is understood that the result obtaining module 134 may be configured to perform the step S204, and for a detailed implementation of the result obtaining module 134, reference may be made to the content related to the step S204.
In a possible implementation, the image recognition device 130 further includes a training module, and the training module is configured to:
training a constructed first network model by using a plurality of collected training samples, and obtaining characteristic information of each training sample obtained by the first network model;
calculating the quality score of each training sample based on the characteristic information of a plurality of training samples;
and adjusting the training weight of each training sample according to the quality score of each training sample, and continuing training the first network model until a preset requirement is met, so as to obtain an identification model corresponding to the first network model.
In a possible implementation, the training module may be further configured to:
after the first network model is trained and iterated, training a constructed second network model by using the plurality of training samples based on the obtained quality scores of the training samples;
after the training weight of each training sample is adjusted and the first network model is continuously trained to obtain an updated quality score, the second network model is trained based on the updated quality score until a preset requirement is met, and an evaluation model corresponding to the second network model is obtained.
In a possible implementation, the training module may be specifically configured to:
aiming at each training sample, calculating to obtain an intra-class distance according to the characteristic information of the training sample and the training samples belonging to the same class, and calculating to obtain an inter-class distance according to the characteristic information of the training samples and the training samples belonging to different classes;
and obtaining the quality score of each training sample based on the intra-class distance and the inter-class distance of each training sample.
In a possible implementation, the image recognition device 130 may further include a dimension reduction module, and the dimension reduction module may be configured to:
and performing dimension reduction processing on the feature information of the image to be identified to obtain feature information with set dimension.
In a possible implementation manner, the calculating module 133 may specifically be configured to:
determining the area where the target object is located in each preset image, and performing pixel shielding processing on the area, which does not contain the target object, of each preset image;
executing pixel shielding processing of the same area of each preset image on the image to be recognized;
and obtaining the characteristic information of the image to be recognized and each preset image after the pixel shielding processing by using the recognition model obtained by training.
In a possible implementation manner, the result obtaining module may be specifically configured to:
setting a discrimination threshold value based on the quality score of the image to be recognized;
and obtaining an attribute identification result of the image to be identified according to the discrimination threshold and each similarity.
In a possible implementation manner, the result obtaining module may be specifically configured to:
comparing each similarity with the discrimination threshold, and if the similarity is smaller than the discrimination threshold, determining that the image to be recognized does not have a target object which is the same as a preset image corresponding to the similarity;
and if the similarity is greater than or equal to the discrimination threshold, determining that the image to be recognized has the same target object as the preset image corresponding to the similarity.
The description of the processing flow of each module in the device and the interaction flow between the modules may refer to the related description in the above method embodiments, and will not be described in detail here.
Further, an embodiment of the present application also provides a computer-readable storage medium, where machine-executable instructions are stored in the computer-readable storage medium, and when the machine-executable instructions are executed, the image recognition method provided by the foregoing embodiment is implemented.
In particular, the computer readable storage medium can be a general storage medium, such as a removable disk, a hard disk, etc., and the computer program on the computer readable storage medium can execute the above-mentioned image recognition method when being executed. With regard to the processes involved when the executable instructions in the computer-readable storage medium are executed, reference may be made to the related descriptions in the above method embodiments, which are not described in detail herein.
In summary, according to the image recognition method, the image recognition device, the electronic device, and the readable storage medium provided by the embodiments of the present application, the feature information of the image to be recognized is obtained by using the recognition model obtained through training, and the quality score of the image to be recognized is obtained by using the evaluation model obtained through training. And then calculating the similarity between the feature information of the image to be recognized and the feature information of each preset image in the plurality of preset images, obtaining a re-recognition result of the image to be recognized based on the obtained plurality of similarities, and obtaining an attribute recognition result of the image to be recognized based on the quality score and each similarity of the image to be recognized, wherein the attribute recognition result can represent whether the image to be recognized has the same target object as the preset image corresponding to each similarity. According to the scheme, in the process of determining the attribute identification result, the image quality score is taken into consideration, the attribute identification result can be judged adaptively based on the image quality score, and the accuracy and the recall rate of the attribute identification result are improved.
The above description is only for the specific embodiments of the present application, but the scope of the present application is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present application should be covered within the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (11)

1. An image recognition method, characterized in that the method comprises:
obtaining characteristic information of an image to be recognized by using a recognition model obtained by training;
obtaining the quality score of the image to be recognized by using the trained evaluation model;
calculating the similarity between the feature information of the image to be identified and the feature information of each preset image in a plurality of preset images;
obtaining a re-recognition result of the image to be recognized based on the obtained multiple similarities, and obtaining an attribute recognition result of the image to be recognized based on the quality score of the image to be recognized and each similarity, wherein the attribute recognition result represents whether the image to be recognized has a target object which is the same as a preset image corresponding to each similarity.
2. The image recognition method of claim 1, further comprising the step of pre-training the obtained recognition model, the step comprising:
training a constructed first network model by using a plurality of collected training samples, and obtaining characteristic information of each training sample obtained by the first network model;
calculating the quality score of each training sample based on the characteristic information of a plurality of training samples;
and adjusting the training weight of each training sample according to the quality score of each training sample, and continuing training the first network model until a preset requirement is met, so as to obtain an identification model corresponding to the first network model.
3. The image recognition method of claim 2, further comprising the step of pre-training an evaluation model, the step comprising:
after the first network model is trained and iterated, training a constructed second network model by using the plurality of training samples based on the obtained quality scores of the training samples;
after the training weight of each training sample is adjusted and the first network model is continuously trained to obtain an updated quality score, the second network model is trained based on the updated quality score until a preset requirement is met, and an evaluation model corresponding to the second network model is obtained.
4. The image recognition method according to claim 2, wherein the step of calculating the quality score of each of the training samples based on the feature information of the plurality of training samples includes:
aiming at each training sample, calculating to obtain an intra-class distance according to the characteristic information of the training sample and the training samples belonging to the same class, and calculating to obtain an inter-class distance according to the characteristic information of the training samples and the training samples belonging to different classes;
and obtaining the quality score of each training sample based on the intra-class distance and the inter-class distance of each training sample.
5. The image recognition method according to claim 1, wherein before the step of calculating the similarity between the feature information of the image to be recognized and the feature information of each of a plurality of preset images, the method further comprises:
and performing dimension reduction processing on the feature information of the image to be identified to obtain feature information with set dimension.
6. The image recognition method according to claim 1, wherein the step of obtaining feature information of the image to be recognized by using the trained recognition model includes:
determining the area where the target object is located in each preset image, and performing pixel shielding processing on the area, which does not contain the target object, of each preset image;
executing pixel shielding processing of the same area of each preset image on the image to be recognized;
and obtaining the characteristic information of the image to be recognized and each preset image after the pixel shielding processing by using the recognition model obtained by training.
7. The image recognition method according to claim 1, wherein the step of obtaining the attribute recognition result of the image to be recognized based on the quality score of the image to be recognized and each of the similarities comprises:
setting a discrimination threshold value based on the quality score of the image to be recognized;
and obtaining an attribute identification result of the image to be identified according to the discrimination threshold and each similarity.
8. The image recognition method according to claim 7, wherein the step of obtaining the attribute recognition result of the image to be recognized according to the discrimination threshold and each of the similarities comprises:
comparing each similarity with the discrimination threshold, and if the similarity is smaller than the discrimination threshold, determining that the image to be recognized does not have a target object which is the same as a preset image corresponding to the similarity;
and if the similarity is greater than or equal to the discrimination threshold, determining that the image to be recognized has the same target object as the preset image corresponding to the similarity.
9. An image recognition apparatus, characterized in that the apparatus comprises:
the information acquisition module is used for acquiring the characteristic information of the image to be recognized by using the recognition model obtained by training;
the score obtaining module is used for obtaining the quality score of the image to be recognized by utilizing the trained evaluation model;
the calculation module is used for calculating the similarity between the feature information of the image to be identified and the feature information of each preset image in a plurality of preset images;
and the result obtaining module is used for obtaining a re-identification result of the image to be identified based on the obtained multiple similarities, and obtaining an attribute identification result of the image to be identified based on the quality score of the image to be identified and each similarity, wherein the attribute identification result represents whether the image to be identified has a target object which is the same as a preset image corresponding to each similarity.
10. An electronic device, comprising: a memory for storing a computer program and a processor for executing the computer program to implement the image recognition method of any one of claims 1 to 8.
11. A computer-readable storage medium for storing a computer program, wherein the computer program, when executed by a processor, implements the image recognition method according to any one of claims 1 to 8.
CN202111300902.3A 2021-11-04 2021-11-04 Image recognition method and device, electronic equipment and readable storage medium Pending CN114037886A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111300902.3A CN114037886A (en) 2021-11-04 2021-11-04 Image recognition method and device, electronic equipment and readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111300902.3A CN114037886A (en) 2021-11-04 2021-11-04 Image recognition method and device, electronic equipment and readable storage medium

Publications (1)

Publication Number Publication Date
CN114037886A true CN114037886A (en) 2022-02-11

Family

ID=80142791

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111300902.3A Pending CN114037886A (en) 2021-11-04 2021-11-04 Image recognition method and device, electronic equipment and readable storage medium

Country Status (1)

Country Link
CN (1) CN114037886A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115620098A (en) * 2022-12-20 2023-01-17 中电信数字城市科技有限公司 Evaluation method and system of cross-camera pedestrian tracking algorithm and electronic equipment
CN116416440A (en) * 2023-01-13 2023-07-11 北京百度网讯科技有限公司 Target recognition method, model training method, device, medium and electronic equipment

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115620098A (en) * 2022-12-20 2023-01-17 中电信数字城市科技有限公司 Evaluation method and system of cross-camera pedestrian tracking algorithm and electronic equipment
CN115620098B (en) * 2022-12-20 2023-03-10 中电信数字城市科技有限公司 Evaluation method and system of cross-camera pedestrian tracking algorithm and electronic equipment
CN116416440A (en) * 2023-01-13 2023-07-11 北京百度网讯科技有限公司 Target recognition method, model training method, device, medium and electronic equipment
CN116416440B (en) * 2023-01-13 2024-02-06 北京百度网讯科技有限公司 Target recognition method, model training method, device, medium and electronic equipment

Similar Documents

Publication Publication Date Title
CN110619369B (en) Fine-grained image classification method based on feature pyramid and global average pooling
CN113378632B (en) Pseudo-label optimization-based unsupervised domain adaptive pedestrian re-identification method
CN112734775B (en) Image labeling, image semantic segmentation and model training methods and devices
CN105144239B (en) Image processing apparatus, image processing method
CN111639564B (en) Video pedestrian re-identification method based on multi-attention heterogeneous network
CN104915643A (en) Deep-learning-based pedestrian re-identification method
CN109165658B (en) Strong negative sample underwater target detection method based on fast-RCNN
WO2021042505A1 (en) Note generation method and apparatus based on character recognition technology, and computer device
CN114037886A (en) Image recognition method and device, electronic equipment and readable storage medium
CN112766218B (en) Cross-domain pedestrian re-recognition method and device based on asymmetric combined teaching network
CN111046971A (en) Image recognition method, device, equipment and computer readable storage medium
CN108921172B (en) Image processing device and method based on support vector machine
CN111723852A (en) Robust training method for target detection network
CN115393606A (en) Method and system for image recognition
Zhao et al. A robust color-independent text detection method from complex videos
CN114092818B (en) Semantic segmentation method and device, electronic equipment and storage medium
CN111898473B (en) Driver state real-time monitoring method based on deep learning
CN111553202B (en) Training method, detection method and device for neural network for living body detection
CN112613341A (en) Training method and device, fingerprint identification method and device, and electronic device
Hsu et al. Facial expression recognition using Hough forest
CN113515633B (en) Screen browsing scene classification method based on computer vision
CN113903015B (en) Lane line identification method and device
CN111832626B (en) Image recognition classification method, device and computer readable storage medium
CN113255665B (en) Target text extraction method and system
CN113537315B (en) Easily-distinguished image selection method based on clustering information entropy

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination