CN114239753A

CN114239753A - Migratable image identification method and device

Info

Publication number: CN114239753A
Application number: CN202210164555.4A
Authority: CN
Inventors: 张凯; 王帆; 韩忠义; 房体品
Original assignee: Shandong Liju Robot Technology Co ltd
Current assignee: Shandong Liju Robot Technology Co ltd
Priority date: 2022-02-23
Filing date: 2022-02-23
Publication date: 2022-03-25
Anticipated expiration: 2042-02-23
Also published as: CN114239753B

Abstract

The invention relates to a transferable image identification method and a device, which relate to the technical field of image identification, and the method comprises the following steps: determining an image type of an input image recognition model; when the input image is a source domain image with a label, enabling the source domain image with the label to pass through a feature extractor and a category predictor, and determining cross entropy loss; when the input image is an unlabeled target domain image, enabling the target domain image to pass through a feature extractor and a domain discriminator and simultaneously pass through the feature extractor and a category predictor; determining the countermeasure loss according to the output result of the domain discriminator and the similarity of the center points of the target domain image and each source domain image; determining information maximization loss according to an output result of the class predictor; the image recognition model is optimized according to cross entropy loss, counter loss and information maximization loss. Through the technical scheme, the performance of target image recognition can be effectively improved, the marks for target image recognition are effectively reduced, and manpower and material resources are greatly reduced.

Description

Migratable image identification method and device

Technical Field

The invention relates to the technical field of image recognition, in particular to a transferable image recognition method and device.

Background

Migratory image recognition refers to a technology for guiding the current label-free image to accurately recognize by using labeled images with similar distribution but different distribution when image recognition is performed. In the big data era, it has become a benign development trend to analyze the value information implied in the data to guide people's life and production. However, in a real scene, it is very easy to collect a large amount of label-free data, and it is very time-consuming and labor-consuming to perform accurate manual labeling on some tasks, such as accurate labeling of a large-scale sensor image. Under the limitation, the task of current image recognition can be guided by the existing labeled image and by utilizing the similarity between the labeled image and the distribution of the image to be recognized. For example, sensors a and B capture images of the same task, and the image of sensor a (generally referred to as a source domain image) has been labeled, and since the image types captured in sensors a and B are the same, sensor B can perform efficient image recognition by virtue of the data already labeled by sensor a, without performing extensive labeling of the data obtained by sensor B. However, due to the difference of the internal structures and the like of the sensor a and the sensor B, the image data collected by the two devices have distribution difference, and therefore, how to accurately identify the image collected by the sensor B (generally referred to as a target domain image) in the presence of the image distribution difference is a difficult point in the current transferable image identification problem.

The traditional method comprises the following steps: accurate labeling is performed on data collected by a sensor, a model is retrained and used for an image recognition task, but the process generates expensive manpower waste, and accurate manual labeling on all collected data is extremely unrealistic in a big data background.

Disclosure of Invention

In order to overcome the problems in the related art, the invention provides a transferable image identification method and a device.

According to a first aspect of the embodiments of the present invention, there is provided a migratable image recognition method, including:

determining an image type of an input image recognition model, wherein the image type comprises a labeled source domain image and an unlabeled target domain image, and the image recognition model comprises a feature extractor, a category predictor and a domain discriminator;

when the input image is a labeled source domain image, enabling the labeled source domain image to pass through the feature extractor and the category predictor, and determining cross entropy loss;

when the input image is an unlabeled target domain image, enabling the target domain image to pass through the feature extractor and the domain discriminator and simultaneously pass through the feature extractor and the class predictor;

determining the countermeasure loss according to the output result of the domain discriminator and the similarity of the central points of the target domain image and each source domain image;

determining information maximization loss according to an output result of the category predictor;

optimizing the image recognition model according to the cross entropy loss, the immunity loss and the information maximization loss.

In one embodiment, preferably, the method further comprises:

acquiring a target image to be identified;

and identifying the target image according to the image identification model so as to determine the category of the target image.

In one embodiment, the cross entropy loss is preferably calculated using the following first formula:

wherein the content of the first and second substances,D _srepresenting the image of all the source domain(s),L _CE(D _s) Representing the cross-entropy loss of all source-domain images, E represents the periodThe physician can watch the disease,x ^sfeatures that represent the image of the source domain,y _sa label class representing the source domain image,

the indication function is represented by a representation of,σdenotes the softmax function, log denotes the log function,

and K represents the total class number of the image.

In one embodiment, preferably, determining the countermeasure loss according to the output result of the domain discriminator and the similarity of the target domain image and the center point of each source domain image includes:

determining an initial countermeasure loss from the output result of the domain discriminator, wherein the initial countermeasure loss is calculated using a second calculation formula:

wherein the content of the first and second substances,L _{d_initial}(D _i) Representing the initial countermeasure loss for the ith target field image,D _irepresenting the image of the i-th target domain,x _trepresenting the features of the ith target domain image,D(G(x _t) Represents the output of the ith target domain image through the feature extractor and then through the domain discriminator, which is equivalent to a binary classification problem,d _ia binary label representing the ith target domain image, for indicating whether the target domain image belongs to the source domain or the target domain, and maximizingL _{d_initial}(D _i) Enabling the domain discriminator to carry out feature level alignment;

determining the clustering center of each category of image through the features of all source domain images output by the feature extractor, wherein the clustering center is calculated by adopting the following third calculation formula;

wherein the content of the first and second substances,c _kthe cluster center of the image representing the kth class,x ^srepresenting the features of the source domain image S,y _sa label class representing the source domain image S,D _srepresenting the image of all the source domain(s),

the indication function is represented by a representation of,G(x _s) The feature which represents the feature of the source domain image S and is output after passing through the feature extractor;

calculating the similarity between each target domain image and the cluster center closest to the target domain image, and taking the similarity as the initial weight of the target domain image to resist loss, wherein the weight is calculated by adopting the following fourth calculation formula;

wherein the content of the first and second substances,w _tthe weight corresponding to the initial countermeasure loss representing the ith target domain image,D _fthe cosine similarity is shown in the figure,c _kthe cluster center of the image representing the kth class,x _tfeatures representing an ith target domain image;

calculating a countermeasure loss corresponding to the target domain image according to the initial countermeasure loss and a weight corresponding thereto, wherein the countermeasure loss is calculated using a fifth calculation formula:

wherein the content of the first and second substances,L _d(D _i) Representing the countermeasure loss of the ith target domain image,w _tthe weight corresponding to the initial countermeasure loss representing the ith target domain image,x _trepresenting the features of the ith target domain image,D(G(x _t) Represents the output result of the ith target domain image after passing through the feature extractor and the domain discriminator,d _ia binary label representing the ith target domain image.

In one embodiment, preferably, determining the information maximization loss according to the output result of the class predictor comprises:

calculating the entropy minimization loss and the class average entropy maximization loss of the target domain image according to the output result of the class predictor;

calculating the information maximization loss according to the entropy minimization loss and the class average entropy maximization loss;

wherein the entropy minimization loss is calculated using the following sixth calculation formula:

wherein the content of the first and second substances,L _ent(D _t) Representing the loss of minimum entropy in the said entropy,D _trepresenting all of the target domain images,σdenotes the softmax function, H (G: (G) (G))x ^t) Represents the output result of the target domain image after passing through the feature extractor and the label predictor, K represents the total class number of the image, E represents the expectation,x ^trepresenting a target domain image;

calculating the class average entropy maximization loss by adopting the following seventh calculation formula:

wherein the content of the first and second substances,L _div(D _t) Represents the average entropy maximization penalty of the class,

represents the average probability vector after softmax of all samples of class k;

wherein the information maximization loss is calculated by adopting the following eighth calculation formula:

L _{IM =} L _{ent +} L _div

L _IMindicating that the information is maximally lost,L _ent(D _t) Representing the loss of minimum entropy in the said entropy,L _div(D _t) Representing the class average entropy maximization loss.

In one embodiment, preferably, optimizing the image recognition model according to the cross-entropy loss, the immunity loss, and the information maximization loss comprises:

determining a model final loss according to the cross entropy loss, the immunity loss and the information maximization loss, wherein the model final loss is calculated by adopting a ninth calculation formula as follows:

L = L _CE(D _s) - L _d(D _t) + βL _IM

l represents the final loss of the model,L _CE(D _s) Representing the cross-entropy loss as a function of time,L _d(D _t) The loss of confrontation is expressed as,L _IMindicating that the information is maximally lost,βindicating the equalization parameters.

According to a second aspect of the embodiments of the present invention, there is provided a migratable image recognition apparatus including:

the image recognition system comprises a first determination module, a second determination module and a third determination module, wherein the first determination module is used for determining the image type of an input image recognition model, the image type comprises a labeled source domain image and an unlabeled target domain image, and the image recognition model comprises a feature extractor, a category predictor and a domain discriminator;

the first processing module is used for enabling the source domain image with the label to pass through the feature extractor and the class predictor and determining cross entropy loss when the input image is the source domain image with the label;

the second processing module is used for enabling the target domain image to pass through the feature extractor and the domain discriminator and simultaneously pass through the feature extractor and the class predictor when the input image is the target domain image without a label;

the second determining module is used for determining the countermeasure loss according to the output result of the domain discriminator and the similarity of the central points of the target domain image and each source domain image;

the third determining module is used for determining information maximization loss according to the output result of the category predictor;

an optimization module to optimize the image recognition model according to the cross entropy loss, the immunity loss, and the information maximization loss.

In one embodiment, preferably, the apparatus further comprises:

the acquisition module is used for acquiring a target image to be identified;

and the identification module is used for identifying the target image according to the image identification model so as to determine the category of the target image.

wherein the content of the first and second substances,D _srepresenting the image of all the source domain(s),L _CE(D _s) Represents the cross-entropy loss of all source domain images, E represents the expectation,x ^sfeatures that represent the image of the source domain,y _sto representThe label category of the source domain image,

and K represents the total class number of the image.

In one embodiment, preferably, the second determining module is configured to:

wherein the content of the first and second substances,L _d(D _i) Representing the countermeasure loss of the ith target domain image,w _tthe weight corresponding to the initial countermeasure loss representing the ith target domain image,x _tindicates the ith itemThe characteristics of the domain-marked image,D(G(x _t) Represents the output result of the ith target domain image after passing through the feature extractor and the domain discriminator,d _ia binary label representing an ith target domain image;

the third determining module is to:

L _{IM =} L _{ent +} L _div

L _IMindicating that the information is maximally lost,L _ent(D _t) Representing the loss of minimum entropy in the said entropy,L _div(D _t) Representing the class average entropy maximization loss;

the optimization module is configured to:

L = L _CE(D _s) - L _d(D _t) + βL _IM

According to a third aspect of embodiments of the present invention, there is provided a computer readable storage medium having stored thereon computer instructions which, when executed by a processor, implement the steps of the method of any one of the first aspect.

The technical scheme provided by the embodiment of the invention can have the following beneficial effects:

compared with the prior art, the technical scheme of the invention not only utilizes the weighted countermeasure loss to optimize the feature extractor module, but also utilizes the cross entropy loss and the information maximization loss to optimize the label predictor module, thereby effectively improving the performance of target image identification, effectively reducing the labels for the target image identification and greatly reducing manpower and material resources.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention, as claimed.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description, serve to explain the principles of the invention.

FIG. 1 is a flow diagram illustrating a migratable image recognition method in accordance with an exemplary embodiment.

Fig. 2 is a schematic diagram illustrating a migratable image recognition method in accordance with an exemplary embodiment.

FIG. 3 is a flow chart illustrating another migratable image recognition method in accordance with an exemplary embodiment.

Fig. 4 is a block diagram illustrating a migratable image recognition device in accordance with an exemplary embodiment.

Fig. 5 is a block diagram illustrating a migratable image recognition device in accordance with an exemplary embodiment.

Detailed Description

Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present invention. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the invention, as detailed in the appended claims.

Fig. 1 is a flow chart illustrating a method of migratable image recognition, as shown in fig. 1, in accordance with an exemplary embodiment, the method comprising:

step S101, determining an image type of an input image recognition model, wherein the image type comprises a labeled source domain image and a non-labeled target domain image, and the image recognition model comprises a feature extractor, a category predictor and a domain discriminator as shown in FIG. 2;

step S102, when the input image is a source domain image with a label, enabling the source domain image with the label to pass through the feature extractor and the category predictor, and determining cross entropy loss;

step S103, when the input image is a label-free target domain image, enabling the target domain image to pass through the feature extractor and the domain discriminator and simultaneously pass through the feature extractor and the class predictor;

step S104, determining the countermeasure loss according to the output result of the domain discriminator and the similarity of the center points of the target domain image and each source domain image;

step S105, determining information maximization loss according to an output result of the category predictor;

and S106, optimizing the image recognition model according to the cross entropy loss, the immunity loss and the information maximization loss.

As shown in fig. 3, in one embodiment, preferably, the method further comprises:

step S301, acquiring a target image to be identified;

step S302, the target image is identified according to the image identification model so as to determine the category of the target image.

wherein the content of the first and second substances,D _srepresenting the image of all the source domain(s),L _CE(D _s) Represents the cross-entropy loss of all source domain images, E represents the expectation,x ^sfeatures that represent the image of the source domain,y _sa label class representing the source domain image,

and K represents the total class number of the image.

wherein the content of the first and second substances,L _{d_initial}(D _i) Representing the initial countermeasure loss for the ith target field image,D _irepresenting the image of the i-th target domain,x _trepresenting the features of the ith target domain image,D(G(x _t) Represents the output of the ith target domain image through the feature extractor and then through the domain discriminator, which is equivalent to a binary classification problem,d _irepresents the ithBinary label of target domain image for indicating whether the target domain image belongs to source domain or target domain, and maximizingL _{d_initial}(D _i) Enabling the domain discriminator to carry out feature level alignment;

L _{IM =} L _{ent +} L _div

L = L _CE(D _s) - L _d(D _t) + βL _IM

As shown in fig. 4, according to a second aspect of the embodiments of the present invention, there is provided a migratable image recognition apparatus including:

a first determining module 41, configured to determine an image type of an input image recognition model, where the image type includes a labeled source domain image and an unlabeled target domain image, and the image recognition model includes a feature extractor, a class predictor, and a domain discriminator;

a first processing module 42, configured to, when the input image is a labeled source domain image, pass the labeled source domain image through the feature extractor and the class predictor, and determine a cross entropy loss;

a second processing module 43, configured to, when the input image is an unlabeled target domain image, make the target domain image pass through the feature extractor and the domain discriminator, and pass through the feature extractor and the class predictor at the same time;

a second determining module 44, configured to determine a countermeasure loss according to an output result of the domain discriminator and a similarity between the target domain image and a center point of each source domain image;

a third determining module 45, configured to determine information maximization loss according to an output result of the class predictor;

an optimization module 46 for optimizing the image recognition model based on the cross-entropy loss, the immunity loss, and the information maximization loss.

As shown in fig. 5, in one embodiment, preferably, the apparatus further comprises:

an obtaining module 51, configured to obtain a target image to be identified;

and the identifying module 52 is configured to identify the target image according to the image identification model to determine the category of the target image.

and K represents the total class number of the image.

In one embodiment, preferably, the second determining module is configured to:

wherein the content of the first and second substances,L _{d_initial}(D _i) Representing the initial countermeasure loss for the ith target field image,D _irepresenting the image of the i-th target domain,x _trepresenting the features of the ith target domain image,D(G(x _t) Representing the output result of the i-th target domain image passing through the feature extractor and then passing through the domain discriminatorThe output of the classifier is equivalent to a binary classification problem,d _ia binary label representing the ith target domain image, for indicating whether the target domain image belongs to the source domain or the target domain, and maximizingL _{d_initial}(D _i) Enabling the domain discriminator to carry out feature level alignment;

wherein the content of the first and second substances,L _d(D _i) Representing the countermeasure loss of the ith target domain image,w _tthe weight corresponding to the initial countermeasure loss representing the ith target domain image,x _trepresenting the features of the ith target domain image,D(G(x _t) Represents the output result of the ith target domain image after passing through the feature extractor and the domain discriminator,d _ia binary label representing an ith target domain image;

the third determining module is to:

L _{IM =} L _{ent +} L _div

the optimization module is configured to:

L = L _CE(D _s) - L _d(D _t) + βL _IM

According to a fourth aspect of the embodiments of the present invention, there is provided a migratable image recognition system, the system including:

a processor;

a memory for storing processor-executable instructions;

wherein the processor is configured to:

It is further understood that the term "plurality" means two or more, and other terms are analogous. "and/or" describes the association relationship of the associated objects, meaning that there may be three relationships, e.g., a and/or B, which may mean: a exists alone, A and B exist simultaneously, and B exists alone. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship. The singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.

It will be further understood that the terms "first," "second," and the like are used to describe various information and that such information should not be limited by these terms. These terms are only used to distinguish one type of information from another and do not denote a particular order or importance. Indeed, the terms "first," "second," and the like are fully interchangeable. For example, first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope of the present invention.

It is further to be understood that while operations are depicted in the drawings in a particular order, this is not to be understood as requiring that such operations be performed in the particular order shown or in serial order, or that all illustrated operations be performed, to achieve desirable results. In certain environments, multitasking and parallel processing may be advantageous.

Other embodiments of the invention will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the invention and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the invention being indicated by the following claims.

It will be understood that the invention is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the invention is limited only by the appended claims.

Claims

1. A migratory image recognition method, the method comprising:

2. The method of claim 1, further comprising:

acquiring a target image to be identified;

3. The method of claim 1, wherein the cross-entropy loss is calculated using a first formula:

and K represents the total class number of the image.

4. The method of claim 1, wherein determining the countermeasure loss according to the output of the domain discriminator and the similarity of the target domain image to the center point of each source domain image comprises:

5. The method of claim 1, wherein determining an information maximization penalty based on the output of the class predictor comprises:

L _{IM =} L _{ent +} L _div

6. The method of claim 1, wherein optimizing the image recognition model based on the cross-entropy loss, the immunity loss, and the information maximization loss comprises:

L = L _CE(D _s) - L _d(D _t) + βL _IM

7. A migrateable image recognition apparatus, comprising:

8. The apparatus of claim 7, further comprising:

the acquisition module is used for acquiring a target image to be identified;

9. The apparatus of claim 7, wherein the cross entropy loss is calculated using a first formula:

and K represents the total class number of the image.

10. The apparatus of claim 7, wherein the second determining module is configured to:

wherein the content of the first and second substances,L _{d_initial}(D _i) Representing the initial countermeasure loss for the ith target field image,D _irepresenting the image of the i-th target domain,x _trepresenting the features of the ith target domain image,D(G(x _t) Means that the ith target domain image passes through the feature extractor and then undergoes domain judgmentThe output result of the discriminator, the output result of the domain discriminator is equivalent to a binary classification problem,d _ia binary label representing the ith target domain image, for indicating whether the target domain image belongs to the source domain or the target domain, and maximizingL _{d_initial}(D _i) Enabling the domain discriminator to carry out feature level alignment;

wherein the content of the first and second substances,w _tthe weight corresponding to the initial countermeasure loss representing the ith target domain image,D _fthe cosine similarity is shown in the figure,c _kin a cluster of images representing the kth classThe heart is provided with a plurality of heart-shaped grooves,x _tfeatures representing an ith target domain image;

the third determining module is to:

wherein the content of the first and second substances,L _ent(D _t) Representing the loss of minimum entropy in the said entropy,D _trepresenting all of the target domain images,σdenotes the softmax function, H (G: (G) (G))x ^t) Represents the output result of the target domain image after passing through the feature extractor and the label predictor, K represents the total class number of the image,e indicates that it is desired to,x ^trepresenting a target domain image;

，

L _{IM =} L _{ent +} L _div

the optimization module is configured to:

L = L _CE(D _s) - L _d(D _t) + βL _IM

l represents the final loss of the model,L _CE(D _s) To representThe cross-entropy loss is a loss of,L _d(D _t) The loss of confrontation is expressed as,L _IMindicating that the information is maximally lost,βindicating the equalization parameters.