CN114283471A

CN114283471A - Multi-modal sequencing optimization method for heterogeneous face image re-recognition

Info

Publication number: CN114283471A
Application number: CN202111542116.4A
Authority: CN
Inventors: 韩镇; 胡辉; 温佳兴; 王中元
Original assignee: Wuhan University WHU
Current assignee: Wuhan University WHU
Priority date: 2021-12-16
Filing date: 2021-12-16
Publication date: 2022-04-05
Anticipated expiration: 2041-12-16
Also published as: CN114283471B

Abstract

The invention discloses a multi-modal sequencing optimization method for heterogeneous face image re-recognition, which comprises the following steps: carrying out modal conversion on the face image; different types of query modality combinations are formed by the face images before and after modality conversion, forward query and reverse query are carried out on the different query modality combinations, and corresponding query sequencing results and query credibility are obtained; respectively calculating a first multi-mode fusion similarity distance and a second multi-mode fusion similarity distance; performing weighted fusion on the first multi-modal fusion similarity distance and the second multi-modal fusion similarity distance to obtain a third multi-modal fusion similarity distance; according to the third multi-mode fusion similarity distance, the final face query similarity ranking result is obtained, the face image complementarity between different modes can be effectively utilized, and the query ranking result with the correct face ranking closer to the front is obtained through the multi-mode fusion similarity distance, so that the accuracy of the heterogeneous face re-identification is improved.

Description

Multi-modal sequencing optimization method for heterogeneous face image re-recognition

Technical Field

The invention relates to the technical field of digital images, in particular to a multi-modal sequencing optimization method for heterogeneous face image re-recognition.

Background

Face re-recognition is one of leading research directions in the field of video monitoring, and the technology is different from the face recognition in that: in the face recognition in video monitoring, a low-quality face shot by a certain camera is generally inquired in a large-scale high-quality face library so as to confirm the identity of the face shot by the certain camera, namely who the person is; the face re-recognition is to query a low-quality face shot by a certain camera in a low-quality face set shot by another camera, so as to confirm the identity relationship between faces shot by different cameras, that is, whether a person under one camera and a person under another camera are the same person, but the person is unclear. With the increasing diversification of monitoring device types, faces shot by different cameras may be heterogeneous, such as visible light faces and infrared faces, so that the accuracy of heterogeneous face re-recognition is significantly reduced. Therefore, how to improve the accuracy of the heterogeneous face re-recognition becomes a new problem to be solved in the field of video monitoring.

Disclosure of Invention

The invention provides a multi-modal sequencing optimization method for heterogeneous face re-recognition, aiming at solving the problem of low accuracy of heterogeneous face re-recognition in the prior art, and aiming at optimizing and adjusting the sequencing of a face set to be inquired so that the sequencing of a correct face is earlier, thereby improving the accuracy of heterogeneous face re-recognition.

The technical problem of the invention is mainly solved by the following technical scheme:

the method discloses a multi-modal sequencing optimization method for heterogeneous face image re-recognition, which comprises the following steps:

s1: performing image translation on two heterogeneous face images under two cameras to respectively obtain face images after modal conversion corresponding to the two heterogeneous face images;

s2: the face images before and after mode conversion form different types of query mode combinations, forward query is carried out on the different query mode combinations to obtain corresponding forward query sequencing results and forward query credibility, the forward query represents that the face under a first camera is used as a query image, and the face under a second camera is used as a queried image;

s3: carrying out reverse query on different query mode combinations to obtain corresponding reverse query sequencing results and reverse query credibility, wherein the reverse query represents that the face under the second camera is used as a query image, and the face under the first camera is used as a queried image;

s4: calculating a first multi-mode fusion similarity distance according to a forward query sorting result, a forward query reliability, a reverse query sorting result and a reverse query reliability of different query mode combinations, and utilizing the complementarity of face images among different modes in a linear fusion mode;

s5: calculating a second multi-mode fusion similarity distance according to a forward query sorting result and a reverse query sorting result of different query mode combinations, and utilizing the complementarity of the face images among different modes in a nonlinear fusion mode;

s6: performing weighted fusion on the first multi-modal fusion similarity distance and the second multi-modal fusion similarity distance to obtain a third multi-modal fusion similarity distance;

s7: and performing image re-identification according to the third multi-modal fusion similarity distance to obtain a final face query similarity sequencing result, and obtaining an image re-identification result based on the final face query similarity sequencing result.

In one embodiment, step S1 includes:

s1.1: dividing two heterogeneous face data sets under two cameras into training data sets

And

and test data set X^AAnd Y^BWherein X and Y respectively represent two cameras, A and B respectively represent two different image modalities obtained by shooting under the two cameras,

m is the ID number of the face image X under the X camera, and M is the number of the face image data sets under the X camera;

n is the ID number of the face image Y under the Y camera, and N is the number of the face image data sets under the Y camera;

s1.2: using training data sets

And

training an image translation network;

s1.3: test data set X^AAnd Y^BInputting a trained image translation network to obtain a corresponding data set X after mode conversion^BAnd Y^AWherein

In one embodiment, step S2 includes:

s2.1: defining the forward query as that the face under the X camera is used as a query image, the face under the Y camera is used as a queried image, Q is equal to {1,2,3} represents the type of the combination of query modes, and for the forward query, Q is equal to 1 and represents that the query image comes from X^AThe image to be inquired comes from Y^B(ii) a Q2 indicates that the query image is from X^AThe image to be inquired comes from Y^A(ii) a Q-3 indicates that the query image is from X^BIs examinedQuery image from Y^BWhen the forward query mode combination type is Q, taking the face image with the ID number m under the X camera as a query image, and taking all N face images under the Y camera as queried images;

s2.2: inputting N +1 face images into a face recognition feature extraction network to obtain respective corresponding face feature vectors, sequentially calculating feature similarity distances between the feature vectors of 1 query image and the feature vectors of N queried images to obtain N cosine similarity distances, and sequencing the N queried images from large to small according to the similarity to obtain a forward query sequencing result:

wherein the content of the first and second substances,

the ID number corresponding to the j-th ranking of the inquired image under the Y camera in the similarity sorting is represented, and j belongs to {1,2, 3.. and N };

s2.3: defining the top k images in the obtained forward query ranking result Rank (Q, m) as forward query top-k neighbors, and expressing the top k images as the forward query top-k neighbors

Wherein k is an adjustable parameter, the value range of k is 1 to MIN (M, N), and the forward query bidirectional top-k neighbor is obtained based on the forward query top-k neighbor:

wherein R (Q, m, k) represents a forward query bi-directional top-k neighbor,

representing that the image of the sequence number corresponding to the top-k neighbor of the forward query is used as a new query image, and the sequence number of the top k in the similarity ranking of all query data set images;

s2.4: and (3) obtaining the forward query credibility based on forward query bidirectional top-k neighbor R (Q, m, k):

wherein, W (Q, m, k) represents the reliability of forward query, | R (Q, m, k) | represents the image quantity of forward query bidirectional top-k neighbor R (Q, m, k).

In one embodiment, step S3 includes:

s3.1: defining the reverse query as that the face under the Y camera is used as a query image, the face under the X camera is used as a queried image, Q is belonged to {1,2,3} represents the type of the combination of query modes, and for the reverse query, Q is equal to 1 to represent that the query image comes from Y^BThe image to be queried comes from X^A(ii) a Q2 indicates that the query image is from Y^AThe image to be queried comes from X^A(ii) a Q-3 indicates that the query image is from Y^BThe image to be queried comes from X^B(ii) a When the reverse query mode combination type is Q, taking the face image with the ID number n under the Y camera as a query image, and taking all M face images under the X camera as queried images;

s3.2: inputting the M +1 face images into a face recognition feature extraction network to obtain respective corresponding face feature vectors; then calculating cosine similarity distances between the feature vectors of the 1 query image and the feature vectors of the M queried images in sequence to obtain M cosine similarity distances; and finally, sequencing the M queried images according to the similarity from large to small to obtain a reverse query sequencing result:

wherein

The method is used for representing the ID number corresponding to the ith ranking of the inquired image under the Y camera in the similarity ranking, i belongs to {1,2,3；

S3.3: defining the top k images ranked in the reverse query ranking result Rank (Q, n) obtained in the step S3.2 as reverse query top-k neighbors, and expressing the images as the reverse query top-k neighbors

Obtaining reverse query bidirectional top-k neighbors based on the reverse query top-k neighbors:

wherein R (Q, n, k) represents a reverse query bi-directional top-k neighbor,

taking an image with a corresponding sequence number adjacent to the reverse query top-k as a new query image, and ranking the sequence numbers of the top k in the similarity ranking of all query data set images;

s3.4: and (4) obtaining reverse query credibility based on the reverse query bidirectional top-k neighbor R (Q, n, k) obtained in the step (S3.3):

wherein, W (Q, n, k) represents the reliability of the reverse query, | R (Q, n, k) | represents the image quantity of the reverse query bidirectional top-k neighbor R (Q, n, k).

In one embodiment, the first multi-modal fusion similarity distance in S4 is calculated as:

wherein

Represents satisfaction

When j isThe value of (1), namely the sequence number of the face image with the ID number of n in Rank (Q, m);

represents satisfaction

The value of time i, namely the sequence number of the face image with the ID number m in Rank (Q, n), D₁(m, n) represents a first multi-mode fusion similarity distance between the face image with the ID number m under the X camera and the face image with the ID number n under the Y camera.

In one embodiment, the second multi-modal fusion similarity distance in S5 is calculated as:

wherein the function of F is to Q1, 2,3 respectively

Set of constructs and

the constituent set, the elements of the r-th order in ascending order, i.e. for three

And three

Respectively sorting from small to large, taking the value of the sorted r, wherein r is an adjustable parameter and has the same value range as Q, and D₂(m, n) represents a second multi-modal fusion similarity distance between the face image with the ID number m under the X camera and the face image with the ID number n under the Y camera.

In one embodiment, the calculation formula of the third multi-modal fusion similarity distance in step S6 is as follows:

D₃＝λ*D₁+(1–λ)*D₂

wherein, λ is weight coefficient, the value range is 0 to 1, D₁For the first multimodal fusion of similarity distances, D₂For a second multimodal fusion of similarity distances, D₃The third multi-modal fusion similarity distance.

In one embodiment, step S7 includes:

sequentially calculating a third multi-mode fusion similarity distance between the face image with the ID number m under the X camera and the N face images with the ID numbers N under the Y camera being 1 and 2 … N;

sequencing the obtained N similarity distances from small to large to obtain a final face query similarity sequencing result of the face images with the ID numbers of m under the X camera and all the N face images under the Y camera, and taking the final face query similarity sequencing result as an image re-identification result

One or more technical solutions in the embodiments of the present application have at least one or more of the following technical effects:

the invention provides a multimode sequencing optimization method facing to heterogeneous face image re-recognition, which is characterized in that two heterogeneous face images under two cameras are subjected to image translation to obtain face images after corresponding modal conversion; different types of query modality combinations are formed by the face images before and after modality conversion, forward query is carried out on the different query modality combinations, and a corresponding forward query sequencing result and forward query credibility are obtained; carrying out reverse query on different query mode combinations to obtain corresponding reverse query sequencing results and reverse query credibility; calculating a first multi-mode fusion similarity distance according to a forward query sorting result, a forward query reliability, a reverse query sorting result and a reverse query reliability of different query mode combinations; calculating a second multi-mode fusion similarity distance according to the forward query sorting result and the reverse query sorting result of different query mode combinations; performing weighted fusion on the first multi-modal fusion similarity distance and the second multi-modal fusion similarity distance to obtain a third multi-modal fusion similarity distance; and obtaining a final face query similarity ranking result according to the third multi-mode fusion similarity distance. Because the face images before and after mode conversion form different types of query mode combinations, the method can effectively utilize the complementarity of the face images among different modes, consider the importance of strengthening the original sequencing of a certain mode, and obtain a query sequencing result of which the correct face sequencing is more advanced by fusing the similarity distance in a multi-mode manner, thereby improving the accuracy of heterogeneous face re-identification.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.

FIG. 1 is a flowchart of a multimodal ranking optimization method for heterogeneous face image re-recognition according to the present invention.

Detailed Description

The invention provides a multimode sequencing optimization method for heterogeneous face image re-recognition, so that the technical effect of improving the recognition accuracy of face image re-recognition is achieved.

In order to achieve the technical effects, the invention has the main conception that:

performing image translation on two heterogeneous face images under two cameras to obtain a face image after corresponding mode conversion; different types of query modality combinations are formed by the face images before and after modality conversion, forward query is carried out on the different query modality combinations, and a corresponding forward query sequencing result and forward query credibility are obtained; carrying out reverse query on different query mode combinations to obtain corresponding reverse query sequencing results and reverse query credibility; calculating a first multi-mode fusion similarity distance according to a forward query sorting result, a forward query reliability, a reverse query sorting result and a reverse query reliability of different query mode combinations; calculating a second multi-mode fusion similarity distance according to the forward query sorting result and the reverse query sorting result of different query mode combinations; performing weighted fusion on the first multi-modal fusion similarity distance and the second multi-modal fusion similarity distance to obtain a third multi-modal fusion similarity distance; and obtaining a final face query similarity ranking result according to the third multi-mode fusion similarity distance.

According to the method, different types of query modality combinations are formed by the face images before and after modality conversion, the complementarity of the face images among different modalities can be effectively utilized, and the query ranking result of which the correct face ranking is more advanced is obtained through multi-modality fusion of similarity distances, so that the accuracy of heterogeneous face re-identification is improved.

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Example one

The embodiment provides a multi-modal sequencing optimization method for heterogeneous face image re-recognition, which comprises the following steps:

s5: calculating a second multi-mode fusion similarity distance according to a forward query sorting result and a reverse query sorting result of different query mode combinations, and utilizing the complementarity of the face images among different modes in a linear fusion mode;

In one embodiment, S1 specifically includes:

And

n is the ID number of the face image Y under the Y camera, and N is the Y shotNumber of under-head face image datasets;

s1.2: using training data sets

And

training an image translation network;

In the specific implementation process, the two heterogeneous face images are a visible light low-resolution face image and a near-infrared low-resolution face image which are respectively shot by the two cameras. In this embodiment, two heterogeneous face data sets are divided into training data sets

And

and test data set X^AAnd Y^BWhere X and Y represent two cameras, respectively, and a and B represent two different image modalities, respectively: the visible light and the near-infrared light,

m is the ID number of the visible light face image X under the X camera, M is the number of the visible light face image data sets under the X camera, and the value of M is 60 in the embodiment;

n is the ID number of the near-infrared face image Y under the Y camera, N is the number of the near-infrared face image data sets under the Y camera, and the value of N is 60 in this embodiment.

In this embodiment, the training data set is first employed

And

and training the image translation network to realize the conversion function of visible light and near infrared modes. Then, test data set X^AAnd Y^BInputting a trained image translation network to obtain a corresponding data set X after mode conversion^BAnd Y^AThat is, visible face image X under X camera^AConversion into near-infrared face image X^BNear-infrared human face image Y under Y camera^BConverting into visible light human face image Y^AWherein

The adopted image translation network can be an existing image translation network, and a CycleGan network is adopted in the embodiment.

In one embodiment, S2 specifically includes:

s2.1: defining the forward query as that the face under the X camera is used as a query image, the face under the Y camera is used as a queried image, Q is equal to {1,2,3} represents the type of the combination of query modes, and for the forward query, Q is equal to 1 and represents that the query image comes from X^AThe image to be inquired comes from Y^B(ii) a Q2 indicates that the query image is from X^AThe image to be inquired comes from Y^A(ii) a Q-3 indicates that the query image is from X^BThe image to be inquired comes from Y^BWhen the forward query mode combination type is Q, taking the face image with the ID number m under the X camera as a query image, and taking all N face images under the Y camera as queried images;

wherein the content of the first and second substances,

wherein R (Q, m, k) represents a forward query bi-directional top-k neighbor,

representing that the image of the corresponding sequence number of the top-k neighbor of the forward query is taken as a new query image in the similarity ranking of all the query data set imagesThe serial number of the top k;

Specifically, the top-k neighbors of the forward query represent the query image m in the query dataset in the forward query, and the IDs of the top k queried images in the similarity ranking of all queried datasets_YThe number (ID number of the image of the queried data set under the Y camera) is a forward query one-way top-k neighbor. Forward query bidirectional top-k neighbors are the first k queried image IDs found when top-k is forward queried_YNumber, with them as new query image, looking up in reverse all query datasets for the top k IDs in similarity order_XNumber (query dataset image under X camera ID number), if at this time m is just the first k IDs_XThe values are as follows. Then the ID will be_YAnd adding the query image m into forward query bidirectional top-k neighbors.

It is shown that when the query combination is Q, the query image m is ranked with the jth ID in the similarity ranking of the queried data set_YNumber (n). For example:

is the query image m with the highest value in the similarity ranking_YNumber (n). Therefore, the temperature of the molten metal is controlled,

represents the ID_YNumber image as new query image, top k ID in similarity ranking of all query dataset images_XNumber (n).

In a specific implementation process, a value of k may be selected according to an actual situation, where the value of k is 10 in this embodiment, and the adopted face recognition feature extraction network is an existing face recognition feature extraction network, and the embodiment adopts an ArcFace network.

In one embodiment, step S3 includes:

wherein

Representing the ID number corresponding to the ith ranking of the inquired image under the Y camera in the similarity ranking, wherein i belongs to {1,2, 3.. and M };

wherein R (Q, n, k) represents a reverse query bi-directional top-k neighbor,

where | R (Q, n, k) | represents the number of images in the reverse query bi-directional top-k neighbor R (Q, n, k).

Specifically, the top-k neighbor of the reverse query represents the query image n in the query dataset in the reverse query, and the IDs of the top k queried images in the similarity ranking of all queried datasets_xNumber (queried dataset image ID number under X camera). Which is a reverse query one-way top-k neighbor. Forward query bidirectional top-k neighbors are the first k queried image IDs found when top-k is forward queried_xNumber, with them as new query image, against finding the top k IDs of similarity ordering of all query datasets_YNumber (query dataset image under Y camera ID number), if at this time m is just the first k IDs_YThe values are as follows. Then the ID will be_XAnd adding the query image m into forward query bidirectional top-k neighbors.

Showing that when the query combination is Q, the image n is queried, and the ID of the ith is ranked in the similarity ordering of the queried data set_XNumber (n). Such as:

that is, the query image n is the query image with the highest value in the similarity ranking_XNumber (n). Therefore, the temperature of the molten metal is controlled,

represents the ID_XNumber image as new query image, top k ID in similarity ranking of all query dataset images_YNumber (n).

In a specific implementation process, the adopted face recognition feature extraction network is the existing face recognition feature extraction network, and the embodiment adopts an ArcFace network.

wherein

Represents satisfaction

The value of the time j is the sequence number of the face image with the ID number n in Rank (Q, m);

represents satisfaction

The value of time i, namely the sequence number of the face image with the ID number m in Rank (Q, n), D₁(m, n) represents the first multi-mode fusion between the face image with the ID number m under the X camera and the face image with the ID number n under the Y cameraThe similarity distance.

wherein the function of F is to Q1, 2,3 respectively

Set of constructs and

And three

In the specific implementation process, the value range of the adjustable parameter r is the same as that of Q, and in this embodiment, r takes a value of 2.

D₃＝λ*D₁+(1–λ)*D₂

The weight coefficient can be selected according to the actual situation, and λ is 0.2 in this embodiment.

Regarding the three multi-modal fusion similarity distances involved in the invention, the first multi-modal fusion similarity distance is a linear similarity distance fusion result, and the complementarity of all multi-modal original orderings is fully considered. The second multi-modal fusion similarity distance is a non-linear similarity distance fusion result, and the importance of the original ordering of a certain modality is strengthened while the complementarity among the multiple modalities is considered. The two multi-modal fusion similarity distances have mutual advantages, so that the final multi-modal fusion similarity distance obtained through weighting can improve the re-recognition accuracy.

The related contents of the image translation CycleGan network and the face recognition feature extraction network ArcFace are listed in the documents listed in the actual examination reference.

In one embodiment, step S7 includes:

and sequencing the obtained N similarity distances from small to large to obtain a final face query similarity sequencing result of all N face images under the X camera and the Y camera under the face image query with the ID number of m under the X camera, and taking the final face query similarity sequencing result as an image re-identification result.

The invention provides a multi-modal sequencing optimization algorithm for heterogeneous face image re-recognition, which utilizes an image translation network to perform modal conversion on original heterogeneous face images under two cameras, forms different types of query modal combinations with the face images before and after the modal conversion, can effectively utilize the complementarity of the face images among different modalities, and obtains a query sequencing result of which the correct face sequencing is more forward by fusing similarity distances in a multi-modal manner, thereby improving the accuracy of the heterogeneous face re-recognition.

It should be understood that parts of the specification not set forth in detail are well within the prior art.

It should be understood that the above description of the preferred embodiments is given for clarity and not for any purpose of limitation, and that various changes, substitutions and alterations can be made herein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims

1. A multi-modal sequencing optimization method for heterogeneous face image re-recognition is characterized by comprising the following steps:

2. The multi-modal ranking optimization method for heterogeneous face image re-recognition according to claim 1, wherein the step S1 includes:

And

s1.2: using training data sets

And

training an image translation network;

3. The multi-modal ranking optimization method for heterogeneous face image re-recognition according to claim 1, wherein the step S2 includes:

wherein the content of the first and second substances,

wherein R (Q, m, k) represents a forward query bi-directional top-k neighbor,

4. The multi-modal ranking optimization method for heterogeneous face image re-recognition according to claim 1, wherein the step S3 includes:

wherein

wherein R (Q, n, k) represents a reverse query bi-directional top-k neighbor,

taking an image of the corresponding sequence number of the top-k neighbor which is inquired reversely asNew query images, the sequence number of the top k in the similarity ranking of all query data set images;

5. The multi-modal ranking optimization method for the re-recognition of the heterogeneous face images as claimed in claim 1, wherein the first multi-modal fusion similarity distance in S4 is calculated as:

wherein

Represents satisfaction

represents satisfaction

6. The multi-modal ranking optimization method for the re-recognition of the heterogeneous face images as claimed in claim 1, wherein the second multi-modal fusion similarity distance in S5 is calculated by the following formula:

wherein the function of F is to Q1, 2,3 respectively

Set of constructs and

And three

7. The multi-modal ranking optimization method for the face image recognitions of the heterogeneous object as claimed in claim 1, wherein the third multi-modal fusion similarity distance in step S6 is calculated by the following formula:

D₃＝λ*D₁+(1–λ)*D₂

8. The multi-modal ranking optimization method for heterogeneous face image re-recognition according to claim 1, wherein the step S7 includes: