CN114283471A - Multi-modal sequencing optimization method for heterogeneous face image re-recognition - Google Patents

Multi-modal sequencing optimization method for heterogeneous face image re-recognition Download PDF

Info

Publication number
CN114283471A
CN114283471A CN202111542116.4A CN202111542116A CN114283471A CN 114283471 A CN114283471 A CN 114283471A CN 202111542116 A CN202111542116 A CN 202111542116A CN 114283471 A CN114283471 A CN 114283471A
Authority
CN
China
Prior art keywords
query
image
face
camera
under
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111542116.4A
Other languages
Chinese (zh)
Other versions
CN114283471B (en
Inventor
韩镇
胡辉
温佳兴
王中元
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan University WHU
Original Assignee
Wuhan University WHU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan University WHU filed Critical Wuhan University WHU
Priority to CN202111542116.4A priority Critical patent/CN114283471B/en
Publication of CN114283471A publication Critical patent/CN114283471A/en
Application granted granted Critical
Publication of CN114283471B publication Critical patent/CN114283471B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Image Analysis (AREA)
  • Collating Specific Patterns (AREA)

Abstract

The invention discloses a multi-modal sequencing optimization method for heterogeneous face image re-recognition, which comprises the following steps: carrying out modal conversion on the face image; different types of query modality combinations are formed by the face images before and after modality conversion, forward query and reverse query are carried out on the different query modality combinations, and corresponding query sequencing results and query credibility are obtained; respectively calculating a first multi-mode fusion similarity distance and a second multi-mode fusion similarity distance; performing weighted fusion on the first multi-modal fusion similarity distance and the second multi-modal fusion similarity distance to obtain a third multi-modal fusion similarity distance; according to the third multi-mode fusion similarity distance, the final face query similarity ranking result is obtained, the face image complementarity between different modes can be effectively utilized, and the query ranking result with the correct face ranking closer to the front is obtained through the multi-mode fusion similarity distance, so that the accuracy of the heterogeneous face re-identification is improved.

Description

Multi-modal sequencing optimization method for heterogeneous face image re-recognition
Technical Field
The invention relates to the technical field of digital images, in particular to a multi-modal sequencing optimization method for heterogeneous face image re-recognition.
Background
Face re-recognition is one of leading research directions in the field of video monitoring, and the technology is different from the face recognition in that: in the face recognition in video monitoring, a low-quality face shot by a certain camera is generally inquired in a large-scale high-quality face library so as to confirm the identity of the face shot by the certain camera, namely who the person is; the face re-recognition is to query a low-quality face shot by a certain camera in a low-quality face set shot by another camera, so as to confirm the identity relationship between faces shot by different cameras, that is, whether a person under one camera and a person under another camera are the same person, but the person is unclear. With the increasing diversification of monitoring device types, faces shot by different cameras may be heterogeneous, such as visible light faces and infrared faces, so that the accuracy of heterogeneous face re-recognition is significantly reduced. Therefore, how to improve the accuracy of the heterogeneous face re-recognition becomes a new problem to be solved in the field of video monitoring.
Disclosure of Invention
The invention provides a multi-modal sequencing optimization method for heterogeneous face re-recognition, aiming at solving the problem of low accuracy of heterogeneous face re-recognition in the prior art, and aiming at optimizing and adjusting the sequencing of a face set to be inquired so that the sequencing of a correct face is earlier, thereby improving the accuracy of heterogeneous face re-recognition.
The technical problem of the invention is mainly solved by the following technical scheme:
the method discloses a multi-modal sequencing optimization method for heterogeneous face image re-recognition, which comprises the following steps:
s1: performing image translation on two heterogeneous face images under two cameras to respectively obtain face images after modal conversion corresponding to the two heterogeneous face images;
s2: the face images before and after mode conversion form different types of query mode combinations, forward query is carried out on the different query mode combinations to obtain corresponding forward query sequencing results and forward query credibility, the forward query represents that the face under a first camera is used as a query image, and the face under a second camera is used as a queried image;
s3: carrying out reverse query on different query mode combinations to obtain corresponding reverse query sequencing results and reverse query credibility, wherein the reverse query represents that the face under the second camera is used as a query image, and the face under the first camera is used as a queried image;
s4: calculating a first multi-mode fusion similarity distance according to a forward query sorting result, a forward query reliability, a reverse query sorting result and a reverse query reliability of different query mode combinations, and utilizing the complementarity of face images among different modes in a linear fusion mode;
s5: calculating a second multi-mode fusion similarity distance according to a forward query sorting result and a reverse query sorting result of different query mode combinations, and utilizing the complementarity of the face images among different modes in a nonlinear fusion mode;
s6: performing weighted fusion on the first multi-modal fusion similarity distance and the second multi-modal fusion similarity distance to obtain a third multi-modal fusion similarity distance;
s7: and performing image re-identification according to the third multi-modal fusion similarity distance to obtain a final face query similarity sequencing result, and obtaining an image re-identification result based on the final face query similarity sequencing result.
In one embodiment, step S1 includes:
s1.1: dividing two heterogeneous face data sets under two cameras into training data sets
Figure BDA0003414617720000021
And
Figure BDA0003414617720000022
and test data set XAAnd YBWherein X and Y respectively represent two cameras, A and B respectively represent two different image modalities obtained by shooting under the two cameras,
Figure BDA0003414617720000023
m is the ID number of the face image X under the X camera, and M is the number of the face image data sets under the X camera;
Figure BDA0003414617720000024
n is the ID number of the face image Y under the Y camera, and N is the number of the face image data sets under the Y camera;
s1.2: using training data sets
Figure BDA0003414617720000025
And
Figure BDA0003414617720000026
training an image translation network;
s1.3: test data set XAAnd YBInputting a trained image translation network to obtain a corresponding data set X after mode conversionBAnd YAWherein
Figure BDA0003414617720000027
Figure BDA0003414617720000028
In one embodiment, step S2 includes:
s2.1: defining the forward query as that the face under the X camera is used as a query image, the face under the Y camera is used as a queried image, Q is equal to {1,2,3} represents the type of the combination of query modes, and for the forward query, Q is equal to 1 and represents that the query image comes from XAThe image to be inquired comes from YB(ii) a Q2 indicates that the query image is from XAThe image to be inquired comes from YA(ii) a Q-3 indicates that the query image is from XBIs examinedQuery image from YBWhen the forward query mode combination type is Q, taking the face image with the ID number m under the X camera as a query image, and taking all N face images under the Y camera as queried images;
s2.2: inputting N +1 face images into a face recognition feature extraction network to obtain respective corresponding face feature vectors, sequentially calculating feature similarity distances between the feature vectors of 1 query image and the feature vectors of N queried images to obtain N cosine similarity distances, and sequencing the N queried images from large to small according to the similarity to obtain a forward query sequencing result:
Figure BDA0003414617720000031
wherein the content of the first and second substances,
Figure BDA0003414617720000032
the ID number corresponding to the j-th ranking of the inquired image under the Y camera in the similarity sorting is represented, and j belongs to {1,2, 3.. and N };
s2.3: defining the top k images in the obtained forward query ranking result Rank (Q, m) as forward query top-k neighbors, and expressing the top k images as the forward query top-k neighbors
Figure BDA0003414617720000033
Wherein k is an adjustable parameter, the value range of k is 1 to MIN (M, N), and the forward query bidirectional top-k neighbor is obtained based on the forward query top-k neighbor:
Figure BDA0003414617720000034
wherein R (Q, m, k) represents a forward query bi-directional top-k neighbor,
Figure BDA0003414617720000035
representing that the image of the sequence number corresponding to the top-k neighbor of the forward query is used as a new query image, and the sequence number of the top k in the similarity ranking of all query data set images;
s2.4: and (3) obtaining the forward query credibility based on forward query bidirectional top-k neighbor R (Q, m, k):
Figure BDA0003414617720000036
wherein, W (Q, m, k) represents the reliability of forward query, | R (Q, m, k) | represents the image quantity of forward query bidirectional top-k neighbor R (Q, m, k).
In one embodiment, step S3 includes:
s3.1: defining the reverse query as that the face under the Y camera is used as a query image, the face under the X camera is used as a queried image, Q is belonged to {1,2,3} represents the type of the combination of query modes, and for the reverse query, Q is equal to 1 to represent that the query image comes from YBThe image to be queried comes from XA(ii) a Q2 indicates that the query image is from YAThe image to be queried comes from XA(ii) a Q-3 indicates that the query image is from YBThe image to be queried comes from XB(ii) a When the reverse query mode combination type is Q, taking the face image with the ID number n under the Y camera as a query image, and taking all M face images under the X camera as queried images;
s3.2: inputting the M +1 face images into a face recognition feature extraction network to obtain respective corresponding face feature vectors; then calculating cosine similarity distances between the feature vectors of the 1 query image and the feature vectors of the M queried images in sequence to obtain M cosine similarity distances; and finally, sequencing the M queried images according to the similarity from large to small to obtain a reverse query sequencing result:
Figure BDA0003414617720000041
wherein
Figure BDA0003414617720000042
The method is used for representing the ID number corresponding to the ith ranking of the inquired image under the Y camera in the similarity ranking, i belongs to {1,2,3;
S3.3: defining the top k images ranked in the reverse query ranking result Rank (Q, n) obtained in the step S3.2 as reverse query top-k neighbors, and expressing the images as the reverse query top-k neighbors
Figure BDA0003414617720000043
Obtaining reverse query bidirectional top-k neighbors based on the reverse query top-k neighbors:
Figure BDA0003414617720000044
wherein R (Q, n, k) represents a reverse query bi-directional top-k neighbor,
Figure BDA0003414617720000045
taking an image with a corresponding sequence number adjacent to the reverse query top-k as a new query image, and ranking the sequence numbers of the top k in the similarity ranking of all query data set images;
s3.4: and (4) obtaining reverse query credibility based on the reverse query bidirectional top-k neighbor R (Q, n, k) obtained in the step (S3.3):
Figure BDA0003414617720000046
wherein, W (Q, n, k) represents the reliability of the reverse query, | R (Q, n, k) | represents the image quantity of the reverse query bidirectional top-k neighbor R (Q, n, k).
In one embodiment, the first multi-modal fusion similarity distance in S4 is calculated as:
Figure BDA0003414617720000047
wherein
Figure BDA0003414617720000048
Represents satisfaction
Figure BDA0003414617720000049
When j isThe value of (1), namely the sequence number of the face image with the ID number of n in Rank (Q, m);
Figure BDA00034146177200000410
represents satisfaction
Figure BDA00034146177200000411
The value of time i, namely the sequence number of the face image with the ID number m in Rank (Q, n), D1(m, n) represents a first multi-mode fusion similarity distance between the face image with the ID number m under the X camera and the face image with the ID number n under the Y camera.
In one embodiment, the second multi-modal fusion similarity distance in S5 is calculated as:
Figure BDA0003414617720000051
wherein the function of F is to Q1, 2,3 respectively
Figure BDA0003414617720000052
Set of constructs and
Figure BDA0003414617720000053
the constituent set, the elements of the r-th order in ascending order, i.e. for three
Figure BDA0003414617720000054
And three
Figure BDA0003414617720000055
Respectively sorting from small to large, taking the value of the sorted r, wherein r is an adjustable parameter and has the same value range as Q, and D2(m, n) represents a second multi-modal fusion similarity distance between the face image with the ID number m under the X camera and the face image with the ID number n under the Y camera.
In one embodiment, the calculation formula of the third multi-modal fusion similarity distance in step S6 is as follows:
D3=λ*D1+(1–λ)*D2
wherein, λ is weight coefficient, the value range is 0 to 1, D1For the first multimodal fusion of similarity distances, D2For a second multimodal fusion of similarity distances, D3The third multi-modal fusion similarity distance.
In one embodiment, step S7 includes:
sequentially calculating a third multi-mode fusion similarity distance between the face image with the ID number m under the X camera and the N face images with the ID numbers N under the Y camera being 1 and 2 … N;
sequencing the obtained N similarity distances from small to large to obtain a final face query similarity sequencing result of the face images with the ID numbers of m under the X camera and all the N face images under the Y camera, and taking the final face query similarity sequencing result as an image re-identification result
One or more technical solutions in the embodiments of the present application have at least one or more of the following technical effects:
the invention provides a multimode sequencing optimization method facing to heterogeneous face image re-recognition, which is characterized in that two heterogeneous face images under two cameras are subjected to image translation to obtain face images after corresponding modal conversion; different types of query modality combinations are formed by the face images before and after modality conversion, forward query is carried out on the different query modality combinations, and a corresponding forward query sequencing result and forward query credibility are obtained; carrying out reverse query on different query mode combinations to obtain corresponding reverse query sequencing results and reverse query credibility; calculating a first multi-mode fusion similarity distance according to a forward query sorting result, a forward query reliability, a reverse query sorting result and a reverse query reliability of different query mode combinations; calculating a second multi-mode fusion similarity distance according to the forward query sorting result and the reverse query sorting result of different query mode combinations; performing weighted fusion on the first multi-modal fusion similarity distance and the second multi-modal fusion similarity distance to obtain a third multi-modal fusion similarity distance; and obtaining a final face query similarity ranking result according to the third multi-mode fusion similarity distance. Because the face images before and after mode conversion form different types of query mode combinations, the method can effectively utilize the complementarity of the face images among different modes, consider the importance of strengthening the original sequencing of a certain mode, and obtain a query sequencing result of which the correct face sequencing is more advanced by fusing the similarity distance in a multi-mode manner, thereby improving the accuracy of heterogeneous face re-identification.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.
FIG. 1 is a flowchart of a multimodal ranking optimization method for heterogeneous face image re-recognition according to the present invention.
Detailed Description
The invention provides a multimode sequencing optimization method for heterogeneous face image re-recognition, so that the technical effect of improving the recognition accuracy of face image re-recognition is achieved.
In order to achieve the technical effects, the invention has the main conception that:
performing image translation on two heterogeneous face images under two cameras to obtain a face image after corresponding mode conversion; different types of query modality combinations are formed by the face images before and after modality conversion, forward query is carried out on the different query modality combinations, and a corresponding forward query sequencing result and forward query credibility are obtained; carrying out reverse query on different query mode combinations to obtain corresponding reverse query sequencing results and reverse query credibility; calculating a first multi-mode fusion similarity distance according to a forward query sorting result, a forward query reliability, a reverse query sorting result and a reverse query reliability of different query mode combinations; calculating a second multi-mode fusion similarity distance according to the forward query sorting result and the reverse query sorting result of different query mode combinations; performing weighted fusion on the first multi-modal fusion similarity distance and the second multi-modal fusion similarity distance to obtain a third multi-modal fusion similarity distance; and obtaining a final face query similarity ranking result according to the third multi-mode fusion similarity distance.
According to the method, different types of query modality combinations are formed by the face images before and after modality conversion, the complementarity of the face images among different modalities can be effectively utilized, and the query ranking result of which the correct face ranking is more advanced is obtained through multi-modality fusion of similarity distances, so that the accuracy of heterogeneous face re-identification is improved.
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Example one
The embodiment provides a multi-modal sequencing optimization method for heterogeneous face image re-recognition, which comprises the following steps:
s1: performing image translation on two heterogeneous face images under two cameras to respectively obtain face images after modal conversion corresponding to the two heterogeneous face images;
s2: the face images before and after mode conversion form different types of query mode combinations, forward query is carried out on the different query mode combinations to obtain corresponding forward query sequencing results and forward query credibility, the forward query represents that the face under a first camera is used as a query image, and the face under a second camera is used as a queried image;
s3: carrying out reverse query on different query mode combinations to obtain corresponding reverse query sequencing results and reverse query credibility, wherein the reverse query represents that the face under the second camera is used as a query image, and the face under the first camera is used as a queried image;
s4: calculating a first multi-mode fusion similarity distance according to a forward query sorting result, a forward query reliability, a reverse query sorting result and a reverse query reliability of different query mode combinations, and utilizing the complementarity of face images among different modes in a linear fusion mode;
s5: calculating a second multi-mode fusion similarity distance according to a forward query sorting result and a reverse query sorting result of different query mode combinations, and utilizing the complementarity of the face images among different modes in a linear fusion mode;
s6: performing weighted fusion on the first multi-modal fusion similarity distance and the second multi-modal fusion similarity distance to obtain a third multi-modal fusion similarity distance;
s7: and performing image re-identification according to the third multi-modal fusion similarity distance to obtain a final face query similarity sequencing result, and obtaining an image re-identification result based on the final face query similarity sequencing result.
In one embodiment, S1 specifically includes:
s1.1: dividing two heterogeneous face data sets under two cameras into training data sets
Figure BDA0003414617720000071
And
Figure BDA0003414617720000072
and test data set XAAnd YBWherein X and Y respectively represent two cameras, A and B respectively represent two different image modalities obtained by shooting under the two cameras,
Figure BDA0003414617720000081
m is the ID number of the face image X under the X camera, and M is the number of the face image data sets under the X camera;
Figure BDA0003414617720000082
n is the ID number of the face image Y under the Y camera, and N is the Y shotNumber of under-head face image datasets;
s1.2: using training data sets
Figure BDA0003414617720000083
And
Figure BDA0003414617720000084
training an image translation network;
s1.3: test data set XAAnd YBInputting a trained image translation network to obtain a corresponding data set X after mode conversionBAnd YAWherein
Figure BDA0003414617720000085
Figure BDA0003414617720000086
In the specific implementation process, the two heterogeneous face images are a visible light low-resolution face image and a near-infrared low-resolution face image which are respectively shot by the two cameras. In this embodiment, two heterogeneous face data sets are divided into training data sets
Figure BDA0003414617720000087
And
Figure BDA0003414617720000088
and test data set XAAnd YBWhere X and Y represent two cameras, respectively, and a and B represent two different image modalities, respectively: the visible light and the near-infrared light,
Figure BDA0003414617720000089
Figure BDA00034146177200000810
m is the ID number of the visible light face image X under the X camera, M is the number of the visible light face image data sets under the X camera, and the value of M is 60 in the embodiment;
Figure BDA00034146177200000811
Figure BDA00034146177200000812
n is the ID number of the near-infrared face image Y under the Y camera, N is the number of the near-infrared face image data sets under the Y camera, and the value of N is 60 in this embodiment.
In this embodiment, the training data set is first employed
Figure BDA00034146177200000813
And
Figure BDA00034146177200000814
and training the image translation network to realize the conversion function of visible light and near infrared modes. Then, test data set XAAnd YBInputting a trained image translation network to obtain a corresponding data set X after mode conversionBAnd YAThat is, visible face image X under X cameraAConversion into near-infrared face image XBNear-infrared human face image Y under Y cameraBConverting into visible light human face image YAWherein
Figure BDA00034146177200000815
Figure BDA00034146177200000816
The adopted image translation network can be an existing image translation network, and a CycleGan network is adopted in the embodiment.
In one embodiment, S2 specifically includes:
s2.1: defining the forward query as that the face under the X camera is used as a query image, the face under the Y camera is used as a queried image, Q is equal to {1,2,3} represents the type of the combination of query modes, and for the forward query, Q is equal to 1 and represents that the query image comes from XAThe image to be inquired comes from YB(ii) a Q2 indicates that the query image is from XAThe image to be inquired comes from YA(ii) a Q-3 indicates that the query image is from XBThe image to be inquired comes from YBWhen the forward query mode combination type is Q, taking the face image with the ID number m under the X camera as a query image, and taking all N face images under the Y camera as queried images;
s2.2: inputting N +1 face images into a face recognition feature extraction network to obtain respective corresponding face feature vectors, sequentially calculating feature similarity distances between the feature vectors of 1 query image and the feature vectors of N queried images to obtain N cosine similarity distances, and sequencing the N queried images from large to small according to the similarity to obtain a forward query sequencing result:
Figure BDA0003414617720000091
wherein the content of the first and second substances,
Figure BDA0003414617720000092
the ID number corresponding to the j-th ranking of the inquired image under the Y camera in the similarity sorting is represented, and j belongs to {1,2, 3.. and N };
s2.3: defining the top k images in the obtained forward query ranking result Rank (Q, m) as forward query top-k neighbors, and expressing the top k images as the forward query top-k neighbors
Figure BDA0003414617720000093
Wherein k is an adjustable parameter, the value range of k is 1 to MIN (M, N), and the forward query bidirectional top-k neighbor is obtained based on the forward query top-k neighbor:
Figure BDA0003414617720000094
wherein R (Q, m, k) represents a forward query bi-directional top-k neighbor,
Figure BDA0003414617720000095
representing that the image of the corresponding sequence number of the top-k neighbor of the forward query is taken as a new query image in the similarity ranking of all the query data set imagesThe serial number of the top k;
s2.4: and (3) obtaining the forward query credibility based on forward query bidirectional top-k neighbor R (Q, m, k):
Figure BDA0003414617720000096
wherein, W (Q, m, k) represents the reliability of forward query, | R (Q, m, k) | represents the image quantity of forward query bidirectional top-k neighbor R (Q, m, k).
Specifically, the top-k neighbors of the forward query represent the query image m in the query dataset in the forward query, and the IDs of the top k queried images in the similarity ranking of all queried datasetsYThe number (ID number of the image of the queried data set under the Y camera) is a forward query one-way top-k neighbor. Forward query bidirectional top-k neighbors are the first k queried image IDs found when top-k is forward queriedYNumber, with them as new query image, looking up in reverse all query datasets for the top k IDs in similarity orderXNumber (query dataset image under X camera ID number), if at this time m is just the first k IDsXThe values are as follows. Then the ID will beYAnd adding the query image m into forward query bidirectional top-k neighbors.
Figure BDA0003414617720000097
It is shown that when the query combination is Q, the query image m is ranked with the jth ID in the similarity ranking of the queried data setYNumber (n). For example:
Figure BDA0003414617720000101
is the query image m with the highest value in the similarity rankingYNumber (n). Therefore, the temperature of the molten metal is controlled,
Figure BDA0003414617720000102
represents the IDYNumber image as new query image, top k ID in similarity ranking of all query dataset imagesXNumber (n).
In a specific implementation process, a value of k may be selected according to an actual situation, where the value of k is 10 in this embodiment, and the adopted face recognition feature extraction network is an existing face recognition feature extraction network, and the embodiment adopts an ArcFace network.
In one embodiment, step S3 includes:
s3.1: defining the reverse query as that the face under the Y camera is used as a query image, the face under the X camera is used as a queried image, Q is belonged to {1,2,3} represents the type of the combination of query modes, and for the reverse query, Q is equal to 1 to represent that the query image comes from YBThe image to be queried comes from XA(ii) a Q2 indicates that the query image is from YAThe image to be queried comes from XA(ii) a Q-3 indicates that the query image is from YBThe image to be queried comes from XB(ii) a When the reverse query mode combination type is Q, taking the face image with the ID number n under the Y camera as a query image, and taking all M face images under the X camera as queried images;
s3.2: inputting the M +1 face images into a face recognition feature extraction network to obtain respective corresponding face feature vectors; then calculating cosine similarity distances between the feature vectors of the 1 query image and the feature vectors of the M queried images in sequence to obtain M cosine similarity distances; and finally, sequencing the M queried images according to the similarity from large to small to obtain a reverse query sequencing result:
Figure BDA0003414617720000103
wherein
Figure BDA0003414617720000104
Representing the ID number corresponding to the ith ranking of the inquired image under the Y camera in the similarity ranking, wherein i belongs to {1,2, 3.. and M };
s3.3: defining the top k images ranked in the reverse query ranking result Rank (Q, n) obtained in the step S3.2 as reverse query top-k neighbors, and expressing the images as the reverse query top-k neighbors
Figure BDA0003414617720000105
Obtaining reverse query bidirectional top-k neighbors based on the reverse query top-k neighbors:
Figure BDA0003414617720000106
wherein R (Q, n, k) represents a reverse query bi-directional top-k neighbor,
Figure BDA0003414617720000107
taking an image with a corresponding sequence number adjacent to the reverse query top-k as a new query image, and ranking the sequence numbers of the top k in the similarity ranking of all query data set images;
s3.4: and (4) obtaining reverse query credibility based on the reverse query bidirectional top-k neighbor R (Q, n, k) obtained in the step (S3.3):
Figure BDA0003414617720000111
where | R (Q, n, k) | represents the number of images in the reverse query bi-directional top-k neighbor R (Q, n, k).
Specifically, the top-k neighbor of the reverse query represents the query image n in the query dataset in the reverse query, and the IDs of the top k queried images in the similarity ranking of all queried datasetsxNumber (queried dataset image ID number under X camera). Which is a reverse query one-way top-k neighbor. Forward query bidirectional top-k neighbors are the first k queried image IDs found when top-k is forward queriedxNumber, with them as new query image, against finding the top k IDs of similarity ordering of all query datasetsYNumber (query dataset image under Y camera ID number), if at this time m is just the first k IDsYThe values are as follows. Then the ID will beXAnd adding the query image m into forward query bidirectional top-k neighbors.
Figure BDA0003414617720000112
Showing that when the query combination is Q, the image n is queried, and the ID of the ith is ranked in the similarity ordering of the queried data setXNumber (n). Such as:
Figure BDA0003414617720000113
that is, the query image n is the query image with the highest value in the similarity rankingXNumber (n). Therefore, the temperature of the molten metal is controlled,
Figure BDA0003414617720000114
represents the IDXNumber image as new query image, top k ID in similarity ranking of all query dataset imagesYNumber (n).
In a specific implementation process, the adopted face recognition feature extraction network is the existing face recognition feature extraction network, and the embodiment adopts an ArcFace network.
In one embodiment, the first multi-modal fusion similarity distance in S4 is calculated as:
Figure BDA0003414617720000115
wherein
Figure BDA0003414617720000116
Represents satisfaction
Figure BDA0003414617720000117
The value of the time j is the sequence number of the face image with the ID number n in Rank (Q, m);
Figure BDA0003414617720000118
represents satisfaction
Figure BDA0003414617720000119
The value of time i, namely the sequence number of the face image with the ID number m in Rank (Q, n), D1(m, n) represents the first multi-mode fusion between the face image with the ID number m under the X camera and the face image with the ID number n under the Y cameraThe similarity distance.
In one embodiment, the second multi-modal fusion similarity distance in S5 is calculated as:
Figure BDA00034146177200001110
wherein the function of F is to Q1, 2,3 respectively
Figure BDA00034146177200001111
Set of constructs and
Figure BDA00034146177200001112
the constituent set, the elements of the r-th order in ascending order, i.e. for three
Figure BDA0003414617720000121
And three
Figure BDA0003414617720000122
Respectively sorting from small to large, taking the value of the sorted r, wherein r is an adjustable parameter and has the same value range as Q, and D2(m, n) represents a second multi-modal fusion similarity distance between the face image with the ID number m under the X camera and the face image with the ID number n under the Y camera.
In the specific implementation process, the value range of the adjustable parameter r is the same as that of Q, and in this embodiment, r takes a value of 2.
In one embodiment, the calculation formula of the third multi-modal fusion similarity distance in step S6 is as follows:
D3=λ*D1+(1–λ)*D2
wherein, λ is weight coefficient, the value range is 0 to 1, D1For the first multimodal fusion of similarity distances, D2For a second multimodal fusion of similarity distances, D3The third multi-modal fusion similarity distance.
The weight coefficient can be selected according to the actual situation, and λ is 0.2 in this embodiment.
Regarding the three multi-modal fusion similarity distances involved in the invention, the first multi-modal fusion similarity distance is a linear similarity distance fusion result, and the complementarity of all multi-modal original orderings is fully considered. The second multi-modal fusion similarity distance is a non-linear similarity distance fusion result, and the importance of the original ordering of a certain modality is strengthened while the complementarity among the multiple modalities is considered. The two multi-modal fusion similarity distances have mutual advantages, so that the final multi-modal fusion similarity distance obtained through weighting can improve the re-recognition accuracy.
The related contents of the image translation CycleGan network and the face recognition feature extraction network ArcFace are listed in the documents listed in the actual examination reference.
In one embodiment, step S7 includes:
sequentially calculating a third multi-mode fusion similarity distance between the face image with the ID number m under the X camera and the N face images with the ID numbers N under the Y camera being 1 and 2 … N;
and sequencing the obtained N similarity distances from small to large to obtain a final face query similarity sequencing result of all N face images under the X camera and the Y camera under the face image query with the ID number of m under the X camera, and taking the final face query similarity sequencing result as an image re-identification result.
The invention provides a multi-modal sequencing optimization algorithm for heterogeneous face image re-recognition, which utilizes an image translation network to perform modal conversion on original heterogeneous face images under two cameras, forms different types of query modal combinations with the face images before and after the modal conversion, can effectively utilize the complementarity of the face images among different modalities, and obtains a query sequencing result of which the correct face sequencing is more forward by fusing similarity distances in a multi-modal manner, thereby improving the accuracy of the heterogeneous face re-recognition.
It should be understood that parts of the specification not set forth in detail are well within the prior art.
It should be understood that the above description of the preferred embodiments is given for clarity and not for any purpose of limitation, and that various changes, substitutions and alterations can be made herein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims (8)

1. A multi-modal sequencing optimization method for heterogeneous face image re-recognition is characterized by comprising the following steps:
s1: performing image translation on two heterogeneous face images under two cameras to respectively obtain face images after modal conversion corresponding to the two heterogeneous face images;
s2: the face images before and after mode conversion form different types of query mode combinations, forward query is carried out on the different query mode combinations to obtain corresponding forward query sequencing results and forward query credibility, the forward query represents that the face under a first camera is used as a query image, and the face under a second camera is used as a queried image;
s3: carrying out reverse query on different query mode combinations to obtain corresponding reverse query sequencing results and reverse query credibility, wherein the reverse query represents that the face under the second camera is used as a query image, and the face under the first camera is used as a queried image;
s4: calculating a first multi-mode fusion similarity distance according to a forward query sorting result, a forward query reliability, a reverse query sorting result and a reverse query reliability of different query mode combinations, and utilizing the complementarity of face images among different modes in a linear fusion mode;
s5: calculating a second multi-mode fusion similarity distance according to a forward query sorting result and a reverse query sorting result of different query mode combinations, and utilizing the complementarity of the face images among different modes in a nonlinear fusion mode;
s6: performing weighted fusion on the first multi-modal fusion similarity distance and the second multi-modal fusion similarity distance to obtain a third multi-modal fusion similarity distance;
s7: and performing image re-identification according to the third multi-modal fusion similarity distance to obtain a final face query similarity sequencing result, and obtaining an image re-identification result based on the final face query similarity sequencing result.
2. The multi-modal ranking optimization method for heterogeneous face image re-recognition according to claim 1, wherein the step S1 includes:
s1.1: dividing two heterogeneous face data sets under two cameras into training data sets
Figure FDA0003414617710000011
And
Figure FDA0003414617710000012
and test data set XAAnd YBWherein X and Y respectively represent two cameras, A and B respectively represent two different image modalities obtained by shooting under the two cameras,
Figure FDA0003414617710000013
m is the ID number of the face image X under the X camera, and M is the number of the face image data sets under the X camera;
Figure FDA0003414617710000014
n is the ID number of the face image Y under the Y camera, and N is the number of the face image data sets under the Y camera;
s1.2: using training data sets
Figure FDA0003414617710000021
And
Figure FDA0003414617710000022
training an image translation network;
s1.3: test data set XAAnd YBInputting a trained image translation network to obtain a corresponding data set X after mode conversionBAnd YAWherein
Figure FDA0003414617710000023
Figure FDA0003414617710000024
3. The multi-modal ranking optimization method for heterogeneous face image re-recognition according to claim 1, wherein the step S2 includes:
s2.1: defining the forward query as that the face under the X camera is used as a query image, the face under the Y camera is used as a queried image, Q is equal to {1,2,3} represents the type of the combination of query modes, and for the forward query, Q is equal to 1 and represents that the query image comes from XAThe image to be inquired comes from YB(ii) a Q2 indicates that the query image is from XAThe image to be inquired comes from YA(ii) a Q-3 indicates that the query image is from XBThe image to be inquired comes from YBWhen the forward query mode combination type is Q, taking the face image with the ID number m under the X camera as a query image, and taking all N face images under the Y camera as queried images;
s2.2: inputting N +1 face images into a face recognition feature extraction network to obtain respective corresponding face feature vectors, sequentially calculating feature similarity distances between the feature vectors of 1 query image and the feature vectors of N queried images to obtain N cosine similarity distances, and sequencing the N queried images from large to small according to the similarity to obtain a forward query sequencing result:
Figure FDA0003414617710000025
wherein the content of the first and second substances,
Figure FDA0003414617710000026
the ID number corresponding to the j-th ranking of the inquired image under the Y camera in the similarity sorting is represented, and j belongs to {1,2, 3.. and N };
s2.3: defining the top k images in the obtained forward query ranking result Rank (Q, m) as forward query top-k neighbors, and expressing the top k images as the forward query top-k neighbors
Figure FDA0003414617710000027
Wherein k is an adjustable parameter, the value range of k is 1 to MIN (M, N), and the forward query bidirectional top-k neighbor is obtained based on the forward query top-k neighbor:
Figure FDA0003414617710000028
wherein R (Q, m, k) represents a forward query bi-directional top-k neighbor,
Figure FDA0003414617710000029
representing that the image of the sequence number corresponding to the top-k neighbor of the forward query is used as a new query image, and the sequence number of the top k in the similarity ranking of all query data set images;
s2.4: and (3) obtaining the forward query credibility based on forward query bidirectional top-k neighbor R (Q, m, k):
Figure FDA0003414617710000031
wherein, W (Q, m, k) represents the reliability of forward query, | R (Q, m, k) | represents the image quantity of forward query bidirectional top-k neighbor R (Q, m, k).
4. The multi-modal ranking optimization method for heterogeneous face image re-recognition according to claim 1, wherein the step S3 includes:
s3.1: defining the reverse query as that the face under the Y camera is used as a query image, the face under the X camera is used as a queried image, Q is belonged to {1,2,3} represents the type of the combination of query modes, and for the reverse query, Q is equal to 1 to represent that the query image comes from YBThe image to be queried comes from XA(ii) a Q2 indicates that the query image is from YAThe image to be queried comes from XA(ii) a Q-3 indicates that the query image is from YBThe image to be queried comes from XB(ii) a When the reverse query mode combination type is Q, taking the face image with the ID number n under the Y camera as a query image, and taking all M face images under the X camera as queried images;
s3.2: inputting the M +1 face images into a face recognition feature extraction network to obtain respective corresponding face feature vectors; then calculating cosine similarity distances between the feature vectors of the 1 query image and the feature vectors of the M queried images in sequence to obtain M cosine similarity distances; and finally, sequencing the M queried images according to the similarity from large to small to obtain a reverse query sequencing result:
Figure FDA0003414617710000032
wherein
Figure FDA0003414617710000033
Representing the ID number corresponding to the ith ranking of the inquired image under the Y camera in the similarity ranking, wherein i belongs to {1,2, 3.. and M };
s3.3: defining the top k images ranked in the reverse query ranking result Rank (Q, n) obtained in the step S3.2 as reverse query top-k neighbors, and expressing the images as the reverse query top-k neighbors
Figure FDA0003414617710000034
Obtaining reverse query bidirectional top-k neighbors based on the reverse query top-k neighbors:
Figure FDA0003414617710000035
wherein R (Q, n, k) represents a reverse query bi-directional top-k neighbor,
Figure FDA0003414617710000036
taking an image of the corresponding sequence number of the top-k neighbor which is inquired reversely asNew query images, the sequence number of the top k in the similarity ranking of all query data set images;
s3.4: and (4) obtaining reverse query credibility based on the reverse query bidirectional top-k neighbor R (Q, n, k) obtained in the step (S3.3):
Figure FDA0003414617710000041
wherein, W (Q, n, k) represents the reliability of the reverse query, | R (Q, n, k) | represents the image quantity of the reverse query bidirectional top-k neighbor R (Q, n, k).
5. The multi-modal ranking optimization method for the re-recognition of the heterogeneous face images as claimed in claim 1, wherein the first multi-modal fusion similarity distance in S4 is calculated as:
Figure FDA0003414617710000042
wherein
Figure FDA0003414617710000043
Represents satisfaction
Figure FDA0003414617710000044
The value of the time j is the sequence number of the face image with the ID number n in Rank (Q, m);
Figure FDA0003414617710000045
represents satisfaction
Figure FDA0003414617710000046
The value of time i, namely the sequence number of the face image with the ID number m in Rank (Q, n), D1(m, n) represents a first multi-mode fusion similarity distance between the face image with the ID number m under the X camera and the face image with the ID number n under the Y camera.
6. The multi-modal ranking optimization method for the re-recognition of the heterogeneous face images as claimed in claim 1, wherein the second multi-modal fusion similarity distance in S5 is calculated by the following formula:
Figure FDA0003414617710000047
wherein the function of F is to Q1, 2,3 respectively
Figure FDA0003414617710000048
Set of constructs and
Figure FDA0003414617710000049
the constituent set, the elements of the r-th order in ascending order, i.e. for three
Figure FDA00034146177100000410
And three
Figure FDA00034146177100000411
Respectively sorting from small to large, taking the value of the sorted r, wherein r is an adjustable parameter and has the same value range as Q, and D2(m, n) represents a second multi-modal fusion similarity distance between the face image with the ID number m under the X camera and the face image with the ID number n under the Y camera.
7. The multi-modal ranking optimization method for the face image recognitions of the heterogeneous object as claimed in claim 1, wherein the third multi-modal fusion similarity distance in step S6 is calculated by the following formula:
D3=λ*D1+(1–λ)*D2
wherein, λ is weight coefficient, the value range is 0 to 1, D1For the first multimodal fusion of similarity distances, D2For a second multimodal fusion of similarity distances, D3The third multi-modal fusion similarity distance.
8. The multi-modal ranking optimization method for heterogeneous face image re-recognition according to claim 1, wherein the step S7 includes:
sequentially calculating a third multi-mode fusion similarity distance between the face image with the ID number m under the X camera and the N face images with the ID numbers N under the Y camera being 1 and 2 … N;
and sequencing the obtained N similarity distances from small to large to obtain a final face query similarity sequencing result of all N face images under the X camera and the Y camera under the face image query with the ID number of m under the X camera, and taking the final face query similarity sequencing result as an image re-identification result.
CN202111542116.4A 2021-12-16 2021-12-16 Multi-mode ordering optimization method for heterogeneous face image re-recognition Active CN114283471B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111542116.4A CN114283471B (en) 2021-12-16 2021-12-16 Multi-mode ordering optimization method for heterogeneous face image re-recognition

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111542116.4A CN114283471B (en) 2021-12-16 2021-12-16 Multi-mode ordering optimization method for heterogeneous face image re-recognition

Publications (2)

Publication Number Publication Date
CN114283471A true CN114283471A (en) 2022-04-05
CN114283471B CN114283471B (en) 2024-04-02

Family

ID=80872469

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111542116.4A Active CN114283471B (en) 2021-12-16 2021-12-16 Multi-mode ordering optimization method for heterogeneous face image re-recognition

Country Status (1)

Country Link
CN (1) CN114283471B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115690556A (en) * 2022-11-08 2023-02-03 河北北方学院附属第一医院 Image recognition method and system based on multi-modal iconography characteristics

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2016110005A1 (en) * 2015-01-07 2016-07-14 深圳市唯特视科技有限公司 Gray level and depth information based multi-layer fusion multi-modal face recognition device and method
CN111126249A (en) * 2019-12-20 2020-05-08 深圳久凌软件技术有限公司 Pedestrian re-identification method and device combining big data and Bayes
CN112926557A (en) * 2021-05-11 2021-06-08 北京的卢深视科技有限公司 Method for training multi-mode face recognition model and multi-mode face recognition method
US20210224313A1 (en) * 2019-03-11 2021-07-22 Boe Technology Group Co., Ltd. Reverse image search method, apparatus and application system

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2016110005A1 (en) * 2015-01-07 2016-07-14 深圳市唯特视科技有限公司 Gray level and depth information based multi-layer fusion multi-modal face recognition device and method
US20210224313A1 (en) * 2019-03-11 2021-07-22 Boe Technology Group Co., Ltd. Reverse image search method, apparatus and application system
CN111126249A (en) * 2019-12-20 2020-05-08 深圳久凌软件技术有限公司 Pedestrian re-identification method and device combining big data and Bayes
CN112926557A (en) * 2021-05-11 2021-06-08 北京的卢深视科技有限公司 Method for training multi-mode face recognition model and multi-mode face recognition method

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
王锟朋;高兴宇;: "基于附加间隔Softmax特征的人脸聚类算法", 计算机应用与软件, no. 02, 12 February 2020 (2020-02-12), pages 117 - 123 *
胡辉等: "Mutimodal Ranking Optimization for Heterogeneous Face Re-identification", SLSEVIER, 31 December 2022 (2022-12-31), pages 1 - 7 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115690556A (en) * 2022-11-08 2023-02-03 河北北方学院附属第一医院 Image recognition method and system based on multi-modal iconography characteristics
CN115690556B (en) * 2022-11-08 2023-06-27 河北北方学院附属第一医院 Image recognition method and system based on multi-mode imaging features

Also Published As

Publication number Publication date
CN114283471B (en) 2024-04-02

Similar Documents

Publication Publication Date Title
CN110188227B (en) Hash image retrieval method based on deep learning and low-rank matrix optimization
KR102385463B1 (en) Facial feature extraction model training method, facial feature extraction method, apparatus, device and storage medium
CN112651262B (en) Cross-modal pedestrian re-identification method based on self-adaptive pedestrian alignment
CN110717099B (en) Method and terminal for recommending film
WO2011089872A1 (en) Image management device, image management method, program, recording medium, and integrated circuit
CN109871821B (en) Pedestrian re-identification method, device, equipment and storage medium of self-adaptive network
CN111597298A (en) Cross-modal retrieval method and device based on deep confrontation discrete hash learning
CN107590505B (en) Learning method combining low-rank representation and sparse regression
CN113239801B (en) Cross-domain action recognition method based on multi-scale feature learning and multi-level domain alignment
CN112507853B (en) Cross-modal pedestrian re-recognition method based on mutual attention mechanism
CN114461839B (en) Multi-mode pre-training-based similar picture retrieval method and device and electronic equipment
CN109766455B (en) Identified full-similarity preserved Hash cross-modal retrieval method
CN113918764B (en) Movie recommendation system based on cross-modal fusion
CN111985520A (en) Multi-mode classification method based on graph convolution neural network
CN112926675B (en) Depth incomplete multi-view multi-label classification method under double visual angle and label missing
Wang et al. Aspect-ratio-preserving multi-patch image aesthetics score prediction
CN109472282B (en) Depth image hashing method based on few training samples
CN116610778A (en) Bidirectional image-text matching method based on cross-modal global and local attention mechanism
CN114283471A (en) Multi-modal sequencing optimization method for heterogeneous face image re-recognition
CN114998777A (en) Training method and device for cross-modal video retrieval model
CN116612324A (en) Small sample image classification method and device based on semantic self-adaptive fusion mechanism
CN111026910A (en) Video recommendation method and device, electronic equipment and computer-readable storage medium
CN117351518A (en) Method and system for identifying unsupervised cross-modal pedestrian based on level difference
CN116956128A (en) Hypergraph-based multi-mode multi-label classification method and system
US20220165055A1 (en) Information processing apparatus, information processing method, and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant