CN114283471A - Multi-modal sequencing optimization method for heterogeneous face image re-recognition - Google Patents
Multi-modal sequencing optimization method for heterogeneous face image re-recognition Download PDFInfo
- Publication number
- CN114283471A CN114283471A CN202111542116.4A CN202111542116A CN114283471A CN 114283471 A CN114283471 A CN 114283471A CN 202111542116 A CN202111542116 A CN 202111542116A CN 114283471 A CN114283471 A CN 114283471A
- Authority
- CN
- China
- Prior art keywords
- query
- image
- face
- camera
- under
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000012163 sequencing technique Methods 0.000 title claims abstract description 56
- 238000000034 method Methods 0.000 title claims abstract description 26
- 238000005457 optimization Methods 0.000 title claims abstract description 19
- 230000004927 fusion Effects 0.000 claims abstract description 84
- 238000006243 chemical reaction Methods 0.000 claims abstract description 22
- 230000002457 bidirectional effect Effects 0.000 claims description 21
- 239000013598 vector Substances 0.000 claims description 18
- 238000013519 translation Methods 0.000 claims description 17
- 238000012549 training Methods 0.000 claims description 12
- 238000000605 extraction Methods 0.000 claims description 11
- 238000012360 testing method Methods 0.000 claims description 8
- 230000001174 ascending effect Effects 0.000 claims description 3
- 239000000470 constituent Substances 0.000 claims description 3
- 239000000126 substance Substances 0.000 claims description 3
- 230000000694 effects Effects 0.000 description 3
- 238000012544 monitoring process Methods 0.000 description 3
- 238000004364 calculation method Methods 0.000 description 2
- 239000002184 metal Substances 0.000 description 2
- 230000004075 alteration Effects 0.000 description 1
- 238000012806 monitoring device Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000005728 strengthening Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Images
Landscapes
- Image Analysis (AREA)
- Collating Specific Patterns (AREA)
Abstract
The invention discloses a multi-modal sequencing optimization method for heterogeneous face image re-recognition, which comprises the following steps: carrying out modal conversion on the face image; different types of query modality combinations are formed by the face images before and after modality conversion, forward query and reverse query are carried out on the different query modality combinations, and corresponding query sequencing results and query credibility are obtained; respectively calculating a first multi-mode fusion similarity distance and a second multi-mode fusion similarity distance; performing weighted fusion on the first multi-modal fusion similarity distance and the second multi-modal fusion similarity distance to obtain a third multi-modal fusion similarity distance; according to the third multi-mode fusion similarity distance, the final face query similarity ranking result is obtained, the face image complementarity between different modes can be effectively utilized, and the query ranking result with the correct face ranking closer to the front is obtained through the multi-mode fusion similarity distance, so that the accuracy of the heterogeneous face re-identification is improved.
Description
Technical Field
The invention relates to the technical field of digital images, in particular to a multi-modal sequencing optimization method for heterogeneous face image re-recognition.
Background
Face re-recognition is one of leading research directions in the field of video monitoring, and the technology is different from the face recognition in that: in the face recognition in video monitoring, a low-quality face shot by a certain camera is generally inquired in a large-scale high-quality face library so as to confirm the identity of the face shot by the certain camera, namely who the person is; the face re-recognition is to query a low-quality face shot by a certain camera in a low-quality face set shot by another camera, so as to confirm the identity relationship between faces shot by different cameras, that is, whether a person under one camera and a person under another camera are the same person, but the person is unclear. With the increasing diversification of monitoring device types, faces shot by different cameras may be heterogeneous, such as visible light faces and infrared faces, so that the accuracy of heterogeneous face re-recognition is significantly reduced. Therefore, how to improve the accuracy of the heterogeneous face re-recognition becomes a new problem to be solved in the field of video monitoring.
Disclosure of Invention
The invention provides a multi-modal sequencing optimization method for heterogeneous face re-recognition, aiming at solving the problem of low accuracy of heterogeneous face re-recognition in the prior art, and aiming at optimizing and adjusting the sequencing of a face set to be inquired so that the sequencing of a correct face is earlier, thereby improving the accuracy of heterogeneous face re-recognition.
The technical problem of the invention is mainly solved by the following technical scheme:
the method discloses a multi-modal sequencing optimization method for heterogeneous face image re-recognition, which comprises the following steps:
s1: performing image translation on two heterogeneous face images under two cameras to respectively obtain face images after modal conversion corresponding to the two heterogeneous face images;
s2: the face images before and after mode conversion form different types of query mode combinations, forward query is carried out on the different query mode combinations to obtain corresponding forward query sequencing results and forward query credibility, the forward query represents that the face under a first camera is used as a query image, and the face under a second camera is used as a queried image;
s3: carrying out reverse query on different query mode combinations to obtain corresponding reverse query sequencing results and reverse query credibility, wherein the reverse query represents that the face under the second camera is used as a query image, and the face under the first camera is used as a queried image;
s4: calculating a first multi-mode fusion similarity distance according to a forward query sorting result, a forward query reliability, a reverse query sorting result and a reverse query reliability of different query mode combinations, and utilizing the complementarity of face images among different modes in a linear fusion mode;
s5: calculating a second multi-mode fusion similarity distance according to a forward query sorting result and a reverse query sorting result of different query mode combinations, and utilizing the complementarity of the face images among different modes in a nonlinear fusion mode;
s6: performing weighted fusion on the first multi-modal fusion similarity distance and the second multi-modal fusion similarity distance to obtain a third multi-modal fusion similarity distance;
s7: and performing image re-identification according to the third multi-modal fusion similarity distance to obtain a final face query similarity sequencing result, and obtaining an image re-identification result based on the final face query similarity sequencing result.
In one embodiment, step S1 includes:
s1.1: dividing two heterogeneous face data sets under two cameras into training data setsAndand test data set XAAnd YBWherein X and Y respectively represent two cameras, A and B respectively represent two different image modalities obtained by shooting under the two cameras,m is the ID number of the face image X under the X camera, and M is the number of the face image data sets under the X camera;n is the ID number of the face image Y under the Y camera, and N is the number of the face image data sets under the Y camera;
s1.3: test data set XAAnd YBInputting a trained image translation network to obtain a corresponding data set X after mode conversionBAnd YAWherein
In one embodiment, step S2 includes:
s2.1: defining the forward query as that the face under the X camera is used as a query image, the face under the Y camera is used as a queried image, Q is equal to {1,2,3} represents the type of the combination of query modes, and for the forward query, Q is equal to 1 and represents that the query image comes from XAThe image to be inquired comes from YB(ii) a Q2 indicates that the query image is from XAThe image to be inquired comes from YA(ii) a Q-3 indicates that the query image is from XBIs examinedQuery image from YBWhen the forward query mode combination type is Q, taking the face image with the ID number m under the X camera as a query image, and taking all N face images under the Y camera as queried images;
s2.2: inputting N +1 face images into a face recognition feature extraction network to obtain respective corresponding face feature vectors, sequentially calculating feature similarity distances between the feature vectors of 1 query image and the feature vectors of N queried images to obtain N cosine similarity distances, and sequencing the N queried images from large to small according to the similarity to obtain a forward query sequencing result:
wherein the content of the first and second substances,the ID number corresponding to the j-th ranking of the inquired image under the Y camera in the similarity sorting is represented, and j belongs to {1,2, 3.. and N };
s2.3: defining the top k images in the obtained forward query ranking result Rank (Q, m) as forward query top-k neighbors, and expressing the top k images as the forward query top-k neighborsWherein k is an adjustable parameter, the value range of k is 1 to MIN (M, N), and the forward query bidirectional top-k neighbor is obtained based on the forward query top-k neighbor:
wherein R (Q, m, k) represents a forward query bi-directional top-k neighbor,representing that the image of the sequence number corresponding to the top-k neighbor of the forward query is used as a new query image, and the sequence number of the top k in the similarity ranking of all query data set images;
s2.4: and (3) obtaining the forward query credibility based on forward query bidirectional top-k neighbor R (Q, m, k):
wherein, W (Q, m, k) represents the reliability of forward query, | R (Q, m, k) | represents the image quantity of forward query bidirectional top-k neighbor R (Q, m, k).
In one embodiment, step S3 includes:
s3.1: defining the reverse query as that the face under the Y camera is used as a query image, the face under the X camera is used as a queried image, Q is belonged to {1,2,3} represents the type of the combination of query modes, and for the reverse query, Q is equal to 1 to represent that the query image comes from YBThe image to be queried comes from XA(ii) a Q2 indicates that the query image is from YAThe image to be queried comes from XA(ii) a Q-3 indicates that the query image is from YBThe image to be queried comes from XB(ii) a When the reverse query mode combination type is Q, taking the face image with the ID number n under the Y camera as a query image, and taking all M face images under the X camera as queried images;
s3.2: inputting the M +1 face images into a face recognition feature extraction network to obtain respective corresponding face feature vectors; then calculating cosine similarity distances between the feature vectors of the 1 query image and the feature vectors of the M queried images in sequence to obtain M cosine similarity distances; and finally, sequencing the M queried images according to the similarity from large to small to obtain a reverse query sequencing result:
whereinThe method is used for representing the ID number corresponding to the ith ranking of the inquired image under the Y camera in the similarity ranking, i belongs to {1,2,3;
S3.3: defining the top k images ranked in the reverse query ranking result Rank (Q, n) obtained in the step S3.2 as reverse query top-k neighbors, and expressing the images as the reverse query top-k neighborsObtaining reverse query bidirectional top-k neighbors based on the reverse query top-k neighbors:
wherein R (Q, n, k) represents a reverse query bi-directional top-k neighbor,taking an image with a corresponding sequence number adjacent to the reverse query top-k as a new query image, and ranking the sequence numbers of the top k in the similarity ranking of all query data set images;
s3.4: and (4) obtaining reverse query credibility based on the reverse query bidirectional top-k neighbor R (Q, n, k) obtained in the step (S3.3):
wherein, W (Q, n, k) represents the reliability of the reverse query, | R (Q, n, k) | represents the image quantity of the reverse query bidirectional top-k neighbor R (Q, n, k).
In one embodiment, the first multi-modal fusion similarity distance in S4 is calculated as:
whereinRepresents satisfactionWhen j isThe value of (1), namely the sequence number of the face image with the ID number of n in Rank (Q, m);represents satisfactionThe value of time i, namely the sequence number of the face image with the ID number m in Rank (Q, n), D1(m, n) represents a first multi-mode fusion similarity distance between the face image with the ID number m under the X camera and the face image with the ID number n under the Y camera.
In one embodiment, the second multi-modal fusion similarity distance in S5 is calculated as:
wherein the function of F is to Q1, 2,3 respectivelySet of constructs andthe constituent set, the elements of the r-th order in ascending order, i.e. for threeAnd threeRespectively sorting from small to large, taking the value of the sorted r, wherein r is an adjustable parameter and has the same value range as Q, and D2(m, n) represents a second multi-modal fusion similarity distance between the face image with the ID number m under the X camera and the face image with the ID number n under the Y camera.
In one embodiment, the calculation formula of the third multi-modal fusion similarity distance in step S6 is as follows:
D3=λ*D1+(1–λ)*D2
wherein, λ is weight coefficient, the value range is 0 to 1, D1For the first multimodal fusion of similarity distances, D2For a second multimodal fusion of similarity distances, D3The third multi-modal fusion similarity distance.
In one embodiment, step S7 includes:
sequentially calculating a third multi-mode fusion similarity distance between the face image with the ID number m under the X camera and the N face images with the ID numbers N under the Y camera being 1 and 2 … N;
sequencing the obtained N similarity distances from small to large to obtain a final face query similarity sequencing result of the face images with the ID numbers of m under the X camera and all the N face images under the Y camera, and taking the final face query similarity sequencing result as an image re-identification result
One or more technical solutions in the embodiments of the present application have at least one or more of the following technical effects:
the invention provides a multimode sequencing optimization method facing to heterogeneous face image re-recognition, which is characterized in that two heterogeneous face images under two cameras are subjected to image translation to obtain face images after corresponding modal conversion; different types of query modality combinations are formed by the face images before and after modality conversion, forward query is carried out on the different query modality combinations, and a corresponding forward query sequencing result and forward query credibility are obtained; carrying out reverse query on different query mode combinations to obtain corresponding reverse query sequencing results and reverse query credibility; calculating a first multi-mode fusion similarity distance according to a forward query sorting result, a forward query reliability, a reverse query sorting result and a reverse query reliability of different query mode combinations; calculating a second multi-mode fusion similarity distance according to the forward query sorting result and the reverse query sorting result of different query mode combinations; performing weighted fusion on the first multi-modal fusion similarity distance and the second multi-modal fusion similarity distance to obtain a third multi-modal fusion similarity distance; and obtaining a final face query similarity ranking result according to the third multi-mode fusion similarity distance. Because the face images before and after mode conversion form different types of query mode combinations, the method can effectively utilize the complementarity of the face images among different modes, consider the importance of strengthening the original sequencing of a certain mode, and obtain a query sequencing result of which the correct face sequencing is more advanced by fusing the similarity distance in a multi-mode manner, thereby improving the accuracy of heterogeneous face re-identification.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.
FIG. 1 is a flowchart of a multimodal ranking optimization method for heterogeneous face image re-recognition according to the present invention.
Detailed Description
The invention provides a multimode sequencing optimization method for heterogeneous face image re-recognition, so that the technical effect of improving the recognition accuracy of face image re-recognition is achieved.
In order to achieve the technical effects, the invention has the main conception that:
performing image translation on two heterogeneous face images under two cameras to obtain a face image after corresponding mode conversion; different types of query modality combinations are formed by the face images before and after modality conversion, forward query is carried out on the different query modality combinations, and a corresponding forward query sequencing result and forward query credibility are obtained; carrying out reverse query on different query mode combinations to obtain corresponding reverse query sequencing results and reverse query credibility; calculating a first multi-mode fusion similarity distance according to a forward query sorting result, a forward query reliability, a reverse query sorting result and a reverse query reliability of different query mode combinations; calculating a second multi-mode fusion similarity distance according to the forward query sorting result and the reverse query sorting result of different query mode combinations; performing weighted fusion on the first multi-modal fusion similarity distance and the second multi-modal fusion similarity distance to obtain a third multi-modal fusion similarity distance; and obtaining a final face query similarity ranking result according to the third multi-mode fusion similarity distance.
According to the method, different types of query modality combinations are formed by the face images before and after modality conversion, the complementarity of the face images among different modalities can be effectively utilized, and the query ranking result of which the correct face ranking is more advanced is obtained through multi-modality fusion of similarity distances, so that the accuracy of heterogeneous face re-identification is improved.
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Example one
The embodiment provides a multi-modal sequencing optimization method for heterogeneous face image re-recognition, which comprises the following steps:
s1: performing image translation on two heterogeneous face images under two cameras to respectively obtain face images after modal conversion corresponding to the two heterogeneous face images;
s2: the face images before and after mode conversion form different types of query mode combinations, forward query is carried out on the different query mode combinations to obtain corresponding forward query sequencing results and forward query credibility, the forward query represents that the face under a first camera is used as a query image, and the face under a second camera is used as a queried image;
s3: carrying out reverse query on different query mode combinations to obtain corresponding reverse query sequencing results and reverse query credibility, wherein the reverse query represents that the face under the second camera is used as a query image, and the face under the first camera is used as a queried image;
s4: calculating a first multi-mode fusion similarity distance according to a forward query sorting result, a forward query reliability, a reverse query sorting result and a reverse query reliability of different query mode combinations, and utilizing the complementarity of face images among different modes in a linear fusion mode;
s5: calculating a second multi-mode fusion similarity distance according to a forward query sorting result and a reverse query sorting result of different query mode combinations, and utilizing the complementarity of the face images among different modes in a linear fusion mode;
s6: performing weighted fusion on the first multi-modal fusion similarity distance and the second multi-modal fusion similarity distance to obtain a third multi-modal fusion similarity distance;
s7: and performing image re-identification according to the third multi-modal fusion similarity distance to obtain a final face query similarity sequencing result, and obtaining an image re-identification result based on the final face query similarity sequencing result.
In one embodiment, S1 specifically includes:
s1.1: dividing two heterogeneous face data sets under two cameras into training data setsAndand test data set XAAnd YBWherein X and Y respectively represent two cameras, A and B respectively represent two different image modalities obtained by shooting under the two cameras,m is the ID number of the face image X under the X camera, and M is the number of the face image data sets under the X camera;n is the ID number of the face image Y under the Y camera, and N is the Y shotNumber of under-head face image datasets;
s1.3: test data set XAAnd YBInputting a trained image translation network to obtain a corresponding data set X after mode conversionBAnd YAWherein
In the specific implementation process, the two heterogeneous face images are a visible light low-resolution face image and a near-infrared low-resolution face image which are respectively shot by the two cameras. In this embodiment, two heterogeneous face data sets are divided into training data setsAndand test data set XAAnd YBWhere X and Y represent two cameras, respectively, and a and B represent two different image modalities, respectively: the visible light and the near-infrared light, m is the ID number of the visible light face image X under the X camera, M is the number of the visible light face image data sets under the X camera, and the value of M is 60 in the embodiment; n is the ID number of the near-infrared face image Y under the Y camera, N is the number of the near-infrared face image data sets under the Y camera, and the value of N is 60 in this embodiment.
In this embodiment, the training data set is first employedAndand training the image translation network to realize the conversion function of visible light and near infrared modes. Then, test data set XAAnd YBInputting a trained image translation network to obtain a corresponding data set X after mode conversionBAnd YAThat is, visible face image X under X cameraAConversion into near-infrared face image XBNear-infrared human face image Y under Y cameraBConverting into visible light human face image YAWherein The adopted image translation network can be an existing image translation network, and a CycleGan network is adopted in the embodiment.
In one embodiment, S2 specifically includes:
s2.1: defining the forward query as that the face under the X camera is used as a query image, the face under the Y camera is used as a queried image, Q is equal to {1,2,3} represents the type of the combination of query modes, and for the forward query, Q is equal to 1 and represents that the query image comes from XAThe image to be inquired comes from YB(ii) a Q2 indicates that the query image is from XAThe image to be inquired comes from YA(ii) a Q-3 indicates that the query image is from XBThe image to be inquired comes from YBWhen the forward query mode combination type is Q, taking the face image with the ID number m under the X camera as a query image, and taking all N face images under the Y camera as queried images;
s2.2: inputting N +1 face images into a face recognition feature extraction network to obtain respective corresponding face feature vectors, sequentially calculating feature similarity distances between the feature vectors of 1 query image and the feature vectors of N queried images to obtain N cosine similarity distances, and sequencing the N queried images from large to small according to the similarity to obtain a forward query sequencing result:
wherein the content of the first and second substances,the ID number corresponding to the j-th ranking of the inquired image under the Y camera in the similarity sorting is represented, and j belongs to {1,2, 3.. and N };
s2.3: defining the top k images in the obtained forward query ranking result Rank (Q, m) as forward query top-k neighbors, and expressing the top k images as the forward query top-k neighborsWherein k is an adjustable parameter, the value range of k is 1 to MIN (M, N), and the forward query bidirectional top-k neighbor is obtained based on the forward query top-k neighbor:
wherein R (Q, m, k) represents a forward query bi-directional top-k neighbor,representing that the image of the corresponding sequence number of the top-k neighbor of the forward query is taken as a new query image in the similarity ranking of all the query data set imagesThe serial number of the top k;
s2.4: and (3) obtaining the forward query credibility based on forward query bidirectional top-k neighbor R (Q, m, k):
wherein, W (Q, m, k) represents the reliability of forward query, | R (Q, m, k) | represents the image quantity of forward query bidirectional top-k neighbor R (Q, m, k).
Specifically, the top-k neighbors of the forward query represent the query image m in the query dataset in the forward query, and the IDs of the top k queried images in the similarity ranking of all queried datasetsYThe number (ID number of the image of the queried data set under the Y camera) is a forward query one-way top-k neighbor. Forward query bidirectional top-k neighbors are the first k queried image IDs found when top-k is forward queriedYNumber, with them as new query image, looking up in reverse all query datasets for the top k IDs in similarity orderXNumber (query dataset image under X camera ID number), if at this time m is just the first k IDsXThe values are as follows. Then the ID will beYAnd adding the query image m into forward query bidirectional top-k neighbors.
It is shown that when the query combination is Q, the query image m is ranked with the jth ID in the similarity ranking of the queried data setYNumber (n). For example:is the query image m with the highest value in the similarity rankingYNumber (n). Therefore, the temperature of the molten metal is controlled,represents the IDYNumber image as new query image, top k ID in similarity ranking of all query dataset imagesXNumber (n).
In a specific implementation process, a value of k may be selected according to an actual situation, where the value of k is 10 in this embodiment, and the adopted face recognition feature extraction network is an existing face recognition feature extraction network, and the embodiment adopts an ArcFace network.
In one embodiment, step S3 includes:
s3.1: defining the reverse query as that the face under the Y camera is used as a query image, the face under the X camera is used as a queried image, Q is belonged to {1,2,3} represents the type of the combination of query modes, and for the reverse query, Q is equal to 1 to represent that the query image comes from YBThe image to be queried comes from XA(ii) a Q2 indicates that the query image is from YAThe image to be queried comes from XA(ii) a Q-3 indicates that the query image is from YBThe image to be queried comes from XB(ii) a When the reverse query mode combination type is Q, taking the face image with the ID number n under the Y camera as a query image, and taking all M face images under the X camera as queried images;
s3.2: inputting the M +1 face images into a face recognition feature extraction network to obtain respective corresponding face feature vectors; then calculating cosine similarity distances between the feature vectors of the 1 query image and the feature vectors of the M queried images in sequence to obtain M cosine similarity distances; and finally, sequencing the M queried images according to the similarity from large to small to obtain a reverse query sequencing result:
whereinRepresenting the ID number corresponding to the ith ranking of the inquired image under the Y camera in the similarity ranking, wherein i belongs to {1,2, 3.. and M };
s3.3: defining the top k images ranked in the reverse query ranking result Rank (Q, n) obtained in the step S3.2 as reverse query top-k neighbors, and expressing the images as the reverse query top-k neighborsObtaining reverse query bidirectional top-k neighbors based on the reverse query top-k neighbors:
wherein R (Q, n, k) represents a reverse query bi-directional top-k neighbor,taking an image with a corresponding sequence number adjacent to the reverse query top-k as a new query image, and ranking the sequence numbers of the top k in the similarity ranking of all query data set images;
s3.4: and (4) obtaining reverse query credibility based on the reverse query bidirectional top-k neighbor R (Q, n, k) obtained in the step (S3.3):
where | R (Q, n, k) | represents the number of images in the reverse query bi-directional top-k neighbor R (Q, n, k).
Specifically, the top-k neighbor of the reverse query represents the query image n in the query dataset in the reverse query, and the IDs of the top k queried images in the similarity ranking of all queried datasetsxNumber (queried dataset image ID number under X camera). Which is a reverse query one-way top-k neighbor. Forward query bidirectional top-k neighbors are the first k queried image IDs found when top-k is forward queriedxNumber, with them as new query image, against finding the top k IDs of similarity ordering of all query datasetsYNumber (query dataset image under Y camera ID number), if at this time m is just the first k IDsYThe values are as follows. Then the ID will beXAnd adding the query image m into forward query bidirectional top-k neighbors.
Showing that when the query combination is Q, the image n is queried, and the ID of the ith is ranked in the similarity ordering of the queried data setXNumber (n). Such as:that is, the query image n is the query image with the highest value in the similarity rankingXNumber (n). Therefore, the temperature of the molten metal is controlled,represents the IDXNumber image as new query image, top k ID in similarity ranking of all query dataset imagesYNumber (n).
In a specific implementation process, the adopted face recognition feature extraction network is the existing face recognition feature extraction network, and the embodiment adopts an ArcFace network.
In one embodiment, the first multi-modal fusion similarity distance in S4 is calculated as:
whereinRepresents satisfactionThe value of the time j is the sequence number of the face image with the ID number n in Rank (Q, m);represents satisfactionThe value of time i, namely the sequence number of the face image with the ID number m in Rank (Q, n), D1(m, n) represents the first multi-mode fusion between the face image with the ID number m under the X camera and the face image with the ID number n under the Y cameraThe similarity distance.
In one embodiment, the second multi-modal fusion similarity distance in S5 is calculated as:
wherein the function of F is to Q1, 2,3 respectivelySet of constructs andthe constituent set, the elements of the r-th order in ascending order, i.e. for threeAnd threeRespectively sorting from small to large, taking the value of the sorted r, wherein r is an adjustable parameter and has the same value range as Q, and D2(m, n) represents a second multi-modal fusion similarity distance between the face image with the ID number m under the X camera and the face image with the ID number n under the Y camera.
In the specific implementation process, the value range of the adjustable parameter r is the same as that of Q, and in this embodiment, r takes a value of 2.
In one embodiment, the calculation formula of the third multi-modal fusion similarity distance in step S6 is as follows:
D3=λ*D1+(1–λ)*D2
wherein, λ is weight coefficient, the value range is 0 to 1, D1For the first multimodal fusion of similarity distances, D2For a second multimodal fusion of similarity distances, D3The third multi-modal fusion similarity distance.
The weight coefficient can be selected according to the actual situation, and λ is 0.2 in this embodiment.
Regarding the three multi-modal fusion similarity distances involved in the invention, the first multi-modal fusion similarity distance is a linear similarity distance fusion result, and the complementarity of all multi-modal original orderings is fully considered. The second multi-modal fusion similarity distance is a non-linear similarity distance fusion result, and the importance of the original ordering of a certain modality is strengthened while the complementarity among the multiple modalities is considered. The two multi-modal fusion similarity distances have mutual advantages, so that the final multi-modal fusion similarity distance obtained through weighting can improve the re-recognition accuracy.
The related contents of the image translation CycleGan network and the face recognition feature extraction network ArcFace are listed in the documents listed in the actual examination reference.
In one embodiment, step S7 includes:
sequentially calculating a third multi-mode fusion similarity distance between the face image with the ID number m under the X camera and the N face images with the ID numbers N under the Y camera being 1 and 2 … N;
and sequencing the obtained N similarity distances from small to large to obtain a final face query similarity sequencing result of all N face images under the X camera and the Y camera under the face image query with the ID number of m under the X camera, and taking the final face query similarity sequencing result as an image re-identification result.
The invention provides a multi-modal sequencing optimization algorithm for heterogeneous face image re-recognition, which utilizes an image translation network to perform modal conversion on original heterogeneous face images under two cameras, forms different types of query modal combinations with the face images before and after the modal conversion, can effectively utilize the complementarity of the face images among different modalities, and obtains a query sequencing result of which the correct face sequencing is more forward by fusing similarity distances in a multi-modal manner, thereby improving the accuracy of the heterogeneous face re-recognition.
It should be understood that parts of the specification not set forth in detail are well within the prior art.
It should be understood that the above description of the preferred embodiments is given for clarity and not for any purpose of limitation, and that various changes, substitutions and alterations can be made herein without departing from the spirit and scope of the invention as defined by the appended claims.
Claims (8)
1. A multi-modal sequencing optimization method for heterogeneous face image re-recognition is characterized by comprising the following steps:
s1: performing image translation on two heterogeneous face images under two cameras to respectively obtain face images after modal conversion corresponding to the two heterogeneous face images;
s2: the face images before and after mode conversion form different types of query mode combinations, forward query is carried out on the different query mode combinations to obtain corresponding forward query sequencing results and forward query credibility, the forward query represents that the face under a first camera is used as a query image, and the face under a second camera is used as a queried image;
s3: carrying out reverse query on different query mode combinations to obtain corresponding reverse query sequencing results and reverse query credibility, wherein the reverse query represents that the face under the second camera is used as a query image, and the face under the first camera is used as a queried image;
s4: calculating a first multi-mode fusion similarity distance according to a forward query sorting result, a forward query reliability, a reverse query sorting result and a reverse query reliability of different query mode combinations, and utilizing the complementarity of face images among different modes in a linear fusion mode;
s5: calculating a second multi-mode fusion similarity distance according to a forward query sorting result and a reverse query sorting result of different query mode combinations, and utilizing the complementarity of the face images among different modes in a nonlinear fusion mode;
s6: performing weighted fusion on the first multi-modal fusion similarity distance and the second multi-modal fusion similarity distance to obtain a third multi-modal fusion similarity distance;
s7: and performing image re-identification according to the third multi-modal fusion similarity distance to obtain a final face query similarity sequencing result, and obtaining an image re-identification result based on the final face query similarity sequencing result.
2. The multi-modal ranking optimization method for heterogeneous face image re-recognition according to claim 1, wherein the step S1 includes:
s1.1: dividing two heterogeneous face data sets under two cameras into training data setsAndand test data set XAAnd YBWherein X and Y respectively represent two cameras, A and B respectively represent two different image modalities obtained by shooting under the two cameras,m is the ID number of the face image X under the X camera, and M is the number of the face image data sets under the X camera;n is the ID number of the face image Y under the Y camera, and N is the number of the face image data sets under the Y camera;
3. The multi-modal ranking optimization method for heterogeneous face image re-recognition according to claim 1, wherein the step S2 includes:
s2.1: defining the forward query as that the face under the X camera is used as a query image, the face under the Y camera is used as a queried image, Q is equal to {1,2,3} represents the type of the combination of query modes, and for the forward query, Q is equal to 1 and represents that the query image comes from XAThe image to be inquired comes from YB(ii) a Q2 indicates that the query image is from XAThe image to be inquired comes from YA(ii) a Q-3 indicates that the query image is from XBThe image to be inquired comes from YBWhen the forward query mode combination type is Q, taking the face image with the ID number m under the X camera as a query image, and taking all N face images under the Y camera as queried images;
s2.2: inputting N +1 face images into a face recognition feature extraction network to obtain respective corresponding face feature vectors, sequentially calculating feature similarity distances between the feature vectors of 1 query image and the feature vectors of N queried images to obtain N cosine similarity distances, and sequencing the N queried images from large to small according to the similarity to obtain a forward query sequencing result:
wherein the content of the first and second substances,the ID number corresponding to the j-th ranking of the inquired image under the Y camera in the similarity sorting is represented, and j belongs to {1,2, 3.. and N };
s2.3: defining the top k images in the obtained forward query ranking result Rank (Q, m) as forward query top-k neighbors, and expressing the top k images as the forward query top-k neighborsWherein k is an adjustable parameter, the value range of k is 1 to MIN (M, N), and the forward query bidirectional top-k neighbor is obtained based on the forward query top-k neighbor:
wherein R (Q, m, k) represents a forward query bi-directional top-k neighbor,representing that the image of the sequence number corresponding to the top-k neighbor of the forward query is used as a new query image, and the sequence number of the top k in the similarity ranking of all query data set images;
s2.4: and (3) obtaining the forward query credibility based on forward query bidirectional top-k neighbor R (Q, m, k):
wherein, W (Q, m, k) represents the reliability of forward query, | R (Q, m, k) | represents the image quantity of forward query bidirectional top-k neighbor R (Q, m, k).
4. The multi-modal ranking optimization method for heterogeneous face image re-recognition according to claim 1, wherein the step S3 includes:
s3.1: defining the reverse query as that the face under the Y camera is used as a query image, the face under the X camera is used as a queried image, Q is belonged to {1,2,3} represents the type of the combination of query modes, and for the reverse query, Q is equal to 1 to represent that the query image comes from YBThe image to be queried comes from XA(ii) a Q2 indicates that the query image is from YAThe image to be queried comes from XA(ii) a Q-3 indicates that the query image is from YBThe image to be queried comes from XB(ii) a When the reverse query mode combination type is Q, taking the face image with the ID number n under the Y camera as a query image, and taking all M face images under the X camera as queried images;
s3.2: inputting the M +1 face images into a face recognition feature extraction network to obtain respective corresponding face feature vectors; then calculating cosine similarity distances between the feature vectors of the 1 query image and the feature vectors of the M queried images in sequence to obtain M cosine similarity distances; and finally, sequencing the M queried images according to the similarity from large to small to obtain a reverse query sequencing result:
whereinRepresenting the ID number corresponding to the ith ranking of the inquired image under the Y camera in the similarity ranking, wherein i belongs to {1,2, 3.. and M };
s3.3: defining the top k images ranked in the reverse query ranking result Rank (Q, n) obtained in the step S3.2 as reverse query top-k neighbors, and expressing the images as the reverse query top-k neighborsObtaining reverse query bidirectional top-k neighbors based on the reverse query top-k neighbors:
wherein R (Q, n, k) represents a reverse query bi-directional top-k neighbor,taking an image of the corresponding sequence number of the top-k neighbor which is inquired reversely asNew query images, the sequence number of the top k in the similarity ranking of all query data set images;
s3.4: and (4) obtaining reverse query credibility based on the reverse query bidirectional top-k neighbor R (Q, n, k) obtained in the step (S3.3):
wherein, W (Q, n, k) represents the reliability of the reverse query, | R (Q, n, k) | represents the image quantity of the reverse query bidirectional top-k neighbor R (Q, n, k).
5. The multi-modal ranking optimization method for the re-recognition of the heterogeneous face images as claimed in claim 1, wherein the first multi-modal fusion similarity distance in S4 is calculated as:
whereinRepresents satisfactionThe value of the time j is the sequence number of the face image with the ID number n in Rank (Q, m);represents satisfactionThe value of time i, namely the sequence number of the face image with the ID number m in Rank (Q, n), D1(m, n) represents a first multi-mode fusion similarity distance between the face image with the ID number m under the X camera and the face image with the ID number n under the Y camera.
6. The multi-modal ranking optimization method for the re-recognition of the heterogeneous face images as claimed in claim 1, wherein the second multi-modal fusion similarity distance in S5 is calculated by the following formula:
wherein the function of F is to Q1, 2,3 respectivelySet of constructs andthe constituent set, the elements of the r-th order in ascending order, i.e. for threeAnd threeRespectively sorting from small to large, taking the value of the sorted r, wherein r is an adjustable parameter and has the same value range as Q, and D2(m, n) represents a second multi-modal fusion similarity distance between the face image with the ID number m under the X camera and the face image with the ID number n under the Y camera.
7. The multi-modal ranking optimization method for the face image recognitions of the heterogeneous object as claimed in claim 1, wherein the third multi-modal fusion similarity distance in step S6 is calculated by the following formula:
D3=λ*D1+(1–λ)*D2
wherein, λ is weight coefficient, the value range is 0 to 1, D1For the first multimodal fusion of similarity distances, D2For a second multimodal fusion of similarity distances, D3The third multi-modal fusion similarity distance.
8. The multi-modal ranking optimization method for heterogeneous face image re-recognition according to claim 1, wherein the step S7 includes:
sequentially calculating a third multi-mode fusion similarity distance between the face image with the ID number m under the X camera and the N face images with the ID numbers N under the Y camera being 1 and 2 … N;
and sequencing the obtained N similarity distances from small to large to obtain a final face query similarity sequencing result of all N face images under the X camera and the Y camera under the face image query with the ID number of m under the X camera, and taking the final face query similarity sequencing result as an image re-identification result.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111542116.4A CN114283471B (en) | 2021-12-16 | 2021-12-16 | Multi-mode ordering optimization method for heterogeneous face image re-recognition |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111542116.4A CN114283471B (en) | 2021-12-16 | 2021-12-16 | Multi-mode ordering optimization method for heterogeneous face image re-recognition |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114283471A true CN114283471A (en) | 2022-04-05 |
CN114283471B CN114283471B (en) | 2024-04-02 |
Family
ID=80872469
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111542116.4A Active CN114283471B (en) | 2021-12-16 | 2021-12-16 | Multi-mode ordering optimization method for heterogeneous face image re-recognition |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114283471B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115690556A (en) * | 2022-11-08 | 2023-02-03 | 河北北方学院附属第一医院 | Image recognition method and system based on multi-modal iconography characteristics |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2016110005A1 (en) * | 2015-01-07 | 2016-07-14 | 深圳市唯特视科技有限公司 | Gray level and depth information based multi-layer fusion multi-modal face recognition device and method |
CN111126249A (en) * | 2019-12-20 | 2020-05-08 | 深圳久凌软件技术有限公司 | Pedestrian re-identification method and device combining big data and Bayes |
CN112926557A (en) * | 2021-05-11 | 2021-06-08 | 北京的卢深视科技有限公司 | Method for training multi-mode face recognition model and multi-mode face recognition method |
US20210224313A1 (en) * | 2019-03-11 | 2021-07-22 | Boe Technology Group Co., Ltd. | Reverse image search method, apparatus and application system |
-
2021
- 2021-12-16 CN CN202111542116.4A patent/CN114283471B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2016110005A1 (en) * | 2015-01-07 | 2016-07-14 | 深圳市唯特视科技有限公司 | Gray level and depth information based multi-layer fusion multi-modal face recognition device and method |
US20210224313A1 (en) * | 2019-03-11 | 2021-07-22 | Boe Technology Group Co., Ltd. | Reverse image search method, apparatus and application system |
CN111126249A (en) * | 2019-12-20 | 2020-05-08 | 深圳久凌软件技术有限公司 | Pedestrian re-identification method and device combining big data and Bayes |
CN112926557A (en) * | 2021-05-11 | 2021-06-08 | 北京的卢深视科技有限公司 | Method for training multi-mode face recognition model and multi-mode face recognition method |
Non-Patent Citations (2)
Title |
---|
王锟朋;高兴宇;: "基于附加间隔Softmax特征的人脸聚类算法", 计算机应用与软件, no. 02, 12 February 2020 (2020-02-12), pages 117 - 123 * |
胡辉等: "Mutimodal Ranking Optimization for Heterogeneous Face Re-identification", SLSEVIER, 31 December 2022 (2022-12-31), pages 1 - 7 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115690556A (en) * | 2022-11-08 | 2023-02-03 | 河北北方学院附属第一医院 | Image recognition method and system based on multi-modal iconography characteristics |
CN115690556B (en) * | 2022-11-08 | 2023-06-27 | 河北北方学院附属第一医院 | Image recognition method and system based on multi-mode imaging features |
Also Published As
Publication number | Publication date |
---|---|
CN114283471B (en) | 2024-04-02 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110188227B (en) | Hash image retrieval method based on deep learning and low-rank matrix optimization | |
KR102385463B1 (en) | Facial feature extraction model training method, facial feature extraction method, apparatus, device and storage medium | |
CN112651262B (en) | Cross-modal pedestrian re-identification method based on self-adaptive pedestrian alignment | |
CN110717099B (en) | Method and terminal for recommending film | |
WO2011089872A1 (en) | Image management device, image management method, program, recording medium, and integrated circuit | |
CN109871821B (en) | Pedestrian re-identification method, device, equipment and storage medium of self-adaptive network | |
CN111597298A (en) | Cross-modal retrieval method and device based on deep confrontation discrete hash learning | |
CN107590505B (en) | Learning method combining low-rank representation and sparse regression | |
CN113239801B (en) | Cross-domain action recognition method based on multi-scale feature learning and multi-level domain alignment | |
CN112507853B (en) | Cross-modal pedestrian re-recognition method based on mutual attention mechanism | |
CN114461839B (en) | Multi-mode pre-training-based similar picture retrieval method and device and electronic equipment | |
CN109766455B (en) | Identified full-similarity preserved Hash cross-modal retrieval method | |
CN113918764B (en) | Movie recommendation system based on cross-modal fusion | |
CN111985520A (en) | Multi-mode classification method based on graph convolution neural network | |
CN112926675B (en) | Depth incomplete multi-view multi-label classification method under double visual angle and label missing | |
Wang et al. | Aspect-ratio-preserving multi-patch image aesthetics score prediction | |
CN109472282B (en) | Depth image hashing method based on few training samples | |
CN116610778A (en) | Bidirectional image-text matching method based on cross-modal global and local attention mechanism | |
CN114283471A (en) | Multi-modal sequencing optimization method for heterogeneous face image re-recognition | |
CN114998777A (en) | Training method and device for cross-modal video retrieval model | |
CN116612324A (en) | Small sample image classification method and device based on semantic self-adaptive fusion mechanism | |
CN111026910A (en) | Video recommendation method and device, electronic equipment and computer-readable storage medium | |
CN117351518A (en) | Method and system for identifying unsupervised cross-modal pedestrian based on level difference | |
CN116956128A (en) | Hypergraph-based multi-mode multi-label classification method and system | |
US20220165055A1 (en) | Information processing apparatus, information processing method, and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |