CN112269889A

CN112269889A - Interactive method, client and system for searching difficult portrait

Info

Publication number: CN112269889A
Application number: CN202011010994.7A
Authority: CN
Inventors: 王茜; 刘民
Original assignee: SHANGHAI CRIMINAL SCIENCE TECHNOLOGY RESEARCH INSTITUTE
Current assignee: SHANGHAI CRIMINAL SCIENCE TECHNOLOGY RESEARCH INSTITUTE
Priority date: 2020-09-23
Filing date: 2020-09-23
Publication date: 2021-01-26
Anticipated expiration: 2040-09-23
Also published as: CN112269889B

Abstract

The invention discloses an interactive problematic portrait retrieval method, a client and a system, and relates to the technical field of portrait identification retrieval. The method comprises the following steps: acquiring an image, and converting the image into an image representation set based on semantic attribute labels; receiving a query input by a user, and acquiring semantic attribute information selected by the user aiming at the query; according to the significance semantic attributes of the selection marks of the user, classifying the image representation set into a significance attribute image set and a non-significance attribute image set according to whether the images have significance semantic attributes, respectively classifying and ordering the significance attribute image set and the non-significance attribute image set, and generating an initial candidate ordering queue according to an ordering result; and acquiring a final target determined by the user through mixed similarity re-ordering interactive retrieval. The invention has reasonable manual workload, high retrieval success rate, high convergence rate and wide application range, and is particularly suitable for difficult portrait retrieval in the public security industry.

Description

Interactive method, client and system for searching difficult portrait

Technical Field

The invention relates to the technical field of portrait identification and retrieval, in particular to an interactive method, a client and a system for retrieving a difficult portrait.

Background

The portrait retrieval (also called portrait picture retrieval) method mainly adopts a manual identification method and/or a portrait identification technology to retrieve the portrait in the picture data to obtain a retrieval result. The key of the portrait retrieval technology lies in content analysis and understanding of the portrait pictures, namely semantic information is extracted from the portrait pictures by a machine vision method for retrieval, the semantic information reflects the content of the portrait and also forms characteristics according to the portrait retrieval, and a retrieval algorithm carries out ranking of retrieval results based on the similarity of the semantic characteristics. For portrait picture retrieval, the semantic features are required to describe the contents of the portrait pictures as accurately and in detail as possible, so that the retrieval result is ensured to be more consistent with the query requirements of users; meanwhile, the large-scale data to be processed is searched, so that the semantic features are required to be easy to extract, and too many computing resources are not consumed.

In recent years, portrait identification technology has been widely applied in the public security field, but many portrait elements that determine case detection in the professional criminal investigation field, such as suspect simulation portraits, witness head impressions (mental images), reviewer skull images, descriptor semantic feature descriptions, and the like, fall into the dilemma of relying only on manual screening because conventional portrait comparison and semantic attribute retrieval cannot be performed. Meanwhile, in a large number of video investigation applications, many large-side, partial image defects, super-resolution and other materials which do not meet the requirements of portrait identification and retrieval quality are often rejected by the existing portrait retrieval system. The above-mentioned non-standard, non-conventional portrait retrieval may be collectively classified as "difficult portrait retrieval". Because research aiming at the problem of difficult portrait retrieval is relatively few, and research in the aspect of special corresponding algorithm is almost blank, the traditional difficult portrait retrieval can only achieve the purpose of retrieval by correcting 'materials' through image processing methods such as face angle correction, super-resolution image clarification processing, portrait complement and the like, and labor is consumed, and the effect is poor.

At present, some technical solutions suitable for searching for difficult and complicated figures are provided in the prior art, for example, a Heterogeneous figure (Heterogeneous Image) comparison method is implemented by drawing all non-standard images and head impressions as a simulation portrait. However, the retrieval effect of the above method is highly dependent on the description ability of the witness and the professional level of the renderer, and the accuracy and stability of the retrieval are difficult to guarantee. As another example, the prior art also provides methods for interactive search techniques that use human intelligence to "inform" the machine operator of trends to highlight the most relevant results according to user needs. On one hand, however, due to the difference of the "sensitive properties" of human vision and machine vision, a large "semantic gap" exists between human vision and machine vision, which causes great trouble to the image recognition work; on the other hand, the current interactive search technology is usually an interactive search method based on random candidate (candidate) selection, which requires a large number of manual operations, and the cycle times are greatly increased in medium-large scale or similar sample sets, even falling into a state of no solution.

In summary, how to provide an interactive method for searching a problematic portrait based on the prior art, which has reasonable workload, high success rate of searching and fast convergence speed, is a technical problem that needs to be solved urgently at present.

Disclosure of Invention

The invention aims to: the defects of the prior art are overcome, and an interactive method, a client and a system for searching the problematic portrait are provided. The invention provides a portrait retrieval method, which comprises the steps of converting an image into an image representation set of semantic attribute labels based on the semantic attribute labels, marking significant semantic attributes through first man-machine interaction, classifying and sequencing the image representation set into a significant attribute image set and a non-significant image set, and acquiring a next candidate sequencing queue; and then, circularly carrying out the layer-by-layer man-machine interaction, and re-sequencing according to the manual selection result of the user to generate a candidate sequencing queue of the next cycle until the operator confirms to finish. The invention has reasonable manual workload, high retrieval success rate, high convergence rate and wide application range, and is particularly suitable for difficult portrait retrieval in the public security industry.

In order to achieve the above object, the present invention provides the following technical solutions:

a method for interactive problematic portrait retrieval, comprising the steps of:

step 100, acquiring an image, and converting the image into an image representation set based on semantic attribute labels;

step 200, receiving a query input by a user, and acquiring semantic attribute information selected by the user aiming at the query; according to the significance semantic attributes of the selection marks of the user, classifying the image representation set into a significance attribute image set and a non-significance attribute image set according to whether the images have significance semantic attributes, respectively classifying and ordering the significance attribute image set and the non-significance attribute image set, and generating an initial candidate ordering queue according to an ordering result;

300, obtaining a final target determined by a user through mixed similarity re-ordering interactive retrieval; the method comprises the steps of obtaining selection information of a user through the layer-by-layer man-machine interaction, conducting mixed similarity reordering on a candidate ordering queue aiming at each selection, generating a candidate ordering queue of the next cycle for the user to reselect, and ending the cycle after obtaining a final target determined by the user.

Furthermore, the semantic attribute labels are portrait semantic attribute labels standardized in the public security industry, and after a multi-label learning neural network is built according to the portrait semantic attribute labels standardized in the public security industry, image data are converted into an image representation set Y based on the semantic attribute labels in the public security industry through the multi-label learning neural network.

Further, in the process of the layer-by-layer man-machine interaction, the judgment of a user is assisted by setting machine vision recognition.

Further, the specific steps of converting to form the image representation set Y include:

let I training image set X be { X_i|x_i∈X^vI is more than or equal to 1 and less than or equal to I, and v is the dimensionality of the image vector; let L industry standardized semantic attribute sets Z be { Z_l|z_lBelongs to Z }, wherein L is more than or equal to 1 and less than or equal to L; converting the training image set into a v × L dimensional image representation set Y of { Y } based on semantic attribute labels through an MLCNN neural network_i，l|y_i，l∈Y^v×LAnd (c) the step of (c) in which,

y_i，l＝Rep(x_i)(1≤i≤I，1≤l≤L)，

wherein the function Rep (-) is from XV to Y^v×LThe transformation function of (a);

in the case where the contribution parameter δ (i, l) of each semantic attribute is consistent, the image x_iScore function Score (x) based on semantic attributes_i) Is y_i，lSum of individual attribute loss functions C (i, l), Score (x)_i) The calculation formula of (a) is as follows,

wherein C (i, l) represents the softmax multi-label loss function value of each valid semantic attribute, the calculation formula of C (i, l) is as follows,

according to Score (x)_i) Generating an attribute score matrix S for X_iObtaining each image x_i1And x_i2Distance function Dis (x) based on attribute_i1-x_i2) The calculation formula is as follows,

Dis(x_i1-x_i2)＝(S_i1-S_i2)^T(Si₁-S_i2)，

in the formula, S_i1Is x_i1Matrix of attribute scores of S_i2Is x_i2The attribute score matrix of (2).

Further, in step 200, the semantic attribute information includes names and attribute values of semantic attributes, and the step of obtaining the semantic attribute information selected by the user for the query includes:

extracting the characteristics of the query according to the query input by the user;

outputting a series of attribute values corresponding to the semantic attributes on a terminal display structure according to the query characteristics for selection by a user;

and acquiring the attribute value of the semantic attribute selected by the user.

Further, in step 200, the step of generating an initial candidate ranking queue comprises:

obtaining the semantic attribute set G selected by the user_Q＝{g₁，g₂，…g_sClassifying all semantic attributes in the L into significant attributes and non-significant attributes, and classifying Z of a semantic attribute set Z_lIs adjusted to

And setting a significance attribute judgment function

Comprises the following steps:

according to whether the image has the saliency attribute

Classifying a set of image representations Y as including a saliency attribute set of image representations

And a set of non-salient attribute image representations

Can be known as Y⁺∩Y^-；

To highlight the role of the saliency attributes, Score (x) was adjusted_i) The attribute contribution value δ (l) in the calculation formula of (1) is adjusted as follows:

in the formula (I), the compound is shown in the specification,

indicating that the semantic attribute is judged to be significant,

representing that the semantic attribute is judged to be non-significance; r represents the number of cycles of interaction; alpha (r) represents that delta (l) selects different attribute contribution values when the semantic attribute is significant or the semantic attribute is non-significant; wherein the initial value of α (r) is set to α (0) to 0.9,

and α (r +1) ═ min (0.5, α (0) -0.05r) (r.gtoreq.0);

the adjusted delta (l) is brought into the Score (x)_i) The calculation formula of (a) is recalculated to obtain two target-based Q_rThe queues sorted according to the reverse order of the distance are respectively a significance semantic attribute queue RankA (Q)_r，A_r)∈Y⁺And an unnoticeable semantic attribute queue RankB (Q)_r，B_r)∈Y^-In the formula, Q_rA selection target representing the r-th manual interaction;

then, respectively acquiring the Top t of RankA through Top (-) bit taking function_aBit sum RankB front t_bThe bits form a candidate sort queue candidate (r), which is:

Candidate(r)＝Top(RankA(Q_r，A_r)，t_a)∩Top(RankB(Q_r，B_r)，t_b)

where Candidate (0) represents the initial Candidate sort queue.

Further, in step 300, the step of performing mixed similarity re-ordering interactive search includes:

obtaining the (r +1) th manual Choice of Choice (r +1), and extracting the integral composite feature F based on the LBP-HSV feature fusion operator for the Choice (r +1) belonging to Candidate (r)_uAnd generating a fusion feature similarity distance matrix A based on a KISSME method as follows:

DisK(x_i1-x_i2)＝(A_i1-A_i2)^T(V_in ^-1-V_out ^-1)(A_i1-A_i2)，

in the formula, V_in ^-1And V_out ^-1Respectively representing intra-class and inter-class in the KISSME algorithmA probability likelihood matrix; a. the_i1Is x_i1A fused feature similarity distance matrix of_i2Is x_i2The fused feature similarity distance matrix of (1);

according to Dis (x)_i1-x_i2) And DisK (x)_i1-x_i2) The fusion distance function is obtained by the following calculation formula: d (x)_i1-x_i2)＝(1-μ)Dis(x_i1-x_i2)+μDisK(x_i1-x_i2)，

In the formula, mu is an assistant feature weight value; the value of μ is based on the manual selection of Q_r+1Whether Top (RankA (Q)) falls in the previous round of candidate queue Candidate (r)_r，A_r)，t_a) Partially set; when Q is manually selected_r+1Fall on Top (RankA (Q)_r，A_r)，t_a) In the middle time, the retrieval effect based on the significance attribute is good, the reordering strategy 1 is executed, and RankB (Q) is used_r，B_r) Discard and discard RankA (Q)_r，A_r) Dividing each of the front and rear halves into RankA (Q)_r+1，A_r+1) And RankB (Q)_r+1，B_r+1) When μ is set to 0.1; otherwise, perform reordering strategy 2, will RankB (Q)_r，B_r) The latter half is discarded and the first half is added with RankA (Q)_r，A_r) And assigning values to RankA (Q) according to the generated reverse order before and after each half score_r+1，A_r+1) And RankB (Q)_r+1，B_r+1) In this case, μ is set to 0.5.

Further, before the reordering step of each loop, a candidate cluster transformation operation for pair Top (RankA (Q) by the following formula is performed to avoid invalid loops and accelerate convergence_r，A_r)，t_a) And (4) clustering:

Top′(RankA(Q_r，A_r)，t_a)＝K_mean(Top(RankA(Q_r，A_r)，t_a))，

in the formula, K_mean(. cndot.) is a clustering function, which means that Top (RankA (Q)_r，A_r)，t_a) As initial centroid, for RankA(Q_r，A_r) Performing k-means clustering operation on the set to obtain a new centroid nearest attribute set Top' (RankA (Q)_r，A_r)，t_a)。

The invention also provides an interactive problematic portrait retrieval client, which comprises the following structure:

the initialization module is used for acquiring an image and converting the image into an image representation set based on semantic attribute labels;

the information acquisition module is used for receiving a query input by a user and acquiring semantic attribute information selected by the user aiming at the query;

the information processing module is used for classifying the image representation set into a significant attribute image set and a non-significant attribute image set according to whether the image has significant semantic attributes or not according to the significance semantic attributes of the selective marker of the user, respectively classifying and ordering the significant attribute image set and the non-significant attribute image set, and generating an initial candidate ordering queue according to an ordering result; and obtaining a final target determined by a user through mixed similarity re-ordering interactive retrieval; the method comprises the steps of obtaining selection information of a user through the hierarchical human-computer interaction, conducting mixed similarity reordering on a candidate ordering queue aiming at each selection, generating a candidate ordering queue of the next cycle for the user to reselect, and ending the cycle when a final target determined by the user is obtained.

The invention also provides an interactive problematic portrait retrieval system, which comprises a user terminal and a server end;

the user terminal is provided with a human-computer interaction interface, and query information and selection information input by a user are collected through the human-computer interaction interface;

the server side includes a processor and a memory for storing processor-executable instructions and parameters, the processor configured to:

acquiring an image, and converting the image into an image representation set based on semantic attribute labels; and the number of the first and second groups,

according to a query input by a receiving user, semantic attribute information selected by the user aiming at the query is obtained; according to the significance semantic attributes of the selection marks of the user, classifying the image representation set into a significance attribute image set and a non-significance attribute image set according to whether the images have significance semantic attributes, respectively classifying and ordering the significance attribute image set and the non-significance attribute image set, and generating an initial candidate ordering queue according to an ordering result; and the number of the first and second groups,

obtaining a final target determined by a user through mixed similarity re-ordering interactive retrieval; the method comprises the steps of obtaining selection information of a user through the layer-by-layer man-machine interaction, conducting mixed similarity reordering on a candidate ordering queue aiming at each selection, generating a candidate ordering queue of the next cycle for the user to reselect, and ending the cycle when a final target determined by the user is obtained.

Due to the adoption of the technical scheme, compared with the prior art, the invention has the following advantages and positive effects as examples: after an image is converted into an image representation set of each semantic attribute label based on the semantic attribute labels, the significant semantic attributes are marked through first human-computer interaction, the image representation set is divided into a significant attribute image set and a non-significant image set, then classification and sequencing are carried out on the image representation sets, and a next candidate sequencing queue is obtained; and then, circularly carrying out the layer-by-layer man-machine interaction, and re-sequencing according to the manual selection result of the user to generate a candidate sequencing queue of the next cycle until the operator confirms to finish. The invention has reasonable manual workload, high retrieval success rate, high convergence rate and wide application range, is particularly suitable for difficult portrait retrieval application in the public security industry, and can be combined with simulated portrait retrieval, suspect identification, dictation portrait drawing and the like according to requirements.

On one hand, a user can set portrait semantic attributes based on classification of standard national standards of the public security industry (the 24 th part of criminal information management codes and the 24 th part of codes GA 240.24-2003, and the 3 rd part of GA 240.3-2000 criminal information management codes: body surface special mark codes) according to needs of the public security industry, and a portrait image expression set based on the semantic attributes of the public security industry is generated through a multi-label classification neural network.

On the other hand, the invention can give consideration to sensitive semantic attributes of human vision (such as length, height, position, inclination and the like of portrait parts) and sensitive attributes of machine vision (such as size, mutual distance and the like of parts) through human-computer interaction, and reduces semantic gap which often puzzles between human eyes and machine vision in image recognition work from the aspect of semantic attribute design.

Meanwhile, in the human-computer interaction, the manual selection of the user is converted into the significant visual interest attributes, and the weight classification of each semantic attribute is realized, so that different database weight sorting strategies can be implemented, and the calculation amount of each interactive retrieval can be remarkably reduced. Practice calculation shows that the technical scheme provided by the invention can reduce the calculation amount of each interactive retrieval by one fourth and reduce the operation matrix after the previous 5 times of retrieval to 23% of the original matrix.

On the other hand, each cycle judges the effect of the previous manual selection, and reduces the error of the manual selection by combining the auxiliary judgment function of machine vision. Compared with the traditional interactive retrieval method based on random candidate selection, the manual operation frequency is reduced by 200%, and the problem that the cycle frequency of the traditional interactive system in medium and large scale or similar sample set is greatly increased or no solution is caused is solved.

Drawings

Fig. 1 is a schematic information processing diagram of an interactive problematic person retrieval method according to an embodiment of the present invention.

Fig. 2 is a block diagram of a client according to an embodiment of the present invention.

Fig. 3 is a block diagram of a system according to an embodiment of the present invention.

Description of reference numerals:

the system comprises a client 200, an initialization module 210, an information acquisition module 220 and an information processing module 230;

system 300, user terminal 310, server 320.

Detailed Description

The following describes the interactive problematic portrait retrieval method, client and system disclosed in the present invention in further detail with reference to the accompanying drawings and specific embodiments. It should be noted that technical features or combinations of technical features described in the following embodiments should not be considered as being isolated, and they may be combined with each other to achieve better technical effects. In the drawings of the embodiments described below, the same reference numerals appearing in the respective drawings denote the same features or components, and may be applied to different embodiments. Thus, once an item is defined in one drawing, it need not be further discussed in subsequent drawings.

It should be noted that the structures, proportions, sizes, and other dimensions shown in the drawings and described in the specification are only for the purpose of understanding and reading the present disclosure, and are not intended to limit the scope of the invention, which is defined by the claims, and any modifications of the structures, changes in the proportions and adjustments of the sizes and other dimensions, should be construed as falling within the scope of the invention unless the function and objectives of the invention are affected. The scope of the preferred embodiments of the present invention includes additional implementations in which functions may be executed out of order from that described or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the embodiments of the present invention.

Techniques, methods, and apparatus known to those of ordinary skill in the relevant art may not be discussed in detail but are intended to be part of the specification where appropriate. In all examples shown and discussed herein, any particular value should be construed as merely illustrative, and not limiting. Thus, other examples of the exemplary embodiments may have different values.

Examples

The invention provides an interactive method for searching a problematic portrait, which comprises the following steps:

and step 100, acquiring an image, and converting the image into an image representation set based on semantic attribute labels.

In this embodiment, preferably, the semantic attribute tag is a portrait semantic attribute tag standardized in the public security industry. Specifically, after a multi-tag learning neural network is constructed according to the standardized portrait semantic attribute tags in the public security industry, image data is converted into an image representation set Y based on the semantic attribute tags in the public security industry through the multi-tag learning neural network.

Preferably, the semantic attributes are portrait semantic attributes and at least include eyebrow information and feature label information, the eyebrow information includes eyebrow shape, density, length, width, relative eyebrow direction and eyebrow spacing information, and the feature label includes category, size, position, orientation and quantity information. Specific portrait semantic attributes are shown in the table below.

Step 200, receiving a query input by a user, and acquiring semantic attribute information selected by the user aiming at the query; and according to the significance semantic attribute of the selection mark of the user, classifying the image representation set into a significance attribute image set and a non-significance attribute image set according to whether the image has the significance semantic attribute, respectively classifying and ordering the significance attribute image set and the non-significance attribute image set, and generating an initial candidate ordering queue according to an ordering result.

The semantic attributes can be classified into salient attributes and non-salient attributes according to the selection mark salient semantic attributes of the user. Further, the image representation set Y can be divided into a significant attribute image set and a non-significant attribute image set according to whether the images have significant attributes.

In this embodiment, preferably, the semantic attribute information includes names and attribute values of semantic attributes. In this case, the step of obtaining semantic attribute information selected by the user for the query includes:

In the embodiment, in the process of the layer-by-layer human-computer interaction, the judgment of a user can be assisted by setting machine vision recognition, so that the manual error is reduced.

The present embodiment is described in detail below with reference to fig. 1.

In step 100, the specific steps of converting the image representation set Y may be as follows.

Let I training image set X be { X_i|x_i∈X^vAnd h, wherein I is more than or equal to 1 and less than or equal to I, and v is the dimension of the image vector.

Let L industry standardized semantic attribute sets Z be { Z_l|z_lBelongs to Z, wherein L is more than or equal to 1 and less than or equal to L.

Then, the training image set is converted into an image representation set Y of v × L dimension based on semantic attribute labels through an MLCNN (Multi-1 abel Classification relational Neural Network) Neural Network, wherein the image representation set Y is { Y }_i，l|y_i，l∈Y^v×LAnd (c) the step of (c) in which,

y_i，l＝Rep(x_i)(1≤i≤I，1≤l≤L) (1)

in the formula, the function Rep (-) is from X^vTo Y^v×LThe transformation function of (2).

Image x under the assumption that the contribution parameter δ (i, l) of each semantic attribute is consistent_iScore function Score (x) based on semantic attributes_i) Is y_i，lSum of individual attribute loss functions C (i, l), Score (x)_i) The calculation formula of (a) is as follows:

in the formula, C (i, l) represents the softmax multi-label loss function value of each valid semantic attribute, and the calculation formula of C (i, l) is as follows:

according to Score (x)_i) An attribute score matrix S for X may be generated_iThereby generating each image x_i1And x_i2Distance function Dis (x) based on attribute_i1-x_i2) The calculation formula is as follows:

Dis(x_i1-x_i2)＝(S_i1-S_i2)^T(S_i1-S_i2) (5)

S_i1is x_i1Matrix of attribute scores of S_i2Is x_i2The attribute score matrix of (2).

Step 200 is used to implement interactive saliency attribute acquisition and preliminary ordering.

Firstly, according to different categories of input Q input by a user (namely an operator), corresponding to the input Q, a series of attribute values corresponding to semantic attributes can be listed on a machine interface of a terminal display structure for the user to select.

The interactor may only select one or more of the attribute values of the impressively defined semantic attributes and may leave aside, i.e., make no selection, other relatively ambiguous, indeterminate semantic attribute options.

And setting a significance attribute judgment function

Comprises the following steps:

then, according to whether the image has the saliency attribute

The set of image representations Y may be classified as containing a saliency attribute set of image representations

And a set of non-salient attribute image representations

Can be known as Y⁺∩Y^-。

In this embodiment, in order to highlight the effect of the saliency attribute, Score (x) may be adjusted_i) To add the saliency attribute, the attribute contribution value δ (l) in equation (2) is adjusted as follows:

in the formula (6), the first and second groups,

indicating that the semantic attribute is judged to be significant,

indicating that the semantic attribute is judged to be non-significant. r represents the number of cycles of interaction. Alpha (r) indicates that delta (l) chooses a different attribute contribution value when the semantic attribute is salient or the semantic attribute is non-salient.

In a preferred embodiment, the initial value α (0) is set to 0.9,

and α (r +1) ═ min (0.5, α (0) -0.05r) (r.gtoreq.0).

Substituting the formula (6) into the formula (5) for recalculation to obtain two queues sorted in reverse order of distance based on the target Qr,respectively, a significance semantic attribute queue RankA (Q)_r，A_r)∈Y⁺And an unnoticeable semantic attribute queue RankB (Q)_r，B_r)∈Y^-In the formula, Qr represents the selection target of the r-th manual interaction.

Candidate(r)＝Top(RankA(Q_r，A_r)，t_a)∩Top(RankB(Q_r，B_r)，t_b) (7)

where Candidate (0) represents the initial Candidate sort queue.

In the present embodiment, preferably, t_a15 and t_b＝5。

And 300, obtaining a final target determined by the user through mixed similarity re-ordering interactive retrieval. In the step, in order to reduce manual errors as much as possible, auxiliary judgment of machine vision is added.

First, the (r +1) th manual selection Choice (r +1) is obtained.

For each manual Choice of Choice (r +1) belonging to Candidate (r), extracting the integral composite feature F based on LBP-HSV (namely Loca1 Binary Pattern-HueScattering Value) feature fusion operator_uGenerating a fusion feature similarity distance matrix A based on a KISSME (keep it Simple and straight forward metric) method as follows:

DisK(x_i1-x_i2)＝(A_i1-A_i2)^T(V_in ^-1-V_out ^-1)(A_i1-A_i2) (8)

in the formula, V_in ^-1And V_out ^-1Respectively representing intra-class and inter-class probability likelihood matrices in the KISSME algorithm. A. the_i1Is x_i1A fused feature similarity distance matrix of_i2Is x_i2The fused feature similarity distance matrix of (1).

Then, a fusion distance function is generated according to equation (5) and equation (8), as follows:

D(x_i1-x_i2)＝(1-μ)Dis(x_i1-x_i2)+μDisK(x_i1-x_i2) (9)

where μ is the assist feature weight value.

In this embodiment, we simplify the setting of this weight μ, and set μ according to different reordering strategies.

Specifically, Q may be selected manually_r+1Whether it falls in the previous round of candidate queue Candidate (r) in equation (7) or not Top (RankA (Q)_r，A_r)，t_a) Are partially set.

When Q is manually selected_r+1Fall on Top (RankA (Q)_r，A_r)，t_a) In the middle, it is shown that the search based on the saliency attribute is effective, and the reordering policy 1 is executed, and μ is set to 0.1.

The reordering strategy 1 is: mixing RankB (Q)_r，B_r) Discarding, RankA (Q, A)_r) Dividing each of the front and rear halves into RankA (Q)_r+1，A_r+1) And RankB (Q)_r+1，B_r+1). Then, the search is performed.

On the contrary, when Q is manually selected_r+1Does not fall on Top (RankA (Q)_r，A_r)，t_a) When the search result is middle, it indicates that the search result based on the saliency attribute is not good, the re-ranking policy 2 is executed, and μ is set to 0.5.

The reordering strategy 2 is: mixing RankB (Q)_r，B_r) The latter half of (A) is discarded, RankB (Q)_r，B_r) Adding RankA (Q) to the first half of_r，A_r) And assigning values to RankA (Q) according to the semi-scores before and after the reverse order arrangement generated by the formula (8)_r+1，A_r+1) And RankB (Q)_r+1，B_r+1). Then, the search is performed.

By adopting the reordering strategy, the calculation amount of each cycle can be effectively reduced, and the convergence is further accelerated.

Preferably, to avoid invalid cycles and accelerate convergence, a candidate clustering transformation operation can be added before the reordering step of each cycle. In particularThe formula is shown below for Top (RankA (Q)_r，A_r)，t_a) And (4) clustering:

Top′(RankA(Q_r，A_r)，t_a)＝K_mean(Top(RankA(Q_r，A_r)，t_a)) (10)

in the formula, K_mean(. cndot.) is a clustering function, which means that Top (RankA (Q)_r，A_r)，t_a) As initial centroid, for RankA (Q)_r，A_r) Performing k-means clustering operation on the set to obtain a new centroid nearest attribute set Top' (RankA (Q)_r，A_r)，t_a)。

Thus, the generation of Candidate (r), candidate cluster transformation, and Q acquisition are continuously performed_rGenerating RankA (Q, A)_r) And generating cycles of Candidate (r +1) operations until the user determines to find the final target, terminating the cycles, and outputting final target information through a terminal display structure.

According to the technical scheme, the defects caused by only depending on semantic attribute identification are avoided, the size of the matrix can be continuously reduced, the consumption of an interactive circulation algorithm at each time is avoided, the identification effect is improved, the effect of quickly retrieving the target is achieved, and the method is particularly suitable for retrieving image sets of medium and large scales.

Based on the application requirements of difficult portrait retrieval, the invention introduces the standardized portrait semantic attributes of the public security industry (or called standardized portrait semantic attributes of the public security industry) and realizes the line-standardized (industry-standardized) portrait semantic attribute classification through a multi-label classification neural network, then distinguishes the significant semantic attributes and the non-significant semantic attributes according to the manual selection of users in a man-machine interaction mode, provides an interactive difficult portrait retrieval scheme based on the significant semantic attributes, and constructs a set of portrait retrieval method with less circulation and rapid convergence by combining with a re-ordering strategy. Practice results show that the precision, recall rate and F1 value of the invention all reach better values, and the invention can well solve the retrieval problem of various difficult portrait.

Referring to fig. 2, an interactive problematic person retrieval client is further provided as another embodiment of the present invention. The client 200 includes an initialization module 210, an information collection module 220, and an information processing module 230.

The initialization module 210 is configured to acquire an image and convert the image into an image representation set based on semantic attribute tags.

Preferably, the semantic attribute tags are portrait semantic attribute tags standardized by the public security industry. Specifically, after a multi-tag learning neural network is constructed according to the standardized portrait semantic attribute tags in the public security industry, image data is converted into an image representation set Y based on the semantic attribute tags in the public security industry through the multi-tag learning neural network.

The information collecting module 220 is configured to receive a query input by a user, and obtain semantic attribute information selected by the user for the query.

Preferably, the semantic attribute information includes names and attribute values of semantic attributes.

The information collection module 220 is configured to: extracting the characteristics of the query according to the query input by the user; outputting a series of attribute values corresponding to the semantic attributes on a terminal display structure according to the query characteristics for selection by a user; and acquiring the attribute value of the semantic attribute selected by the user as the selected semantic attribute information.

The information processing module 230 is configured to classify the image representation set into a significant attribute image set and a non-significant attribute image set according to whether the image has significant semantic attributes or not according to the user's selection tag significant semantic attributes, classify and sort the significant attribute image set and the non-significant attribute image set, and generate an initial candidate sorting queue according to a sorting result; and obtaining a final target determined by a user through mixed similarity re-ordering interactive retrieval; the method comprises the steps of obtaining selection information of a user through the hierarchical human-computer interaction, conducting mixed similarity reordering on a candidate ordering queue aiming at each selection, generating a candidate ordering queue of the next cycle for the user to reselect, and ending the cycle when a final target determined by the user is obtained.

Other technical features referring to the previous embodiment, the information processing module 230 can be configured to execute a corresponding information processing method, which is not described herein again.

Referring to fig. 3, another embodiment of the present invention provides an interactive image retrieval system. The system 300 includes a user terminal 310 and a server side 320.

The user terminal 310 is provided with a human-computer interaction interface, and acquires query information and selection information input by a user through the human-computer interaction interface.

The server side 320 includes a processor and a memory for storing processor-executable instructions and parameters.

The processor is configured to: acquiring an image, and converting the image into an image representation set based on semantic attribute labels; acquiring semantic attribute information selected by a user aiming at a query according to the query input by the user; according to the significance semantic attributes of the selection marks of the user, classifying the image representation set into a significance attribute image set and a non-significance attribute image set according to whether the images have significance semantic attributes, respectively classifying and ordering the significance attribute image set and the non-significance attribute image set, and generating an initial candidate ordering queue according to an ordering result; and, obtaining a final target determined by the user through mixed similarity re-ordering interactive retrieval; the method comprises the steps of obtaining selection information of a user through the layer-by-layer man-machine interaction, conducting mixed similarity reordering on a candidate ordering queue aiming at each selection, generating a candidate ordering queue of the next cycle for the user to reselect, and ending the cycle when a final target determined by the user is obtained.

Other technical features refer to the previous embodiment, and the processor can be configured to execute the corresponding information processing method, which is not described herein again.

In the foregoing description, the disclosure of the present invention is not intended to limit itself to these aspects. Rather, the various components may be selectively and operatively combined in any number within the intended scope of the present disclosure. In addition, terms like "comprising," "including," and "having" should be interpreted as inclusive or open-ended, rather than exclusive or closed-ended, by default, unless explicitly defined to the contrary. All technical, scientific, or other terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs unless defined otherwise. Common terms found in dictionaries should not be interpreted too ideally or too realistically in the context of related art documents unless the present disclosure expressly limits them to that. Any changes and modifications of the present invention based on the above disclosure will be within the scope of the appended claims.

Claims

1. A method for interactive problematic portrait retrieval, characterized by the steps of:

2. The method of claim 1, wherein: the semantic attribute labels are portrait semantic attribute labels standardized in the public security industry, and after a multi-label learning neural network is constructed according to the portrait semantic attribute labels standardized in the public security industry, image data are converted into an image expression set Y based on the semantic attribute labels in the public security industry through the multi-label learning neural network.

3. The method of claim 1, wherein: in the process of the layer-by-layer human-computer interaction, the judgment of a user is assisted by setting machine vision recognition.

4. The method according to claim 1 or 2, characterized in that: the specific steps of converting to form the image representation set Y include:

let I training image set X be { X_i|x_i∈X^vI is more than or equal to 1 and less than or equal to I, and v is the dimensionality of the image vector;

let L industry standardized semantic attribute sets Z be { Z_l|z_lBelongs to Z }, wherein L is more than or equal to 1 and less than or equal to L; converting the training image set into a v × L dimensional image representation set Y of { Y } based on semantic attribute labels through an MLCNN neural network_i，l|y_i，l∈Y^v×LAnd (c) the step of (c) in which,

y_i，l＝Rep(xi)(1≤i≤I，1≤l≤L)，

in the formula, the function Rep (-) is from X^vTo Y^v×LThe transformation function of (a);

Dis(x_i1-x_i2)＝(S_i1-S_i2)^T(S_i1-S_i2)，

5. The method of claim 4, wherein: in step 200, the semantic attribute information includes names and attribute values of semantic attributes, and the step of obtaining the semantic attribute information selected by the user for the query includes:

6. The method of claim 5, wherein: in step 200, the step of generating an initial candidate ranking queue comprises:

And setting a significance attribute judgment function

Comprises the following steps:

according to whether the image has the saliency attribute

And a set of non-salient attribute image representations

Can be known as Y⁺∩Y^-(ii) a To highlight the role of the saliency attributes, Score (x) was adjusted_i) The attribute contribution value δ (l) in the calculation formula of (1) is adjusted as follows:

in the formula (I), the compound is shown in the specification,

indicating that the semantic attribute is judged to be significant,

and α (r +1) ═ min (0.5, α (0) -0.05r) (r.gtoreq.0);

the adjusted delta (l) is brought into the Score (x)_i) Is recalculated by the calculation formulaObtaining two target-based Q_rThe queues sorted according to the reverse order of the distance are respectively a significance semantic attribute queue RankA (Q)_r，A_r)∈Y⁺And an unnoticeable semantic attribute queue RankB (Q)_r，B_r)∈Y^-In the formula, Q_rA selection target representing the r-th manual interaction;

Candidate(r)＝Top(RankA(Q_r，A_r)，t_a)∩Top(RankB(Q_r，B_r)，t_b)

where Candidate (0) represents the initial Candidate sort queue.

7. The method of claim 6, wherein: in step 300, the step of performing mixed similarity re-ordering interactive search includes:

DisK(x_i1-x_i2)＝(A_i1-A_i2)^T(V_in ^-1-V_out ^-1)(A_i1-A_i2)，

in the formula, V_in ^-1And V_out ^-1Respectively representing intra-class and inter-class probability likelihood matrixes in a KISSME algorithm; a. the_i1Is x_i1A fused feature similarity distance matrix of_i2Is x_i2The fused feature similarity distance matrix of (1);

In the formula, mu is an assistant feature weight value; the value of μ is based on the manual selection of Q_r+1Whether Top (RankA (Q)) falls in the previous round of candidate queue Candidate (r)_r，A_r)，t_a) Partially set; when Q is manually selected_r+1Fall on Top (RankA (Q)_r，A_r)，t_a) In the middle time, the retrieval effect based on the significance attribute is good, the reordering strategy 1 is executed, and RankB (Q) is used_r，B_r) Discard and discard RankA (Q)_r，A_r) Dividing each of the front and rear halves into RankA (Q)_r+1，A_r+1) And RankB (Q)_r+1，B_r+1) When μ is set to 0.1; otherwise, perform reordering strategy 2, will RankB (Q)_r，B_r) The latter half is discarded and the first half is added with RankA (Q)_r，A_r) And according to DisK (x)_i1-x_i2) The values of the semi-scores before and after the reverse order arrangement generated by the calculation formula are assigned to RankA (Q)_r+1，A_r+1) And RankB (Q)_r+1，B_r+1) In this case, μ is set to 0.5.

8. The method of claim 7, wherein: before the reordering step of each loop, a candidate cluster transformation operation is performed to avoid invalid loops and accelerate convergence, which is to pair Top (RankA (Q) by the following formula_r，A_r)，t_a) And (4) clustering:

Top′(RankA(Q_r，A_r)，t_a)＝K_mean(Top(RankA(Q_r，A_r)，t_a))，

9. An interactive problematic portrait retrieval client is characterized by comprising the following structures:

the initialization module is used for acquiring an image and converting the image into an image representation set based on semantic attribute labels; the information acquisition module is used for receiving a query input by a user and acquiring semantic attribute information selected by the user aiming at the query;

10. An interactive problematic portrait retrieval system comprises a user terminal and a server terminal, and is characterized in that: