CN113642392A - Target searching method and device - Google Patents

Target searching method and device Download PDF

Info

Publication number
CN113642392A
CN113642392A CN202110767455.6A CN202110767455A CN113642392A CN 113642392 A CN113642392 A CN 113642392A CN 202110767455 A CN202110767455 A CN 202110767455A CN 113642392 A CN113642392 A CN 113642392A
Authority
CN
China
Prior art keywords
target
graph
candidate
searched
targets
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110767455.6A
Other languages
Chinese (zh)
Other versions
CN113642392B (en
Inventor
杨华
刘创
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Jiaotong University
Original Assignee
Shanghai Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Jiaotong University filed Critical Shanghai Jiaotong University
Priority to CN202110767455.6A priority Critical patent/CN113642392B/en
Publication of CN113642392A publication Critical patent/CN113642392A/en
Application granted granted Critical
Publication of CN113642392B publication Critical patent/CN113642392B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Abstract

The invention discloses a target searching method and device. The method comprises the following steps: the method comprises the steps of obtaining feature expressions of targets in a target video frame, and constructing a source graph which takes the target to be searched as a central node, other targets as context nodes and points to the central node by the context nodes based on the feature expressions of the targets. The method comprises the steps of obtaining feature expressions of candidate targets in a candidate video frame, determining the candidate targets corresponding to the targets based on the feature expressions of the candidate targets and the feature expressions of the targets, and constructing a target graph which takes the candidate targets corresponding to the targets to be searched as central nodes, takes the candidate targets corresponding to other targets as context nodes and points to the central nodes through the context nodes. And obtaining the graph embedding vectors of the source graph and the target graph based on the twin residual graph convolution neural network. Determining a target to be searched in the candidate video frame based on at least the graph embedding vector of the source graph and the graph embedding vector of the target graph. The scheme of the invention improves the accuracy of target search and has stronger robustness.

Description

Target searching method and device
Technical Field
The present invention relates to the field of computer vision technologies, and in particular, to a target search method and apparatus, a computer device, and a computer-readable storage medium.
Background
Generally, object searching may be defined as a process of searching for a specific object in a vast number of surveillance video frames. The object search technology is an automatic object detection and identification technology, which can quickly locate an object of interest in a monitoring network. The target search technology combines target detection and target re-identification, which is a key technology in intelligent video monitoring and has gained wide attention in the field of computer vision in recent years. Taking the target as a pedestrian as an example, the pedestrian search has important application value in security protection people searching, human body behavior analysis and other applications.
Currently, the individual similarity between the target to be searched and the candidate target is usually calculated according to the characteristics of the target to be searched and the candidate target, and then the candidate target is ranked according to the individual similarity between the target to be searched and the candidate target, so as to realize the search of the target to be searched. However, in surveillance video, the target usually changes greatly due to the change of the view angle, illumination, and the like, and due to the problem of occlusion. For example, for a pedestrian in a surveillance video, the appearance of the pedestrian may greatly change in different video frames due to the posture, the viewing angle, the illumination, whether the pedestrian is blocked or not, and the like. However, the accuracy of the search is low by adopting the current target search method.
Therefore, how to search out a target in a video and improve the accuracy of target search becomes one of the problems to be solved at present.
Disclosure of Invention
The invention provides a target searching method, a target searching device, computer equipment and a computer readable storage medium, which are used for accurately searching targets in different video frames.
The invention provides a target searching method, which comprises the following steps:
acquiring feature expression of each target in a target video frame, and constructing a source graph which takes the target to be searched as a central node and other targets as context nodes and points to the central node by the context nodes based on the feature expression of each target;
acquiring feature expression of each candidate target in a candidate video frame, determining the candidate target corresponding to each target based on the feature expression of each candidate target and the feature expression of each target, and constructing a target graph which takes the candidate target corresponding to the target to be searched as a central node, takes the candidate target corresponding to other targets as context nodes and points to the central node by the context nodes, wherein the candidate target corresponding to each target is a candidate target most similar to each target;
obtaining a graph embedding vector of a source graph and a graph embedding vector of a target graph based on the twin residual graph convolution neural network;
determining a target to be searched in the candidate video frame based on at least the graph embedding vector of the source graph and the graph embedding vector of the target graph.
Optionally, the determining a target to be searched in the candidate video frame based on at least the graph embedding vector of the source graph and the graph embedding vector of the target graph includes:
calculating graph similarity of a target to be searched and a candidate target corresponding to the target to be searched based on the graph embedding vector of the source graph and the graph embedding vector of the target graph;
and determining the target to be searched in the candidate video frame based on the image similarity of the target to be searched and the candidate target corresponding to the target to be searched.
Optionally, the determining a target to be searched in the candidate video frame based on at least the graph embedding vector of the source graph and the graph embedding vector of the target graph includes:
calculating graph similarity of a target to be searched and a candidate target corresponding to the target to be searched based on the graph embedding vector of the source graph and the graph embedding vector of the target graph;
correcting the similarity between the target to be searched and the candidate target corresponding to the target to be searched based on the graph similarity between the target to be searched and the candidate target corresponding to the target to be searched so as to obtain the similarity between the target to be searched and the candidate target corresponding to the target to be searched after correction;
and determining the target to be searched in the candidate video frame based on the similarity between the corrected target to be searched and the candidate target corresponding to the target to be searched.
Optionally, the adjacency matrix of the source graph and the target graph is defined by the following formula:
Figure BDA0003152391420000031
wherein A isijBeing elements of the ith row and jth column of the contiguous matrix, q1Feature vectors, g, for the center node in the source graph1Is a feature vector of a central node in the target graph, qjIs the feature vector, g, of node j in the source graphjIs the feature vector of the node j in the target graph.
Optionally, the twin residual map convolutional neural network at least includes two map convolutional layers, each map convolutional layer is defined by the following formula:
Zl+1=σ(AZlWl)
where σ (-) is the nonlinear activation operation, A is the adjacency matrix, ZlIs an input feature matrix of the l-th layer, WlLearning parameters of the l-th layer.
Optionally, the obtaining of the feature expression of each target in the target video frame includes:
detecting each target in the target video frame based on the detection network;
acquiring feature expression of each target based on a neural network;
the acquiring the feature expression of each candidate target in the candidate video frame comprises the following steps:
detecting each candidate target in the candidate video frame based on the detection network;
and acquiring the feature expression of each candidate target based on the neural network.
The present invention also provides a target search apparatus, comprising:
the first processing unit is used for acquiring the feature expression of each target in the target video frame, and constructing a source graph which takes the target to be searched as a central node and other targets as context nodes and points to the central node by the context nodes based on the feature expression of each target;
the second processing unit is used for acquiring feature expressions of all candidate targets in a candidate video frame, determining the candidate targets corresponding to all the targets based on the feature expressions of all the candidate targets and the feature expressions of all the targets, and constructing a target graph which takes the candidate target corresponding to the target to be searched as a central node, takes the candidate target corresponding to other targets as context nodes and points to the central node from the context nodes, wherein the candidate target corresponding to all the targets is a candidate target most similar to each target;
the acquisition unit is used for acquiring a graph embedding vector of the source graph and a graph embedding vector of the target graph based on the twin residual graph convolution neural network;
a determining unit, configured to determine a target to be searched in the candidate video frame based on at least the graph embedding vector of the source graph and the graph embedding vector of the target graph.
The invention also provides a computer device comprising at least one processor and at least one memory, wherein the memory stores a computer program which, when executed by the processor, enables the processor to perform the above object search method.
The present invention also provides a computer-readable storage medium in which instructions, when executed by a processor in a device, enable the device to perform the above-described object search method.
Compared with the prior art, the technical scheme of the invention has the following beneficial effects:
the method comprises the steps of firstly obtaining feature expressions of targets in a target video frame, and constructing a source graph which takes the target to be searched in the target video frame as a central node and other targets as context nodes and points to the central node by the context nodes based on the feature expressions of the targets. And then acquiring feature expressions of all candidate targets in the candidate video frame, determining the candidate targets corresponding to all targets based on the feature expressions of all the candidate targets and the feature expressions of all the targets, and constructing a target graph which takes the candidate target corresponding to the target to be searched as a central node, takes the candidate target corresponding to other targets as a context node and points to the central node from the context node, wherein the candidate target corresponding to all the targets is the candidate target most similar to each target. Then, a graph embedding vector of the source graph and a graph embedding vector of the target graph are respectively obtained based on the twin residual graph convolution neural network. Finally, a target to be searched in the candidate video frame is determined based on at least the graph embedding vector of the source graph and the graph embedding vector of the target graph. In the process of searching the target, a source graph reflecting the relation between the target to be searched and the contextual target thereof is constructed, and a target graph reflecting the relation between the candidate target corresponding to the target to be searched and the contextual target thereof is constructed. By constructing the source graph and the target graph, the context target in the video frame is taken into consideration, so that the target searching result has higher robustness to target change, and the accuracy of target searching is improved. In addition, the graph embedding vectors of the source graph and the target graph are obtained by adopting the twin residual graph convolution neural network, so that the information of the upper and lower targets in the source graph and the target graph can be effectively integrated, the attenuation of the characteristics can be effectively reduced by adopting the twin residual graph convolution neural network, and the accuracy of target searching is further improved. In addition, the target searching method of the technical scheme of the invention is easy to realize and has strong universality.
Further, after the graph embedding vector based on the source graph and the graph embedding vector of the target graph are calculated to obtain the graph similarity between the target to be searched and the candidate target corresponding to the target to be searched, the graph similarity between the target to be searched and the candidate target corresponding to the target to be searched can be utilized to correct the similarity between the target to be searched and the candidate target corresponding to the target to be searched, and then the target to be searched is determined in the candidate video based on the corrected similarity between the target to be searched and the candidate target corresponding to the target to be searched. When searching for the target, the individual feature similarity between the target to be searched and the candidate target corresponding to the target to be searched is considered, the graph similarity between the target to be searched and the candidate target corresponding to the target to be searched based on the context target is also considered, the graph similarity between the target to be searched and the candidate target corresponding to the target to be searched is utilized to correct the similarity between the target to be searched and the candidate target corresponding to the target to be searched, and the target is searched through the corrected similarity between the target to be searched and the candidate target corresponding to the target to be searched, so that the accuracy of target searching is improved to a great extent.
Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.
The technical solution of the present invention is further described in detail by the accompanying drawings and embodiments.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention and not to limit the invention. In the drawings:
FIG. 1 is a schematic flow chart of a target searching method according to an embodiment of the present invention;
FIG. 2 is a diagram illustrating the graph similarity obtained by using a twin residual image convolution neural network according to an embodiment of the present invention;
FIG. 3 is a schematic diagram illustrating a process of searching for a target by the target searching method according to the embodiment of the present invention;
FIG. 4 is a diagram illustrating a comparison of search results of a target search method and other target search methods according to an embodiment of the present invention;
FIG. 5 is a diagram illustrating a comparison between a target search method and a search result of a target search method according to an embodiment of the present invention;
fig. 6 is a schematic diagram illustrating the ranking of the search results of the target search method and other target search methods according to the embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention clearer, the present invention will be described in further detail with reference to the accompanying drawings, and it is apparent that the described embodiments are only a part of the embodiments of the present invention, not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It is now common to search for an object to be searched in a video frame based on individual feature similarities of the object to be searched and candidate objects as described in the prior art. And the target to be searched is searched only based on the individual feature similarity of the target to be searched and the candidate target, and the searching accuracy is low. Therefore, the embodiment of the invention provides a target searching method. Referring to fig. 1, fig. 1 is a schematic flow chart of a target search method according to an embodiment of the present invention, and as shown in fig. 1, the target search method includes:
s101: the method comprises the steps of obtaining feature expressions of targets in a target video frame, and constructing a source graph which takes the target to be searched as a central node, other targets as context nodes and points to the central node by the context nodes based on the feature expressions of the targets.
S102: the method comprises the steps of obtaining feature expressions of candidate targets in a candidate video frame, determining the candidate targets corresponding to the targets based on the feature expressions of the candidate targets and the feature expressions of the targets, and constructing a target graph which takes the candidate targets corresponding to the targets to be searched as central nodes, takes the candidate targets corresponding to other targets as context nodes and points to the central nodes through the context nodes, wherein the candidate targets corresponding to the targets are the candidate targets most similar to each target.
S103: and respectively obtaining a graph embedding vector of the source graph and a graph embedding vector of the target graph based on the twin residual graph convolutional neural network.
S104: determining a target to be searched in the candidate video frame based on at least the graph embedding vector of the source graph and the graph embedding vector of the target graph.
S101 is executed, in this embodiment, the target video frame may be a frame of video containing the target to be searched. The target to be searched is determined, and the target can be a pedestrian or a vehicle. Taking a pedestrian as an example, it may be determined that the pedestrian to be searched is somebody in the target video frame. In order to obtain the feature expression of each target in the target video frame, in this embodiment, each target in the target video frame may be detected based on a detection network, where the detection network may be a neural network, for example: R-CNN (Region-CNN), Faster R-CNN, and the like. Then, feature expressions of the targets are obtained based on the Neural network, the feature expressions of the targets in this embodiment may be feature vectors of the targets, and specifically, a Convolutional Neural Network (CNN) may be used, such as: Se-ResNet-50 performs feature extraction on each target obtained through fast R-CNN detection to obtain a feature vector of each target, wherein the feature vector of the target in the embodiment can be a 512-dimensional vector. After the feature vectors of the targets are obtained, simplifying each target in the target video frame into a node, taking a target to be searched (such as some pedestrian) in the target video frame as a central node, taking other targets as context nodes, connecting the central node and the context nodes, and establishing a source graph by pointing the context nodes to the central node.
And S102 is executed, and the feature expression of each candidate target in the candidate video frame is obtained. Similarly, in this embodiment, each candidate target in the candidate video frame may be detected based on a detection network, where the detection network may be a neural network, such as: R-CNN (Region-CNN), Faster R-CNN, and the like. Then, feature expressions of the candidate targets are obtained based on the Neural network, in this embodiment, the feature expressions of the candidate targets may be feature vectors of the candidate targets, and specifically, feature extraction may be performed on the candidate targets obtained through fast R-CNN detection by using a Convolutional Neural Network (CNN), such as Se-ResNet-50, to obtain the feature vectors of the candidate targets. After the feature vectors of the candidate targets are determined, the candidate targets corresponding to the targets are determined. In this embodiment, the candidate object corresponding to each object refers to a candidate object most similar to each object. By simplifying the target and the candidate target into nodes, the candidate target corresponding to each target can be determined according to the distance between the nodes, or the similarity between the nodes. In this embodiment, the candidate object most similar to each object may be determined by calculating cosine similarity between each object and the candidate object. In other embodiments, the candidate target most similar to each target may also be determined by calculating a manhattan distance or a euclidean distance between each target and the candidate target, or the like. After the candidate targets corresponding to each target are determined, the candidate target corresponding to the target to be searched is taken as a central node (for example, the candidate target corresponding to some golden target is taken as the central node) and the candidate targets corresponding to other targets are taken as context nodes, the central node and the context nodes are connected, and the direction is pointed to the central node by the context nodes, so that the target graph is constructed.
Executing S103: a graph embedding vector of the source graph and a graph embedding vector of the target graph are obtained based on the twin residual graph convolution neural network. After the source map and the target map are constructed through S101 and S102, they are input to a twin residual map convolutional neural network, which includes at least two map convolutional layers, each defined by the following formula:
Zl+1=σ(AZlWl)
where σ (-) is the nonlinear activation operation, A is the adjacency matrix, and A ∈ Rm×m,ZlIs the input feature matrix of the l-th layer,
Figure BDA0003152391420000081
Wlas the learning parameters of the l-th layer,
Figure BDA0003152391420000082
m is the number of nodes in the source graph, dinAs a dimension of the input feature vector, doutIs the dimension of the output feature vector.
In this example, din=doutM varies with the source map 512.
In this embodiment, the twin residual image convolution neural network may include three layers, see fig. 2, and fig. 2 is a schematic diagram of obtaining image similarity by using the twin residual image convolution neural network in the embodiment of the present invention. The twin residual map convolution neural network in fig. 2 includes: GCN-Layer-1, GCN-Layer-2 and GCN-Layer-3. The adjacency matrix a of the source graph and the target graph is defined by the following formula:
Figure BDA0003152391420000083
wherein A isijIs an element adjacent to the ith row and jth column of matrix A, q1Feature vectors, g, for the center node in the source graph1Is a feature vector of a central node in the target graph, qjIs the feature vector, g, of node j in the source graphjIs the feature vector of the node j in the target graph.
When a graph embedding vector of a source graph is obtained through a twin residual graph convolution neural network, an input feature matrix Z of a layer 1 of the twin residual graph convolution neural network1The feature matrix of the source graph is a matrix formed by feature vectors of each node in the source graph. Likewise, for the graph embedding vector of the target graph obtained by the twin residual graph convolution neural network, the input feature matrix Z of the 1 st layer of the twin residual graph convolution neural network1The feature matrix is a feature matrix of the target graph, namely a matrix formed by feature vectors of each node in the target graph.
It should be noted that, in this embodiment, for convenience of description, the target, the feature vector of the target, the graph embedding vector of the target, and the central node when the target is used as the central node, which are included in the target video frame, are all represented by the same symbol, for example, the target q is represented by the same symbol1Target q1Characteristic vector q of1Target q1Is embedded in a map vector q1With a target q1When the central node is a central node, the central node q1. Similarly, the candidate target, the feature vector of the candidate target, the graph embedding vector of the candidate target, and the central node when the candidate target is taken as the central node, which are included in the candidate video frame, are all represented by the same symbol, such as the candidate target g1Candidate target g1Feature vector g of1Candidate target g1Graph embedding vector g1To candidate target g1When the central node is a central node, the central node g1
With continued reference to FIG. 2, the center node of the source graph in FIG. 2 is q1I.e. q1For the object to be searched, the other object is q2~q5Which is a context node, context node q2~q5And a central node q1Form a context node q therebetween2~q5Pointing to a central node q1The edge of (2). G in the target graph1To and from the object q to be searched1Corresponding candidate target, i.e. g1For the candidate target and the target q to be searched1The most similar candidate object. g2~g5Are respectively q and2~q5corresponding candidate target, i.e. g2For the candidate target and the target q2Most similar candidate object, g3For the candidate target and the target q3Most similar candidate object, g4For the candidate target and the target q4Most similar candidate object, g5For the candidate target and the target q5The most similar candidate object. In the target graph, the center node is g1The other candidate target is g2~g5Which is a context node, context node g2~g5And a central node g1Form a context node g therebetween2~g5Point to center node g1The edge of (2).
In FIG. 2, the source map is input to Layer 1 GCN-Layer-1 of the GCN, and is also part of the GCN-Layer-3 input of the GCN. Inputting a source map into a GCN, after the source map passes through GCN-Layer-1 and GCN-Layer-2, taking the output of the GCN-Layer-2 and the source map as the input of the GCN-Layer-3, finally outputting a map embedding vector of the source map, and determining a target q to be searched from the map embedding vector of the source map1Graph embedding vector g1
Similarly, the target map is input to Layer 1 GCN-Layer-1 of the GCN, and is also part of the GCN-Layer-3 input of the GCN. Inputting the target graph to the GCN, after the GCN-Layer-1 and the GCN-Layer-2, taking the output of the GCN-Layer-2 and the target graph as the input of the GCN-Layer-3, finally outputting a graph embedding vector of the target graph, and determining a graph embedding vector of the target graph and a target q to be searched1Corresponding candidate target g1Graph embedding vector g1
In this embodiment, the twin residual map convolution neural network is trained by the following cosine embedding loss function:
Figure BDA0003152391420000101
wherein q is1Embedding vectors for a graph normalized for the object to be searched, g1Graph embedding vector normalized for candidate target, alpha is threshold value, y is 1 to represent target q to be searched1And candidate target g1Is the same object. In this embodiment α may be 0.5.
And S104 is executed, after the graph embedding vectors of the source graph and the target graph are obtained through the twin residual graph convolution neural network, firstly, the graph similarity of the target to be searched and the candidate target corresponding to the target to be searched is calculated based on the graph embedding vector of the source graph and the graph embedding vector of the target graph. Specifically, the method comprises the following steps:
and determining a graph embedding vector of the target to be searched according to the target to be searched in the source graph, namely the position of the central node in the source graph in the graph embedding vector of the source graph. Similarly, the graph embedding vector of the candidate target corresponding to the target to be searched is determined according to the candidate target corresponding to the target to be searched in the target graph, namely the position of the central node in the graph embedding vector of the target graph. And calculating the graph similarity between the graph embedding vector of the target to be searched and the graph embedding vector of the candidate target corresponding to the graph embedding vector. Specifically, when the graph similarity between the graph embedding vector of the target to be searched and the graph embedding vector of the candidate target corresponding to the target to be searched is calculated, the graph similarity between the graph embedding vector of the target to be searched and the graph embedding vector of the candidate target corresponding to the target to be searched can be normalized, and then the graph similarity between the graph embedding vector of the target to be searched and the graph embedding vector of the candidate target corresponding to the target to be searched can be obtained by calculating the cosine similarity between the normalized graph embedding vector of the target to be searched and the normalized graph embedding vector of the candidate target corresponding to the target to be searched. With continued reference to FIG. 2, the target q to be searched is obtained by convolving the neural network with the twin residual map1Is embedded in a map vector q1With the object q to be searched1Corresponding candidate target g1Graph embedding vector g1Then, respectively normalizing the two images, and calculating the cosine similarity between the two images to obtain the graph similarity S between the two imagesg(q1,g1)。
In this embodiment, in order to better improve the accuracy of target search, after the graph similarity between the target to be searched and the candidate target corresponding to the target to be searched is determined, the graph similarity is used to correct the similarity between the target to be searched and the candidate target corresponding to the target to be searched so as to obtain the similarity between the target to be searched and the candidate target corresponding to the target to be searched after correction. The similarity between the target to be searched and the candidate target corresponding to the target to be searched can be obtained by normalizing the feature vector of the target to be searched and the feature vector of the candidate target corresponding to the target to be searched, and then calculating the cosine similarity between the normalized feature vector of the target to be searched and the normalized feature vector of the candidate target corresponding to the target to be searched.
In this embodiment, the similarity between the target to be searched and the candidate target corresponding to the target to be searched may be corrected by the following formula:
S(q1,g1)=(1-λ)S0(q1,g1)+λSg(q1,g1),λ∈(0,1)
wherein q is1As an object to be searched, g1Is a reaction with q1Corresponding candidate object, S (q)1,g1) For the corrected target q to be searched1And candidate target g corresponding thereto1Similarity of (D), S0(q1,g1) For an object q to be searched1And candidate target g corresponding thereto1Similarity of (D), Sg(q1,g1) For an object q to be searched1And candidate target g corresponding thereto1λ is a correction factor, and λ may be 0.5 in this embodiment.
After the similarity between the corrected target to be searched and the candidate target corresponding to the target to be searched is obtained, whether the candidate target is the target to be searched can be judged according to whether the similarity is larger than a preset threshold value. In practical application, the candidate target corresponding to the target to be searched in the multiple candidate video frames needs to be judged, the similarity between the corrected target to be searched and the candidate target corresponding to the target to be searched can be calculated according to the formula, and then the similarity between the target to be searched corresponding to each candidate video frame obtained through calculation and the candidate target corresponding to the target to be searched is ranked from high to low so as to search the target to be searched in the multiple candidate video frames.
In the embodiment, the similarity between the target to be searched and the candidate target corresponding to the target to be searched is corrected according to the graph similarity between the target to be searched and the candidate target corresponding to the target to be searched, so that on one hand, the individual feature similarity between the target to be searched and the candidate target corresponding to the target to be searched is considered, and on the other hand, the graph similarity between the target to be searched and the candidate target corresponding to the target to be searched based on the context target is also considered, and therefore the accuracy of target search is improved to a great extent.
In other embodiments, the target search may also be performed directly according to the graph similarity between the target to be searched and the candidate target corresponding to the target to be searched, that is, whether the candidate target corresponding to the target to be searched is determined directly according to the graph similarity between the target to be searched and the candidate target corresponding to the target to be searched. Specifically, whether the candidate target is the target to be searched may be determined according to whether the graph similarity between the target to be searched and the candidate target corresponding to the target to be searched is greater than a preset threshold.
Thus, the search for the target in the video frame is realized through the above-mentioned S101 to S104.
Fig. 3 is a schematic diagram of a process of searching for a target by the target searching method according to the embodiment of the present invention. The following briefly describes the technical solution of the embodiment of the present invention with reference to fig. 3. In fig. 3, a search pedestrian example is illustrated, and a target pedestrian in a target pedestrian video frame and a candidate pedestrian in a candidate pedestrian video frame are detected by fast R-CNN to obtain the target pedestrian: q. q.s1,q2,q3And candidate pedestrians: g1,g2,g3Using CNN, e.g. Se-ResNet-50, for target pedestrians q1,q2,q3Candidate pedestrian g1,g2,g3Performing Feature Extraction (CNN Feature Extraction) to respectively obtain target pedestrians q1,q2,q3Characteristic vector q of1,q2,q3Candidate pedestrian g1,g2,g3Feature vector g of1,g2,g3. Calculating the similarity between each target pedestrian and the candidate pedestrian to find the candidate pedestrian most similar to each target pedestrian, in FIG. 3, the candidate pedestrian q1,q2,q3The most similar pedestrian candidates are g in order1,g2,g3. If the target line to be searched is artificial q1Then with q1Is a central node, q2,q3Constructing a Source Graph (Source Graph) for context nodes in g1Is a central node, g2,g3A Target Graph (Target Graph) is constructed for the context nodes. Inputting a Graph Pair (constraint of Graph Pair) composed of a source Graph and a target Graph into a twin Residual Graph convolution neural Network (SR-GCN) to obtain a target pedestrian q1And a pedestrian candidate g corresponding thereto1Degree of similarity of graph Sg(q1,g1) To the target pedestrian q1And a pedestrian candidate g corresponding thereto1Degree of similarity of graph Sg(q1,g1) For target pedestrian q1And a pedestrian candidate g corresponding thereto1Similarity of (2)0(q1,g1) Correction (Graph Similarity retrieval) is performed to obtain a corrected target pedestrian q1And a pedestrian candidate g corresponding thereto1Similarity of (a) S (q)1,g1) The target pedestrian q can be identified based on the corrected similarity1Search is performed to determine a pedestrian candidate g1Whether it is a target pedestrian q1
In practical application, the performance of the target search method of this embodiment can be evaluated by top-1 accuracy and mean Average accuracy (mAP). Hereinafter, when the target is a pedestrian, the performance of the target search method is evaluated.
Fig. 4 is a schematic diagram illustrating comparison between search results of a target search method and search results of other target search methods according to an embodiment of the present invention. In FIG. 4, the two databases PRW and CUHK-SYSU are provided by (L.ZHING, H.ZHANG, S.Sun, M.Chandraker, Y.Yang, Q.Tian, Person re-identification in the world, in: Proceedings of the IEEE Conference on 555Computer Vision and Pattern Recognition,2017, pp.1367-1376.) and (T.Xiao, S.Li, B.Wang, L.Lin, X.Wang, Joint detection and identification discovery search for Person search, in: Proceedings of the IEEE Conference Computer Vision and Pattern Recognition,2017, pp.3415-3424), respectively, for performance evaluation of the video search method.
In fig. 4, Ours is the top-1 and mAP values calculated when searching for the target pedestrian by using the target searching method of the present embodiment. As can be seen from fig. 4, for different data sets, the accuracy of searching for the target pedestrian by the target searching method of the embodiment of the invention is significantly higher than that of searching for the target pedestrian by other target searching methods.
Fig. 5 is a schematic diagram illustrating comparison between a target search method and a search result of the target search method according to the embodiment of the present invention. In fig. 5, Baseline is values of top-1 and mAP when a target pedestrian and a candidate pedestrian are searched in different candidate video frames after similarity calculation is directly performed on the target pedestrian and the candidate pedestrian according to individual features extracted by CNN, and Baseline + SR-GCN is values of top-1 and mAP when the target pedestrian is searched by using the target searching method of the embodiment of the invention, and as can be seen from fig. 5, the target searching method of the embodiment of the invention improves the accuracy of target pedestrian search to a greater extent by using context pedestrians in the video frames as auxiliary information.
Fig. 6 is a schematic diagram illustrating the ranking of the search results of the target search method and other target search methods according to the embodiment of the present invention. In fig. 6, for each target pedestrian (query), the first action is a result of ranking candidate pedestrians after searching for the target pedestrian by using another target searching method, and the second action is a result of ranking candidate pedestrians after searching for the target pedestrian by using the target searching method according to the embodiment of the present invention. In fig. 6, the pedestrian candidates in the green box are the target pedestrians to be searched, and the pedestrian candidates in the blue box are the wrong search results. As can be seen from fig. 6, by using the target search method according to the embodiment of the present invention, the context pedestrian is used as the auxiliary information to search for the target pedestrian, and when the appearance of the target pedestrian is greatly changed, as many correct search results as possible can be arranged at the front position, so that the accuracy of searching for the target pedestrian is improved. The target searching method provided by the embodiment of the invention has higher robustness to the change of the target.
The present invention also provides an object search apparatus, the apparatus comprising:
the first processing unit is used for acquiring the feature expression of each target in the target video frame, and constructing a source graph which takes the target to be searched as a central node and other targets as context nodes and points to the central node by the context nodes based on the feature expression of each target.
And the second processing unit is used for acquiring the feature expression of each candidate target in the candidate video frame, determining the candidate target corresponding to each target based on the feature expression of each candidate target and the feature expression of each target, and constructing a target graph which takes the candidate target corresponding to the target to be searched as a central node, takes the candidate target corresponding to the other target as a context node, and points to the central node from the context node, wherein the candidate target corresponding to each target is a candidate target most similar to each target.
And the acquisition unit is used for acquiring the graph embedding vector of the source graph and the graph embedding vector of the target graph based on the twin residual graph convolution neural network.
A determining unit, configured to determine a target to be searched in the candidate video frame based on at least the graph embedding vector of the source graph and the graph embedding vector of the target graph.
The implementation of the target search apparatus of this embodiment may refer to the implementation of the target search method described above, and details are not described here.
Based on the same technical concept, embodiments of the present invention provide a computer device, which includes at least one processor and at least one memory, wherein the memory stores a computer program, and when the program is executed by the processor, the processor is enabled to execute the above object search method.
Based on the same technical concept, embodiments of the present invention provide a computer-readable storage medium in which instructions, when executed by a processor within a device, enable the device to perform the above-described object search method.
It should be apparent to those skilled in the art that embodiments of the present invention may be provided as a method, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all such alterations and modifications as fall within the scope of the invention.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims (9)

1. A method of searching for an object, comprising:
acquiring feature expression of each target in a target video frame, and constructing a source graph which takes the target to be searched as a central node and other targets as context nodes and points to the central node by the context nodes based on the feature expression of each target;
acquiring feature expression of each candidate target in a candidate video frame, determining the candidate target corresponding to each target based on the feature expression of each candidate target and the feature expression of each target, and constructing a target graph which takes the candidate target corresponding to the target to be searched as a central node, takes the candidate target corresponding to other targets as context nodes and points to the central node by the context nodes, wherein the candidate target corresponding to each target is a candidate target most similar to each target;
obtaining a graph embedding vector of a source graph and a graph embedding vector of a target graph based on the twin residual graph convolution neural network;
determining a target to be searched in the candidate video frame based on at least the graph embedding vector of the source graph and the graph embedding vector of the target graph.
2. The method of claim 1, wherein the determining a target to search for in the candidate video frame based on at least the graph embedding vector of the source graph and the graph embedding vector of the target graph comprises:
calculating graph similarity of a target to be searched and a candidate target corresponding to the target to be searched based on the graph embedding vector of the source graph and the graph embedding vector of the target graph;
and determining the target to be searched in the candidate video frame based on the image similarity of the target to be searched and the candidate target corresponding to the target to be searched.
3. The method of claim 1, wherein the determining a target to search for in the candidate video frame based on at least the graph embedding vector of the source graph and the graph embedding vector of the target graph comprises:
calculating graph similarity of a target to be searched and a candidate target corresponding to the target to be searched based on the graph embedding vector of the source graph and the graph embedding vector of the target graph;
correcting the similarity between the target to be searched and the candidate target corresponding to the target to be searched based on the graph similarity between the target to be searched and the candidate target corresponding to the target to be searched so as to obtain the similarity between the target to be searched and the candidate target corresponding to the target to be searched after correction;
and determining the target to be searched in the candidate video frame based on the similarity between the corrected target to be searched and the candidate target corresponding to the target to be searched.
4. The method of claim 1, wherein the adjacency matrix of the source graph and the target graph is defined by the following equation:
Figure FDA0003152391410000021
wherein A isijBeing elements of the ith row and jth column of the contiguous matrix, q1Feature vectors, g, for the center node in the source graph1Being central nodes in the target graphFeature vector, qjIs the feature vector, g, of node j in the source graphjIs the feature vector of the node j in the target graph.
5. The method of claim 1, wherein the twin residual map convolutional neural network comprises at least two map convolutional layers, each map convolutional layer defined by the formula:
Zl+1=σ(AZlWl)
where σ (-) is the nonlinear activation operation, A is the adjacency matrix, ZlIs an input feature matrix of the l-th layer, WlLearning parameters of the l-th layer.
6. The method of claim 1, wherein obtaining the feature representation for each object in the target video frame comprises:
detecting each target in the target video frame based on the detection network;
acquiring feature expression of each target based on a neural network;
the acquiring the feature expression of each candidate target in the candidate video frame comprises the following steps:
detecting each candidate target in the candidate video frame based on the detection network;
and acquiring the feature expression of each candidate target based on the neural network.
7. An object search apparatus, comprising:
the first processing unit is used for acquiring the feature expression of each target in the target video frame, and constructing a source graph which takes the target to be searched as a central node and other targets as context nodes and points to the central node by the context nodes based on the feature expression of each target;
the second processing unit is used for acquiring feature expressions of all candidate targets in a candidate video frame, determining the candidate targets corresponding to all the targets based on the feature expressions of all the candidate targets and the feature expressions of all the targets, and constructing a target graph which takes the candidate target corresponding to the target to be searched as a central node, takes the candidate target corresponding to other targets as context nodes and points to the central node from the context nodes, wherein the candidate target corresponding to all the targets is a candidate target most similar to each target;
the acquisition unit is used for acquiring a graph embedding vector of the source graph and a graph embedding vector of the target graph based on the twin residual graph convolution neural network;
a determining unit, configured to determine a target to be searched in the candidate video frame based on at least the graph embedding vector of the source graph and the graph embedding vector of the target graph.
8. A computer device comprising at least one processor and at least one memory, wherein the memory stores a computer program that, when executed by the processor, enables the processor to perform the object search method of any of claims 1-6.
9. A computer readable storage medium having instructions which, when executed by a processor within a device, enable the device to perform the object search method of any of claims 1 to 6.
CN202110767455.6A 2021-07-07 2021-07-07 Target searching method and device Active CN113642392B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110767455.6A CN113642392B (en) 2021-07-07 2021-07-07 Target searching method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110767455.6A CN113642392B (en) 2021-07-07 2021-07-07 Target searching method and device

Publications (2)

Publication Number Publication Date
CN113642392A true CN113642392A (en) 2021-11-12
CN113642392B CN113642392B (en) 2023-11-28

Family

ID=78416809

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110767455.6A Active CN113642392B (en) 2021-07-07 2021-07-07 Target searching method and device

Country Status (1)

Country Link
CN (1) CN113642392B (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170024461A1 (en) * 2015-07-23 2017-01-26 International Business Machines Corporation Context sensitive query expansion
CN109408652A (en) * 2018-09-30 2019-03-01 北京搜狗科技发展有限公司 A kind of image searching method, device and equipment
CN110472065A (en) * 2019-07-25 2019-11-19 电子科技大学 Across linguistry map entity alignment schemes based on the twin network of GCN
CN111260688A (en) * 2020-01-13 2020-06-09 深圳大学 Twin double-path target tracking method
CN112528624A (en) * 2019-09-03 2021-03-19 阿里巴巴集团控股有限公司 Text processing method and device, search method and processor
CN112685573A (en) * 2021-01-06 2021-04-20 中山大学 Knowledge graph embedding training method and related device
CN112990295A (en) * 2021-03-10 2021-06-18 中国互联网络信息中心 Semi-supervised graph representation learning method and device based on migration learning and deep learning fusion

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170024461A1 (en) * 2015-07-23 2017-01-26 International Business Machines Corporation Context sensitive query expansion
CN109408652A (en) * 2018-09-30 2019-03-01 北京搜狗科技发展有限公司 A kind of image searching method, device and equipment
CN110472065A (en) * 2019-07-25 2019-11-19 电子科技大学 Across linguistry map entity alignment schemes based on the twin network of GCN
CN112528624A (en) * 2019-09-03 2021-03-19 阿里巴巴集团控股有限公司 Text processing method and device, search method and processor
CN111260688A (en) * 2020-01-13 2020-06-09 深圳大学 Twin double-path target tracking method
CN112685573A (en) * 2021-01-06 2021-04-20 中山大学 Knowledge graph embedding training method and related device
CN112990295A (en) * 2021-03-10 2021-06-18 中国互联网络信息中心 Semi-supervised graph representation learning method and device based on migration learning and deep learning fusion

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
凌春阳 等: "基于图嵌入的软件项目源代码检索方法", 《软件学报》, pages 1481 - 1497 *

Also Published As

Publication number Publication date
CN113642392B (en) 2023-11-28

Similar Documents

Publication Publication Date Title
Li et al. Dual-resolution correspondence networks
CN108960211B (en) Multi-target human body posture detection method and system
CN107577990B (en) Large-scale face recognition method based on GPU (graphics processing Unit) accelerated retrieval
Kejriwal et al. High performance loop closure detection using bag of word pairs
CN110728263A (en) Pedestrian re-identification method based on strong discrimination feature learning of distance selection
CN110503076B (en) Video classification method, device, equipment and medium based on artificial intelligence
CN107329962B (en) Image retrieval database generation method, and method and device for enhancing reality
CN110991321B (en) Video pedestrian re-identification method based on tag correction and weighting feature fusion
CN109919084B (en) Pedestrian re-identification method based on depth multi-index hash
CN109472191A (en) A kind of pedestrian based on space-time context identifies again and method for tracing
CN111125397A (en) Cloth image retrieval method based on convolutional neural network
CN112101156A (en) Target identification method and device and electronic equipment
CN113128518B (en) Sift mismatch detection method based on twin convolution network and feature mixing
CN110992404A (en) Target tracking method, device and system and storage medium
CN116266387A (en) YOLOV4 image recognition algorithm and system based on re-parameterized residual error structure and coordinate attention mechanism
Darmon et al. Learning to guide local feature matches
CN111241326B (en) Image visual relationship indication positioning method based on attention pyramid graph network
CN111582057B (en) Face verification method based on local receptive field
CN113642392B (en) Target searching method and device
Huang et al. Improving keypoint matching using a landmark-based image representation
CN111126436A (en) Visual matching method and device
WO2022252519A1 (en) Image processing method and apparatus, terminal, medium, and program
CN115527050A (en) Image feature matching method, computer device and readable storage medium
CN113706580B (en) Target tracking method, system, equipment and medium based on relevant filtering tracker
CN107341151B (en) Image retrieval database generation method, and method and device for enhancing reality

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant