CN116188804A - Twin network target search system based on transformer - Google Patents

Twin network target search system based on transformer Download PDF

Info

Publication number
CN116188804A
CN116188804A CN202310449364.7A CN202310449364A CN116188804A CN 116188804 A CN116188804 A CN 116188804A CN 202310449364 A CN202310449364 A CN 202310449364A CN 116188804 A CN116188804 A CN 116188804A
Authority
CN
China
Prior art keywords
target
picture
graph
query
search
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202310449364.7A
Other languages
Chinese (zh)
Other versions
CN116188804B (en
Inventor
郑艳伟
何国海
于东晓
李峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shandong University
Original Assignee
Shandong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shandong University filed Critical Shandong University
Priority to CN202310449364.7A priority Critical patent/CN116188804B/en
Publication of CN116188804A publication Critical patent/CN116188804A/en
Application granted granted Critical
Publication of CN116188804B publication Critical patent/CN116188804B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The invention belongs to the field of image retrieval and target detection in computer vision, and discloses a twin network target search system based on a transformer.

Description

Twin network target search system based on transformer
Technical Field
The invention belongs to the field of image retrieval and target detection in computer vision, and discloses a twin network target searching system based on a transformer.
Background
The computer vision means that a camera and a computer are used for replacing human eyes to perform machine vision such as recognition, tracking and measurement on targets, and further graphic processing is performed, so that the computer is processed into images which are more suitable for human eyes to observe or transmit to an instrument to detect. As a scientific discipline, computer vision research-related theory and technology has attempted to build artificial intelligence systems that can obtain 'information' from images or multidimensional data. Perception can be seen as the extraction of information from sensory signals, so computer vision can also be seen as science of how to "perceive" an artificial system from images or multidimensional data. Image processing techniques convert an input image into another image having desired characteristics. Image processing technology is often utilized in computer vision research to perform preprocessing and feature extraction, so that the computer has the capabilities of vision, hearing, speaking and the like.
Object detection and recognition are widely used in many fields of life, and is to distinguish objects in images or videos from uninteresting parts, determine whether the objects exist, determine the positions of the objects if the objects exist, and recognize the objects as a computer vision task. Target detection and recognition are very important research directions in the field of computer vision, along with the rapid development of the Internet, artificial intelligence technology and intelligent hardware, a large amount of image and video data exist in human life, so that the computer vision technology plays a larger and larger role in human life, and research on computer vision is also getting more and more hot. Object detection and recognition are also becoming increasingly important as a cornerstone in the field of computer vision. Because the current demand for the target retrieval system is larger and larger, and the technical development of the target retrieval system is slower, a mature target retrieval system is urgently needed, and the actual target retrieval problem is accurately solved, so the system is developed.
Disclosure of Invention
In order to solve the technical problems, the invention provides a twin network target retrieval system based on a transducer, which utilizes a camera to monitor targets and combines a method of image retrieval and target detection in computer vision to realize the retrieval and display of targets in a monitoring area of the camera.
In order to achieve the above purpose, the technical scheme of the invention is as follows:
a transformer-based twin network target search system, comprising the steps of:
(1) Collecting image data as a graph to be searched; extracting an interested target from part of the graphs to be searched, taking the interested target as a query graph, and designing and twinning a network target searching training model;
(2) Selecting a camera area, selecting a camera group to determine a search area, and inputting a target picture to be searched;
(3) Starting a search task, acquiring scene pictures from the inside of a camera at equal time intervals in a video frame taking mode, detecting the pictures through a model, detecting each target, comparing the target with the picture to be searched as a characteristic, calculating the matching degree of the targets, taking the maximum value, and if the matching degree exceeds a set threshold value, searching the picture sequence number
Figure SMS_1
Adding into a result queue;
(4) If the result queue has new records, the current detection picture is stored in a static resource directory set by a background server, information is stored in a database, and the front-end interface screens and displays search result information of a corresponding target from the database according to requirements.
Further, the specific method of the step (1) is as follows:
(1.1) acquisition of n Zhang Sousuo plot
Figure SMS_3
The default size of each search graph is 224 x 224, and the number of targets in the n search graphs is +.>
Figure SMS_7
Cut out the inquiry graph to be->
Figure SMS_11
Scaling each query graph from its original size to 56 x 56, denoted +.>
Figure SMS_2
Then for each sheetManually classifying the query graphs, classifying the query graphs of the same target into one type, and marking the query graphs as +.>
Figure SMS_6
Placing each query graph into the corresponding class of folders corresponding to the count folders, and then establishing a dictionary +.>
Figure SMS_9
,/>
Figure SMS_12
The value corresponds to each search graph and is marked as +.>
Figure SMS_4
,/>
Figure SMS_5
Each of->
Figure SMS_8
Value corresponding +.>
Figure SMS_10
Class names of all targets exist in the current search graph;
(1.2) designing a twin network target search model, wherein a model feature extraction backbone is divided into a vit1 and a vit2, and the vit1 is used for extracting features of a search graph; then 16 query graphs are selected, and the selection rules are as follows: querying index of current search graph
Figure SMS_13
Randomly selecting 4 query graphs from the category folders which are not currently indexed, and selecting 12 query graphs from all the category files which are indexed, wherein 3 query graphs are randomly selected from each category folder, and if 12 query graphs can be selected, splicing the 16 query graphs of 56 x 56 into one 224 x 224 graph in random sequence; the vit2 is used for extracting the characteristics of a series of corresponding query graphs in the query splice graph, and the vit1 and the vit2 share the weight;
(1.3) the features extracted by vit1 are obtained by the DETR target detection head
Figure SMS_14
The DETR target detection head is used for predicting the position of each target in the search graph, and the extracted characteristics of the vit1 and the vit2 jointly obtain a +.>
Figure SMS_15
Figure SMS_16
And->
Figure SMS_17
The combination is performed by a proportional relationship.
Further, in step (1.2): if 12 query graphs cannot be selected, randomly selecting one query graph from the selected query graphs each time by using a data enhancement mode, generating a new query graph with the size of 56 x 56 by a turnover or rotation mode, repeating the data enhancement operation until the total number of the query graphs reaches 16, ending the data enhancement operation, and then splicing the 16 query graphs with the size of 56 x 56 into a 224 x 224 graph, wherein the newly spliced graph is named as a query splice graph and is recorded as
Figure SMS_18
Further, in the step (1.2),
(1.2.1) adding a DETR target detection head, which can detect and frame each target from each diagram to be searched, and obtain the coordinates of each target;
(1.2.2) data are divided into n groups, each group being
Figure SMS_20
,/>
Figure SMS_26
,/>
Figure SMS_29
Wherein->
Figure SMS_19
Is the firstuZhang Sousuo figure->
Figure SMS_23
Is the firstvZhang Chaxun splice; />
Figure SMS_27
Extraction of features by vit1>
Figure SMS_31
Then, by the DETR target detection head, the +.>
Figure SMS_22
Feature vector of individual object->
Figure SMS_24
Scaling the feature vectors of the m targets to feature dimensions of 56 x 384, the corresponding feature vector being +.>
Figure SMS_28
,/>
Figure SMS_32
Extracting features by vit2>
Figure SMS_21
Due to->
Figure SMS_25
Is formed by splicing 16 query graphs with the size of 56 x 56, namely, the corresponding 16 feature vectors can be obtained by extracting features according to fixed coordinate positions>
Figure SMS_30
(1.2.3) feature vectors generated for search graphs
Figure SMS_33
Characteristic vector generated by inquiring splice diagram>
Figure SMS_34
When two are compared, and the two belong to the same category,defining as positive samples, defining as negative samples when the two samples do not belong to the same category, and defining a loss function by adopting a cosine distance formula:
Figure SMS_35
(equation 1);
Figure SMS_36
(equation 2); />
Figure SMS_37
(equation 3);
Figure SMS_38
,/>
Figure SMS_39
,/>
Figure SMS_40
,/>
Figure SMS_41
,/>
Figure SMS_42
to search for a graphuIs the first of (2)δFeature vectors of the target; />
Figure SMS_43
Splice graph for inquiryvIs the first of (2)ηA feature vector;
when the input network is positive, the loss is calculated by using formula 1, and two eigenvectors are required to be made
Figure SMS_44
The smaller the distance between them, resulting in +.>
Figure SMS_45
The smaller, when the input network is a negative sample, the loss is calculated using equation 2, requiring two eigenvectors +.>
Figure SMS_46
The greater the distance between them, resulting in +.>
Figure SMS_47
The smaller the final +.>
Figure SMS_48
The smaller.
Further, the specific method of step (1.3) is as follows:
(1.3.1) let the single target in vit1 pass through the DETR target detection head output loss be
Figure SMS_49
The detection head is arranged to obtainkThe probability of each detection frame is +.>
Figure SMS_50
The result box number is:
Figure SMS_51
(equation 4);
(1.3.2) selecting num number detection frame region to record as A, setting preset anchor region as B, wherein A and B ensure intersection region, B does not completely contain A, and supposing
Figure SMS_52
Or->
Figure SMS_53
Region (2)>
Figure SMS_54
Representative area->
Figure SMS_55
Is a part of the area of (2);
Figure SMS_56
(equation 5);
(1.3.3) features of vit1 and vit2 together obtain one
Figure SMS_58
Define benchmark->
Figure SMS_63
The eigenvector of vit1 is defined as +.>
Figure SMS_66
The eigenvector of vit2 is defined as +.>
Figure SMS_59
,/>
Figure SMS_60
,/>
Figure SMS_62
And->
Figure SMS_65
Is a parameter which can be learned, and when all search graphs and corresponding query graphs are input in groups, the method is characterized in that +.>
Figure SMS_57
To get as close to 0 as possible, otherwise, let +.>
Figure SMS_61
As close to 1 as possible, +.>
Figure SMS_64
(1.3.4) determining the final loss:
Figure SMS_67
wherein->
Figure SMS_68
Further, the specific method of the step (2) is as follows:
(2.1) establishing a new process for the current task, adding the current process ID into a process queue, starting the current process, and preparing to execute the target search task;
(2.2) when the program is startedCamera groups requiring selection of corresponding areas at the front end, assuming selectionqCorresponding to each camera
Figure SMS_69
And detects whether the target picture has been added +.>
Figure SMS_70
And (5) successfully starting the system when the condition is met.
Further, the specific method of the step (3) is as follows:
(3.1) transmitting a starting command to the front end to start the target searching module;
(3.2) running a video frame taking module forqEach camera takes out a picture to be detected, and the picture name is that of the cameras
Figure SMS_72
Each picture generates a feature vector of +.>
Figure SMS_79
Inside each picture +.>
Figure SMS_83
Target (/ ->
Figure SMS_74
) The corresponding feature vectors are respectively
Figure SMS_77
Scaling to feature vector +.>
Figure SMS_80
The same dimension is +.>
Figure SMS_82
Then, feature vector of target picture +.>
Figure SMS_73
Performing feature comparison to calculate matching degree, and generating a feature matching degree hash table ++>
Figure SMS_75
,/>
Figure SMS_78
Recording picture number->
Figure SMS_81
Maximum matching degree of ∈10->
Figure SMS_71
Recording picture number->
Figure SMS_76
Position coordinates of the highest matching degree target area, wherein each value is:
Figure SMS_84
(formula 6); ->
Figure SMS_85
(equation 7);
wherein the method comprises the steps of
Figure SMS_86
Represents the abscissa of the center of the region, +.>
Figure SMS_87
Represents the ordinate of the center of the region,/>
Figure SMS_88
Representing the height of the area>
Figure SMS_89
Representing the area width;
(3.3) setting the threshold value as y, and selecting the picture sequence number with the value exceeding y from maps
Figure SMS_90
Picture order->
Figure SMS_91
Add to the Result pair column Result.
Further, the specific method of step (3.3) is as follows:
(3.3.1) traversing the current Map, for each
Figure SMS_92
If->
Figure SMS_93
Indicating that the current picture is an effective scene picture containing the target picture characteristics, and recording the sequence number of the picture +.>
Figure SMS_94
(3.3.2) performing an addition operation to the selected picture order, and adding a newly generated picture order at the end of the Result pair column Result
Figure SMS_95
,/>
Figure SMS_96
And finally, returning a Result pair column Result.
Further, the specific method of the step (4) is as follows:
(4.1) detecting the result queue, if a new record is generated, obtaining the current picture sequence number
Figure SMS_98
The current picture number +.>
Figure SMS_101
Storing in preset server static folder, and generating picture sequence number +.>
Figure SMS_103
Is +.>
Figure SMS_97
The camera is +.>
Figure SMS_100
Target matching degree->
Figure SMS_102
Coordinates of the object->
Figure SMS_105
Target picture name->
Figure SMS_99
And access address->
Figure SMS_104
Writing into a database;
(4.2) the front end screens and displays the current target picture in real time by setting the searching condition
Figure SMS_106
Corresponding search result information.
Compared with the prior art, the invention has the following beneficial effects:
(1) According to the invention, the twin network is adopted, and the query is introduced for training, so that the trained model is more accurate and has pertinence.
(2) According to the invention, the vision transformer model and the DETR target detection head are introduced to train the target retrieval model, train the current target retrieval model end to end, complete the tasks of detection and retrieval, and better promote the accuracy of the model.
(3) The invention screens and displays the condition of the current target picture in the monitoring area in real time through the front end and updates the current target picture in real time.
Drawings
Fig. 1 is an overall schematic diagram of a twin network target retrieval system based on a transducer according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention.
The invention provides a twin network target retrieval system based on a transformer, which can utilize a camera to monitor targets and combine the methods of image retrieval and target detection in computer vision to realize the retrieval and display of the targets in a monitoring area of the camera as shown in figure 1.
The DETR, collectively DEtection TRansforme, is a Transformer-based end-to-end target detection network proposed by Facebook. the transducer is an attention model and is used in the field of natural language processing; vision transformer is an attention model in the computer vision field, which is a migration application of a transducer model, and is improved to be applied to the computer vision field, so that the image processing is completed.
Specific examples are as follows:
a transformer-based twin network target retrieval system, comprising the steps of:
(1) Data acquisition and model design stage:
(1.1) acquisition of n Zhang Sousuo plot
Figure SMS_108
The default size of each search graph is 224 x 224, and the number of targets in the n search graphs is +.>
Figure SMS_112
Figure SMS_116
Cutting out a query graph to be->
Figure SMS_109
Scaling each query graph from its original size to 56 x 56, denoted +.>
Figure SMS_111
Then manually classifying each query graph, classifying the query graph of the same target into one type, and marking the query graph as +.>
Figure SMS_114
Placing each query graph into the corresponding class of folders corresponding to the count folders, and then establishing a dictionary +.>
Figure SMS_117
,/>
Figure SMS_110
The value corresponds to each search graph and is marked as +.>
Figure SMS_113
,/>
Figure SMS_115
Each of->
Figure SMS_118
Value corresponding +.>
Figure SMS_107
Class names (same file folder names) for all targets exist in the current search graph.
(1.2) designing a twin network target search model, wherein a model feature extraction backbone is divided into a vit1 and a vit2 (both vit1 and vit2 are based on vision transformer), and the vit1 is used for extracting features of a search graph; then 16 query graphs are selected, and the selection rules are as follows: querying index of current search graph
Figure SMS_119
Randomly selecting 4 query graphs from the category folders which are not currently indexed, and selecting 12 query graphs from all the category files which are indexed, wherein 3 query graphs are randomly selected from each category folder, and if 12 query graphs can be selected, splicing the 16 query graphs of 56 x 56 into one 224 x 224 graph in random sequence; if 12 query graphs cannot be selected, randomly selecting one query graph from the selected query graphs each time by using a data enhancement mode, generating a new query graph with the size of 56 x 56 by a turnover or rotation mode, repeating the data enhancement operation until the total number of the query graphs reaches 16, ending the data enhancement operation, and then splicing the 16 query graphs with the size of 56 x 56 into 224 x 224 pictures, wherein the newly spliced picture is named as a query splicing picture and is named as a query splicing picture
Figure SMS_120
The vit2 is used for extracting the characteristics of a series of corresponding query graphs in the query splice graph, and the vit1 and the vit2 perform weight sharing to further improve the accuracy of the network, and the specific operation is as follows:
(1.2.1) adding a DETR target detection head, which is a functional module for detecting the position of a target in a picture, which can predict the position of each target in a current picture, by which each target can be detected and framed from each picture to be searched, and the coordinates of each target can be obtained.
(1.2.2) data are divided into n groups, each group being
Figure SMS_121
,/>
Figure SMS_126
,/>
Figure SMS_130
Wherein->
Figure SMS_122
Is the firstuZhang Sousuo figure->
Figure SMS_125
Is the firstvZhang Chaxun splice; />
Figure SMS_129
Extraction of features by vit1>
Figure SMS_134
Then, by the DETR target detection head, the +.>
Figure SMS_124
Feature vector of individual object->
Figure SMS_127
Scaling the feature vectors of m targets to the feature dimension of 56 x 384 through an ROI Pooling operation, wherein the corresponding feature vector is +.>
Figure SMS_131
,/>
Figure SMS_133
Extracting features by vit2>
Figure SMS_123
Due to->
Figure SMS_128
Is formed by splicing 16 query graphs with the size of 56 x 56, namely, the corresponding 16 feature vectors can be obtained by extracting features according to fixed coordinate positions>
Figure SMS_132
(1.2.3) feature vectors generated for search graphs
Figure SMS_135
Characteristic vector generated by inquiring splice diagram>
Figure SMS_136
And comparing every two samples, defining positive samples when the two samples belong to the same category, defining negative samples when the two samples do not belong to the same category, and defining a loss function by adopting a cosine distance formula:
Figure SMS_137
(equation 1); />
Figure SMS_138
(equation 2);
Figure SMS_139
(equation 3);
Figure SMS_142
,/>
Figure SMS_144
,/>
Figure SMS_147
to search for a graphuIs the first of (2)δFeature vectors of the target; />
Figure SMS_140
Splice graph for inquiryvIs the first of (2)ηA feature vector; when the input network is positive, the loss is calculated using equation 1, requiring two eigenvectors +.>
Figure SMS_143
The smaller the distance between them, resulting in +.>
Figure SMS_145
The smaller, when the input network is a negative sample, the loss is calculated using equation 2, requiring two eigenvectors +.>
Figure SMS_148
The greater the distance between them, resulting in +.>
Figure SMS_141
The smaller the final +.>
Figure SMS_146
The smaller.
(1.3) the features extracted by vit1 are passed through a DETR target detection head (the function of the DETR target detection head is to predict the position of each target in the search map), obtaining one
Figure SMS_149
The extracted features of vit1 and vit2 jointly obtain one
Figure SMS_150
,/>
Figure SMS_151
And->
Figure SMS_152
The combination is carried out through the proportion relation, and the specific operation is as follows:
(1.3.1) Let the output loss of a single target in vit1 through the DETR target detection head be
Figure SMS_153
The detection head is arranged to obtainkThe probability of each detection frame is +.>
Figure SMS_154
The result box number is:
Figure SMS_155
(equation 4);
(1.3.2) selecting num number detection frame region to record as A, setting preset anchor region as B, wherein A and B ensure intersection region, B does not completely contain A, and supposing
Figure SMS_156
Or->
Figure SMS_157
Region (2)>
Figure SMS_158
Representative area->
Figure SMS_159
Is a part of the area of (2);
Figure SMS_160
(equation 5);
(1.3.3) features of vit1 and vit2 together obtain one
Figure SMS_163
Define benchmark->
Figure SMS_167
The eigenvector of vit1 is defined as +.>
Figure SMS_170
The eigenvector of vit2 is defined as +.>
Figure SMS_162
,/>
Figure SMS_164
,/>
Figure SMS_166
And->
Figure SMS_169
Is a parameter which can be learned, and when all search graphs and corresponding query graphs are input in groups, the method is characterized in that +.>
Figure SMS_161
To get as close to 0 as possible, otherwise, let +.>
Figure SMS_165
As close to 1 as possible, +.>
Figure SMS_168
(1.3.4) determining the final loss:
Figure SMS_171
wherein->
Figure SMS_172
Can be adjusted according to the requirement; at present we use +.>
Figure SMS_173
Is a value of (a).
(2) Switch setting and zone setting phase:
and (2.1) establishing a new process for the current task, adding the current process ID into a process queue, starting the current process, and preparing to execute the target search task.
(2.2) when the program is started, the camera group of the corresponding region needs to be selected at the front end, assuming that the selection is performedqCorresponding to each camera
Figure SMS_174
And detects whether the target picture has been added +.>
Figure SMS_175
And (5) successfully starting the system when the condition is met.
(3) Model detection processing stage:
(3.1) transmitting a starting command to the front end to start the target searching module;
(3.2) running a video frame taking module forqEach camera takes out a picture to be detected, and the picture name is that of the cameras
Figure SMS_179
Each picture generates a feature vector of +.>
Figure SMS_181
Inside each picture +.>
Figure SMS_185
Target (/ ->
Figure SMS_177
) The corresponding feature vectors are +.>
Figure SMS_180
Through an ROI Pooling operation (scaling the feature vector), scaling to the feature vector +_with the target picture>
Figure SMS_184
The same dimension is +.>
Figure SMS_187
Then, feature vector of target picture +.>
Figure SMS_176
Performing feature comparison to calculate matching degree, and generating a feature matching degree hash table ++>
Figure SMS_182
,/>
Figure SMS_186
Recording picture number->
Figure SMS_188
Maximum matching degree of ∈10->
Figure SMS_178
Recording picture number->
Figure SMS_183
Position coordinates of the highest matching degree target area, wherein each value is:
Figure SMS_189
(equation 6);
Figure SMS_190
(equation 7);
wherein the method comprises the steps of
Figure SMS_191
Represents the abscissa of the center of the region, +.>
Figure SMS_192
Represents the ordinate of the center of the region,/>
Figure SMS_193
Representing the height of the area>
Figure SMS_194
Representing the area width.
(3.3) setting the threshold toySelecting a value from Map to exceedyPicture sequence number of (2)
Figure SMS_195
Picture order->
Figure SMS_196
Adding to a Result pair column Result, wherein the specific operation is as follows:
(3.3.1) traversing the current Map, for each
Figure SMS_197
If->
Figure SMS_198
Indicating that the current picture is an effective scene picture containing target picture characteristics, recording picture sequence number +.>
Figure SMS_199
(3.3.2) performing an addition operation to the selected picture order, and adding a newly generated picture order at the end of the Result pair column Result
Figure SMS_200
,/>
Figure SMS_201
And finally, returning a Result pair column Result.
(4) Storage and display stage:
(4.1) detecting the result queue, if a new record is generated, obtaining the current picture sequence number
Figure SMS_204
Picture +.>
Figure SMS_206
Storing in preset server static folder, and generating picture sequence number +.>
Figure SMS_208
Is +.>
Figure SMS_203
The camera is +.>
Figure SMS_207
Degree of matching of targets
Figure SMS_209
Coordinates of the object->
Figure SMS_210
Target picture name->
Figure SMS_202
And access address->
Figure SMS_205
Writing into a database.
(4.2) the front end screens and displays the current target picture in real time by setting the searching condition
Figure SMS_211
Corresponding search result information.
According to the method, under the scene of deploying the camera, the searching and displaying of the targets in the monitoring area of the camera can be realized, a twin network is introduced, a mode of vision transformer model and DETR target detecting head is introduced, and the methods of result queue, area loss calculation, matching degree calculation, camera area selection and the like are compatible with the use of various types of visible light cameras, and the method has high robustness, screens and displays the conditions of the current target picture in the monitoring area at the front end, and updates the current target picture in real time.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (9)

1. A transformer-based twin network target search system, comprising the steps of:
(1) Collecting image data as a graph to be searched; extracting an interested target from part of the graphs to be searched, taking the interested target as a query graph, and designing and twinning a network target searching training model;
(2) Selecting a camera area, selecting a camera group to determine a search area, and inputting a target picture to be searched;
(3) Search task initiation, framing by videoScene pictures are obtained from the camera at equal time intervals in a mode, the pictures are detected through a model, each target is detected, the targets are compared with the to-be-searched picture in characteristics, the matching degree of the targets is calculated, the maximum value is taken, and if the matching degree exceeds a set threshold value, the picture number is searched
Figure QLYQS_1
Adding into a result queue;
(4) If the result queue has new records, the current detection picture is stored in a static resource directory set by a background server, information is stored in a database, and the front-end interface screens and displays search result information of a corresponding target from the database according to requirements.
2. The transformer-based twin network target search system of claim 1, wherein the specific method of step (1) is as follows:
(1.1) acquisition of n Zhang Sousuo plot
Figure QLYQS_3
The default size of each search graph is 224 x 224, and the number of targets in the n search graphs is +.>
Figure QLYQS_6
Cut out the inquiry graph to be->
Figure QLYQS_9
Scaling each query graph from its original size to 56 x 56, denoted +.>
Figure QLYQS_4
Then manually classifying each query graph, classifying the query graph of the same target into one type, and marking the query graph as +.>
Figure QLYQS_7
Placing each query graph into the corresponding class of folders corresponding to the count folders, and then establishing a dictionary +.>
Figure QLYQS_8
,/>
Figure QLYQS_11
The value corresponds to each search graph and is marked as +.>
Figure QLYQS_2
,/>
Figure QLYQS_5
Each of->
Figure QLYQS_10
Value corresponding +.>
Figure QLYQS_12
Class names of all targets exist in the current search graph;
(1.2) designing a twin network target search model, wherein a model feature extraction backbone is divided into a vit1 and a vit2, and the vit1 is used for extracting features of a search graph; then 16 query graphs are selected, and the selection rules are as follows: querying index of current search graph
Figure QLYQS_13
Randomly selecting 4 query graphs from the category folders which are not currently indexed, and selecting 12 query graphs from all the category files which are indexed, wherein 3 query graphs are randomly selected from each category folder, and if 12 query graphs can be selected, splicing the 16 query graphs of 56 x 56 into one 224 x 224 graph in random sequence; the vit2 is used for extracting the characteristics of a series of corresponding query graphs in the query splice graph, and the vit1 and the vit2 share the weight;
(1.3) the features extracted by vit1 are obtained by the DETR target detection head
Figure QLYQS_14
The DETR target detection head is used for predicting the position of each target in the search graph, and the characteristics extracted by the vit1 and the vit2 are obtained togetherGet a +.>
Figure QLYQS_15
Figure QLYQS_16
And->
Figure QLYQS_17
The combination is performed by a proportional relationship.
3. The transformer-based twin network target search system of claim 2, wherein in step (1.2): if 12 query graphs cannot be selected, randomly selecting one query graph from the selected query graphs each time by using a data enhancement mode, generating a new query graph with the size of 56 x 56 by a turnover or rotation mode, repeating the data enhancement operation until the total number of the query graphs reaches 16, ending the data enhancement operation, and then splicing the 16 query graphs with the size of 56 x 56 into a 224 x 224 graph, wherein the newly spliced graph is named as a query splice graph and is recorded as
Figure QLYQS_18
4. The method of claim 2, wherein in step (1.2),
(1.2.1) adding a DETR target detection head, which can detect and frame each target from each diagram to be searched, and obtain the coordinates of each target;
(1.2.2) data are divided into n groups, each group being
Figure QLYQS_20
Wherein->
Figure QLYQS_25
Is the firstuZhang Sousuo figure->
Figure QLYQS_28
Is the firstvZhang Chaxun splice; />
Figure QLYQS_22
Extraction of features by vit1>
Figure QLYQS_24
Then, by the DETR target detection head, the +.>
Figure QLYQS_27
Feature vector of individual object->
Figure QLYQS_30
Scaling the feature vectors of the m targets to feature dimensions of 56 x 384, the corresponding feature vector being +.>
Figure QLYQS_19
,/>
Figure QLYQS_23
Extracting features by vit2>
Figure QLYQS_26
Due to->
Figure QLYQS_29
Is formed by splicing 16 query graphs with the size of 56 x 56, namely, the corresponding 16 feature vectors can be obtained by extracting features according to fixed coordinate positions
Figure QLYQS_21
(1.2.3) feature vectors generated for search graphs
Figure QLYQS_31
Feature vectors generated by querying a mosaic
Figure QLYQS_32
And comparing every two samples, defining positive samples when the two samples belong to the same category, defining negative samples when the two samples do not belong to the same category, and defining a loss function by adopting a cosine distance formula:
Figure QLYQS_33
(equation 1);
Figure QLYQS_34
(equation 2);
Figure QLYQS_35
(equation 3);
Figure QLYQS_36
;/>
Figure QLYQS_39
to search for a graphuIs the first of (2)δFeature vectors of the target; />
Figure QLYQS_41
Splice graph for inquiryvIs the first of (2)ηA feature vector; when the input network is positive, the loss is calculated using equation 1, requiring two eigenvectors +.>
Figure QLYQS_37
The smaller the distance between them, resulting in +.>
Figure QLYQS_40
The smaller, when the input network is a negative sample, the loss is calculated using equation 2, requiring two eigenvectors +.>
Figure QLYQS_42
The greater the distance between them, resulting in
Figure QLYQS_43
The smaller the final +.>
Figure QLYQS_38
The smaller.
5. The transformer-based twin network target search system of claim 2, wherein the specific method of step (1.3) is as follows:
(1.3.1) let the single target in vit1 pass through the DETR target detection head output loss be
Figure QLYQS_44
The detection head is arranged to obtainkThe probability of each detection frame is +.>
Figure QLYQS_45
The result box number is:
Figure QLYQS_46
(equation 4);
(1.3.2) selecting num number detection frame region to record as A, setting preset anchor region as B, wherein A and B ensure intersection region, B does not completely contain A, and supposing
Figure QLYQS_47
Or->
Figure QLYQS_48
Region (2)>
Figure QLYQS_49
Representative area->
Figure QLYQS_50
Is a part of the area of (2);
Figure QLYQS_51
(equation 5);
(1.3.3) features of vit1 and vit2 together obtain one
Figure QLYQS_53
Define benchmark->
Figure QLYQS_56
The eigenvector of vit1 is defined as +.>
Figure QLYQS_59
The eigenvector of vit2 is defined as +.>
Figure QLYQS_52
,/>
Figure QLYQS_55
,/>
Figure QLYQS_58
And->
Figure QLYQS_61
Is a parameter which can be learned, and when all search graphs and corresponding query graphs are input in groups, the method is characterized in that +.>
Figure QLYQS_54
To get as close to 0 as possible, otherwise, let +.>
Figure QLYQS_57
As close to 1 as possible, +.>
Figure QLYQS_60
(1.3.4) determining the final loss:
Figure QLYQS_62
wherein->
Figure QLYQS_63
6. The method of claim 2, wherein the specific method of step (2) is as follows:
(2.1) establishing a new process for the current task, adding the current process ID into a process queue, starting the current process, and preparing to execute the target search task;
(2.2) when the program is started, the camera group of the corresponding region needs to be selected at the front end, assuming that the selection is performedqThe cameras are corresponding to each other,
Figure QLYQS_64
and detects whether the target picture has been added +.>
Figure QLYQS_65
And (5) successfully starting the system when the condition is met.
7. The transformer-based twin network target search system of claim 1, wherein the specific method of step (3) is as follows:
(3.1) transmitting a starting command to the front end to start the target searching module;
(3.2) running a video frame taking module forqEach camera takes out a picture to be detected, and the picture name is that of the cameras
Figure QLYQS_67
Each picture generates a feature vector of +.>
Figure QLYQS_69
Inside each picture +.>
Figure QLYQS_72
Target (/ ->
Figure QLYQS_66
) Corresponding toIs +.>
Figure QLYQS_70
Scaling to feature vector +.>
Figure QLYQS_73
The same dimension is respectively
Figure QLYQS_75
Then, feature vector of target picture +.>
Figure QLYQS_68
Performing feature comparison to calculate matching degree, and generating a feature matching degree hash table ++>
Figure QLYQS_71
,/>
Figure QLYQS_76
Recording picture number->
Figure QLYQS_78
Maximum matching degree of ∈10->
Figure QLYQS_74
Recording picture number->
Figure QLYQS_77
Position coordinates of the highest matching degree target area, wherein each value is:
Figure QLYQS_79
(equation 6);
Figure QLYQS_80
(equation 7);
wherein the method comprises the steps of
Figure QLYQS_81
Represents the abscissa of the center of the region, +.>
Figure QLYQS_82
Represents the ordinate of the center of the region,/>
Figure QLYQS_83
Representing the height of the area and,
Figure QLYQS_84
representing the area width;
(3.3) setting the threshold toySelecting a value from Map to exceedyPicture sequence number of (2)
Figure QLYQS_85
Picture order->
Figure QLYQS_86
Add to the Result pair column Result.
8. The transformer-based twin network target search system of claim 7, wherein the specific method of step (3.3) is as follows:
(3.3.1) traversing the current Map, for each
Figure QLYQS_87
If->
Figure QLYQS_88
Indicating that the current picture is an effective scene picture containing target picture characteristics, recording picture sequence number +.>
Figure QLYQS_89
(3.3.2) performing an addition operation to the selected picture order, and adding a newly generated picture order at the end of the Result pair column Result
Figure QLYQS_90
,/>
Figure QLYQS_91
And finally, returning a Result pair column Result.
9. The transformer-based twin network target search system of claim 7, wherein the specific method of step (4) is as follows:
(4.1) detecting the result queue, if a new record is generated, obtaining the current picture sequence number
Figure QLYQS_92
The current picture number +.>
Figure QLYQS_96
Storing in preset server static folder, and generating picture sequence number +.>
Figure QLYQS_98
Is +.>
Figure QLYQS_93
The camera is +.>
Figure QLYQS_97
Target matching degree->
Figure QLYQS_99
Coordinates of the object->
Figure QLYQS_100
Target picture name->
Figure QLYQS_94
And access address->
Figure QLYQS_95
Writing into a database;
(4.2) the front end is real by setting the search conditionTime screening and displaying current target picture
Figure QLYQS_101
Corresponding search result information. />
CN202310449364.7A 2023-04-25 2023-04-25 Twin network target search system based on transformer Active CN116188804B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310449364.7A CN116188804B (en) 2023-04-25 2023-04-25 Twin network target search system based on transformer

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310449364.7A CN116188804B (en) 2023-04-25 2023-04-25 Twin network target search system based on transformer

Publications (2)

Publication Number Publication Date
CN116188804A true CN116188804A (en) 2023-05-30
CN116188804B CN116188804B (en) 2023-07-04

Family

ID=86449298

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310449364.7A Active CN116188804B (en) 2023-04-25 2023-04-25 Twin network target search system based on transformer

Country Status (1)

Country Link
CN (1) CN116188804B (en)

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060120627A1 (en) * 2004-12-07 2006-06-08 Canon Kabushiki Kaisha Image search apparatus, image search method, program, and storage medium
US20180260415A1 (en) * 2017-03-10 2018-09-13 Xerox Corporation Instance-level image retrieval with a region proposal network
CN111179307A (en) * 2019-12-16 2020-05-19 浙江工业大学 Visual target tracking method for full-volume integral and regression twin network structure
CN112883928A (en) * 2021-03-26 2021-06-01 南通大学 Multi-target tracking algorithm based on deep neural network
CN113240716A (en) * 2021-05-31 2021-08-10 西安电子科技大学 Twin network target tracking method and system with multi-feature fusion
CN113744311A (en) * 2021-09-02 2021-12-03 北京理工大学 Twin neural network moving target tracking method based on full-connection attention module
US20220172455A1 (en) * 2020-12-01 2022-06-02 Accenture Global Solutions Limited Systems and methods for fractal-based visual searching
CN114821390A (en) * 2022-03-17 2022-07-29 齐鲁工业大学 Twin network target tracking method and system based on attention and relationship detection
CN115588030A (en) * 2022-09-27 2023-01-10 湖北工业大学 Visual target tracking method and device based on twin network
US20230050679A1 (en) * 2021-07-29 2023-02-16 Novateur Research Solutions System and method for rare object localization and search in overhead imagery

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060120627A1 (en) * 2004-12-07 2006-06-08 Canon Kabushiki Kaisha Image search apparatus, image search method, program, and storage medium
US20180260415A1 (en) * 2017-03-10 2018-09-13 Xerox Corporation Instance-level image retrieval with a region proposal network
CN111179307A (en) * 2019-12-16 2020-05-19 浙江工业大学 Visual target tracking method for full-volume integral and regression twin network structure
US20220172455A1 (en) * 2020-12-01 2022-06-02 Accenture Global Solutions Limited Systems and methods for fractal-based visual searching
CN112883928A (en) * 2021-03-26 2021-06-01 南通大学 Multi-target tracking algorithm based on deep neural network
CN113240716A (en) * 2021-05-31 2021-08-10 西安电子科技大学 Twin network target tracking method and system with multi-feature fusion
US20230050679A1 (en) * 2021-07-29 2023-02-16 Novateur Research Solutions System and method for rare object localization and search in overhead imagery
CN113744311A (en) * 2021-09-02 2021-12-03 北京理工大学 Twin neural network moving target tracking method based on full-connection attention module
CN114821390A (en) * 2022-03-17 2022-07-29 齐鲁工业大学 Twin network target tracking method and system based on attention and relationship detection
CN115588030A (en) * 2022-09-27 2023-01-10 湖北工业大学 Visual target tracking method and device based on twin network

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
ENG-JON ONG ETC.: "Siamese Network of Deep Fisher-Vector Descriptors for Image Retrieval", 《 COMPUTER VISION AND PATTERN RECOGNITION》, pages 1 - 12 *
YUANYUN WANG ETC.: "Depthwise Over-parameterized Siamese Network for Visual Tracking", 《2021 INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY AND BIOMEDICAL ENGINEERING (ICITBE)》, pages 58 - 62 *
张俊: "基于孪生全卷积网络的单目标跟踪方法研究", 《中国优秀硕士学位论文全文数据库(信息科技辑)》, pages 138 - 1415 *
王梦亭 等: "基于孪生网络的单目标跟踪算法综述", 《计算机应用》, pages 661 - 673 *

Also Published As

Publication number Publication date
CN116188804B (en) 2023-07-04

Similar Documents

Publication Publication Date Title
KR101346730B1 (en) System, apparatus, method, program and recording medium for processing image
US6246790B1 (en) Image indexing using color correlograms
CN111460968B (en) Unmanned aerial vehicle identification and tracking method and device based on video
KR101417548B1 (en) Method and system for generating and labeling events in photo collections
US8027541B2 (en) Image organization based on image content
US7043474B2 (en) System and method for measuring image similarity based on semantic meaning
JP5934653B2 (en) Image classification device, image classification method, program, recording medium, integrated circuit, model creation device
US20080247610A1 (en) Apparatus, Method and Computer Program for Processing Information
US20130121535A1 (en) Detection device and method for transition area in space
KR20070079330A (en) Display control apparatus, display control method, computer program, and recording medium
Meng et al. Object instance search in videos via spatio-temporal trajectory discovery
US9665773B2 (en) Searching for events by attendants
US20140193048A1 (en) Retrieving Visual Media
WO2022127814A1 (en) Method and apparatus for detecting salient object in image, and device and storage medium
CN112464775A (en) Video target re-identification method based on multi-branch network
CN113723558A (en) Remote sensing image small sample ship detection method based on attention mechanism
US10991085B2 (en) Classifying panoramic images
CN116188804B (en) Twin network target search system based on transformer
Piramanayagam et al. Shot boundary detection and label propagation for spatio-temporal video segmentation
JP6778625B2 (en) Image search system, image search method and image search program
JP2018194956A (en) Image recognition dive, method and program
Arnold et al. Automatic Identification and Classification of Portraits in a Corpus of Historical Photographs
CN102436487B (en) Optical flow method based on video retrieval system
WO2015185479A1 (en) Method of and system for determining and selecting media representing event diversity
Khan et al. A Fused LBP Texture Descriptor-Based Image Retrieval System

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant