CN116188804A - Twin network target search system based on transformer - Google Patents
Twin network target search system based on transformer Download PDFInfo
- Publication number
- CN116188804A CN116188804A CN202310449364.7A CN202310449364A CN116188804A CN 116188804 A CN116188804 A CN 116188804A CN 202310449364 A CN202310449364 A CN 202310449364A CN 116188804 A CN116188804 A CN 116188804A
- Authority
- CN
- China
- Prior art keywords
- target
- picture
- graph
- query
- search
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/44—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/764—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Abstract
The invention belongs to the field of image retrieval and target detection in computer vision, and discloses a twin network target search system based on a transformer.
Description
Technical Field
The invention belongs to the field of image retrieval and target detection in computer vision, and discloses a twin network target searching system based on a transformer.
Background
The computer vision means that a camera and a computer are used for replacing human eyes to perform machine vision such as recognition, tracking and measurement on targets, and further graphic processing is performed, so that the computer is processed into images which are more suitable for human eyes to observe or transmit to an instrument to detect. As a scientific discipline, computer vision research-related theory and technology has attempted to build artificial intelligence systems that can obtain 'information' from images or multidimensional data. Perception can be seen as the extraction of information from sensory signals, so computer vision can also be seen as science of how to "perceive" an artificial system from images or multidimensional data. Image processing techniques convert an input image into another image having desired characteristics. Image processing technology is often utilized in computer vision research to perform preprocessing and feature extraction, so that the computer has the capabilities of vision, hearing, speaking and the like.
Object detection and recognition are widely used in many fields of life, and is to distinguish objects in images or videos from uninteresting parts, determine whether the objects exist, determine the positions of the objects if the objects exist, and recognize the objects as a computer vision task. Target detection and recognition are very important research directions in the field of computer vision, along with the rapid development of the Internet, artificial intelligence technology and intelligent hardware, a large amount of image and video data exist in human life, so that the computer vision technology plays a larger and larger role in human life, and research on computer vision is also getting more and more hot. Object detection and recognition are also becoming increasingly important as a cornerstone in the field of computer vision. Because the current demand for the target retrieval system is larger and larger, and the technical development of the target retrieval system is slower, a mature target retrieval system is urgently needed, and the actual target retrieval problem is accurately solved, so the system is developed.
Disclosure of Invention
In order to solve the technical problems, the invention provides a twin network target retrieval system based on a transducer, which utilizes a camera to monitor targets and combines a method of image retrieval and target detection in computer vision to realize the retrieval and display of targets in a monitoring area of the camera.
In order to achieve the above purpose, the technical scheme of the invention is as follows:
a transformer-based twin network target search system, comprising the steps of:
(1) Collecting image data as a graph to be searched; extracting an interested target from part of the graphs to be searched, taking the interested target as a query graph, and designing and twinning a network target searching training model;
(2) Selecting a camera area, selecting a camera group to determine a search area, and inputting a target picture to be searched;
(3) Starting a search task, acquiring scene pictures from the inside of a camera at equal time intervals in a video frame taking mode, detecting the pictures through a model, detecting each target, comparing the target with the picture to be searched as a characteristic, calculating the matching degree of the targets, taking the maximum value, and if the matching degree exceeds a set threshold value, searching the picture sequence numberAdding into a result queue;
(4) If the result queue has new records, the current detection picture is stored in a static resource directory set by a background server, information is stored in a database, and the front-end interface screens and displays search result information of a corresponding target from the database according to requirements.
Further, the specific method of the step (1) is as follows:
(1.1) acquisition of n Zhang Sousuo plotThe default size of each search graph is 224 x 224, and the number of targets in the n search graphs is +.>Cut out the inquiry graph to be->Scaling each query graph from its original size to 56 x 56, denoted +.>Then for each sheetManually classifying the query graphs, classifying the query graphs of the same target into one type, and marking the query graphs as +.>Placing each query graph into the corresponding class of folders corresponding to the count folders, and then establishing a dictionary +.>,/>The value corresponds to each search graph and is marked as +.>,/>Each of->Value corresponding +.>Class names of all targets exist in the current search graph;
(1.2) designing a twin network target search model, wherein a model feature extraction backbone is divided into a vit1 and a vit2, and the vit1 is used for extracting features of a search graph; then 16 query graphs are selected, and the selection rules are as follows: querying index of current search graphRandomly selecting 4 query graphs from the category folders which are not currently indexed, and selecting 12 query graphs from all the category files which are indexed, wherein 3 query graphs are randomly selected from each category folder, and if 12 query graphs can be selected, splicing the 16 query graphs of 56 x 56 into one 224 x 224 graph in random sequence; the vit2 is used for extracting the characteristics of a series of corresponding query graphs in the query splice graph, and the vit1 and the vit2 share the weight;
(1.3) the features extracted by vit1 are obtained by the DETR target detection headThe DETR target detection head is used for predicting the position of each target in the search graph, and the extracted characteristics of the vit1 and the vit2 jointly obtain a +.>,And->The combination is performed by a proportional relationship.
Further, in step (1.2): if 12 query graphs cannot be selected, randomly selecting one query graph from the selected query graphs each time by using a data enhancement mode, generating a new query graph with the size of 56 x 56 by a turnover or rotation mode, repeating the data enhancement operation until the total number of the query graphs reaches 16, ending the data enhancement operation, and then splicing the 16 query graphs with the size of 56 x 56 into a 224 x 224 graph, wherein the newly spliced graph is named as a query splice graph and is recorded as。
Further, in the step (1.2),
(1.2.1) adding a DETR target detection head, which can detect and frame each target from each diagram to be searched, and obtain the coordinates of each target;
(1.2.2) data are divided into n groups, each group being,/>,/>Wherein->Is the firstuZhang Sousuo figure->Is the firstvZhang Chaxun splice; />Extraction of features by vit1>Then, by the DETR target detection head, the +.>Feature vector of individual object->Scaling the feature vectors of the m targets to feature dimensions of 56 x 384, the corresponding feature vector being +.>,/>Extracting features by vit2>Due to->Is formed by splicing 16 query graphs with the size of 56 x 56, namely, the corresponding 16 feature vectors can be obtained by extracting features according to fixed coordinate positions>;
(1.2.3) feature vectors generated for search graphsCharacteristic vector generated by inquiring splice diagram>When two are compared, and the two belong to the same category,defining as positive samples, defining as negative samples when the two samples do not belong to the same category, and defining a loss function by adopting a cosine distance formula:
,/>,/>,/>,/>to search for a graphuIs the first of (2)δFeature vectors of the target; />Splice graph for inquiryvIs the first of (2)ηA feature vector;
when the input network is positive, the loss is calculated by using formula 1, and two eigenvectors are required to be madeThe smaller the distance between them, resulting in +.>The smaller, when the input network is a negative sample, the loss is calculated using equation 2, requiring two eigenvectors +.>The greater the distance between them, resulting in +.>The smaller the final +.>The smaller.
Further, the specific method of step (1.3) is as follows:
(1.3.1) let the single target in vit1 pass through the DETR target detection head output loss beThe detection head is arranged to obtainkThe probability of each detection frame is +.>The result box number is:
(1.3.2) selecting num number detection frame region to record as A, setting preset anchor region as B, wherein A and B ensure intersection region, B does not completely contain A, and supposingOr->Region (2)>Representative area->Is a part of the area of (2);(equation 5);
(1.3.3) features of vit1 and vit2 together obtain oneDefine benchmark->The eigenvector of vit1 is defined as +.>The eigenvector of vit2 is defined as +.>,/>,/>And->Is a parameter which can be learned, and when all search graphs and corresponding query graphs are input in groups, the method is characterized in that +.>To get as close to 0 as possible, otherwise, let +.>As close to 1 as possible, +.>;
Further, the specific method of the step (2) is as follows:
(2.1) establishing a new process for the current task, adding the current process ID into a process queue, starting the current process, and preparing to execute the target search task;
(2.2) when the program is startedCamera groups requiring selection of corresponding areas at the front end, assuming selectionqCorresponding to each cameraAnd detects whether the target picture has been added +.>And (5) successfully starting the system when the condition is met.
Further, the specific method of the step (3) is as follows:
(3.1) transmitting a starting command to the front end to start the target searching module;
(3.2) running a video frame taking module forqEach camera takes out a picture to be detected, and the picture name is that of the camerasEach picture generates a feature vector of +.>Inside each picture +.>Target (/ ->) The corresponding feature vectors are respectivelyScaling to feature vector +.>The same dimension is +.>Then, feature vector of target picture +.>Performing feature comparison to calculate matching degree, and generating a feature matching degree hash table ++>,/>Recording picture number->Maximum matching degree of ∈10->Recording picture number->Position coordinates of the highest matching degree target area, wherein each value is:
wherein the method comprises the steps ofRepresents the abscissa of the center of the region, +.>Represents the ordinate of the center of the region,/>Representing the height of the area>Representing the area width;
(3.3) setting the threshold value as y, and selecting the picture sequence number with the value exceeding y from mapsPicture order->Add to the Result pair column Result.
Further, the specific method of step (3.3) is as follows:
(3.3.1) traversing the current Map, for eachIf->Indicating that the current picture is an effective scene picture containing the target picture characteristics, and recording the sequence number of the picture +.>;
(3.3.2) performing an addition operation to the selected picture order, and adding a newly generated picture order at the end of the Result pair column Result,/>And finally, returning a Result pair column Result.
Further, the specific method of the step (4) is as follows:
(4.1) detecting the result queue, if a new record is generated, obtaining the current picture sequence numberThe current picture number +.>Storing in preset server static folder, and generating picture sequence number +.>Is +.>The camera is +.>Target matching degree->Coordinates of the object->Target picture name->And access address->Writing into a database;
(4.2) the front end screens and displays the current target picture in real time by setting the searching conditionCorresponding search result information.
Compared with the prior art, the invention has the following beneficial effects:
(1) According to the invention, the twin network is adopted, and the query is introduced for training, so that the trained model is more accurate and has pertinence.
(2) According to the invention, the vision transformer model and the DETR target detection head are introduced to train the target retrieval model, train the current target retrieval model end to end, complete the tasks of detection and retrieval, and better promote the accuracy of the model.
(3) The invention screens and displays the condition of the current target picture in the monitoring area in real time through the front end and updates the current target picture in real time.
Drawings
Fig. 1 is an overall schematic diagram of a twin network target retrieval system based on a transducer according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention.
The invention provides a twin network target retrieval system based on a transformer, which can utilize a camera to monitor targets and combine the methods of image retrieval and target detection in computer vision to realize the retrieval and display of the targets in a monitoring area of the camera as shown in figure 1.
The DETR, collectively DEtection TRansforme, is a Transformer-based end-to-end target detection network proposed by Facebook. the transducer is an attention model and is used in the field of natural language processing; vision transformer is an attention model in the computer vision field, which is a migration application of a transducer model, and is improved to be applied to the computer vision field, so that the image processing is completed.
Specific examples are as follows:
a transformer-based twin network target retrieval system, comprising the steps of:
(1) Data acquisition and model design stage:
(1.1) acquisition of n Zhang Sousuo plotThe default size of each search graph is 224 x 224, and the number of targets in the n search graphs is +.> Cutting out a query graph to be->Scaling each query graph from its original size to 56 x 56, denoted +.>Then manually classifying each query graph, classifying the query graph of the same target into one type, and marking the query graph as +.>Placing each query graph into the corresponding class of folders corresponding to the count folders, and then establishing a dictionary +.>,/>The value corresponds to each search graph and is marked as +.>,/>Each of->Value corresponding +.>Class names (same file folder names) for all targets exist in the current search graph.
(1.2) designing a twin network target search model, wherein a model feature extraction backbone is divided into a vit1 and a vit2 (both vit1 and vit2 are based on vision transformer), and the vit1 is used for extracting features of a search graph; then 16 query graphs are selected, and the selection rules are as follows: querying index of current search graphRandomly selecting 4 query graphs from the category folders which are not currently indexed, and selecting 12 query graphs from all the category files which are indexed, wherein 3 query graphs are randomly selected from each category folder, and if 12 query graphs can be selected, splicing the 16 query graphs of 56 x 56 into one 224 x 224 graph in random sequence; if 12 query graphs cannot be selected, randomly selecting one query graph from the selected query graphs each time by using a data enhancement mode, generating a new query graph with the size of 56 x 56 by a turnover or rotation mode, repeating the data enhancement operation until the total number of the query graphs reaches 16, ending the data enhancement operation, and then splicing the 16 query graphs with the size of 56 x 56 into 224 x 224 pictures, wherein the newly spliced picture is named as a query splicing picture and is named as a query splicing pictureThe vit2 is used for extracting the characteristics of a series of corresponding query graphs in the query splice graph, and the vit1 and the vit2 perform weight sharing to further improve the accuracy of the network, and the specific operation is as follows:
(1.2.1) adding a DETR target detection head, which is a functional module for detecting the position of a target in a picture, which can predict the position of each target in a current picture, by which each target can be detected and framed from each picture to be searched, and the coordinates of each target can be obtained.
(1.2.2) data are divided into n groups, each group being,/>,/>Wherein->Is the firstuZhang Sousuo figure->Is the firstvZhang Chaxun splice; />Extraction of features by vit1>Then, by the DETR target detection head, the +.>Feature vector of individual object->Scaling the feature vectors of m targets to the feature dimension of 56 x 384 through an ROI Pooling operation, wherein the corresponding feature vector is +.>,/>Extracting features by vit2>Due to->Is formed by splicing 16 query graphs with the size of 56 x 56, namely, the corresponding 16 feature vectors can be obtained by extracting features according to fixed coordinate positions>。
(1.2.3) feature vectors generated for search graphsCharacteristic vector generated by inquiring splice diagram>And comparing every two samples, defining positive samples when the two samples belong to the same category, defining negative samples when the two samples do not belong to the same category, and defining a loss function by adopting a cosine distance formula:
,/>,/>to search for a graphuIs the first of (2)δFeature vectors of the target; />Splice graph for inquiryvIs the first of (2)ηA feature vector; when the input network is positive, the loss is calculated using equation 1, requiring two eigenvectors +.>The smaller the distance between them, resulting in +.>The smaller, when the input network is a negative sample, the loss is calculated using equation 2, requiring two eigenvectors +.>The greater the distance between them, resulting in +.>The smaller the final +.>The smaller.
(1.3) the features extracted by vit1 are passed through a DETR target detection head (the function of the DETR target detection head is to predict the position of each target in the search map), obtaining oneThe extracted features of vit1 and vit2 jointly obtain one,/>And->The combination is carried out through the proportion relation, and the specific operation is as follows:
(1.3.1) Let the output loss of a single target in vit1 through the DETR target detection head beThe detection head is arranged to obtainkThe probability of each detection frame is +.>The result box number is:
(1.3.2) selecting num number detection frame region to record as A, setting preset anchor region as B, wherein A and B ensure intersection region, B does not completely contain A, and supposingOr->Region (2)>Representative area->Is a part of the area of (2);
(1.3.3) features of vit1 and vit2 together obtain oneDefine benchmark->The eigenvector of vit1 is defined as +.>The eigenvector of vit2 is defined as +.>,/>,/>And->Is a parameter which can be learned, and when all search graphs and corresponding query graphs are input in groups, the method is characterized in that +.>To get as close to 0 as possible, otherwise, let +.>As close to 1 as possible, +.>。
(1.3.4) determining the final loss:wherein->Can be adjusted according to the requirement; at present we use +.>Is a value of (a).
(2) Switch setting and zone setting phase:
and (2.1) establishing a new process for the current task, adding the current process ID into a process queue, starting the current process, and preparing to execute the target search task.
(2.2) when the program is started, the camera group of the corresponding region needs to be selected at the front end, assuming that the selection is performedqCorresponding to each cameraAnd detects whether the target picture has been added +.>And (5) successfully starting the system when the condition is met.
(3) Model detection processing stage:
(3.1) transmitting a starting command to the front end to start the target searching module;
(3.2) running a video frame taking module forqEach camera takes out a picture to be detected, and the picture name is that of the camerasEach picture generates a feature vector of +.>Inside each picture +.>Target (/ ->) The corresponding feature vectors are +.>Through an ROI Pooling operation (scaling the feature vector), scaling to the feature vector +_with the target picture>The same dimension is +.>Then, feature vector of target picture +.>Performing feature comparison to calculate matching degree, and generating a feature matching degree hash table ++>,/>Recording picture number->Maximum matching degree of ∈10->Recording picture number->Position coordinates of the highest matching degree target area, wherein each value is:
wherein the method comprises the steps ofRepresents the abscissa of the center of the region, +.>Represents the ordinate of the center of the region,/>Representing the height of the area>Representing the area width.
(3.3) setting the threshold toySelecting a value from Map to exceedyPicture sequence number of (2)Picture order->Adding to a Result pair column Result, wherein the specific operation is as follows:
(3.3.1) traversing the current Map, for eachIf->Indicating that the current picture is an effective scene picture containing target picture characteristics, recording picture sequence number +.>。
(3.3.2) performing an addition operation to the selected picture order, and adding a newly generated picture order at the end of the Result pair column Result,/>And finally, returning a Result pair column Result.
(4) Storage and display stage:
(4.1) detecting the result queue, if a new record is generated, obtaining the current picture sequence numberPicture +.>Storing in preset server static folder, and generating picture sequence number +.>Is +.>The camera is +.>Degree of matching of targetsCoordinates of the object->Target picture name->And access address->Writing into a database.
(4.2) the front end screens and displays the current target picture in real time by setting the searching conditionCorresponding search result information.
According to the method, under the scene of deploying the camera, the searching and displaying of the targets in the monitoring area of the camera can be realized, a twin network is introduced, a mode of vision transformer model and DETR target detecting head is introduced, and the methods of result queue, area loss calculation, matching degree calculation, camera area selection and the like are compatible with the use of various types of visible light cameras, and the method has high robustness, screens and displays the conditions of the current target picture in the monitoring area at the front end, and updates the current target picture in real time.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.
Claims (9)
1. A transformer-based twin network target search system, comprising the steps of:
(1) Collecting image data as a graph to be searched; extracting an interested target from part of the graphs to be searched, taking the interested target as a query graph, and designing and twinning a network target searching training model;
(2) Selecting a camera area, selecting a camera group to determine a search area, and inputting a target picture to be searched;
(3) Search task initiation, framing by videoScene pictures are obtained from the camera at equal time intervals in a mode, the pictures are detected through a model, each target is detected, the targets are compared with the to-be-searched picture in characteristics, the matching degree of the targets is calculated, the maximum value is taken, and if the matching degree exceeds a set threshold value, the picture number is searchedAdding into a result queue;
(4) If the result queue has new records, the current detection picture is stored in a static resource directory set by a background server, information is stored in a database, and the front-end interface screens and displays search result information of a corresponding target from the database according to requirements.
2. The transformer-based twin network target search system of claim 1, wherein the specific method of step (1) is as follows:
(1.1) acquisition of n Zhang Sousuo plotThe default size of each search graph is 224 x 224, and the number of targets in the n search graphs is +.>Cut out the inquiry graph to be->Scaling each query graph from its original size to 56 x 56, denoted +.>Then manually classifying each query graph, classifying the query graph of the same target into one type, and marking the query graph as +.>Placing each query graph into the corresponding class of folders corresponding to the count folders, and then establishing a dictionary +.>,/>The value corresponds to each search graph and is marked as +.>,/>Each of->Value corresponding +.>Class names of all targets exist in the current search graph;
(1.2) designing a twin network target search model, wherein a model feature extraction backbone is divided into a vit1 and a vit2, and the vit1 is used for extracting features of a search graph; then 16 query graphs are selected, and the selection rules are as follows: querying index of current search graphRandomly selecting 4 query graphs from the category folders which are not currently indexed, and selecting 12 query graphs from all the category files which are indexed, wherein 3 query graphs are randomly selected from each category folder, and if 12 query graphs can be selected, splicing the 16 query graphs of 56 x 56 into one 224 x 224 graph in random sequence; the vit2 is used for extracting the characteristics of a series of corresponding query graphs in the query splice graph, and the vit1 and the vit2 share the weight;
(1.3) the features extracted by vit1 are obtained by the DETR target detection headThe DETR target detection head is used for predicting the position of each target in the search graph, and the characteristics extracted by the vit1 and the vit2 are obtained togetherGet a +.>,And->The combination is performed by a proportional relationship.
3. The transformer-based twin network target search system of claim 2, wherein in step (1.2): if 12 query graphs cannot be selected, randomly selecting one query graph from the selected query graphs each time by using a data enhancement mode, generating a new query graph with the size of 56 x 56 by a turnover or rotation mode, repeating the data enhancement operation until the total number of the query graphs reaches 16, ending the data enhancement operation, and then splicing the 16 query graphs with the size of 56 x 56 into a 224 x 224 graph, wherein the newly spliced graph is named as a query splice graph and is recorded as。
4. The method of claim 2, wherein in step (1.2),
(1.2.1) adding a DETR target detection head, which can detect and frame each target from each diagram to be searched, and obtain the coordinates of each target;
(1.2.2) data are divided into n groups, each group beingWherein->Is the firstuZhang Sousuo figure->Is the firstvZhang Chaxun splice; />Extraction of features by vit1>Then, by the DETR target detection head, the +.>Feature vector of individual object->Scaling the feature vectors of the m targets to feature dimensions of 56 x 384, the corresponding feature vector being +.>,/>Extracting features by vit2>Due to->Is formed by splicing 16 query graphs with the size of 56 x 56, namely, the corresponding 16 feature vectors can be obtained by extracting features according to fixed coordinate positions;
(1.2.3) feature vectors generated for search graphsFeature vectors generated by querying a mosaicAnd comparing every two samples, defining positive samples when the two samples belong to the same category, defining negative samples when the two samples do not belong to the same category, and defining a loss function by adopting a cosine distance formula:
;/>to search for a graphuIs the first of (2)δFeature vectors of the target; />Splice graph for inquiryvIs the first of (2)ηA feature vector; when the input network is positive, the loss is calculated using equation 1, requiring two eigenvectors +.>The smaller the distance between them, resulting in +.>The smaller, when the input network is a negative sample, the loss is calculated using equation 2, requiring two eigenvectors +.>The greater the distance between them, resulting inThe smaller the final +.>The smaller.
5. The transformer-based twin network target search system of claim 2, wherein the specific method of step (1.3) is as follows:
(1.3.1) let the single target in vit1 pass through the DETR target detection head output loss beThe detection head is arranged to obtainkThe probability of each detection frame is +.>The result box number is:
(1.3.2) selecting num number detection frame region to record as A, setting preset anchor region as B, wherein A and B ensure intersection region, B does not completely contain A, and supposingOr->Region (2)>Representative area->Is a part of the area of (2);
(1.3.3) features of vit1 and vit2 together obtain oneDefine benchmark->The eigenvector of vit1 is defined as +.>The eigenvector of vit2 is defined as +.>,/>,/>And->Is a parameter which can be learned, and when all search graphs and corresponding query graphs are input in groups, the method is characterized in that +.>To get as close to 0 as possible, otherwise, let +.>As close to 1 as possible, +.>;
6. The method of claim 2, wherein the specific method of step (2) is as follows:
(2.1) establishing a new process for the current task, adding the current process ID into a process queue, starting the current process, and preparing to execute the target search task;
(2.2) when the program is started, the camera group of the corresponding region needs to be selected at the front end, assuming that the selection is performedqThe cameras are corresponding to each other,and detects whether the target picture has been added +.>And (5) successfully starting the system when the condition is met.
7. The transformer-based twin network target search system of claim 1, wherein the specific method of step (3) is as follows:
(3.1) transmitting a starting command to the front end to start the target searching module;
(3.2) running a video frame taking module forqEach camera takes out a picture to be detected, and the picture name is that of the camerasEach picture generates a feature vector of +.>Inside each picture +.>Target (/ ->) Corresponding toIs +.>Scaling to feature vector +.>The same dimension is respectivelyThen, feature vector of target picture +.>Performing feature comparison to calculate matching degree, and generating a feature matching degree hash table ++>,/>Recording picture number->Maximum matching degree of ∈10->Recording picture number->Position coordinates of the highest matching degree target area, wherein each value is:
wherein the method comprises the steps ofRepresents the abscissa of the center of the region, +.>Represents the ordinate of the center of the region,/>Representing the height of the area and,representing the area width;
8. The transformer-based twin network target search system of claim 7, wherein the specific method of step (3.3) is as follows:
(3.3.1) traversing the current Map, for eachIf->Indicating that the current picture is an effective scene picture containing target picture characteristics, recording picture sequence number +.>;
9. The transformer-based twin network target search system of claim 7, wherein the specific method of step (4) is as follows:
(4.1) detecting the result queue, if a new record is generated, obtaining the current picture sequence numberThe current picture number +.>Storing in preset server static folder, and generating picture sequence number +.>Is +.>The camera is +.>Target matching degree->Coordinates of the object->Target picture name->And access address->Writing into a database;
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310449364.7A CN116188804B (en) | 2023-04-25 | 2023-04-25 | Twin network target search system based on transformer |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310449364.7A CN116188804B (en) | 2023-04-25 | 2023-04-25 | Twin network target search system based on transformer |
Publications (2)
Publication Number | Publication Date |
---|---|
CN116188804A true CN116188804A (en) | 2023-05-30 |
CN116188804B CN116188804B (en) | 2023-07-04 |
Family
ID=86449298
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310449364.7A Active CN116188804B (en) | 2023-04-25 | 2023-04-25 | Twin network target search system based on transformer |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116188804B (en) |
Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20060120627A1 (en) * | 2004-12-07 | 2006-06-08 | Canon Kabushiki Kaisha | Image search apparatus, image search method, program, and storage medium |
US20180260415A1 (en) * | 2017-03-10 | 2018-09-13 | Xerox Corporation | Instance-level image retrieval with a region proposal network |
CN111179307A (en) * | 2019-12-16 | 2020-05-19 | 浙江工业大学 | Visual target tracking method for full-volume integral and regression twin network structure |
CN112883928A (en) * | 2021-03-26 | 2021-06-01 | 南通大学 | Multi-target tracking algorithm based on deep neural network |
CN113240716A (en) * | 2021-05-31 | 2021-08-10 | 西安电子科技大学 | Twin network target tracking method and system with multi-feature fusion |
CN113744311A (en) * | 2021-09-02 | 2021-12-03 | 北京理工大学 | Twin neural network moving target tracking method based on full-connection attention module |
US20220172455A1 (en) * | 2020-12-01 | 2022-06-02 | Accenture Global Solutions Limited | Systems and methods for fractal-based visual searching |
CN114821390A (en) * | 2022-03-17 | 2022-07-29 | 齐鲁工业大学 | Twin network target tracking method and system based on attention and relationship detection |
CN115588030A (en) * | 2022-09-27 | 2023-01-10 | 湖北工业大学 | Visual target tracking method and device based on twin network |
US20230050679A1 (en) * | 2021-07-29 | 2023-02-16 | Novateur Research Solutions | System and method for rare object localization and search in overhead imagery |
-
2023
- 2023-04-25 CN CN202310449364.7A patent/CN116188804B/en active Active
Patent Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20060120627A1 (en) * | 2004-12-07 | 2006-06-08 | Canon Kabushiki Kaisha | Image search apparatus, image search method, program, and storage medium |
US20180260415A1 (en) * | 2017-03-10 | 2018-09-13 | Xerox Corporation | Instance-level image retrieval with a region proposal network |
CN111179307A (en) * | 2019-12-16 | 2020-05-19 | 浙江工业大学 | Visual target tracking method for full-volume integral and regression twin network structure |
US20220172455A1 (en) * | 2020-12-01 | 2022-06-02 | Accenture Global Solutions Limited | Systems and methods for fractal-based visual searching |
CN112883928A (en) * | 2021-03-26 | 2021-06-01 | 南通大学 | Multi-target tracking algorithm based on deep neural network |
CN113240716A (en) * | 2021-05-31 | 2021-08-10 | 西安电子科技大学 | Twin network target tracking method and system with multi-feature fusion |
US20230050679A1 (en) * | 2021-07-29 | 2023-02-16 | Novateur Research Solutions | System and method for rare object localization and search in overhead imagery |
CN113744311A (en) * | 2021-09-02 | 2021-12-03 | 北京理工大学 | Twin neural network moving target tracking method based on full-connection attention module |
CN114821390A (en) * | 2022-03-17 | 2022-07-29 | 齐鲁工业大学 | Twin network target tracking method and system based on attention and relationship detection |
CN115588030A (en) * | 2022-09-27 | 2023-01-10 | 湖北工业大学 | Visual target tracking method and device based on twin network |
Non-Patent Citations (4)
Title |
---|
ENG-JON ONG ETC.: "Siamese Network of Deep Fisher-Vector Descriptors for Image Retrieval", 《 COMPUTER VISION AND PATTERN RECOGNITION》, pages 1 - 12 * |
YUANYUN WANG ETC.: "Depthwise Over-parameterized Siamese Network for Visual Tracking", 《2021 INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY AND BIOMEDICAL ENGINEERING (ICITBE)》, pages 58 - 62 * |
张俊: "基于孪生全卷积网络的单目标跟踪方法研究", 《中国优秀硕士学位论文全文数据库(信息科技辑)》, pages 138 - 1415 * |
王梦亭 等: "基于孪生网络的单目标跟踪算法综述", 《计算机应用》, pages 661 - 673 * |
Also Published As
Publication number | Publication date |
---|---|
CN116188804B (en) | 2023-07-04 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
KR101346730B1 (en) | System, apparatus, method, program and recording medium for processing image | |
US6246790B1 (en) | Image indexing using color correlograms | |
CN111460968B (en) | Unmanned aerial vehicle identification and tracking method and device based on video | |
KR101417548B1 (en) | Method and system for generating and labeling events in photo collections | |
US8027541B2 (en) | Image organization based on image content | |
US7043474B2 (en) | System and method for measuring image similarity based on semantic meaning | |
JP5934653B2 (en) | Image classification device, image classification method, program, recording medium, integrated circuit, model creation device | |
US20080247610A1 (en) | Apparatus, Method and Computer Program for Processing Information | |
US20130121535A1 (en) | Detection device and method for transition area in space | |
KR20070079330A (en) | Display control apparatus, display control method, computer program, and recording medium | |
Meng et al. | Object instance search in videos via spatio-temporal trajectory discovery | |
US9665773B2 (en) | Searching for events by attendants | |
US20140193048A1 (en) | Retrieving Visual Media | |
WO2022127814A1 (en) | Method and apparatus for detecting salient object in image, and device and storage medium | |
CN112464775A (en) | Video target re-identification method based on multi-branch network | |
CN113723558A (en) | Remote sensing image small sample ship detection method based on attention mechanism | |
US10991085B2 (en) | Classifying panoramic images | |
CN116188804B (en) | Twin network target search system based on transformer | |
Piramanayagam et al. | Shot boundary detection and label propagation for spatio-temporal video segmentation | |
JP6778625B2 (en) | Image search system, image search method and image search program | |
JP2018194956A (en) | Image recognition dive, method and program | |
Arnold et al. | Automatic Identification and Classification of Portraits in a Corpus of Historical Photographs | |
CN102436487B (en) | Optical flow method based on video retrieval system | |
WO2015185479A1 (en) | Method of and system for determining and selecting media representing event diversity | |
Khan et al. | A Fused LBP Texture Descriptor-Based Image Retrieval System |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |