CN117975370A

CN117975370A - Image-based target archiving method and device, storage medium and electronic equipment

Info

Publication number: CN117975370A
Application number: CN202410217562.5A
Authority: CN
Inventors: 侯冠群; 陈鑫嘉; 陆海先; 车军
Original assignee: Hangzhou Hikvision Digital Technology Co Ltd
Current assignee: Hangzhou Hikvision Digital Technology Co Ltd
Priority date: 2024-02-27
Filing date: 2024-02-27
Publication date: 2024-05-03

Abstract

The application discloses a target archiving method and device based on images, a storage medium and electronic equipment, comprising the following steps: acquiring all images to be archived; in each image, aiming at various detected classified contents, respectively extracting classified features, and generating comprehensive perception features based on the extracted classified features; constructing a graph structure based on the overall perception characteristics of all the images; taking each image as a node of the graph structure, and if the probability that the comprehensive perception features of the two images correspond to the same target is greater than a set requirement, edges exist between the two corresponding images; based on the overall perception characteristics of all the images and the graph structure, carrying out characteristic fusion processing to obtain the aggregation characteristics of all the images; and clustering all the images based on the aggregation characteristics, and enabling the clustered images to correspond to the same target to obtain a target archiving result. By the method and the device, the accuracy of target archiving can be effectively improved.

Description

Image-based target archiving method and device, storage medium and electronic equipment

Technical Field

The present application relates to image processing technology, and in particular, to an image-based target archiving method, apparatus, storage medium, and electronic device.

Background

With the progress of image processing technology, video monitoring systems are becoming more and more widely used.

In an intelligent video monitoring system, target archiving is often required, specifically, pictures and videos belonging to the same target are classified into the same archive by using various information of the target (such as personnel), and the method has important significance for carrying out abnormal condition analysis and target investigation in a user system.

In the existing target archiving method, single characteristic information of a target is generally utilized to conduct target archiving, for example, in personnel archiving, only face image characteristics are used for clustering, and pictures, videos and the like belonging to the same person are classified into files of the same person according to a clustering result so as to generate personnel files; in this way, the feature information used is single, which can affect the accuracy of archiving. Or in some target archiving methods, multiple characteristic information of the target is used for fusion, but fusion processing is often very simple, so that the fused characteristics cannot well reflect the characteristics of the target, and the archiving accuracy is affected.

Disclosure of Invention

The application provides a target archiving method and device based on images, a storage medium and electronic equipment, which can effectively improve the accuracy of target archiving.

In order to achieve the above purpose, the application adopts the following technical scheme:

An image-based target archiving method, comprising:

acquiring all images to be archived;

In each image, aiming at various detected classified contents, respectively extracting classified features, and generating comprehensive perception features based on the extracted classified features;

constructing a graph structure based on the overall perception characteristics of all the images; taking each image as a node of the graph structure, and if the probability that the comprehensive perception features of the two images correspond to the same target is greater than a set requirement, edges exist between the two corresponding images;

based on the overall perception characteristics of all the images and the graph structure, carrying out characteristic fusion processing to obtain the aggregation characteristics of all the images;

And clustering all the images based on the aggregation characteristics, and enabling the clustered images to correspond to the same target to obtain a target archiving result.

Preferably, the generating the full perception feature includes:

Splicing the extracted classification features, and taking the spliced result as the comprehensive perception feature;

Or alternatively

Respectively carrying out feature conversion on each extracted classified feature to obtain converted features, carrying out splicing or adding operation on each classified converted feature, and taking a splicing result or adding result as the comprehensive perception feature; wherein the features after each class transformation have the same dimensions.

Preferably, the structure of the construction diagram includes:

Based on a KNN algorithm, determining K images with highest similarity with the overall perception features of the first images aiming at each first image in all images;

And setting edges between the first image and nodes of each image in the K images respectively.

Preferably, the structure of the construction diagram includes:

Based on a KNN algorithm, determining K images with highest similarity with the overall perception features of all the first images for each first image in all the images to form a candidate image set of the first images; wherein K is a preset positive integer;

Inputting the comprehensive perception characteristics of the first image and the comprehensive perception characteristics of any image in the candidate image set of the first image into a pre-trained neighborhood perception subgraph adjustment model, and obtaining the probability that the first image and any image correspond to the same target after the neighborhood perception subgraph adjustment model is processed;

and setting edges between the first image and any image, wherein the probability of the first image corresponding to the same target is larger than a set probability threshold value, and the first image and any image are corresponding to the same target.

Preferably, the processing of the neighborhood aware subgraph adjustment model includes:

For the first image and any image, respectively determining a neighborhood feature propagation result of the first image and a neighborhood feature propagation result of any image by a set propagation method, and connecting the overall perception feature of the first image, the neighborhood feature propagation result of the first image, the overall perception feature of any image and the neighborhood feature propagation result of any image together to form an enhanced perception feature of a central node pair formed by the first image and any image;

For a center node pair formed by the first image and any one of the images, determining neighbor nodes of the center node pair based on the overall perception characteristics of the images, respectively calculating the distance between each neighbor node and the center node pair, and sequencing each distance to generate a distance list;

processing the enhanced perceptual features input to a feature discriminator to obtain a first embedding output;

inputting the distance list into a neighborhood discriminator for processing to obtain a second embedding output;

and splicing the first embedding output and the second embedding output together, inputting a splicing result into a full-connection layer for performing two-classification processing, and predicting to obtain probability information of connection between the first image and any image.

Preferably, the determining the neighborhood feature propagation result of the first image includes:

Selecting one or more second images with the similarity with the overall perception characteristics of the first image being greater than or equal to a set first threshold value from the candidate image set of the first image to form a neighborhood image set of the first image;

Determining the average value of the overall perception characteristics of each second image in the neighborhood image set of the first image as a neighborhood characteristic propagation result of the first image;

the determining the neighborhood feature propagation result of any image comprises the following steps:

Selecting one or more third images with the similarity to the overall perception characteristics of any image greater than or equal to the first threshold value from the candidate image set of any image to form a neighborhood image set of any image;

and determining the average value of the overall perception characteristics of each third image in the neighborhood image set of any image, and taking the average value as the neighborhood characteristic propagation result of any image.

Preferably, the determining the neighbor node of the center node pair includes:

Selecting images with the similarity to the overall perception feature of the first image being greater than or equal to a set second threshold value from the candidate image set of the first image, selecting images with the similarity to the overall perception feature of any image being greater than or equal to the second threshold value from the candidate image set of any image, and taking the selected images as first-order neighbor nodes of the center node pair;

For each first-order neighbor node, selecting an image with the similarity to the overall perception characteristics of the first-order neighbor node being greater than or equal to the second threshold value from the candidate image set of the first-order neighbor node as a second-order neighbor node of the center node pair;

And taking all the first-order neighbor nodes and all the second-order neighbor nodes as neighbor nodes of the center node pair.

Preferably, the calculating the distance between each of the neighboring nodes and the central node pair includes:

Calculating a first distance between the neighbor node and the first image and a second distance between the neighbor node and any image respectively;

and determining the distance between the neighbor node and the center node pair based on the first distance, the second distance and the order of the neighbor node.

Preferably, the means for calculating the first distance includes:

calculating a distance value between the neighbor node and the first image based on the comprehensive perception characteristics of the neighbor node and the first image, and setting the value of the first distance as a set first maximum distance if the distance value is larger than a set first distance threshold; if the distance value is smaller than or equal to the first distance threshold value, setting the value of the first distance as the distance value;

the means for calculating the second distance includes:

calculating a distance value between the neighbor node and any image based on the overall perception characteristics of the neighbor node and any image, and setting the value of the second distance as a set second maximum distance if the distance value is larger than a set second distance threshold; if the distance value is smaller than or equal to the second distance threshold value, setting the value of the second distance as the distance value;

The determining the distance between the neighbor node and the center node pair includes:

And summing the average value of the first distance and the second distance with the order of the neighbor node, and taking the summation result as the distance between the neighbor node and the center node pair.

An image-based target archiving apparatus, comprising: the system comprises an acquisition unit, a comprehensive perception feature generation unit, a graph construction unit, a feature fusion unit and an archiving unit;

The acquisition unit is used for acquiring all images to be archived;

The comprehensive perception feature generation unit is used for respectively extracting classification features aiming at various detected classification contents in each image and generating comprehensive perception features based on the extracted classification features;

The image construction unit is used for constructing an image structure based on the overall perception characteristics of all the images; taking each image as a node of the graph structure, and if the probability that the comprehensive perception features of the two images correspond to the same target is greater than a set requirement, edges exist between the two corresponding images;

The feature fusion unit is used for carrying out feature fusion processing based on the overall perception features of all the images and the graph structure to obtain the aggregation features of all the images;

and the archiving unit is used for clustering all the images based on the aggregation characteristics, and the clustered images correspond to the same target to obtain a target archiving result.

Preferably, in the overall perceptual feature generating unit, the generating the overall perceptual feature includes:

Or alternatively

Preferably, the graph construction unit comprises a KNN algorithm processing subunit and an edge setting subunit,

The KNN algorithm processing subunit is used for determining K images with highest similarity with the overall perception features of all the first images according to each first image in all the images based on a KNN algorithm; wherein K is a preset positive integer;

The edge setting subunit is configured to set an edge between the first image and a node of each of the K images.

Preferably, the graph construction unit comprises a KNN algorithm processing subunit, a neighborhood perception subgraph adjustment model processing subunit and an edge setting subunit,

The KNN algorithm processing subunit is configured to determine, for each first image in all the images, K images with highest similarity to the overall perceptual features of the first image, based on a KNN algorithm, and form a candidate image set of the first image; wherein K is a preset positive integer;

The neighborhood perception sub-graph adjustment model processing subunit is configured to input a pre-trained neighborhood perception sub-graph adjustment model to the overall perception feature of the first image and the overall perception feature of any image in the candidate image set of the first image, and obtain probabilities that the first image and the any image correspond to the same target after the neighborhood perception sub-graph adjustment model is processed;

The edge setting subunit is configured to set an edge between the first image and the any image, where the probability of the first image corresponding to the same target is greater than a set probability threshold.

Preferably, the neighborhood aware subgraph adjustment model processing subunit includes a first preprocessing module, a second preprocessing module, an FD processing module, an ND processing module and a full connection layer processing module,

The first preprocessing module is configured to determine, for the first image and the any image, a neighborhood feature propagation result of the first image and a neighborhood feature propagation result of the any image through a set propagation method, and connect together a full perception feature of the first image, the neighborhood feature propagation result of the first image, the full perception feature of the any image, and the neighborhood feature propagation result of the any image, so as to form an enhanced perception feature of a central node pair formed by the first image and the any image;

The second preprocessing module is used for determining neighbor nodes of a center node pair formed by the first image and any image based on the overall perception characteristics of the image, respectively calculating the distance between each neighbor node and the center node pair, and sequencing each distance to generate a distance list;

The FD processing module is used for inputting the enhanced perception feature into a feature discriminator to process to obtain a first embedding output;

The ND processing module is used for inputting the distance list into a neighborhood discriminator for processing to obtain a second embedding output;

The full-connection layer processing module is configured to splice the first embedding output and the second embedding output together, input a splicing result into a full-connection layer for performing two-classification processing, and predict to obtain probability information that the first image and any image are connected.

Preferably, in the first preprocessing module,

The determining the neighborhood feature propagation result of the first image comprises:

Preferably, in the second preprocessing module, the determining a neighboring node of the center node pair includes:

Preferably, in the second preprocessing module, the calculating the distance between each of the neighboring nodes and the central node pair includes:

Preferably, in the second preprocessing module,

The means for calculating the first distance includes:

the means for calculating the second distance includes:

A computer readable storage medium having stored thereon computer instructions which, when executed by a processor, implement the image-based target archiving method of any one of the above.

An electronic device comprising at least a computer readable storage medium, further comprising a processor;

The processor is configured to read executable instructions from the computer readable storage medium and execute the instructions to implement the image-based target archiving method of any one of the above.

As can be seen from the above technical solution, in the present application, firstly, in each obtained image to be archived, classification feature extraction is performed respectively for each detected classification content, and comprehensive perception features are generated based on each extracted classification feature; in this way, the various feature information in the image may be organized together to generate comprehensive perceptual features for subsequent archiving processing. Then, constructing a graph structure by utilizing the comprehensive perception characteristics of all the images; taking the images as nodes of a graph structure, and if the probabilities of the comprehensive perception features of two images corresponding to the same target meet the requirements, edges exist between the two corresponding images; in the graph structure thus constructed, two nodes connected by an edge are more likely to correspond to the same target. Then, based on the overall perception characteristics of all the images and the constructed graph structure, carrying out characteristic fusion processing by utilizing a graph algorithm to obtain the aggregation characteristics of all the images; because the graph algorithm can realize the efficient aggregation of various different types of features, and the aggregation features can replace original features, the possibility that two nodes connected by edges belong to the same target is higher, and therefore, the application can perform the efficient aggregation of the features by utilizing the constructed graph structure and the graph algorithm, obtain the improved aggregation features and better embody the overall characteristics of the target. And finally, clustering the images based on the aggregation characteristics, and enabling the clustered images to correspond to the same target to obtain a target archiving result. Because the aggregation features are improved fusion results of various feature information which can better reflect the overall characteristics of the target, clustering based on the aggregation features can be more accurate, and therefore the accuracy of target archiving is effectively improved.

Drawings

FIG. 1 is a basic flow diagram of an image-based target archiving method in the present application;

FIG. 2 is a schematic flow chart of an image-based target archiving method in accordance with the present application;

FIG. 3 is a schematic diagram of a neighborhood feature subgraph adjustment model;

FIG. 4 is a schematic diagram of the basic structure of a GCN;

FIG. 5 is a schematic view of the basic structure of the image-based target archival device of the present application;

Fig. 6 is a schematic diagram of a basic structure of an electronic device provided in the present application.

Detailed Description

The present application will be described in further detail with reference to the accompanying drawings, in order to make the objects, technical means and advantages of the present application more apparent.

Fig. 1 is a basic flow diagram of an image-based target archiving method in the present application. As shown in fig. 1, the method includes:

Step 101, all images to be archived are acquired.

The step is used for acquiring all images which need to be archived this time.

Step 102, in each image, extracting classification features according to the detected various classification contents, and generating comprehensive perception features based on the extracted classification features.

For each image, a variety of different classification content may be detected, such as face information, body information, vehicle information, spatiotemporal information, etc. may be detected in the image in person archiving. For these detected classification contents, feature extraction may be performed separately, for example, face feature extraction, body feature extraction, space-time feature extraction, and the like. The feature information of various different categories is organized to generate overall perceptual features that are used to represent all of the feature information of the image.

And 103, constructing a graph structure based on the overall perception characteristics of all the images.

When the graph structure is constructed, each image is used as a node of the graph structure, and if the probability that the overall perception characteristics of the two images correspond to the same target is larger than the set requirement, edges exist between the corresponding two images.

Considering that the graph algorithm can realize efficient feature fusion processing, the application considers that the graph algorithm is utilized to fuse various feature information in the image. However, since the image itself for the target archiving does not have a natural graph structure, the graph structure is first constructed in this step using the full perceived features of all the images.

Specifically, the graph structure comprises nodes and edges between certain nodes, the edges represent that a certain association relationship exists between two nodes connected with the nodes, and no obvious association relationship exists between different images for target archiving, so that hidden relationships between the images need to be mined to generate the graph structure, and then feature fusion is realized by using a graph algorithm.

What is to be done in the present application is the archiving of objects, i.e. the grouping together of different images belonging to the same object. When the graph structure is constructed, the relationship corresponding to the same object is used as the association relationship between the images, namely if two images correspond to the same object, the association relationship exists between the two images, and the edge connecting the two nodes exists between the nodes of the two images; if the two images do not correspond to the same target, no association relationship exists between the two images, and no edge connecting the two nodes exists between the nodes of the two images.

In this step, when the graph structure is constructed, it is not completely determined whether each image corresponds to accurate information of the same object, so that, in a specific process, the probability that two images correspond to the same object, that is, probability information, is determined based on the overall perceptual features of the images, and when the probability is greater than a set requirement, it is considered that the two images are likely to correspond to the same object, and an edge is set between the nodes of the two images. In the above manner, one map structure can be constructed using all the images acquired in step 101.

In more detail, in determining whether an edge is set between nodes of two images, K images having the highest similarity to the overall perceived feature of the image a may be determined for each image a among all the images based on the KNN algorithm, and these K images are referred to as a candidate image set of the image a, most simply; and because the feature similarity between each image in the candidate image set and the image A is higher, the probability that each image in the candidate image set corresponds to the same target with the image A is considered to be larger than the set requirement, and an edge is arranged between each image in the candidate image set and the image A. A corresponding operation is performed for each image a and a graph structure can be generated. The KNN algorithm can determine candidate image sets of all images a at one time, and feature similarity between images can be determined by the distance between features, such as cosine distance, euclidean distance, etc. between the features; in general, the larger the feature distance between images, the lower the feature similarity, and the smaller the feature distance between images, the larger the feature similarity.

The above-mentioned manner of using the feature similarity between images and the KNN algorithm may generate the graph structure, but this manner of directly using the KNN algorithm to generate the graph structure may depend very much on the accuracy of the image features for image expression, that is, if the accuracy of the image features for image expression is limited, the quality of the generated graph structure may be limited, that is, the reliability of edges in the graph structure may be limited, and accordingly, the accuracy of feature fusion implemented by performing the graph algorithm may be limited. Based on the above, the application also provides a generation mode of the graph structure, which can encode the overall perception characteristics in the graph to obtain a more reliable graph structure, namely, the reliability of edges in the graph structure is improved. In the method, the candidate image set is taken as an initial image set, screening is continued on the basis of the candidate image set, and whether edges exist between images in the image set and the image A or not is determined.

Specifically, the comprehensive perception feature of each image a and the comprehensive perception feature of any image B in the candidate image set of the image a can be input into a pre-trained neighborhood subgraph adjustment model, and the neighborhood subgraph adjustment model is used for determining probability information of the image a and the image B corresponding to the same target after the input features are processed; then, whether the probability information is larger than a set probability threshold is judged based on the specific probability information, and edges are arranged between nodes of the two images for the image A and the image B with the probability information larger than the probability threshold. A corresponding operation is performed for each image a and a graph structure can be generated. The graph structure is based on a neural network model to determine the probability of the same target corresponding to two images, and the neural network model can be enabled to have the capability of obtaining more reliable probability information through strict training, so that the reliability of edges in the graph structure is improved.

And 104, performing feature fusion processing based on the overall perception features and the graph structures of all the images to obtain the aggregate features of all the images.

And (3) inputting the comprehensive perception features determined in the step (102) and the graph structure determined in the step (103) into a graph algorithm together, and performing feature fusion processing. The graph algorithm may be implemented in various existing manners, such as GCN or transducer structures. The aggregation and information propagation can be carried out among the comprehensive perception features of each image layer by layer through the processing of the graph algorithm, and finally the aggregation feature of each image is obtained, so that in the given graph structure, the aggregation features among two nodes connected by the edge are more similar, and for the two nodes not connected by the edge, the difference of the aggregation features among each other is larger, so that the aggregation features can reflect the integral characteristics of the image on the corresponding target recognition, the feature information of various categories is fused more fully, and the expression of the image is more accurate.

And 105, clustering all the images based on the aggregation characteristics, and enabling the clustered images to correspond to the same target to obtain a target archiving result.

Image clustering processing is carried out based on the aggregation features, and the aggregation features are more accurate in image expression, so that the image clustering result based on the aggregation features is more accurate, the target archiving accuracy can be effectively improved, the archiving effect is improved, the processing of various different types of features is not needed in the subsequent retrieval stages, and the archiving efficiency is improved.

So far, the basic flow of the image-based target archiving method in the application is ended.

The following describes a specific implementation of the target archiving method in the present application by means of specific embodiments. In the specific embodiment of the application, the personnel archiving is taken as an example to describe the target archiving method of the application, and the method is carried out by adopting a neural network model when constructing a graph structure. FIG. 2 is a flow chart of a method according to an embodiment of the application. As shown in fig. 2, the method includes:

step 201, acquiring real-time snapshot images to be archived, and extracting all classification features for each snapshot image.

The personnel archiving in this embodiment is archiving the real-time snap images within the set time period.

And carrying out detection processing on each piece of the real-time snap images in the set time period, respectively carrying out feature extraction on each detected classified content, and obtaining corresponding classified features corresponding to each classified content. For example, information such as a human body, a human face, a vehicle and the like is obtained through detection, when some component exists, a corresponding modeling module, an attribute and scoring module are triggered, for example, when the human face is detected from the snap-shot image V, a human face modeling and human face attribute scoring module is triggered, a human face modeling feature F _f, human face attribute information A _f and human face scoring information I _f are generated; when a human body is detected from the snap shot image V, a human body modeling and human body attribute scoring module is triggered to generate human body modeling characteristics F _b, human body attribute information A _b and human body scoring information I _b; when a vehicle is present, the vehicle modeling module will be triggered to obtain the vehicle modeling feature F _c. In addition, the space-time data of the target is obtained according to the position of the snapshot image and the snapshot time, and unified pretreatment is carried out on the space-time data, wherein the pretreatment mode can be that the time and the space information are combined together by using a space-time coding technology to generate F _t, the other pretreatment mode can be that the position information is represented by longitude and latitude, the time information is represented by a time stamp, and then the two are combined into F _t.

Step 202, for each image, organizing all classification features to generate a full perception feature.

For any image a, all classification features of image a are organized together to generate a global perceptual feature. In a specific manner of generating the overall perceptual feature, most simply, all the extracted classification features may be directly spliced (concat) together to obtain the overall perceptual feature, for example, the overall perceptual feature may be f=f _f°F_b°A_f°°I_f°A_b°I_b°F_c°F_t. When one of the classification features is missing, 0 vectors or 1 vectors with the same dimension can be supplemented at the corresponding position.

Or the way in which the overall perceptual features are generated may also be: respectively carrying out feature conversion on each extracted classified feature to obtain converted features, carrying out splicing or adding operation on each classified converted feature, and taking a splicing result or adding result as the comprehensive perception feature; wherein the features after each class transformation have the same dimensions. The feature conversion of the classification features can be performed through a Fully Connected (FC) layer, and different FC layer parameters can be adopted for different kinds of classification features, so as to set different characteristics of the corresponding classification features.

After the overall perceptual features of all the images are generated in step 202, the construction of the graph structure is realized by the processing of steps 203 to 210.

Step 203, based on the KNN algorithm, for each image in all images, determining K images with highest similarity to the overall perceptual features of the image, and forming a candidate image set of the image.

When constructing the graph structure in this embodiment, first, a candidate image set of each image in all the images is determined by KNN algorithm. The specific KNN algorithm processing can be performed in an existing manner, which can determine a candidate image set for each of all images at once.

In the following construction for the graph structure, the processing for each image is the same, and the processing for a single image is described below with image a as an example.

Step 204, inputting the full-scale perceptual features of the image a and the full-scale perceptual features of any image B in the candidate set of the image a into a pre-trained neighborhood perceptual subgraph adjustment model.

The pre-trained neighborhood perception subgraph adjustment model is used for processing the comprehensive perception characteristics of two input images and determining the probability that the two images correspond to the same target.

The structure of the specific neighborhood aware subgraph adjustment model is shown in fig. 3, and specifically includes a data preprocessing section (i.e., the process outlined by the dashed box in fig. 3), a feature discriminator (FD, feature Discriminator), a neighborhood discriminator (ND, neighborhood Discriminator), and an FC layer. The data preprocessing section is further divided into a preprocessing section (upper dashed box in fig. 3) of FD input data and a preprocessing section (lower dashed box in fig. 3) of ND input data, and the FC layer is used to implement the sorting process.

A large number of early studies showed that local neighborhood structure can improve embedding (embedding) quality, which is more suitable for pairwise classification. Therefore, in this embodiment, different neighborhood structure information is applied to FD and ND, respectively, to improve quality of embedding input to FC layer. The preprocessing of FD input data and the preprocessing of ND input data are described below by steps 205 and 206, respectively. Wherein, image a and image B are referred to as a center node pair, corresponding to nodes i and j in the graph structure, respectively.

Step 205, for image a and image B, determining a neighborhood feature propagation result of image a and a neighborhood feature propagation result of image B respectively by a set propagation method, and connecting the overall perception feature of image a, the neighborhood feature propagation result of image a, the overall perception feature of image B and the neighborhood feature propagation result of image B together to form an enhanced perception feature for inputting FD for processing.

In the preprocessing of FD input data, features of two nodes i and j are stitched together. Specifically, studies have shown that the characteristics of the current node can be improved by the propagation of neighborhood characteristics, based on which, considering that the candidate image set is determined by a K-nearest neighbor algorithm, the candidate image set of image a can be regarded as a neighborhood of image a, and the propagation result of the neighborhood to image a is obtained by using a propagation method from these neighborhood characteristics, which is called the neighborhood characteristic propagation result of image a. The propagation method may be a plurality of existing propagation methods, such as Transformer, GCN or Mean Aggregation, and the propagation result F' _i of the image a, that is, the propagation result of the neighborhood feature of the image a, is obtained from the overall perceptual feature F of the neighborhood image by using these propagation methods. And splicing the neighborhood feature propagation result F' _i of the image A and the comprehensive perception feature F _i of the image A together to generate the improved feature of the image A. And processing the image B in the same way as the image A to obtain a neighborhood feature propagation result F '_j of the image B, and splicing the neighborhood feature propagation result F' _j of the image B and the comprehensive perception feature F _j of the image B together to generate the improved feature of the image B. Finally, the improved features of the image A and the improved features of the image B are spliced together, namely the comprehensive perception features F _i of the image A, the neighborhood feature propagation results F '_i of the image A, the comprehensive perception features F _j of the image B and the neighborhood feature propagation results F' _j of the image B are spliced together to form the enhanced perception features of the central node pair formed by the image A and the image B Thus, the preprocessing of FD input data is completed.

In addition, the above-mentioned method can use the existing propagation method when determining the neighborhood feature propagation result of the image a, in order to simplify the processing procedure and improve the FD efficiency, this embodiment provides a simple mean value aggregation method, which can also be used to determine the neighborhood feature propagation result of the image a. Specifically, one or more images with the similarity s _ij with the overall perceived feature of the image a greater than or equal to the set first threshold t ₁ may be selected from the candidate image set K (v _i) of the image a, and a neighborhood image set K' (v _i)＝{v_m|s_im≥t₁,v_m∈K(v_i) of the image a is formed; determining the average value of the overall perception characteristics of each image v _m in a neighborhood image set K' (v _i) of the image A as the neighborhood characteristic propagation result of the image AWhere v _i denotes image a, M is the node index in the neighborhood image set of image a, and M is the total number of nodes in the neighborhood image set of image a. In the processing of the neighborhood feature subgraph adjustment model of the present application, the descriptions of the image and the nodes are equivalent, because the image itself is a node in the graph structure. As described above, in the processing manner of the present embodiment, the full-scale perceptual features of all the images in the candidate image set of the image a are not utilized, but the partial images whose similarity satisfies the requirement are selected to participate in the propagation, so that the processing efficiency is improved.

Step 206, for the center node pair formed by the image A and the image B, determining the neighbor nodes of the center node pair based on the overall perception characteristics of the image, respectively calculating the distances between each neighbor node and the center node pair, and generating a distance list after sequencing each distance for inputting ND for processing.

In the preprocessing of ND input data, firstly, a center node pair formed by an image A and an image B is used for determining neighbor nodes of the center node pair based on the overall perception characteristics of related images, distances between each neighbor node and the center node pair are calculated respectively, and a distance list is generated after each distance is sequenced and is used as a preprocessing result of ND input data.

In more detail, ND in combination with structural features represents a closed sub-graph between a pair of central nodes, which closed sub-graph represents the probability of connection between two nodes. In order to improve the precision of the closed subgraph, in this embodiment, when selecting a neighboring node for a center node pair, a first-order neighboring node and a second-order neighboring node are included. In practical application, a proper neighbor node can be selected according to the precision requirement and the processing resources, the more the number of the neighbor nodes is, the higher the precision of the closed sub-graph is, and a more reliable graph structure can be obtained correspondingly, but the more the consumed processing resources are, and the longer the processing time is. In this embodiment, the selection manner of the first-order neighbor node and the second-order neighbor node is as follows:

Selecting an image with the similarity s _ic of the overall perception feature of the image A being greater than or equal to a set second threshold t ₂ from a candidate image set K (v _i) of the image A, selecting an image with the similarity s _jc of the overall perception feature of the image B being greater than or equal to a second threshold t ₂ from a candidate image set K (v _j) of the image B, taking the selected image as a first-order neighbor node of a center node pair, and forming all the first-order neighbor nodes into a first-order neighbor node set I.e.

For each first-order neighbor node v _c, selecting an image with the similarity s _cd of the overall perception characteristic of the first-order neighbor node v _c being greater than or equal to a second threshold t ₂ from a candidate image set K (v _c) of the first-order neighbor node v _c as a second-order neighbor node of the center node pair, and forming all the second-order neighbor nodes into a second-order neighbor node setI.e./>

The second-order neighbor node set of the first-order neighbor node set is collected and combined to obtain a neighbor node set of the central nodeI.e. the set of nodes between node i and node j that enclose the sub-graph. The first threshold and the second threshold are selected empirically, and the second threshold may be equal to the first threshold, and the second threshold may be 0.

Regarding the distance between the neighbor node and the center node pair, the contribution of the different neighbor nodes to the connection probability between the center node pair is reflected. The specific calculation mode can be as follows:

For any neighbor node n, calculating a first distance dist _in between the neighbor node n and the image A and a second distance dist _jn between the neighbor node n and the image B respectively; wherein, dist _in and dist _jn have the same calculation mode, taking dist _in as an example:

Wherein s _in 'is the distance between the neighbor node n and the node i of the image a, which may be a cosine distance, a euclidean distance, a shortest path distance, etc., t ₂' is a preset distance threshold, which may be a distance value corresponding to the similarity represented by the second threshold, and dist _Max is a maximum distance value between the neighbor node and the neighbor node;

determining a distance dist _n of the neighbor node n from the center node pair based on the first distance, the second distance, and the order O _n of the neighbor node n, i.e

Wherein the order of the center node is 0. The distances between all the neighbor nodes and the center node pairs are calculated in the above way, and then the distances are sequenced according to the size to obtain a distance list of the center node pairs The distance list is the preprocessing result of ND input data.

Step 207, the enhanced perceptual features determined in step 205 are processed by a feature discriminator to obtain a first embedding output.

Wherein the Feature Discriminator (FD) may be implemented by a multi-layer perceptron, as shown in fig. 3. The parameters of the multi-layer perceptron can be obtained through training. The output result obtained by processing the enhanced perception feature of the input through FD is called a first embedding output

Step 208, the distance list determined in step 206 is input to a neighborhood discriminator for processing, and a first embedding output is obtained.

The Neighborhood Discriminator (ND) may be implemented by a multi-layer perceptron, as shown in fig. 3. The parameters of the multi-layer perceptron can be obtained through training. The output result obtained by processing the input enhanced perception feature by ND is called a second embedding output

In addition, the parameters of the multi-layer perceptron are different in FD and ND.

Step 209, splice the first embedding output determined in step 207 and the second embedding output determined in step 208 together, input the splice result into the FC layer for processing, and output the probability information of the image a and the image B corresponding to the same target.

As shown in fig. 3, the input of the FC layer includes two parts: first embedding outputAnd a second embedding output/>After the two parts of content are spliced together and input into the FC layer, two classification is realized through the processing of the FC layer, and the obtained two classification results are respectively that the association relationship exists between the image A and the image B or that the association relationship does not exist between the image A and the image B, and accordingly, the probability that the association relationship exists between the image A and the image B, that is, the probability that the image content in the image A and the image B points to the same target (for example, the same person in two images) can also be obtained. Parameters of a specific FC layer may be obtained by training. The processing of the FC layer may be expressed as/>

So far, all the processing of the neighborhood feature subgraph adjustment model is completed through the processing of the steps 205 to 209, and the probability of the association relationship between the image A and the image B is obtained. In the neighborhood feature subgraph adjustment model, parameters of the FD, the ND and the full-connection layer are required to be trained in advance, the whole neighborhood feature subgraph adjustment model can be trained together, a classification result of whether a pair of images corresponds to the same target can be obtained through each round of training, a loss function is obtained through calculation, specific parameters of the FD, the ND and the full-connection layer are adjusted according to the value of the loss function, and model parameters are updated continuously through a plurality of rounds of training iteration until training ending conditions are met, so that a trained neighborhood feature subgraph adjustment model is generated.

Step 210, determining whether an edge exists between the image a and the image B based on the probability information obtained in step 209.

When the probability information obtained in step 209 is greater than or equal to the set third threshold t ₃, that is, image A

The likelihood of the association relationship with the image B is relatively high, that is, the likelihood of the same object being corresponding between the image a and the image B is relatively high, and an edge is set between the image a and the image B. Can be expressed specifically as

A _ij is 0 indicating that there is no edge between node i and node j, and 1 indicating that there is an edge between node i and node j.

The above is the whole process of determining whether there is an edge between the two in the graph structure for the center node pair composed of the image a and the image B. For each image in the candidate image set of the image A, a center node pair is formed with the image A, and whether edges exist between the center node pair and the image A or not is determined according to the mode, so that all images with connection relation with the image A (namely that edges exist between the center node pair and the image A) in the image structure can be obtained, namely that all edges related to the image A are found.

For all the images acquired in step 201, the image a is processed in the manner described above, so that the edge associated with the image in the image structure can be found for each image, and the whole image structure is constructed.

Step 211, performing feature fusion processing based on the overall perception features and the graph structures of all the images to obtain the aggregate features of all the images.

In this embodiment, feature fusion processing is performed by a graph algorithm based on GCN. The method specifically comprises the following steps:

Firstly, for the graph structure G obtained in step 211, an adjacent matrix a can be obtained therefrom, and then the overall perceptual features f= [ F ₁,F₂,...,F_n]∈R^N×D ] of all the images obtained in step 202 are input into the GCN together with the adjacent matrix a; and obtaining an aggregated characteristic F '= [ F ₁′,F₂′,...,F_n′]∈R^N×D ] through GCN aggregation and information propagation of a plurality of layers, wherein F ₁′,F₂′,...,F_n' is an aggregation characteristic corresponding to each image.

Wherein, each layer of characteristic aggregation function of the GCN can be expressed as follows: The GCN structure is schematically shown in FIG. 4,/> For initial node features, i.e. the overall perceptual features f= [ F ₁,F₂,...,F_n]∈R^N×D,/>, of all images(I.e., A) is the adjacency matrix and F ^l+1 is the output of the first layer GCN. The value a _ij of the element located in the j-th column of the i-th row in the adjacency matrix a indicates whether an edge exists between the center node pair formed by the node i and the node j, that is, the value of a _ij determined in step 210.

Through aggregation and information propagation of the multi-layer GCN, the characteristics belonging to the same target are more similar, and the characteristic difference not belonging to the same target is larger.

And step 212, clustering all the images based on the aggregation characteristics, and enabling the clustered images to correspond to the same target to obtain a target archiving result.

The aggregate characteristics corresponding to all the images can be obtained in step 212. In the step, all images are clustered based on the aggregation characteristics, and the clustered images correspond to the same target to obtain a target archiving result. Specific clustering processes may include, but are not limited to, conventional clustering algorithms such as DBSCAN, K-Means, infomap, etc., transform-based clustering algorithms, etc.

Thus, the process of the target archiving method in the embodiment of the application is finished. According to the method, a reliable graph structure is constructed, and the graph algorithm is adopted to fuse various different types of features, so that the self-adaptive fusion of the features can be realized, the fusion can be ensured to be more sufficient and reasonable, the reliability of the feature fusion can be effectively improved, the fused features are utilized to replace the original various different types of information for archiving, the archiving effect is effectively improved, the archiving accuracy is improved, the complex post-processing process is not needed, and the archiving efficiency is improved.

The method is a specific implementation of the image-based target archiving method. The application also provides an image-based target archiving device which can be used for realizing the target archiving method. Fig. 5 is a schematic view of the basic structure of the image-based target filing apparatus according to the present application. As shown in fig. 5, the apparatus includes: the system comprises an acquisition unit, a comprehensive perception feature generation unit, a graph construction unit, a feature fusion unit and an archiving unit.

The system comprises an acquisition unit, a storage unit and a storage unit, wherein the acquisition unit is used for acquiring all images to be archived;

A graph construction unit for constructing a graph structure based on the overall perceptual features of all the images; taking each image as a node of a graph structure, and if the probability that the comprehensive perception features of the two images correspond to the same target is greater than a set requirement, edges exist between the two corresponding images;

The feature fusion unit is used for carrying out feature fusion processing based on the overall perception features and the graph structures of all the images to obtain the aggregation features of all the images;

Optionally, in the overall perceptual feature generating unit, the specific process of generating the overall perceptual feature may include:

Splicing the extracted classification features, and taking the spliced result as a comprehensive perception feature;

Or alternatively

Most basically, the graph construction unit may comprise a KNN algorithm processing subunit and an edge setting subunit,

an edge setting subunit for setting edges between the first image and nodes of each of the K images, respectively.

In order to further improve the reliability of the graph structure, the graph construction unit may include a KNN algorithm processing subunit, a neighborhood aware subgraph adjustment model processing subunit and an edge setting subunit,

The KNN algorithm processing subunit is used for determining K images with highest similarity with the overall perception features of the first images aiming at each first image in all the images based on the KNN algorithm to form a candidate image set of the first images; wherein K is a preset positive integer;

The neighborhood perception sub-graph adjustment model processing subunit is used for inputting the comprehensive perception characteristics of the first image and the comprehensive perception characteristics of any image in the candidate image set of the first image into a neighborhood perception sub-graph adjustment model trained in advance, and obtaining the probability that the first image and any image correspond to the same target after the neighborhood perception sub-graph adjustment model is processed;

and the edge setting subunit is used for setting edges between the first image and any image, which correspond to the same target and have the probability larger than the set probability threshold value.

Optionally, the neighborhood aware subgraph adjustment model processing subunit may include a first preprocessing module, a second preprocessing module, an FD processing module, an ND processing module and a full-connection layer processing module,

The first preprocessing module is used for preprocessing FD input data, and specifically comprises the following steps: for a first image and any image, respectively determining a neighborhood feature propagation result of the first image and a neighborhood feature propagation result of any image by a set propagation method, and connecting the overall perception feature of the first image, the neighborhood feature propagation result of the first image, the overall perception feature of any image and the neighborhood feature propagation result of any image together to form an enhanced perception feature of a central node pair formed by the first image and any image;

The second preprocessing module is configured to implement preprocessing of ND input data, and specifically includes: for a center node pair formed by the first image and any one of the images, determining neighbor nodes of the center node pair based on the overall perception characteristics of the images, respectively calculating the distance between each neighbor node and the center node pair, and sequencing each distance to generate a distance list;

the FD processing module is used for inputting the enhanced perception characteristics into the characteristic discriminator to be processed so as to obtain a first embedding output;

The ND processing module is used for inputting the distance list into the neighborhood discriminator for processing to obtain a second embedding output;

And the full-connection layer processing module is used for splicing the first embedding output and the second embedding output together, inputting the splicing result into the full-connection layer for performing two-classification processing, and predicting to obtain probability information of connection between the first image and any image.

Optionally, in the first preprocessing module,

The process of determining the neighborhood feature propagation result of the first image may specifically include:

Selecting one or more second images with the similarity with the overall perception characteristics of the first image being greater than or equal to a set first threshold value from a candidate image set of the first image to form a neighborhood image set of the first image;

determining the average value of the overall perception characteristics of each second image in the neighborhood image set of the first image, and taking the average value as the neighborhood characteristic propagation result of the first image;

the process of determining the neighborhood feature propagation result of any image may specifically include:

selecting one or more third images with the similarity to the overall perception characteristics of any image being greater than or equal to a first threshold value from the candidate image set of any image to form a neighborhood image set of any image;

Optionally, in the second preprocessing module, the determining a processing of the neighboring node of the central node pair may specifically include:

Selecting images with the similarity of the overall perception features of the first image being greater than or equal to a set second threshold value from the candidate image set of the first image, selecting images with the similarity of the overall perception features of any image being greater than or equal to the second threshold value from the candidate image set of any image, and taking the selected images as first-order neighbor nodes of the center node pair;

For each first-order neighbor node, selecting an image with the similarity of the overall perception characteristics of the first-order neighbor node being greater than or equal to a second threshold value from a candidate image set of the first-order neighbor node as a second-order neighbor node of the center node pair;

All first-order neighbor nodes and all second-order neighbor nodes are used as neighbor nodes of the center node pair.

Optionally, in the second preprocessing module, the process of calculating the distance between each neighboring node and the central node pair may specifically include:

Calculating a first distance between each neighbor node and the first image and a second distance between each neighbor node and any image;

the distance of the neighbor node to the center node pair is determined based on the first distance, the second distance, and the order of the neighbor node.

Optionally, in the second preprocessing module,

The way of calculating the first distance may specifically include:

calculating a distance value between the neighbor node and the first image based on the comprehensive perception characteristics of the neighbor node and the first image, and setting the value of the first distance as a set first maximum distance if the distance value is larger than a set first distance threshold; if the distance value is smaller than or equal to a first distance threshold value, setting the value of the first distance as the distance value;

The manner of calculating the second distance includes:

Calculating a distance value between the neighbor node and any image based on the comprehensive perception characteristics of the neighbor node and any image, and setting the value of the second distance as a set second maximum distance if the distance value is larger than a set second distance threshold; if the distance value is smaller than or equal to the second distance threshold value, setting the value of the second distance as a distance value;

the process of determining the distance between the neighbor node and the center node pair may specifically include:

The present application also provides a computer readable storage medium storing instructions that, when executed by a processor, perform steps in implementing an image-based target archiving method as described above. In practice, the computer readable medium may be comprised by or separate from the apparatus/device/system of the above embodiments, and may not be incorporated into the apparatus/device/system. Wherein instructions are stored in a computer readable storage medium which, when executed by a processor, can perform the steps in the image-based target archiving method described above.

According to an embodiment of the present disclosure, the computer-readable storage medium may be a non-volatile computer-readable storage medium, which may include, for example, but is not limited to: portable computer diskette, hard disk, random Access Memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), portable compact disc read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the foregoing, but are not intended to limit the scope of the application. In the disclosed embodiments, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

Fig. 6 is an electronic device provided by the application. As shown in fig. 6, a schematic structural diagram of an electronic device according to an embodiment of the present application is shown, specifically:

The electronic device can include a processor 901 of one or more processing cores, a memory 602 of one or more computer readable storage media, and a computer program stored on the memory and executable on the processor. When the program of the memory 602 is executed, an image-based target archiving method can be implemented.

Specifically, in practical applications, the electronic device may further include a power supply 603, an input/output unit 604, and other components. It will be appreciated by those skilled in the art that the structure of the electronic device shown in fig. 6 is not limiting of the electronic device and may include more or fewer components than shown, or may combine certain components, or a different arrangement of components. Wherein:

the processor 601 is a control center of the electronic device, connects various parts of the entire electronic device using various interfaces and lines, and performs various functions of a server and processes data by running or executing software programs and/or modules stored in the memory 602, and calling data stored in the memory 602, thereby performing overall monitoring of the electronic device.

The memory 602 may be used to store software programs and modules, i.e., the computer-readable storage media described above. The processor 601 executes various functional applications and data processing by running software programs and modules stored in the memory 602. The memory 602 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, application programs required for at least one function, and the like; the storage data area may store data created according to the use of the server, etc. In addition, the memory 602 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid-state storage device. Accordingly, the memory 602 may also include a memory controller to provide access to the memory 602 by the processor 601.

The electronic device further comprises a power supply 603 for supplying power to the various components, which may be logically connected to the processor 601 via a power management system, so that functions of managing charging, discharging, power consumption management, etc. are achieved via the power management system. The power supply 603 may also include one or more of any components, such as a direct current or alternating current power supply, a recharging system, a power failure detection circuit, a power converter or inverter, a power status indicator, and the like.

The electronic device may also include an input output unit 604, which input unit output 604 may be used to receive input numeric or character information and to generate keyboard, mouse, joystick, optical signal inputs related to user settings and function control. The input unit output 604 may also be used to display information entered by a user or provided to a user as well as various graphical user interfaces that may be composed of graphics, text, icons, video, and any combination thereof.

The foregoing description of the preferred embodiments of the invention is not intended to be limiting, but rather to enable any modification, equivalent replacement, improvement or the like to be made within the spirit and principles of the invention.

Claims

1. An image-based target archiving method, comprising:

acquiring all images to be archived;

2. The method of claim 1, wherein the generating the full perception feature comprises:

Or alternatively

3. The method of claim 1, wherein constructing the graph structure comprises:

4. The method of claim 1, wherein constructing the graph structure comprises:

5. The method of claim 4, wherein the processing of the neighborhood aware subgraph adjustment model comprises:

6. The method of claim 5, wherein the determining the neighborhood feature propagation result for the first image comprises:

7. The method of claim 5, wherein the determining the neighbor nodes of the pair of center nodes comprises:

8. The method of claim 7, wherein said calculating the distance of each of said neighbor nodes from said pair of center nodes comprises:

9. The method of claim 8, wherein calculating the first distance comprises:

the means for calculating the second distance includes:

10. An image-based target archiving apparatus, comprising: the system comprises an acquisition unit, a comprehensive perception feature generation unit, a graph construction unit, a feature fusion unit and an archiving unit;

The acquisition unit is used for acquiring all images to be archived;

11. The apparatus according to claim 10, wherein in the full perception feature generating unit, the generating full perception feature includes:

Or alternatively

12. The apparatus of claim 10, wherein the graph construction unit includes a KNN algorithm processing subunit and an edge setting subunit,

13. The apparatus of claim 10, wherein the graph construction unit comprises a KNN algorithm processing subunit, a neighborhood aware subgraph adjustment model processing subunit, and an edge setting subunit,

14. The apparatus of claim 13, wherein the neighborhood aware subgraph adjustment model processing subunit includes a first preprocessing module, a second preprocessing module, an FD processing module, an ND processing module, and a full connection layer processing module,

15. The apparatus of claim 14, wherein, in the first preprocessing module,

16. The apparatus of claim 14, wherein in the second preprocessing module, the determining the neighbor node of the center node pair comprises:

17. The apparatus of claim 16, wherein in the second preprocessing module, the calculating the distance of each of the neighbor nodes from the pair of center nodes comprises:

18. The apparatus of claim 17, wherein, in the second pre-processing module,

The means for calculating the first distance includes:

the means for calculating the second distance includes:

19. A computer readable storage medium having stored thereon computer instructions, which when executed by a processor, implement the image-based target archiving method of any one of claims 1 to 9.

20. An electronic device comprising at least a computer-readable storage medium and a processor;

the processor configured to read executable instructions from the computer readable storage medium and execute the instructions to implement the image-based target archiving method of any one of claims 1-9.