CN115878823B

CN115878823B - Deep hash method and traffic data retrieval method based on graph convolution network

Info

Publication number: CN115878823B
Application number: CN202310195620.4A
Authority: CN
Inventors: 胡超; 夏方尚元; 施鹤远; 刘荣凯; 梁锴
Original assignee: Central South University
Current assignee: Central South University
Priority date: 2023-03-03
Filing date: 2023-03-03
Publication date: 2023-04-28
Anticipated expiration: 2043-03-03
Also published as: CN115878823A

Abstract

The invention discloses a deep hash method based on a graph rolling network, which comprises the steps of obtaining a training image; performing data enhancement on the image data; constructing a vision converter module; inputting output data of the visual transducer module into a graph convolution network for correlation optimization; mapping an output image of the graph rolling network through a full connection layer and an activation function to obtain a hash code; constructing a comprehensive loss function optimization hash process; and completing the actual deep hash process according to the final optimization result. The invention also discloses a traffic data retrieval method comprising the deep hash method based on the graph rolling network. The invention ensures the correlation relation of the low-dimensional Hamming space consistent with the high-dimensional space of the original image, generates more efficient and compact binary hash codes, improves the effectiveness of large-scale picture retrieval, and has high reliability, good effectiveness, simplicity and convenience.

Description

Deep hash method and traffic data retrieval method based on graph convolution network

Technical Field

The invention belongs to the technical field of computers, and particularly relates to a deep hash method and a traffic data retrieval method based on a graph rolling network.

Background

Along with the development of economic technology and the improvement of living standard of people, the compression mapping technology is widely applied to the production and living of people, and brings endless convenience to the production and living of people. The deep hash is used as a compression mapping technology, and the core idea is to map high-dimensional image information into low-dimensional binary codes by learning a hash function, and simultaneously maintain semantic information and similarity relation of the image. The deep hash technology is widely applied to large-scale image retrieval tasks in the fields of intelligent transportation, education big data and the like.

Currently, the mainstream depth hash method is based on a visual transformer (VIT, vision Transformer) to extract the features of an image, and the image features are mapped to a low-dimensional hamming space through a full-connection layer to construct a loss function to optimize a hash model, so that the generated hash code can maintain the semantic information and similarity relation of an original image.

However, the current deep hash method only considers that the correlation of the image is enhanced through the correlated loss function, so that the generated hash code can well maintain the similarity relationship. However, the effect of this approach is highly dependent on the effectiveness of the loss function, and it is very difficult to design an extremely effective loss function; moreover, if the loss function is not fully considered in the design, the hash method has unsatisfactory effect.

In addition, the traffic data retrieval process based on the existing hash method has the defects of poor reliability, low efficiency and extremely complex retrieval algorithm.

Disclosure of Invention

The invention aims to provide a deep hash method based on a graph rolling network, which has high reliability, good effectiveness, simplicity and convenience.

Another object of the present invention is to provide a traffic data retrieval method including the deep hash method based on graph rolling network.

The deep hash method based on the graph rolling network provided by the invention comprises the following steps:

s1, acquiring a training image;

s2, randomly cutting the image obtained in the step S1 to complete data enhancement of image data;

s3, constructing a visual converter module based on block embedding, position embedding and an encoder;

s4, inputting output data of the vision converter module constructed in the step S3 into a graph convolution network for correlation optimization;

s5, mapping the output image of the graph rolling network obtained in the step S4 through a full connection layer and an activation function to obtain a hash code;

s6, constructing a comprehensive loss function based on similarity loss and semantic loss, and optimizing the hash processes of the steps S3-S5;

s7, according to a final optimization result, the actual deep hash process is completed.

The step S2 specifically comprises the following steps:

unifying the images acquired in the step S1 into a 256×256 square image;

and randomly clipping the unified image by adopting a 224 multiplied by 224 clipping frame, thereby completing the data enhancement of the image data.

The step S3 specifically comprises the following steps:

the visual converter comprises a block embedding module, a position embedding module and an encoder module which are sequentially connected in series;

the block embedding module is used for dividing an input image into a plurality of blocks, adding a class token to be learned, obtaining an image block and an embedding vector, and inputting the image block and the embedding vector into the position embedding module together;

the position embedding module is used for adding sequence information to the input image blocks so as to generate a vector for classification;

the encoder module is used for extracting image characteristics of the vector output by the position embedding module.

The block embedding module cuts the input image into piecespThe block is provided with a plurality of channels,

whereinHFor the length of the input image to be taken,Win order to be able to input the width of the image,Pthe length or width of the segmented image; then, a class token to be learned is added>

Obtaining an embedded vector

Is->

Wherein->

Is the firstiFirst of sheet imagepA block.

The position embedding module adds sequence information for the input image blockPEThereby generating a vector for classificationz ₀ Is that

。

The encoder module comprises m blocks; each block comprises a first layer normalization sub-module, a multi-head self-attention sub-module, a second layer normalization sub-module and a multi-layer perceptron sub-module; the calculation process for each block is represented by the following formula:

in->

Is the firstmOutput characteristics of the individual blocks;MLP() Processing functions of the multi-layer perceptron submodule; />

() Normalizing the processing functions of the sub-modules for the second layer; />

Is an intermediate variable;MSA() Processing functions of the multi-head self-attention sub-module;LN ₁ () The processing functions of the sub-modules are normalized for the first layer.

The step S4 specifically comprises the following steps:

class-based tokenx _cls Calculating cosine similarity between image pairs to obtain a similarity relation matrixVIs that

Wherein->

Is the firstiClass tokens for the individual images; />

Is->

Is the modulus of the vector of (a);

image data characteristics output by the vision converter module are taken as points, and the similarity relation matrix is similarVAs an edge relation, inputting the edge relation into a graph convolution network for correlation optimization; the following formula is used as a propagation rule for the graph roll-up network:

in->

Is the firstlAn output feature matrix of the layer; />

Is the firstlAn input feature matrix of the layer; />

Is an activation function; />

Is->

Degree matrix of->

，/>

Is->

Middle (f)iLine 1iColumn element->

Is a matrix

Middle (f)iLine 1jElements of a column; />

Is an adjacency matrix of an undirected graph composed of images and +.>

，I _n Is a unit matrix;

is the firstlWeight parameters of the layers.

Step S5 is to map the output image of the graph rolling network obtained in step S4 with a sign activation function through a full connection layer to obtain a K-bit hash codeHIs that

Wherein->

。

The step S6 is based on similarity loss and semantic loss, and a comprehensive loss function is constructed, and specifically comprises the following steps:

the following equation is used as a similarity loss function

：

In the middle ofw _ij For training pairsx _i Andx _j weights of (2), and->

，S ₁ For the number of similar items in the dataset,S ₀ for the number of dissimilar items in the dataset,Sfor the total number in the dataset +.>

Is the firstiImage and the firstjSimilarity labels for the individual images; />

Is thatx _i Obtained by mappingKBit hash codes;q _ij for +/based on binary hash code>

And->

Intermediate variable calculated by cosine similarity between them, and +.>

；

The following formula is adopted as a semantic loss function

：

In the middle ofw _i Is a weight parameter, and->

，c _t Is the firstiNumber of pictures of the category to which the piece of picture belongs, +.>

Is the firstiThe correct number of classified categories to which the pictures belong; />

Is the firstiGenuine category label of picturejThe value of the bit; />

Is the firstiPrediction category label of a picturejThe value of the bit;

combining similarity loss and semantic loss to construct a comprehensive loss function

：/>

In->

Is a set super parameter; />

Is the L2 norm of the model parameters and is used for preventing the phenomenon of overfitting in the model training process.

The invention also provides a traffic data retrieval method comprising the deep hash method based on the graph convolutional network, which comprises the following steps:

A. acquiring traffic original data to be retrieved and traffic original data in a database;

B. a deep hash method based on a graph rolling network is adopted, hash codes of pictures to be searched and hash codes of pictures in a database are respectively generated, the hash codes of the pictures to be searched and the hash codes of the pictures in the database are compared and ordered according to Hamming distances, and a search result is returned;

C. and B, obtaining a processing result which is a final traffic data retrieval result.

According to the depth hash method and the traffic data retrieval method based on the graph rolling network, the maintenance of the similarity relationship between images is enhanced by introducing the graph rolling network, the feature flow between image features is promoted by utilizing the correlation relationship between images in the original image space, so that the distance between the similar images is closer, the correlation relationship between the low-dimensional Hamming space and the high-dimensional space of the original image is ensured, more efficient and compact binary hash codes are generated, and the effectiveness of large-scale image retrieval is improved; the invention has high reliability, good effectiveness, simplicity and convenience.

Drawings

Fig. 1 is a flow chart of the hash method of the present invention.

Fig. 2 is a flow chart of a traffic data retrieving method according to the present invention.

Detailed Description

Fig. 1 is a schematic flow chart of a hash method according to the present invention: the deep hash method based on the graph rolling network provided by the invention comprises the following steps:

s1, acquiring a training image;

in practice, it is assumed that there is a set of training sets

Comprising n training images and a corresponding set of labels +.>

Each label is provided withy _i Are all made ofcThe dimension vector is represented by a vector of dimensions,crepresenting the category number of the pictures in the dataset;

for any two pictures in the dataset, a similarity tag may be generated

If->

And->

Similarly, then->

=1; otherwise->

=0; deep hash learning nonlinear hash function ++>

The method comprises the steps of carrying out a first treatment on the surface of the Mapping an image from a high-dimensional space to a low-dimensional space, i.e. +.>

Hash code mapped to K bits +.>

At the same time according to the similarity matrixSMaintaining similar information of pictures, that is to say if

=1，/>

And->

The smaller the hamming distance between them; vice versa;

s2, randomly cutting the image obtained in the step S1 to complete data enhancement of image data; the method specifically comprises the following steps:

unifying the images acquired in the step S1 into a 256×256 square image;

randomly cutting the unified image by adopting a 224 multiplied by 224 cutting frame, thereby completing the data enhancement of the image data;

s3, constructing a visual converter module based on block embedding, position embedding and an encoder; the method specifically comprises the following steps:

the block embedding module is used for dividing the input image into a plurality of blocksAdding a class token to be learned to obtain an image block and an embedded vector, and inputting the image block and the embedded vector into a position embedding module together; in specific implementation, the block embedding module divides the input image intopThe block is provided with a plurality of channels,

Obtaining an embedded vector->

Is->

Wherein

Is the firstiFirst of sheet imagepA block;

the position embedding module is used for adding sequence information to the input image blocks so as to generate a vector for classification; in specific implementation, the position embedding module adds sequence information for the input image blockPEThereby generating a vector for classificationz ₀ Is that

；

The encoder module is used for extracting image characteristics of the vector output by the position embedding module; in particular implementations, the encoder module includes m blocks; each block comprises a first layer normalization sub-module, a multi-head self-attention sub-module, a second layer normalization sub-module and a multi-layer perceptron sub-module; the calculation process for each block is represented by the following formula:

in the middle ofz _m Is the firstmOutput characteristics of the individual blocks;MLP() Processing functions of the multi-layer perceptron submodule;LN ₂ () Normalizing the processing functions of the sub-modules for the second layer; />

Is an intermediate variable;MSA() Processing functions of the multi-head self-attention sub-module;LN ₁ () Normalizing the processing functions of the sub-modules for a first layer;

s4, inputting output data of the vision converter module constructed in the step S3 into a graph convolution network for correlation optimization; the method specifically comprises the following steps:

class-based token

Calculating cosine similarity between image pairs to obtain a similarity relation matrixVIs that

Wherein->

Is the firstiClass tokens for the individual images; />

Is->

Is the modulus of the vector of (a);

in->

Is the firstlAn output feature matrix of the layer; />

Is the firstlAn input feature matrix of the layer; />

Is an activation function; />

Is->

Degree matrix of->

，/>

Is->

Middle (f)iLine 1iThe elements of the column are arranged such that,

for matrix->

Middle (f)iLine 1jElements of a column; />

Is an adjacency matrix of an undirected graph composed of images and +.>

，I _n Is a unit matrix; />

Is the firstlThe weight parameters of the layers; s5, mapping the output image of the graph rolling network obtained in the step S4 through a full connection layer and an activation function to obtain a hash code; specifically, mapping the output image of the graph rolling network obtained in the step S4 through a full connection layer and a sign activation function to obtain a K-bit hash codeHIs->

Wherein->

；

S6, constructing a comprehensive loss function based on similarity loss and semantic loss, and optimizing the hash processes of the steps S3-S5; the method specifically comprises the following steps:

assuming that P represents joint probability distribution in an original high-dimensional space of the image, and Q represents joint probability distribution of the hash code in a low-dimensional Hamming space; in order for the learned hash code to retain similar information to the original picture, the distribution Q should be made as similar as possible to the distribution P; here, JS divergence is adopted as the measurement method;

wherein the P distribution is fixed bys _ij Instead ofp _ij The method comprises the steps of carrying out a first treatment on the surface of the This means that the distribution Q should be as close as possible to the distribution P; that is to sayp _ij When the number of the codes is =1,q _ij should approach 1 as closely as possible; otherwise the first set of parameters is selected,q _ij should approach zero; here, according to binary hash codesh _i Andh _j cosine similarity betweenq _ij ：/>

Therefore, the following formula is finally adopted as the similarity loss functionL _sim ：/>

In the middle ofw _ij For training pairsx _i Andx _j weights of (2), and->

，S ₁ For the number of similar items in the dataset,S ₀ to refer to the number of dissimilar terms in the dataset,Sfor the total number in the dataset +.>

Is thatx _i Obtained by mappingKBit hash codes; />

For +/based on binary hash code>

And->

Intermediate variable calculated by cosine similarity between them, and +.>

；

In order to ensure that the hash code finally generated by the model still can well keep the semantic information of the original image, classifying the finally generated hash code to obtain a classification vector

Each of which isl _i Are all made ofcThe dimension vector is represented by a vector of dimensions,

the method comprises the steps of carrying out a first treatment on the surface of the Therefore, the following expression is adopted as the semantic loss functionL _sem ：/>

In the middle ofw _i Is a weight parameter, and->

Is the firstiThe category to which the picture belongs is correctly classifiedThe number of (3); />

Is the firstiGenuine category label of picturejThe value of the bit; />

Is the firstiPrediction category label of a picturejThe value of the bit;

：/>

In->

Is a set super parameter; />

The model is L2 norm of model parameters and is used for preventing the phenomenon of over fitting in the model training process; by constantly iterating the optimization +.>

An efficient hash model can be learned;

According to the hash method, feature correlation optimization is carried out through the graph convolution network, the graph convolution network can be utilized to update the features of similar nodes according to the adjacent matrix of the nodes in the graph, namely, iteration is carried out on point information by utilizing the relationship of edges, similar information among data is kept, the similarity among data points is learned through the graph convolution network, the similar information is integrated in the image features, so that a hash code which better keeps the similarity relationship of an original image is generated, and the hash retrieval performance is further improved; in the hash method, the JS divergence and the cross entropy loss function in the probability theory are used for reference, and corresponding similarity loss function and semantic loss function are designed for large-scale image retrieval to optimize the integral hash model, so that correlation optimization is enhanced; meanwhile, in order to solve the problem of uneven distribution of images, corresponding weights are designed, and the image training intensity with less distribution and wrong classification is enhanced so as to meet the requirement of actual retrieval.

The effect of the hash method of the present invention is further described below in conjunction with one embodiment:

the data set used in the experiment was CIFAR-10. To develop the comparison experiment, the dataset is further divided into a training dataset, a query dataset, and a retrieval dataset. The detailed settings for the data sets are shown in table 1:

table 1 detailed setup schematic table of dataset

The comparison methods adopted in the experiment are respectively as follows: DSH, hashNet, DCH, IDHN and QSMIH. The average accuracy results obtained with different Ha Xiwei are shown in table 2:

table 2 table of average accuracy results obtained for different Ha Xiwei

As can be seen from table 2, the hash method of the present invention achieves better retrieval performance at different experimental settings than other methods.

Fig. 2 is a flow chart of a traffic data retrieving method according to the present invention: the traffic data retrieval method comprising the deep hash method based on the graph rolling network provided by the invention comprises the following steps:

C, obtaining the information of the vehicle owner through the traffic data retrieval result obtained in the step C, thereby realizing the record and punishment of illegal vehicles; or combining the traffic flow model, accurately matching vehicles from cameras at a plurality of intersections, drawing vehicle tracks, and effectively relieving traffic jams.

Claims

1. A deep hash method based on a graph rolling network comprises the following steps:

s1, acquiring a training image;

2. The deep hashing method based on graph rolling network according to claim 1, wherein the step S2 specifically includes the following steps:

unifying the images acquired in the step S1 into a 256×256 square image;

3. The deep hashing method based on graph rolling network according to claim 2, wherein the step S3 specifically includes the following steps:

4. A deep hashing method based on graph rolling network according to claim 3 wherein the block embedding module segments the input image into piecespThe block is provided with a plurality of channels,

whereinHFor the length of the input image to be taken,Win order to be able to input the width of the image,Pthe length or width of the segmented image; then, a class token to be learned is addedx _cls Obtaining an embedded vectorX _emd Is that

Whereinx _ip Is the firstiFirst of sheet imagepA block.

5. The deep hashing method based on graph rolling network of claim 4 wherein the position embedding module adds sequence information to the input image blockPEThereby generating a vector for classificationz ₀ Is that

。

6. The graph-rolling network based depth hash method of claim 5, wherein the encoder module comprises m blocks; each block comprises a first layer normalization sub-module, a multi-head self-attention sub-module, a second layer normalization sub-module and a multi-layer perceptron sub-module; the calculation process for each block is represented by the following formula:

7. The deep hashing method based on graph rolling network according to claim 6, wherein the step S4 specifically includes the following steps:

Wherein->

Is the firstiClass tokens for the individual images; />

Is->

Is the modulus of the vector of (a);

in->

Is the firstlAn output feature matrix of the layer; />

Is the firstlAn input feature matrix of the layer; />

Is an activation function; />

Is->

Degree matrix of->

，/>

Is->

Middle (f)iLine 1iColumn element->

For matrix->

Middle (f)iLine 1jElements of a column; />

Is an adjacency matrix of an undirected graph composed of images and +.>

，I _n Is a unit matrix; />

Is the firstlWeight parameters of the layers.

8. The deep hashing method based on graph rolling network as claimed in claim 7, wherein the step S5 is specifically to map the output image of the graph rolling network obtained in the step S4 with a full connection layer and sign activation function to obtain a K-bit hash codeHIs that

Wherein->

。

9. The deep hashing method based on graph rolling network according to claim 8, wherein the constructing the comprehensive loss function based on similarity loss and semantic loss in step S6 specifically includes the following steps:

the following equation is used as a similarity loss functionL _sim ：

In the middle ofw _ij For training pairsx _i Andx _j weights of (2), and->

，S ₁ For the number of similar items in the dataset,S ₀ for the number of dissimilar items in the dataset,Sas a total number in the data set,s _ij is the firstiImage and the firstjSimilarity labels for the individual images;h _i is thatx _i Obtained by mappingKBit hash codes;q _ij based on binary hash codesh _i Andh _i intermediate variable calculated by cosine similarity between them, and +.>

；

The following formula is adopted as a semantic loss functionL _sem ：

In the middle ofw _i Is a weight parameter, and->

，c _t Is the firstiThe number of pictures of the category to which the piece of picture belongs,c _tp is the firstiThe correct number of classified categories to which the pictures belong;y _ij is the firstiGenuine category label of picturejThe value of the bit;l _ij is the firstiPrediction category label of a picturejThe value of the bit;

combining similarity loss and semantic loss to construct a comprehensive loss functionL _total ：

In->

Is a set super parameter; />

10. A traffic data retrieval method comprising the deep hash method based on graph convolutional network according to one of claims 1 to 9, characterized by comprising the following steps:

B. a depth hash method based on a graph convolution network according to one of claims 1-9 is adopted to respectively generate hash codes of pictures to be searched and hash codes of pictures in a database, the hash codes of the two are compared and ordered according to hamming distances, and a search result is returned;