CN114155403A

CN114155403A - Image segmentation Hash sorting method based on deep learning

Info

Publication number: CN114155403A
Application number: CN202111217840.XA
Authority: CN
Inventors: 赖韩江; 龚秦康; 潘炎; 印鉴
Original assignee: Sun Yat Sen University
Current assignee: Sun Yat Sen University
Priority date: 2021-10-19
Filing date: 2021-10-19
Publication date: 2022-03-08

Abstract

The invention provides an image segmentation Hash sorting method based on deep learning, which extracts Hash codes capable of solving the sorting problem of multivariate similarity through a G network, and enlarges expressible similarity under the premise of not increasing the length of the Hash codes through well-designed multi-segment Hash distance measurement and a dense triple loss function, so that the Hash codes have richer semantic information and keep less calculation loss in the sorting problem.

Description

Image segmentation Hash sorting method based on deep learning

Technical Field

The invention relates to the field of computer application technology and computer vision, in particular to an image segmentation hash ordering method based on deep learning.

Background

In recent years, with the rapid development of the internet, the network has become a main approach for people to entertain and acquire information, in the process, a large amount of image data is accumulated on the internet, and the current quite mature text retrieval technology can help people to acquire information, but is still deficient for utilizing images to retrieve information. The image retrieval technology can help people to find other images related to a certain image, for example, when online shopping is carried out, a garment similar to the certain garment is wanted to be found, but the garment is difficult to describe by words, at this time, an online shopping platform usually provides an interface for searching images by using images, a user only needs to provide images of wanted products, a system automatically sequences in a database, and similar commodities are found for the user to select, so the image retrieval technology has great attraction to both academic circles and industrial circles.

The current common image retrieval technology is based on category information for retrieval, namely, only images similar to the current retrieved image are considered to be images of the same category as the image, and semantic distances among labels are ignored, for example, if a cat and a dog are animals, the dog should be arranged in front of the image of the automobile relative to the automobile. The method based on deep learning extracts image features, trains under supervision information based on semantic distance, and then sequences. The application of the deep learning model in the field of pictures is mature, and currently, common image feature extraction networks include VGG, ResNet and the like.

For some of the above problems, the ResNet network is adopted after investigation, and the model has many depths, such as 18 layers, 34 layers, 50 layers, 101 layers, 152 layers and the like which are common. Generally, the deeper the depth, the more detail features of the image can be extracted. However, the deeper the depth, the higher the computational overhead and the higher the hardware requirements. After combining the factors, the image feature extraction is carried out by adopting ResNet of 50 layers. Tests show that 50 layers of ResNet can achieve a good effect. By converting the real continuous features of the image into binary hash codes, the retrieval overhead can be greatly accelerated. However, since the distance metric of the conventional binary hash code uses the hamming distance, the hamming distance is enough to solve the search problem based on the classification label in the past because only the image or not image is needed to be distinguished, only two similarities are needed, and when the problem is expanded to the sorting problem requiring more similarities, the hamming distance is insufficient, for example, the sorting of 100 categories is solved, 100 similarities are needed, at least 99 bits of hash codes are needed to calculate 100 hamming distances, but a longer hash code brings more calculation overhead. In order to solve the problem of insufficient similarity, a segmented hash distance measurement function is designed, and expressible similarity of the segmented hash distance measurement function is increased on the premise of not increasing the length of the hash code.

Disclosure of Invention

The present invention provides a + idiom + name to overcome at least one of the above-mentioned drawbacks (deficiencies) of the prior art.

In order to achieve the technical effects, the technical scheme of the invention is as follows:

an image segmentation hash sorting method based on deep learning comprises the following steps:

s1: establishing a deep learning network model G for extracting the image hash code;

s2: carrying out distance calculation on the hash code by using a new segmented hash measurement mode;

s3, training and testing the model by using the new sequencing loss function;

s4: establishing a process for providing a background interface, providing a sequencing entry and returning a sequencing result.

Further, the specific process of step S1 is:

s11: establishing a first module of a G network, representing each preprocessed image into a low-dimensional real number vector, pre-training a model ResNet-50 on a large-scale marking photo, and extracting a set of real number characteristic vectors X with set length through the ResNet-50 model;

s12: and a second module for establishing a G network, wherein a fully-connected layer is used for mapping a real number feature vector X with a set length into an n-bit hash code, wherein n is an even number, the hash code is a string of n-bit binary codes substantially, the output result of the fully-connected layer is still a real number, the sign of each bit of the real number is taken as the result of the final hash code on the bit, namely 1 represents a positive number, and 0 represents a negative number, and because the sign-taking operation is an unguided operation, a tanh function is used as an activation function on the fully-connected layer in the training process and is used as an approximation to the hash code.

Further, the specific process of step S2 is:

s21: dividing n-bit hash codes output by the G network into a front section and a rear section, wherein the length of each section is n/2;

s22: distance calculation is carried out by a newly designed segmented hash measurement mode, for any two n-bit hash codes, hamming distances are calculated for the high n/2 bit and low n/2 bit hash codes of the two, the hamming distance calculated by the high-bit hash code is recorded as d1, the hamming distance calculated by the low-bit hash code is recorded as d2, and the final distance:

further, the specific process of step S3 is:

s31: dividing the data set into training data and testing data;

s32: the integral model is trained, and the training steps of the G network are as follows: extracting image hash codes by a G network, calculating a distance by newly designed piecewise hash measurement, then calculating a sequencing loss function and minimizing loss to train a G network model, and optimizing parameters of the G network;

s33: the test steps of the model are as follows: dividing the test data set into a query set and a query set, and sequencing the images in the query set by using the images in the query set; inputting data in a retrieval set into a G network, generating a hash code by the G network, and storing the hash code result into a database DB; then inputting each image in the query set into a G network, performing distance calculation on the obtained hash code and data in the DB, and then performing nDCG calculation, wherein the specific calculation mode is as follows: for the q-th query hash code, calculating the distance between the q-th query hash code and data hash code:

wherein r is_iIndicates the similarity of the sample ranked at the ith bit with the query, Z_KAnd (3) representing the calculation results of similarity in an ideal order from large to small, namely a normalization factor, and finally averaging the nDCG of each query to obtain a final result.

Further, the specific process of step S4 is:

s41: storing the trained ResNet-50 model;

s42: establishing a background service process, and reserving an interface for image input;

s43: inputting the image by accessing the interface created in the S42, and then preprocessing the video by the background service process of the S42 to be processed into an input format required by the ResNet-50 model of the S41; then, a ResNet-50 model stored in S41 is called, the processed image is input into the model, and n-bit hash codes are obtained; and then transferring the image hash codes stored in the database to perform piecewise hash measurement to calculate the distance, and returning the first k images after sorting according to the size, namely the most similar first k images are retrieval results.

Further, in step S12, the feature extraction process is as follows:

pre-training a ResNet-50 model through an ImageNet image data set, and then finely adjusting the ResNet-50 model on a Cifar-100 data set; after each image passes through a pre-trained ResNet-50 model, a group of continuous characteristic vectors with 2048-dimensional size can be generated, then the continuous characteristic vectors are converted into n-dimensional continuous characteristic vectors through a layer of full connection layer, and then the n-dimensional continuous characteristic vectors are converted into self-defined n-length codes through a Hash layer.

Further, in step S32, in the training process of the G network, a dense triple loss is used as a loss function, where a distance metric function in the triple loss is a new segment hash distance metric, and the dense triple loss function definition and the conventional triple definition are:

Loss＝||output_a-output_p||_H-||output_a-output_n||_H+Margin

whereinoutput_aIs an anchor point, output_pIs a positive sample, output_nFor negative examples, positive and negative examples are no longer just samples of the same or different classes as the anchor point, but positive examples can constitute triples as long as they are semantically closer to the anchor point than negative examples, and semantic distances between classes are represented by calculating word-embedding distances of tagged words.

Further, in step S32, in the training process of the G network, the dense triple loss is used as a loss function. Wherein the distance metric function in the triple loss is the new segment hash distance metric. The dense triple loss function definition and the traditional triple definition are: loss | | | output_a-output_p||_H-||output_a-output_n||_H+Margin

Wherein output_aIs an anchor point, output_pIs a positive sample, output_nThe semantic information is characterized by the currently advanced word embedding technology, and the semantic distance between the categories is represented by calculating the word embedding distance of the label words. Designed | · | non-conducting phosphor_HThe method solves the problem that the similarity can be described too little by the traditional Hamming distance method in the segmented Hash distance measurement, and solves the problem that the similarity is not enough on the basis of not increasing the bit number and the parameter number.

Compared with the prior art, the technical scheme of the invention has the beneficial effects that:

the invention can extract the Hash code which can be used for solving the ordering problem of the multivariate similarity through the G network, and the expressive similarity degree is expanded on the premise of not increasing the length of the Hash code through the well-designed multi-segment Hash distance measurement and the dense triple loss function, so that the Hash code has richer semantic information, and less calculation loss is kept in the ordering problem.

Drawings

FIG. 1 is a complete diagram of the algorithmic model of the present invention;

FIG. 2 is a diagram illustrating a segment hash distance according to the present invention.

Detailed Description

The drawings are for illustrative purposes only and are not to be construed as limiting the patent;

for the purpose of better illustrating the embodiments, certain features of the drawings may be omitted, enlarged or reduced, and do not represent the size of an actual product;

it will be understood by those skilled in the art that certain well-known structures in the drawings and descriptions thereof may be omitted.

The technical solution of the present invention is further described below with reference to the accompanying drawings and examples.

As shown in fig. 1-2, an image segment hash sorting method based on deep learning includes the following steps:

s3, training and testing the model by using the new sequencing loss function;

Further, the specific process of step S1 is:

The specific process of step S2 is:

the specific process of step S3 is:

s31: dividing the data set into training data and testing data;

The specific process of step S4 is:

s41: storing the trained ResNet-50 model;

In step S12, the feature extraction process is as follows:

In step S32, in the training process of the G network, a dense triple loss is used as a loss function, where a distance metric function in the triple loss is a new segment hash distance metric, and the dense triple loss function definition and the conventional triple definition are:

Loss＝||output_a-output_p||_H-||output_a-output_n||_H+Margin

wherein output_aIs an anchor point, output_pIs a positive sample, output_nBeing negative samples, positive and negative samples are no longer just samples of the same and different class as the anchor point, but positive samplesThe method can form a triple only by being closer to an anchor point than a negative sample in semantics, and the semantic distance between categories is represented by calculating the word embedding distance of the label words.

In step S32, in the training process of the G network, the dense triplet loss is used as a loss function. Wherein the distance metric function in the triple loss is the new segment hash distance metric. The dense triple loss function definition and the traditional triple definition are: loss | | | output_a-output_p||_H-||output_a-output_n||_H+Margin

The same or similar reference numerals correspond to the same or similar parts;

the positional relationships depicted in the drawings are for illustrative purposes only and are not to be construed as limiting the present patent;

it should be understood that the above-described embodiments of the present invention are merely examples for clearly illustrating the present invention, and are not intended to limit the embodiments of the present invention. Other variations and modifications will be apparent to persons skilled in the art in light of the above description. And are neither required nor exhaustive of all embodiments. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the claims of the present invention.

Claims

1. An image segmentation hash sorting method based on deep learning is characterized by comprising the following steps:

s3: training and testing the model by using a new sequencing loss function;

2. The deep learning based image segment hash ordering method according to claim 1, wherein the specific process of the step S1 is:

3. The image segmentation hash sorting method based on deep learning according to claim 2, wherein the specific process of the step S2 is:

4. the deep learning based image segment hash ordering method according to claim 3, wherein the specific process of the step S3 is:

s31: dividing the data set into training data and testing data;

5. The deep learning based image segment hash ordering method according to claim 4, wherein the specific process of the step S4 is:

s41: storing the trained ResNet-50 model;

6. The deep learning based image segmentation hash sorting method according to claim 5, wherein in step S12, the feature extraction process is as follows:

7. The image segmentation hash ordering method based on deep learning of claim 6, wherein in step S32, in the training process of the G network, a dense triple loss is used as a loss function, wherein a distance metric function in the triple loss is a new segmentation hash distance metric, and the dense triple loss function is defined as follows:

Loss＝||output_a-output_p||_H-||output_a-output_n||_H+Margin

wherein output_aIs an anchor point, output_pIs a positive sample, output_nFor negative examples, positive and negative examples are no longer just samples of the same or different classes as the anchor point, but positive examples can constitute triples as long as they are semantically closer to the anchor point than negative examples, and semantic distances between classes are represented by calculating word-embedding distances of tagged words.

8. The deep learning-based image segmentation hash sorting method according to claim 7, wherein designed | · | | | computationally_HIn order to measure the segment hash distance, the segment hash distance measure is adopted, after segmentation, the high-order distances have 33 possibilities, the low-order distances also have 33 possibilities, and the weights of the high-order distances and the low-order distances are different, 1089 possible distances can be generated, namely 1089 similarities exist, so that the problem of insufficient similarity is solved on the basis of not increasing the number of bits and parameters.