CN108921850B

CN108921850B - Image local feature extraction method based on image segmentation technology

Info

Publication number: CN108921850B
Application number: CN201810336591.8A
Authority: CN
Inventors: 张雷; 陈杰
Original assignee: Boyun Vision Beijing Technology Co ltd
Current assignee: Boyun Vision Beijing Technology Co ltd
Priority date: 2018-04-16
Filing date: 2018-04-16
Publication date: 2022-05-17
Anticipated expiration: 2038-04-16
Also published as: CN108921850A

Abstract

The invention discloses an image local feature extraction method based on an image segmentation technology, which comprises the following steps: 1. constructing an image segmentation model; 2. inputting the image into a CNN network, obtaining a multilayer characteristic diagram, and obtaining the characteristic diagram and information of the position of an original pixel when each characteristic diagram is subjected to down-sampling after a plurality of convolution layers and pooling down-sampling; 3. the feature map is up-sampled through an up-sampling module, and pixels are re-distributed to original positions; 4. calculating the softmax loss for each pixel position on the newly generated feature map; 5. continuously iterating the process until the returned loss value is smaller than an acceptance range, and completing the process of constructing and training the image segmentation model; 6. and completing the extraction of the local features of the image through a training feature extraction network. The invention can realize accurate retrieval of different degrees and accurately position the target and the target part, thereby extracting the characteristics of the key part and carrying out detailed characteristic comparison.

Description

Image local feature extraction method based on image segmentation technology

Technical Field

The invention relates to the technical field of image processing, in particular to an image local feature extraction method based on an image segmentation technology.

Background

Image retrieval is a process of searching for similar images or the same target image using images. Accurate retrieval requires that the searched images are all images of the same object, not similar objects. The image retrieval process comprises the steps of extracting image features, warehousing the image features, comparing the features and sequencing output results according to similarity. In the market, many image retrieval methods exist, and the main innovation is focused on an image feature extraction method, a feature comparison method and a final output result sorting method. The method is characterized in that an image segmentation technology is utilized, a depth model is used for carrying out pixel-level segmentation on an image or a target, the position of the target (a pedestrian or a vehicle is positioned in the image) or the position of a target part (the trunk or the limbs of the person) is accurately positioned, corresponding extraction features are carried out aiming at the positions, whether the partial features are the same target or not can be finely distinguished through comparison of the partial features, compared with the global features, the local features contain more detail information, and the method is more beneficial to distinguishing the targets.

Most of the prior art feature extraction schemes are based on global features, and the extraction of local features at a pixel level is rarely performed by a little fine local features, and basically, the extraction of local features is performed by using an image gradient or a method of using a selected image block. In the image retrieval task of similar images, the characteristics can basically meet the task requirements, but when the retrieval task of the same target is searched in detail, the characteristics cannot well meet the task requirements.

In addition, in a search task, the search image generally requires that the input image is an image including only the target image, or that the input image is a full image (a image including various scenes and targets), and then the input image is detected once before the search is performed, so that the target image can be searched.

Disclosure of Invention

In view of the foregoing defects in the prior art, an object of the present invention is to provide a method for extracting local features of an image based on an image segmentation technique, so as to solve the deficiencies in the prior art.

In order to achieve the above object, the present invention provides an image local feature extraction method based on an image segmentation technology, which includes the following steps:

step 1, constructing an image segmentation model, wherein the segmentation model consists of a plurality of network layers with different functions and comprises a CNN model structure, a batchnorm layer structure and a deconv layer structure;

step 2, inputting the image into a CNN network to obtain a multilayer characteristic diagram, and obtaining the characteristic diagram and information of the position of an original pixel when each characteristic diagram is subjected to down-sampling after a plurality of convolution layers and pooling down-sampling;

step 3, the feature map is up-sampled through an up-sampling module, and pixels are re-distributed to original positions according to original position information of the pixels on each layer of feature map, so that the features of the generated feature map and the original input image are kept consistent;

step 4, calculating softmax loss of each pixel position on a newly generated characteristic diagram, classifying the position of each pixel point, calculating loss aiming at each pixel position by comparing a manually marked image label with a result output by a network, and training parameters in a model through return loss;

step 5, continuously iterating the process until the returned loss value is smaller than an acceptance range, and completing the process of constructing and training the image segmentation model;

and 6, after the construction and training processes of the image segmentation model are completed, extracting the features of different parts of the original image according to the output image of the last segmentation network through the training feature extraction network, and completing the extraction of the local features of the image.

In the above method for extracting local features of an image based on an image segmentation technique, the feature extraction network in step 6 includes a model building part and a feature training part.

In the method for extracting the local image features based on the image segmentation technology, the branch part for extracting the features of the feature extraction network adopts residual error design, the residual error network of the feature extraction network is input through the output of the image segmentation network model, and the training of the feature extraction network is jointly supervised by finally using the triple loss function and the softmax loss function through a series of residual error layers.

In the method for extracting local features of an image based on the image segmentation technology, the image input by the feature extraction network is an image at a corresponding position from an original image according to an output result of the segmentation network, a rectangular region is obtained, the size of the rectangular region is set according to an actual retrieval requirement, pixels at corresponding positions of the original image are filled in pixel positions to which required categories belong in the rectangular region, and all the rest positions are set to be 0.

In the method for extracting the local features of the image based on the image segmentation technology, when the feature extraction network is trained, an input image of the network is a triple image, and the triple comprises two images of the same target and one image of a different target; and when the training is finished and the user uses the training, inputting an image of the feature extraction network.

In the above method for extracting local features of an image based on an image segmentation technique, the network input image for feature extraction may be a target image segmented from the whole scene, or may be a local image of the target image.

The invention has the beneficial effects that:

according to the invention, an image segmentation technology is utilized, and the image feature part adopts a fusion network method of an image segmentation network and a feature extraction network, so that the image segmentation network is introduced into the feature extraction network, and according to different target marking grades, accurate retrieval of different degrees can be realized, targets and target parts are accurately positioned, and thus key part features are extracted for detailed feature comparison, and the features of the same target can be more effectively distinguished from other features.

The conception, the specific structure and the technical effects of the present invention will be further described with reference to the accompanying drawings to fully understand the objects, the features and the effects of the present invention.

Drawings

FIG. 1 is a framework flow diagram of the present invention.

Detailed Description

As shown in fig. 1, a method for extracting local features of an image based on an image segmentation technology includes the following steps:

In this embodiment, the feature extraction network of step 6 includes a model building part and a feature training part.

In this embodiment, the branch part of the feature extraction network feature extraction adopts residual error design, and the training of the feature extraction network is supervised together by inputting the output of the image segmentation network model into the residual error network of the feature extraction network, passing through a series of residual error layers, and finally using the triple loss function and the softmax loss function.

In this embodiment, the image input by the feature extraction network is an image obtained by extracting a position from an original image according to an output result of the segmentation network, and taking a rectangular area, the size of the rectangular area is set according to an actual search requirement, pixels at positions corresponding to positions of the original image are filled in pixel positions to which a required category belongs in the rectangular area, and all the rest positions are set to 0.

In this embodiment, when the feature extraction network is trained, an input image of the network is a triplet image, where the triplet includes two images of the same target and one image of a different target; and when the training is finished and the user uses the training, inputting an image of the feature extraction network.

In this embodiment, the feature extraction network input image may be a target image segmented from the entire scene, or may be a local image of the target image.

The general image retrieval process comprises three processes of image feature extraction, image storage and retrieval image feature comparison, and the method is mainly innovated aiming at the image feature extraction. The image characteristic part adopts a fusion network method of an image segmentation network and a characteristic extraction network, the image segmentation network is introduced into the characteristic extraction network, and accurate retrieval of different degrees can be realized according to different target labeling grades. When the label is only aimed at the target and the background, the extracted image feature is limited to the target area, and then a target feature of the image which is locally used for feature retrieval is generated. When the labeling is performed on the target per se with finer granularity (for example, for a human, the human marks the head, the trunk, the limbs, the carried articles, and the like), a retrieval feature formed by combining local features of the target is obtained. The retrieval performance in both cases is superior to the effect of retrieving using a single image alone.

The image object segmentation refers to the segmentation of an object from an image background at the pixel level in an image, and according to the difference of segmentation fine granularity, a part of the object can be separated from the object so as to achieve the purpose of extracting the local features of the object. The image segmentation method used in the invention is an image segmentation method using a deep neural network CNN, wherein the segmentation network generates and outputs an image with the same size as the input image according to the training result on a training set, and the pixels of the image correspond to the category information of each pixel point in the original image. When the classes in the training data only include two classes, namely a target class and a background class, the pixels of the output image only include two pixel values for distinguishing the foreground from the background. According to the method, the characteristics of the target part are taken out and put into the characteristic extraction branch according to the distinguished front and rear backgrounds, and the generated characteristics are the characteristics used for image retrieval.

Next, the following describes an outline flow of image segmentation. Firstly, an image segmentation model is required to be constructed, the segmentation model is composed of a plurality of network layers with different functions, and besides a common CNN model structure, different structures such as a batchnorm layer and a deconv layer are added. The overall model frame is shown in fig. 1.

The image firstly passes through a CNN network to obtain a multilayer characteristic diagram, and after a plurality of convolution layers and pooling (downsampling), the characteristic diagram and information corresponding to the position of an original pixel during downsampling of each characteristic diagram are obtained.

And then, the feature maps are subjected to upsampling, pixels are redistributed to original positions according to original position information of the pixels on each layer of feature map, and the generated feature map is consistent with the features of the original input image as far as possible.

Calculating softmax loss of each pixel position on a newly generated characteristic diagram, classifying the position of each pixel point, calculating loss (loss) aiming at each pixel position by comparing a manually marked image label with a result output by a network, and achieving the function of training parameters in a model by returning the loss. Each pixel in the output image of the model corresponds to one category, and the image at the corresponding position can be extracted according to the position of the required category according to the difference of the categories.

And continuously iterating the process until the returned loss value is smaller than the acceptance range, thereby completing the process of constructing and training the image segmentation model.

However, since the present invention is directed to the image retrieval task, the direct extraction of the features of the convolution layer or the pooling layer of the segmentation network is not enough to meet the requirement of the retrieval task, so a feature extraction network is introduced here. The network is used as a branch of the segmentation network, and after the training of the segmentation network is completed, the branch part is trained separately. The feature extraction network mainly extracts features of different parts of an original image according to an output image of a last segmentation network. The feature extraction network is also divided into a model construction part and a feature training part. First, the construction part of the model is introduced:

and a branch part of the feature extraction adopts residual error design, the residual error network of the feature extraction is input through the output of the feature graph generated by the network and the image segmentation network, and the training of the feature extraction network is jointly supervised by finally using a triple loss function and a softmax loss function through a series of residual error layers. The image input into the feature extraction network is an image which is obtained from an original image at a corresponding position according to an output result of the segmentation network, a rectangular area is obtained, the size of the rectangular area is set according to an actual retrieval requirement, pixels at the corresponding position of the original image are filled in the pixel position to which a required category belongs in the rectangular area, and all the rest positions are set to be 0. Triple loss and softmax characteristics-classification of a supervision characteristic and a supervision degree of the dispersion of different target characteristics, aiming at enabling the generated characteristics to have stronger resolving power and to be more sensitive to the characteristics of the same target.

In the training part, because the triple loss is needed, some processing needs to be performed on the organization of the input images, and 2 different images of the same object are combined with images of another different object and input into the network together. In the present invention, the input image may be a target image divided from the entire scene, or may be a partial image of the target image. The rest of the training process is similar to the training process of the split network.

In the output part of the feature extraction network, the features of the image part and the target part can be extracted according to the segmentation result by combining the output result of the previous image segmentation network, and the target local detail comparison is carried out.

After the feature map and the segmentation result are input into the feature extraction network, the form of feature extraction can be set according to actual project requirements, different local features can be obtained for different parts of the target according to the segmentation result, and partial features of the target can also be extracted integrally. When the entire target feature is extracted, two or more targets are allowed to exist in the input image. When the extraction of the target partial region feature is selected, the input image is allowed to have only one target. The local feature selection mode is more flexible according to different user selection and comparison methods.

The foregoing detailed description of the preferred embodiments of the invention has been presented. It should be understood that numerous modifications and variations could be devised by those skilled in the art in light of the present teachings without departing from the inventive concepts. Therefore, the technical solutions available to those skilled in the art through logic analysis, reasoning and limited experiments based on the prior art according to the concept of the present invention should be within the scope of protection defined by the claims.

Claims

1. A method for extracting local features of an image based on an image segmentation technology is characterized by comprising the following steps:

step 6, after the process of constructing and training the image segmentation model is completed, extracting the features of different parts of the original image according to the output image of the last segmentation network through the training feature extraction network, and completing the extraction of the local features of the image;

the image input by the feature extraction network is an image at a corresponding position from an original image according to an output result of a segmentation network, a rectangular area is taken, the size of the rectangular area is set according to an actual retrieval requirement, pixels at the corresponding position of the original image are filled in the pixel position to which a required category belongs in the rectangular area, and all the rest positions are set to be 0;

when the feature extraction network is trained, the input image of the network is a triple image, and the triple comprises two images of the same target and one image of a different target; after training is finished, when the user uses the training device, inputting an image of the feature extraction network;

the feature extraction network input image may be a target image segmented from the entire scene, or may be a partial image of the target image.

2. The method as claimed in claim 1, wherein the feature extraction network in step 6 includes a model construction part and a feature training part.

3. The method for extracting local features of an image based on an image segmentation technology as claimed in claim 2, characterized in that: the branch part of the feature extraction network feature extraction adopts residual error design, the residual error network of the feature extraction network is input through the output of the image segmentation network model, and the training of the feature extraction network is supervised together by finally using a triple loss function and a softmax loss function through a series of residual error layers.