CN108921850B - Image local feature extraction method based on image segmentation technology - Google Patents

Image local feature extraction method based on image segmentation technology Download PDF

Info

Publication number
CN108921850B
CN108921850B CN201810336591.8A CN201810336591A CN108921850B CN 108921850 B CN108921850 B CN 108921850B CN 201810336591 A CN201810336591 A CN 201810336591A CN 108921850 B CN108921850 B CN 108921850B
Authority
CN
China
Prior art keywords
image
network
feature extraction
training
segmentation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810336591.8A
Other languages
Chinese (zh)
Other versions
CN108921850A (en
Inventor
张雷
陈杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Boyun Vision Beijing Technology Co ltd
Original Assignee
Boyun Vision Beijing Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Boyun Vision Beijing Technology Co ltd filed Critical Boyun Vision Beijing Technology Co ltd
Priority to CN201810336591.8A priority Critical patent/CN108921850B/en
Publication of CN108921850A publication Critical patent/CN108921850A/en
Application granted granted Critical
Publication of CN108921850B publication Critical patent/CN108921850B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Image Analysis (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses an image local feature extraction method based on an image segmentation technology, which comprises the following steps: 1. constructing an image segmentation model; 2. inputting the image into a CNN network, obtaining a multilayer characteristic diagram, and obtaining the characteristic diagram and information of the position of an original pixel when each characteristic diagram is subjected to down-sampling after a plurality of convolution layers and pooling down-sampling; 3. the feature map is up-sampled through an up-sampling module, and pixels are re-distributed to original positions; 4. calculating the softmax loss for each pixel position on the newly generated feature map; 5. continuously iterating the process until the returned loss value is smaller than an acceptance range, and completing the process of constructing and training the image segmentation model; 6. and completing the extraction of the local features of the image through a training feature extraction network. The invention can realize accurate retrieval of different degrees and accurately position the target and the target part, thereby extracting the characteristics of the key part and carrying out detailed characteristic comparison.

Description

Image local feature extraction method based on image segmentation technology
Technical Field
The invention relates to the technical field of image processing, in particular to an image local feature extraction method based on an image segmentation technology.
Background
Image retrieval is a process of searching for similar images or the same target image using images. Accurate retrieval requires that the searched images are all images of the same object, not similar objects. The image retrieval process comprises the steps of extracting image features, warehousing the image features, comparing the features and sequencing output results according to similarity. In the market, many image retrieval methods exist, and the main innovation is focused on an image feature extraction method, a feature comparison method and a final output result sorting method. The method is characterized in that an image segmentation technology is utilized, a depth model is used for carrying out pixel-level segmentation on an image or a target, the position of the target (a pedestrian or a vehicle is positioned in the image) or the position of a target part (the trunk or the limbs of the person) is accurately positioned, corresponding extraction features are carried out aiming at the positions, whether the partial features are the same target or not can be finely distinguished through comparison of the partial features, compared with the global features, the local features contain more detail information, and the method is more beneficial to distinguishing the targets.
Most of the prior art feature extraction schemes are based on global features, and the extraction of local features at a pixel level is rarely performed by a little fine local features, and basically, the extraction of local features is performed by using an image gradient or a method of using a selected image block. In the image retrieval task of similar images, the characteristics can basically meet the task requirements, but when the retrieval task of the same target is searched in detail, the characteristics cannot well meet the task requirements.
In addition, in a search task, the search image generally requires that the input image is an image including only the target image, or that the input image is a full image (a image including various scenes and targets), and then the input image is detected once before the search is performed, so that the target image can be searched.
Disclosure of Invention
In view of the foregoing defects in the prior art, an object of the present invention is to provide a method for extracting local features of an image based on an image segmentation technique, so as to solve the deficiencies in the prior art.
In order to achieve the above object, the present invention provides an image local feature extraction method based on an image segmentation technology, which includes the following steps:
step 1, constructing an image segmentation model, wherein the segmentation model consists of a plurality of network layers with different functions and comprises a CNN model structure, a batchnorm layer structure and a deconv layer structure;
step 2, inputting the image into a CNN network to obtain a multilayer characteristic diagram, and obtaining the characteristic diagram and information of the position of an original pixel when each characteristic diagram is subjected to down-sampling after a plurality of convolution layers and pooling down-sampling;
step 3, the feature map is up-sampled through an up-sampling module, and pixels are re-distributed to original positions according to original position information of the pixels on each layer of feature map, so that the features of the generated feature map and the original input image are kept consistent;
step 4, calculating softmax loss of each pixel position on a newly generated characteristic diagram, classifying the position of each pixel point, calculating loss aiming at each pixel position by comparing a manually marked image label with a result output by a network, and training parameters in a model through return loss;
step 5, continuously iterating the process until the returned loss value is smaller than an acceptance range, and completing the process of constructing and training the image segmentation model;
and 6, after the construction and training processes of the image segmentation model are completed, extracting the features of different parts of the original image according to the output image of the last segmentation network through the training feature extraction network, and completing the extraction of the local features of the image.
In the above method for extracting local features of an image based on an image segmentation technique, the feature extraction network in step 6 includes a model building part and a feature training part.
In the method for extracting the local image features based on the image segmentation technology, the branch part for extracting the features of the feature extraction network adopts residual error design, the residual error network of the feature extraction network is input through the output of the image segmentation network model, and the training of the feature extraction network is jointly supervised by finally using the triple loss function and the softmax loss function through a series of residual error layers.
In the method for extracting local features of an image based on the image segmentation technology, the image input by the feature extraction network is an image at a corresponding position from an original image according to an output result of the segmentation network, a rectangular region is obtained, the size of the rectangular region is set according to an actual retrieval requirement, pixels at corresponding positions of the original image are filled in pixel positions to which required categories belong in the rectangular region, and all the rest positions are set to be 0.
In the method for extracting the local features of the image based on the image segmentation technology, when the feature extraction network is trained, an input image of the network is a triple image, and the triple comprises two images of the same target and one image of a different target; and when the training is finished and the user uses the training, inputting an image of the feature extraction network.
In the above method for extracting local features of an image based on an image segmentation technique, the network input image for feature extraction may be a target image segmented from the whole scene, or may be a local image of the target image.
The invention has the beneficial effects that:
according to the invention, an image segmentation technology is utilized, and the image feature part adopts a fusion network method of an image segmentation network and a feature extraction network, so that the image segmentation network is introduced into the feature extraction network, and according to different target marking grades, accurate retrieval of different degrees can be realized, targets and target parts are accurately positioned, and thus key part features are extracted for detailed feature comparison, and the features of the same target can be more effectively distinguished from other features.
The conception, the specific structure and the technical effects of the present invention will be further described with reference to the accompanying drawings to fully understand the objects, the features and the effects of the present invention.
Drawings
FIG. 1 is a framework flow diagram of the present invention.
Detailed Description
As shown in fig. 1, a method for extracting local features of an image based on an image segmentation technology includes the following steps:
step 1, constructing an image segmentation model, wherein the segmentation model consists of a plurality of network layers with different functions and comprises a CNN model structure, a batchnorm layer structure and a deconv layer structure;
step 2, inputting the image into a CNN network to obtain a multilayer characteristic diagram, and obtaining the characteristic diagram and information of the position of an original pixel when each characteristic diagram is subjected to down-sampling after a plurality of convolution layers and pooling down-sampling;
step 3, the feature map is up-sampled through an up-sampling module, and pixels are re-distributed to original positions according to original position information of the pixels on each layer of feature map, so that the features of the generated feature map and the original input image are kept consistent;
step 4, calculating softmax loss of each pixel position on a newly generated characteristic diagram, classifying the position of each pixel point, calculating loss aiming at each pixel position by comparing a manually marked image label with a result output by a network, and training parameters in a model through return loss;
step 5, continuously iterating the process until the returned loss value is smaller than an acceptance range, and completing the process of constructing and training the image segmentation model;
and 6, after the construction and training processes of the image segmentation model are completed, extracting the features of different parts of the original image according to the output image of the last segmentation network through the training feature extraction network, and completing the extraction of the local features of the image.
In this embodiment, the feature extraction network of step 6 includes a model building part and a feature training part.
In this embodiment, the branch part of the feature extraction network feature extraction adopts residual error design, and the training of the feature extraction network is supervised together by inputting the output of the image segmentation network model into the residual error network of the feature extraction network, passing through a series of residual error layers, and finally using the triple loss function and the softmax loss function.
In this embodiment, the image input by the feature extraction network is an image obtained by extracting a position from an original image according to an output result of the segmentation network, and taking a rectangular area, the size of the rectangular area is set according to an actual search requirement, pixels at positions corresponding to positions of the original image are filled in pixel positions to which a required category belongs in the rectangular area, and all the rest positions are set to 0.
In this embodiment, when the feature extraction network is trained, an input image of the network is a triplet image, where the triplet includes two images of the same target and one image of a different target; and when the training is finished and the user uses the training, inputting an image of the feature extraction network.
In this embodiment, the feature extraction network input image may be a target image segmented from the entire scene, or may be a local image of the target image.
The general image retrieval process comprises three processes of image feature extraction, image storage and retrieval image feature comparison, and the method is mainly innovated aiming at the image feature extraction. The image characteristic part adopts a fusion network method of an image segmentation network and a characteristic extraction network, the image segmentation network is introduced into the characteristic extraction network, and accurate retrieval of different degrees can be realized according to different target labeling grades. When the label is only aimed at the target and the background, the extracted image feature is limited to the target area, and then a target feature of the image which is locally used for feature retrieval is generated. When the labeling is performed on the target per se with finer granularity (for example, for a human, the human marks the head, the trunk, the limbs, the carried articles, and the like), a retrieval feature formed by combining local features of the target is obtained. The retrieval performance in both cases is superior to the effect of retrieving using a single image alone.
The image object segmentation refers to the segmentation of an object from an image background at the pixel level in an image, and according to the difference of segmentation fine granularity, a part of the object can be separated from the object so as to achieve the purpose of extracting the local features of the object. The image segmentation method used in the invention is an image segmentation method using a deep neural network CNN, wherein the segmentation network generates and outputs an image with the same size as the input image according to the training result on a training set, and the pixels of the image correspond to the category information of each pixel point in the original image. When the classes in the training data only include two classes, namely a target class and a background class, the pixels of the output image only include two pixel values for distinguishing the foreground from the background. According to the method, the characteristics of the target part are taken out and put into the characteristic extraction branch according to the distinguished front and rear backgrounds, and the generated characteristics are the characteristics used for image retrieval.
Next, the following describes an outline flow of image segmentation. Firstly, an image segmentation model is required to be constructed, the segmentation model is composed of a plurality of network layers with different functions, and besides a common CNN model structure, different structures such as a batchnorm layer and a deconv layer are added. The overall model frame is shown in fig. 1.
The image firstly passes through a CNN network to obtain a multilayer characteristic diagram, and after a plurality of convolution layers and pooling (downsampling), the characteristic diagram and information corresponding to the position of an original pixel during downsampling of each characteristic diagram are obtained.
And then, the feature maps are subjected to upsampling, pixels are redistributed to original positions according to original position information of the pixels on each layer of feature map, and the generated feature map is consistent with the features of the original input image as far as possible.
Calculating softmax loss of each pixel position on a newly generated characteristic diagram, classifying the position of each pixel point, calculating loss (loss) aiming at each pixel position by comparing a manually marked image label with a result output by a network, and achieving the function of training parameters in a model by returning the loss. Each pixel in the output image of the model corresponds to one category, and the image at the corresponding position can be extracted according to the position of the required category according to the difference of the categories.
And continuously iterating the process until the returned loss value is smaller than the acceptance range, thereby completing the process of constructing and training the image segmentation model.
However, since the present invention is directed to the image retrieval task, the direct extraction of the features of the convolution layer or the pooling layer of the segmentation network is not enough to meet the requirement of the retrieval task, so a feature extraction network is introduced here. The network is used as a branch of the segmentation network, and after the training of the segmentation network is completed, the branch part is trained separately. The feature extraction network mainly extracts features of different parts of an original image according to an output image of a last segmentation network. The feature extraction network is also divided into a model construction part and a feature training part. First, the construction part of the model is introduced:
and a branch part of the feature extraction adopts residual error design, the residual error network of the feature extraction is input through the output of the feature graph generated by the network and the image segmentation network, and the training of the feature extraction network is jointly supervised by finally using a triple loss function and a softmax loss function through a series of residual error layers. The image input into the feature extraction network is an image which is obtained from an original image at a corresponding position according to an output result of the segmentation network, a rectangular area is obtained, the size of the rectangular area is set according to an actual retrieval requirement, pixels at the corresponding position of the original image are filled in the pixel position to which a required category belongs in the rectangular area, and all the rest positions are set to be 0. Triple loss and softmax characteristics-classification of a supervision characteristic and a supervision degree of the dispersion of different target characteristics, aiming at enabling the generated characteristics to have stronger resolving power and to be more sensitive to the characteristics of the same target.
In the training part, because the triple loss is needed, some processing needs to be performed on the organization of the input images, and 2 different images of the same object are combined with images of another different object and input into the network together. In the present invention, the input image may be a target image divided from the entire scene, or may be a partial image of the target image. The rest of the training process is similar to the training process of the split network.
In the output part of the feature extraction network, the features of the image part and the target part can be extracted according to the segmentation result by combining the output result of the previous image segmentation network, and the target local detail comparison is carried out.
After the feature map and the segmentation result are input into the feature extraction network, the form of feature extraction can be set according to actual project requirements, different local features can be obtained for different parts of the target according to the segmentation result, and partial features of the target can also be extracted integrally. When the entire target feature is extracted, two or more targets are allowed to exist in the input image. When the extraction of the target partial region feature is selected, the input image is allowed to have only one target. The local feature selection mode is more flexible according to different user selection and comparison methods.
The foregoing detailed description of the preferred embodiments of the invention has been presented. It should be understood that numerous modifications and variations could be devised by those skilled in the art in light of the present teachings without departing from the inventive concepts. Therefore, the technical solutions available to those skilled in the art through logic analysis, reasoning and limited experiments based on the prior art according to the concept of the present invention should be within the scope of protection defined by the claims.

Claims (3)

1. A method for extracting local features of an image based on an image segmentation technology is characterized by comprising the following steps:
step 1, constructing an image segmentation model, wherein the segmentation model consists of a plurality of network layers with different functions and comprises a CNN model structure, a batchnorm layer structure and a deconv layer structure;
step 2, inputting the image into a CNN network to obtain a multilayer characteristic diagram, and obtaining the characteristic diagram and information of the position of an original pixel when each characteristic diagram is subjected to down-sampling after a plurality of convolution layers and pooling down-sampling;
step 3, the feature map is up-sampled through an up-sampling module, and pixels are re-distributed to original positions according to original position information of the pixels on each layer of feature map, so that the features of the generated feature map and the original input image are kept consistent;
step 4, calculating softmax loss of each pixel position on a newly generated characteristic diagram, classifying the position of each pixel point, calculating loss aiming at each pixel position by comparing a manually marked image label with a result output by a network, and training parameters in a model through return loss;
step 5, continuously iterating the process until the returned loss value is smaller than an acceptance range, and completing the process of constructing and training the image segmentation model;
step 6, after the process of constructing and training the image segmentation model is completed, extracting the features of different parts of the original image according to the output image of the last segmentation network through the training feature extraction network, and completing the extraction of the local features of the image;
the image input by the feature extraction network is an image at a corresponding position from an original image according to an output result of a segmentation network, a rectangular area is taken, the size of the rectangular area is set according to an actual retrieval requirement, pixels at the corresponding position of the original image are filled in the pixel position to which a required category belongs in the rectangular area, and all the rest positions are set to be 0;
when the feature extraction network is trained, the input image of the network is a triple image, and the triple comprises two images of the same target and one image of a different target; after training is finished, when the user uses the training device, inputting an image of the feature extraction network;
the feature extraction network input image may be a target image segmented from the entire scene, or may be a partial image of the target image.
2. The method as claimed in claim 1, wherein the feature extraction network in step 6 includes a model construction part and a feature training part.
3. The method for extracting local features of an image based on an image segmentation technology as claimed in claim 2, characterized in that: the branch part of the feature extraction network feature extraction adopts residual error design, the residual error network of the feature extraction network is input through the output of the image segmentation network model, and the training of the feature extraction network is supervised together by finally using a triple loss function and a softmax loss function through a series of residual error layers.
CN201810336591.8A 2018-04-16 2018-04-16 Image local feature extraction method based on image segmentation technology Active CN108921850B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810336591.8A CN108921850B (en) 2018-04-16 2018-04-16 Image local feature extraction method based on image segmentation technology

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810336591.8A CN108921850B (en) 2018-04-16 2018-04-16 Image local feature extraction method based on image segmentation technology

Publications (2)

Publication Number Publication Date
CN108921850A CN108921850A (en) 2018-11-30
CN108921850B true CN108921850B (en) 2022-05-17

Family

ID=64402935

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810336591.8A Active CN108921850B (en) 2018-04-16 2018-04-16 Image local feature extraction method based on image segmentation technology

Country Status (1)

Country Link
CN (1) CN108921850B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109685803B (en) * 2018-12-14 2020-10-23 深圳先进技术研究院 Left ventricle image segmentation method, device, equipment and storage medium
CN109960742B (en) * 2019-02-18 2021-11-05 苏州科达科技股份有限公司 Local information searching method and device
CN110458849B (en) * 2019-07-26 2023-04-25 山东大学 Image segmentation method based on feature correction
CN113255760A (en) * 2021-05-20 2021-08-13 推想医疗科技股份有限公司 Method for training image processing model, method and device for image processing
CN115661449B (en) * 2022-09-22 2023-11-21 北京百度网讯科技有限公司 Image segmentation and training method and device for image segmentation model

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106250812A (en) * 2016-07-15 2016-12-21 汤平 A kind of model recognizing method based on quick R CNN deep neural network
CN106384100A (en) * 2016-09-28 2017-02-08 武汉大学 Component-based fine vehicle model recognition method
CN106980641A (en) * 2017-02-09 2017-07-25 上海交通大学 The quick picture retrieval system of unsupervised Hash and method based on convolutional neural networks
CN107016681A (en) * 2017-03-29 2017-08-04 浙江师范大学 Brain MRI lesion segmentation approach based on full convolutional network
CN107203999A (en) * 2017-04-28 2017-09-26 北京航空航天大学 A kind of skin lens image automatic division method based on full convolutional neural networks
CN107729818A (en) * 2017-09-21 2018-02-23 北京航空航天大学 A kind of multiple features fusion vehicle recognition methods again based on deep learning
CN107784282A (en) * 2017-10-24 2018-03-09 北京旷视科技有限公司 The recognition methods of object properties, apparatus and system

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106649487B (en) * 2016-10-09 2020-02-18 苏州大学 Image retrieval method based on interest target

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106250812A (en) * 2016-07-15 2016-12-21 汤平 A kind of model recognizing method based on quick R CNN deep neural network
CN106384100A (en) * 2016-09-28 2017-02-08 武汉大学 Component-based fine vehicle model recognition method
CN106980641A (en) * 2017-02-09 2017-07-25 上海交通大学 The quick picture retrieval system of unsupervised Hash and method based on convolutional neural networks
CN107016681A (en) * 2017-03-29 2017-08-04 浙江师范大学 Brain MRI lesion segmentation approach based on full convolutional network
CN107203999A (en) * 2017-04-28 2017-09-26 北京航空航天大学 A kind of skin lens image automatic division method based on full convolutional neural networks
CN107729818A (en) * 2017-09-21 2018-02-23 北京航空航天大学 A kind of multiple features fusion vehicle recognition methods again based on deep learning
CN107784282A (en) * 2017-10-24 2018-03-09 北京旷视科技有限公司 The recognition methods of object properties, apparatus and system

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
基于卷积神经网络的图像语义分割;陈鸿翔;《中国优秀硕士学位论文全文数据库 信息科技辑》;20160715(第07期);I138-1091 *
基于深度学习的小面积指纹匹配方法;张永良;《计算机应用》;20171110;第37卷(第11期);第3213-3215页 *

Also Published As

Publication number Publication date
CN108921850A (en) 2018-11-30

Similar Documents

Publication Publication Date Title
CN108921850B (en) Image local feature extraction method based on image segmentation technology
CN108764065B (en) Pedestrian re-recognition feature fusion aided learning method
Yang et al. Real-time face detection based on YOLO
CN107341517B (en) Multi-scale small object detection method based on deep learning inter-level feature fusion
CN110084850B (en) Dynamic scene visual positioning method based on image semantic segmentation
CN107833213B (en) Weak supervision object detection method based on false-true value self-adaptive method
CN109741331B (en) Image foreground object segmentation method
US20210264144A1 (en) Human pose analysis system and method
CN109389057B (en) Object detection method based on multi-scale advanced semantic fusion network
CN109784197B (en) Pedestrian re-identification method based on hole convolution and attention mechanics learning mechanism
CN106126581A (en) Cartographical sketching image search method based on degree of depth study
KR102190527B1 (en) Apparatus and method for automatic synthesizing images
CN113160062B (en) Infrared image target detection method, device, equipment and storage medium
Yang et al. Real-time pedestrian and vehicle detection for autonomous driving
CN112560675B (en) Bird visual target detection method combining YOLO and rotation-fusion strategy
CN110781882A (en) License plate positioning and identifying method based on YOLO model
CN112950477A (en) High-resolution saliency target detection method based on dual-path processing
Zang et al. Traffic lane detection using fully convolutional neural network
CN104050460B (en) The pedestrian detection method of multiple features fusion
CN113269224A (en) Scene image classification method, system and storage medium
CN116721301B (en) Training method, classifying method, device and storage medium for target scene classifying model
CN113850136A (en) Yolov5 and BCNN-based vehicle orientation identification method and system
CN108717436B (en) Commodity target rapid retrieval method based on significance detection
CN116597267B (en) Image recognition method, device, computer equipment and storage medium
CN114743045B (en) Small sample target detection method based on double-branch area suggestion network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant