CN111627033B

CN111627033B - Method, equipment and computer readable storage medium for dividing difficult sample instance

Info

Publication number: CN111627033B
Application number: CN202010480111.2A
Authority: CN
Inventors: 薛均晓; 程君进; 徐明亮; 吕培
Original assignee: Zhengzhou University
Current assignee: Zhengzhou University
Priority date: 2020-05-30
Filing date: 2020-05-30
Publication date: 2023-10-20
Anticipated expiration: 2040-05-30
Also published as: CN111627033A

Abstract

The invention discloses a method, equipment and a computer readable storage medium for dividing a difficult sample instance, which comprise the following steps: preprocessing an image, namely, easily distinguishing the foreground and the background of the image; the difficult sample segmentation of the image is used for distinguishing positive samples from negative samples in the image; the convolution training of the image is used for example segmentation of difficult samples in the image. The invention preprocesses the original image, so that the foreground and the background of the image are easier to distinguish, and the boundary is clearer. And (3) carrying out difficult sample segmentation on the preprocessed image, and improving the recognition accuracy in the convolution training. The convolutional training can be performed in accordance with large-scale training samples, autonomous learning can be performed, a large number of training samples are favorable for activating deep network neurons, and states of the target object under different colors, forms and environments are memorized and analyzed.

Description

Method, equipment and computer readable storage medium for dividing difficult sample instance

Technical Field

The present invention relates to the field of computer vision and graphic image processing, and more particularly, to a method, apparatus, and computer readable storage medium for partitioning a difficult sample instance.

Background

The precise instance segmentation is carried out on the image dataset, and the method plays a vital role in the artificial intelligence fields of unmanned driving, robots, virtual reality and the like.

In the current example segmentation method, it can be divided into two classes: instance segmentation algorithm based on traditional method and instance segmentation algorithm based on deep learning.

Instance segmentation is achieved based on a traditional method: the conventional methods have region-based segmentation methods such as a method of dividing an image into small image blocks having the same property and a method of dividing an image using a plurality of color spaces. There is also a Markov Random Field (MRF) model, which is a method for segmenting an image by using an indirect estimated random process generated by MFR, and a Conditional Random Field (CRF) model method is also often used as a post-processing module of a semantic segmentation algorithm based on deep learning to refine and segment the image. But the conventional example segmentation method has lower segmentation accuracy.

The mask rcnn is used for replacing the RoI Pooling layer with the RoI Align layer on the basis of the master rcnn, and a mask branch is added, so that the evolution from target detection, target classification to the realization of an instance segmentation technology is realized, the ROI Align technology does not use quantization operation, more accurate feature map information can be obtained by using a bilinear interpolation method, and errors caused by quantization operation performed when a feature map is acquired are reduced. The simple calculation of the ROI alignment is carried out by calculating the calculated pixel points and other pixel points in a matrix range of the convolution kernel size of the periphery of the pixel points, so that the receptive field is rectangular all the time no matter how deep the network is, but the shapes of a plurality of objects can be changed in reality, and the adaptability of the simple calculation of the ROI alignment is low.

Disclosure of Invention

The invention mainly solves the technical problems of low segmentation precision and low adaptability, and provides a method, equipment and a computer readable storage medium for segmenting a difficult sample instance, which have high segmentation precision and wide application range.

In order to solve the technical problems, the invention adopts a technical scheme that: a method of hard sample instance segmentation, comprising:

preprocessing an image, namely, easily distinguishing the foreground and the background of the image;

the difficult sample segmentation of the image is used for distinguishing positive samples from negative samples in the image;

the convolution training of the image is used for example segmentation of difficult samples in the image.

In another embodiment of the difficult sample instance segmentation method of the present invention, the image data is preprocessed using sharpening and clustering methods.

In another embodiment of the difficult sample instance segmentation method of the present invention, sharpening is laplace sharpening and clustering is K-means clustering.

In another embodiment of the method for partitioning a difficult sample of the present invention, the method for partitioning a difficult sample of an image includes:

classifying the images, namely classifying the preprocessed images by using a classifier;

calculating the IOU value of the image, calculating the IOU value of the classified image, and setting an IOU threshold;

comparing the IOU value with a set IOU threshold value, and outputting an image with the IOU value larger than the IOU threshold value.

In another embodiment of the difficult sample instance segmentation method of the present invention, the IOU threshold is 0.5.

In another embodiment of the method for partitioning the difficult sample instance, the IOU value is compared with the set IOU threshold, the image with the IOU value smaller than the IOU threshold is input into the classifier again for classification, the IOU value is recalculated and compared with the IOU threshold, the image with the IOU value larger than the IOU threshold is output, the image with the IOU value smaller than the IOU threshold is repeatedly classified, the IOU value is calculated and compared with the IOU threshold.

In another embodiment of the difficult sample instance segmentation method of the present invention, the image is convolutionally trained by a convolutional neural network.

In another embodiment of the difficult sample instance segmentation method of the present invention, a convolutional neural network is included Deformable ROI Align.

A difficult sample instance segmentation apparatus comprising:

the image preprocessing module is used for enabling the foreground and the background of the image to be easily distinguished;

the difficult sample segmentation module of the image is used for distinguishing positive samples from negative samples in the image;

and the convolution training module of the image is used for dividing the instance of the difficult sample in the image.

A computer readable storage medium for hard sample instance segmentation, the computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of a hard sample instance segmentation method.

The beneficial effects of the invention are as follows: the original image is preprocessed, so that the foreground and the background of the image are easier to distinguish, and the boundary is clearer. And (3) carrying out difficult sample segmentation on the preprocessed image, and improving the recognition accuracy in the convolution training. The convolutional training can be performed in accordance with large-scale training samples, autonomous learning can be performed, a large number of training samples are favorable for activating deep network neurons, and states of the target object under different colors, forms and environments are memorized and analyzed.

Drawings

FIG. 1 is a flow chart of one embodiment of a method of hard sample instance segmentation according to the present invention;

FIG. 2 is a sharpening flow diagram of one embodiment of a difficult sample instance segmentation method according to the present disclosure;

FIG. 3 is a clustering flow chart of one embodiment of a difficult sample instance segmentation method according to the present invention;

FIG. 4 is a flow chart of a difficult sample segmentation process according to one embodiment of the difficult sample instance segmentation method of the present invention;

FIG. 5 is a convolutional neural network composition diagram of one embodiment of a difficult sample instance segmentation method in accordance with the present invention.

FIG. 6 is a diagram of mask head and mask iou head compositions according to one embodiment of the difficult sample instance partitioning method of the invention.

FIG. 7 is a schematic diagram of a structure of an embodiment of a difficult sample instance segmentation apparatus according to the present invention;

fig. 8 is a schematic diagram of an embodiment of a computer-readable storage medium according to the present invention.

Detailed Description

In order that the invention may be readily understood, a more particular description thereof will be rendered by reference to specific embodiments that are illustrated in the appended drawings. Preferred embodiments of the present invention are shown in the drawings. This invention may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. The term "and/or" as used in this specification includes any and all combinations of one or more of the associated listed items.

In order to solve the problems of low segmentation precision and low adaptability, the invention provides a difficult sample instance segmentation method with high segmentation precision and wide application range.

Referring to fig. 1, a method for partitioning a difficult sample instance includes:

s10, preprocessing an image, wherein the preprocessing is used for enabling the foreground and the background of the image to be easily distinguished;

s20, dividing a difficult sample of the image, and distinguishing a positive sample from a negative sample in the image;

s30, performing convolution training on the image, and dividing the instance of the difficult sample in the image.

As in fig. 2, the image data is preprocessed using sharpening and clustering methods. Preferably, sharpening is Laplacian sharpening, and clustering is K-means clustering.

When the foreground and background of the image are unclear or lines of objects in the picture are not obvious, the line feeling of the foreground and background and the example boundaries can be enhanced through sharpening, and when the example is segmented, the boundaries can be marked clearly, so that a clearer sharpened image is obtained.

The Laplace sharpening comprises the following steps:

s101, carrying out Laplace transformation on the original image to obtain a transformed image, and highlighting small details in the image.

S102, carrying out gradient transformation on the original image to obtain a gradient image, and highlighting the edge of the original image.

S103, performing smoothing operation on the gradient image by using a 5x5 mean filter to obtain a smoothed image, thereby achieving the effect of noise reduction.

S104, masking the transformed image by using the smooth image to obtain a masked image.

S105, expanding the gray scale range of the image, and performing power rate conversion processing on the masking image to obtain a sharpened image.

As shown in FIG. 3, the clustering operation can effectively solve the problem of classification errors, perform the clustering operation, and label the pictures with common characteristic information in category, wherein the pictures are provided with more accurate labels, and when the convolutional neural network enters the convolutional training, the convolutional neural network can learn the category labels, so that the samples with wrong classification can be reduced, and the workload of difficult sample segmentation can be reduced.

The sharpened image is segmented by using K-means clustering, wherein the K-means clustering comprises the following steps:

s106, determining K initial cluster centers required by K-means clustering by using kernel density estimation.

The kernel density estimation is a modeling method of non-parameter estimation, and can directly estimate a probability density function from continuous variation values under image pixels without presupposing pixel value distribution, so as to obtain a smooth estimation curve. We choose gaussian as kernel function and choose the pixel gray value of m lattice points with equal probability as observation value.

Gaussian kernel density estimation function relation:

wherein y is used _j M lattice points are selected in the image with medium probability, and the window width is h.

j＝1,2,3...m,m＝256

Wherein I is _xy The image of the x-th row and the y-th columnGray value of pixel, y _j Is the j-th lattice gray value, max (I _xy ) Is the maximum value of the image pixel, min (I _xy ) Is the minimum gray level of the image pixels, h is the window width, STD is the standard deviation of the image pixels, IQR is the quarter-bit difference of the image pixels, and n is the number of pixels of the scanned image.

S107, assigning each point to the centroid nearest to, forming K clusters,

s108, calculating the mass center of each cluster again,

s109, judging whether the centroid changes.

And S1010, if the centroid changes, returning to S107, and continuing to operate until the centroid does not change any more, and outputting the preprocessed image.

As shown in fig. 4, by adopting difficult sample segmentation, errors in judging positive samples as negative samples are reduced in classification, the number of the positive samples is increased, and a convolutional neural network obtains more effective information in training and learning, so that the problems of low recall rate of target detection and unbalanced samples can be solved. And the identification precision of the classification network to the target is improved.

The difficult sample segmentation of the preprocessed image includes:

s201, classifying images, namely classifying the preprocessed images by using a classifier;

the classifier is part of a convolutional neural network and classifies the examples in the preprocessed image according to information learned from the preprocessed image by the convolutional neural network.

S202, in the classifier, the preprocessed image is processed, the IOU value of the preprocessed image is calculated, and the IOU threshold value is set to be 0.5.

IOU value: the IOU value is often used as a standard for target detection judgment, and the detection accuracy is judged. When detecting the target, there is a basic operation that the target to be marked is framed out through a frame, and when detecting, the target frame is also subjected to the following calculation mode, wherein the calculation mode is expressed by the following formula:

DetectionResult represents the result obtained through the neural network and groundtrunk represents the annotated result.

S203, comparing the IOU value of the preprocessed image with a set IOU threshold, and outputting the preprocessed image with the IOU value larger than the IOU threshold as a difficult sample segmentation image.

Inputting the image with the IOU value smaller than the IOU threshold value of the preprocessed image into the classifier again for classification, recalculating the IOU value of the preprocessed image and comparing the IOU threshold value, outputting the image with the IOU value larger than the IOU threshold value of the preprocessed image, repeatedly classifying the image with the IOU value smaller than the IOU threshold value, calculating the IOU value and comparing the IOU threshold value, and repeatedly performing iterative calculation until the IOU value of the preprocessed image is larger than the set IOU threshold value.

S204, outputting the preprocessed image with the IOU value larger than the IOU threshold value as a segmented image, and inputting the segmented image into a convolutional neural network.

Preferably, the image is convolutionally trained by a convolutional neural network.

Fig. 5, a convolutional neural network is included Deformable ROI Align. Conv1, conv2, conv3, conv4, RPN, deformable ROI Align, conv5, conv6, conv7, mask head, mask iou head are included in the convolutional neural network. conv1, conv2, conv3, conv4 comprise a convolutional layer, a pooling layer, an activation layer, and conv5 comprises a fully connected layer and an activation layer; conv6 and conv7 are full connection layers.

The preprocessing of the difficult sample segmentation can be performed in a convolutional neural network, the steps of the difficult sample segmentation are completed through conv1, conv2 and conv3 in the convolutional neural network, segmented image output is obtained, the segmented image is input into conv4 for feature extraction again, a generated first feature image is input into an RPN layer and a Deformable ROI Align layer. The first feature image is input to the RPN layer, the region pro-pos is output by the RPN layer, the first feature image and the region pro-pos are input to the Deformable ROI Align layer, the second feature image is output by the Deformable ROI Align layer, the second feature image is input to the conv5, the conv5 outputs a convolution kernel, the convolution kernel outputs a classification probability through the conv6, the convolution kernel outputs a segmentation box through the conv7, the convolution kernel is input to the mask head and then to the mask head, the mask image is output by the mask iol head, and the mask image is combined with the classification and segmentation box to form a segmented image with a final segmentation result.

In the RPN layer, the characteristic image passes through a convolution layer and an activation layer to obtain a plurality of Anchor boxes, two full convolutions are adopted, wherein one full convolution carries out two classification on the 9 Anchor boxes pixel by pixel, the 9 Anchor boxes are cut and filtered through the convolution layer, and then the Anchor boxes are judged to belong to the foreground or the background, namely the object or not; then by reshape, region proposal is obtained.

Meanwhile, the other roll layer obtains four coordinate offset information of 9 Anchor boxes pixel by pixel, and corrects the Anchor boxes to generate a region proposal. The overlapping frames can be further removed, rough screening is carried out on the Anchor boxes, and the first n Anchor boxes are selected, so that n regions are only needed, and n is preferably 500 when the region regions enter Deformable ROI Align, the calculation workload can be reduced, and the calculation speed is improved.

At layer Deformable ROI Align, the region proposal generated in the RPN layer and the first feature image are input to layer Deformable ROI Align, and the region proposal is mapped onto the first feature image by Deformable ROI Align to obtain a second feature image. When mapping, the bilinear interpolation method is used to obtain the image value on the pixel point with the coordinates of the floating point number, so that the whole characteristic aggregation process is converted into a continuous operation, and the learned offset is put into calculation at the same time of calculation.

The second feature image is input into conv5, the convolution kernel is output by conv5,

the convolution kernel outputs a classification through conv6, calculates which category (e.g., person, horse, car, etc.) each region proposal specifically belongs to,

the convolution kernel outputs a sounding box through conv7 to obtain the position offset of each region proposal, and the position offset is used for obtaining a more accurate target detection frame through regression;

as shown in FIG. 6, the convolution kernel is input to the mask head 40, and the mask head 40 includes 5 convolution layers (C401, C402, C403, C404, C405) and is implemented by the mask head to generate a mask.

The mask generated by the mask head and the second feature image generated by the mask head shown in fig. 5 and 6,Deformable ROI Align is input to a mask iou head, the mask iou head includes 4 convolution layers (C501, C502, C503, C504) and three full connection layers (FC 505, FC506, FC 507), and a mask image is output from the mask iou head and combined with a classification and a segmentation box to form a segmentation image with a segmentation result.

Referring to fig. 7, a difficult sample instance segmentation apparatus 60 includes:

an image preprocessing module 601, configured to make the foreground and the background of the image easily distinguishable;

a difficult sample segmentation module 602 of the image for distinguishing positive and negative samples in the image;

the convolution training module 603 of the image is used for example segmentation of difficult samples in the image.

Referring to fig. 8, a computer readable storage medium 70 for hard sample instance segmentation has stored thereon a computer program 701, which when executed by a processor, implements the steps of a hard sample instance segmentation method.

It should be understood that, although the steps in the flowcharts of the figures are shown in order as indicated by the arrows, these steps are not necessarily performed in order as indicated by the arrows. The steps are not strictly limited in order and may be performed in other orders, unless explicitly stated herein. Moreover, at least some of the steps in the flowcharts of the figures may include a plurality of sub-steps or stages that are not necessarily performed at the same time, but may be performed at different times, the order of their execution not necessarily being sequential, but may be performed in turn or alternately with other steps or at least a portion of the other steps or stages.

The foregoing description is only illustrative of the present invention and is not intended to limit the scope of the invention, and all equivalent structural changes made by the present invention and the accompanying drawings, or direct or indirect application in other related technical fields, are included in the scope of the present invention.

Claims

1. A method for partitioning a difficult sample instance, comprising:

the convolution training of the image is used for example segmentation of difficult samples in the image;

performing convolution training on the image through a convolution neural network, wherein the convolution neural network comprises conv1, conv2, conv3, conv4, RPN, deformable ROIAlign, conv5, conv6, conv7, mask head and mask iou head; conv1, conv2, conv3, conv4 comprise a convolutional layer, a pooling layer, an activation layer, and conv5 comprises a fully connected layer and an activation layer; conv6 and conv7 are full connection layers;

completing the step of difficult sample segmentation through conv1, conv2 and conv3 in a convolutional neural network, obtaining segmented image output, inputting the segmented image into conv4 for feature extraction again, generating a first feature image, and inputting the first feature image into an RPN layer and a Deformable ROI Align layer; the first characteristic image is input to an RPN layer, the RPN layer outputs a region proposal, the first characteristic image and the region proposal are input to a Deformable ROI Align layer, the Deformable ROI Align layer outputs a second characteristic image, the second characteristic image is input to conv5, the conv5 outputs a convolution kernel, the convolution kernel outputs a classification result through conv6, the convolution kernel outputs a segmentation box through conv7, the convolution kernel is input to a mask head and then input to a mask iou head, the mask iou head outputs a mask image, and the mask image is combined with the classification result and the segmentation box to form a segmentation image with a segmentation result.

2. The difficult sample instance segmentation method according to claim 1, wherein: the image data is preprocessed by sharpening and clustering methods.

3. The difficult sample instance segmentation method according to claim 2, wherein: the sharpening is Laplacian sharpening, and the clustering is K-means clustering.

4. The difficult sample instance segmentation method according to claim 1, wherein: the difficult sample segmentation of the image comprises:

5. The difficult sample instance segmentation method according to claim 4, wherein: the IOU threshold is 0.5.

6. The difficult sample instance segmentation method according to claim 4, wherein: and comparing the IOU value with the set IOU threshold, re-inputting the image with the IOU value smaller than the IOU threshold into the classifier for classification, re-calculating the IOU value and comparing the IOU threshold, outputting the image with the IOU value larger than the IOU threshold, and repeatedly classifying the image with the IOU value smaller than the IOU threshold, calculating the IOU value and comparing the IOU threshold.

7. A difficult sample instance segmentation apparatus, comprising:

the convolution training module of the image is used for dividing the instance of the difficult sample in the image;

8. A computer-readable storage medium for hard sample instance segmentation, characterized in that: the computer readable storage medium has stored thereon a computer program which, when executed by a processor, implements the steps of the difficult sample instance segmentation method according to any one of claims 1 to 6.