CN110287777B - Golden monkey body segmentation algorithm in natural scene - Google Patents

Golden monkey body segmentation algorithm in natural scene Download PDF

Info

Publication number
CN110287777B
CN110287777B CN201910405596.6A CN201910405596A CN110287777B CN 110287777 B CN110287777 B CN 110287777B CN 201910405596 A CN201910405596 A CN 201910405596A CN 110287777 B CN110287777 B CN 110287777B
Authority
CN
China
Prior art keywords
layer
convolution
output
confidence map
feature fusion
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201910405596.6A
Other languages
Chinese (zh)
Other versions
CN110287777A (en
Inventor
许鹏飞
王妍
郭松涛
李朋喜
常晓军
郭凌
何刚
陈�峰
郭军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Northwestern University
Original Assignee
Northwestern University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Northwestern University filed Critical Northwestern University
Priority to CN201910405596.6A priority Critical patent/CN110287777B/en
Publication of CN110287777A publication Critical patent/CN110287777A/en
Application granted granted Critical
Publication of CN110287777B publication Critical patent/CN110287777B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • G06V10/267Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/103Static body considered as a whole, e.g. static pedestrian or occupant recognition

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Human Computer Interaction (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a golden monkey body segmentation algorithm in a natural scene, which comprises the steps of constructing a semantic segmentation network and realizing end-to-end image segmentation; training the semantic segmentation network, and storing the trained network model for segmentation detection of the image to be segmented; the semantic segmentation network comprises a classification network, a fusion part and an output part, wherein the classification network is used for carrying out feature extraction on an input image and carrying out pixel-level classification to obtain a confidence map; the two feature fusion layers are respectively fused with the segmentation networks of the two and three layers at the bottom, so that the identification accuracy is improved, and cross-layer connection is realized; the output section finally outputs a high-resolution thermodynamic-like map in accordance with the original image size. The invention effectively improves the detection accuracy and can better solve the problem that the body of the golden monkey is divided into a plurality of parts.

Description

Golden monkey body segmentation algorithm in natural scene
Technical Field
The invention relates to the technical field of image segmentation and image object identification and positioning, in particular to a golden monkey body segmentation algorithm in a natural scene.
Background
In the research and protection work of the wild golden monkey, the golden monkey is accurately distinguished from the environmental background through natural images and videos, and the method is the basis of a subsequent individual re-identification task, so that data support is provided for the research of the quantity, behaviors and the like of golden monkey groups. At present, a plurality of researchers at home and abroad propose various animal detection and recognition algorithms based on facial features.
The Du provides an automatic positioning method based on the facial features of the rhesus monkey by utilizing image segmentation and mathematical morphology. The traditional image processing method is utilized to realize the automatic positioning of the organs such as the eyes, the mouths and the like of the macaque, and the edge detection operators are utilized to extract the detailed outlines of partial organs.
However, these methods do not yield very good results due to image variations. Pengfei Xu proposes a golden monkey face detection algorithm based on regional color quantization and incremental adaptive course learning, which can effectively detect monkey faces of monkeys, but the segmentation threshold of the isplc is limited by the kind or habitat of the golden monkeys.
Disclosure of Invention
The invention aims to solve the technical problem of providing a golden monkey body segmentation algorithm which can meet the requirements of a golden monkey individual re-identification task on natural scene data processing.
In order to realize the task, the invention adopts the following technical scheme:
a golden monkey body segmentation algorithm under a natural scene comprises the following steps:
constructing a semantic segmentation network to realize end-to-end image segmentation; training the semantic segmentation network, and storing the trained network model for segmentation detection of the image to be segmented;
the semantic segmentation network comprises a classification network, a fusion part and an output part, wherein:
the semantic segmentation network sequentially comprises a first convolution layer, a first maximum pooling layer, a second convolution layer c, a second maximum pooling layer, a third convolution layer, a third maximum pooling layer, a fourth convolution layer, a fourth maximum pooling layer, a fifth convolution layer, a fifth maximum pooling layer, a sixth convolution layer and a seventh convolution layer from front to back; each layer of the first convolution layer to the fifth convolution layer comprises two continuous convolution calculations, the sixth convolution layer comprises one convolution calculation, and the function is to perform feature extraction processing on an input image to obtain a feature map; the seventh convolution layer comprises a primary convolution calculation and a classification activation function, and is used for performing feature extraction processing and pixel-level classification to obtain a confidence map; the role of the first through fifth max pooling layers is to reduce the data dimension without losing features;
the fusion part comprises a first characteristic fusion layer and a second characteristic fusion layer, and the first characteristic fusion layer performs characteristic fusion with the output of the fifth maximum pooling layer after up-sampling the output of the seventh convolution layer; the second feature fusion layer performs feature fusion on the output result of the first feature fusion layer and the features extracted by the fourth maximum pooling layer to obtain a fused confidence map;
the output part comprises an output layer, the output layer comprises an up-sampling layer and a classification layer, wherein the input of the up-sampling layer is the output of the second feature fusion layer, and the function is to expand the fused confidence map to the size of the original input image; the classification layer is used for performing classification prediction on each pixel to finally obtain a high-resolution class thermodynamic diagram consistent with the size of the original input image.
Further, the convolution kernel size of each of the first convolution layer to the fifth convolution layer is 2 × 2, and the step length is 2; the convolution kernel size of the sixth convolution layer is 7 × 7, and the step size is 1; the convolution kernel size of the seventh convolution layer is 1 × 1 with a step size of 1.
Further, the pooling kernel size of each of the first largest pooling layer to the fifth largest pooling layer is 2 × 2, and the step size is 2.
Further, the first feature fusion layer performs feature fusion with the output of the fifth largest pooling layer after upsampling the output of the seventh convolutional layer, and includes:
the first feature fusion layer comprises an up-sampling layer and a convolution layer, the input of the up-sampling layer is the output of the seventh convolution layer, the up-sampling multiple is 2 times, and the function is to expand pixels of the confidence map so as to facilitate feature fusion and obtain a confidence map A2 with expanded dimensions; the input of the convolutional layer is the output of the fifth maximum pooling layer, the size of the convolutional kernel is 1 multiplied by 1, the step length is 1, the activation function is a Relu function, and a confidence map B of the output of the fifth maximum pooling layer is obtained; the final output of the first feature fusion layer is the sum of the confidence map B and the confidence map a2, denoted as confidence map C.
Further, the second feature fusion layer performs feature fusion on the output result of the first feature fusion layer and the features extracted by the fourth largest pooling layer to obtain a fused confidence map, which includes:
the second feature fusion layer comprises an up-sampling layer and a convolution layer, the input of the up-sampling layer is a segmentation result graph C, the up-sampling multiple is 2 times, the function is to expand pixels of the confidence graph so as to facilitate feature fusion, and a dimension expanded confidence graph C2 is obtained; the input of the convolution layer is the output of the fourth maximum pooling layer, the size of the convolution kernel is 1 multiplied by 1, the step length is 1, the activation function is a Relu function, and a confidence map D of the output of the fourth maximum pooling layer is obtained; the final output of the second feature fusion layer is the sum of the confidence map D and the confidence map C2, which is denoted as the confidence map E, i.e. the fused confidence map.
Further, when the semantic segmentation network is trained, the loss function adopted is as follows:
Figure GDA0002918802590000031
wherein, y(i,j)Representing the label value at the actually classified image pixel point (i, j) corresponding to the input image,
Figure GDA0002918802590000032
representing the predicted value of the pixel point (i, j) of the output image after the input image is processed by the semantic segmentation network, height and width respectively representing the height and width of the image, dw(i,j)The weight function is constrained for the distance, expressed as:
Figure GDA0002918802590000033
wherein distance (C)(i,j),enter(i,j)) Representing a pixel point I(i,j)Center away from the connected domain where it is located(i,j)α and β are two constants.
The invention has the following technical characteristics:
1. the algorithm achieves the purpose of golden monkey target detection by semantically segmenting an original image, mainly performs golden monkey and natural environment segmentation by a full convolution network in deep learning, and focuses an FCN model on the integrity of a golden monkey individual by using a loss function based on distance weight, namely an improved cross entropy loss function, so that the final detection accuracy is improved.
2. Through experimental contrastive analysis, the problem that the body of the golden monkey is divided into a plurality of parts can be better solved by using the improved loss function, and the improved natural scene golden monkey detection network can better improve the image segmentation result of the previous problems.
Drawings
FIG. 1 is a diagram of a semantic segmentation network architecture;
FIG. 2 is a feature fusion graph;
FIG. 3 illustrates IoU a calculation method;
FIG. 4 is an exemplary diagram of a semantic segmentation network generating correct detection results, wherein (a) represents the segmentation results of the original image, and (b) is a rectangular frame in which rectangular golden monkey individual detection results are generated based on the result image;
FIG. 5 is an exemplary diagram of error detection results generated by the semantic segmentation network; wherein (a) and (c) show the condition that the lower edge of the golden monkey is not completely covered by the rectangular detection result, and (b) show the condition that the single golden monkey is divided into two detection frames due to the error of the division result;
fig. 6 is a comparison of detection results of the semantic segmentation network after training by using the improved DWL function, wherein (a), (b), and (c) are respectively a diagram of detection results of three golden monkey individuals.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, embodiments of the present invention will be described in detail with reference to the accompanying drawings. The golden monkey body segmentation in the scheme refers to detecting and segmenting an image part where a golden monkey body is located from an image containing the golden monkeys.
Step 1, constructing a semantic segmentation network to realize end-to-end image segmentation
The semantic segmentation network comprises a classification network, a fusion part and an output part, wherein:
first part, the classification network
The classification network is composed of 7 convolutional layers and 5 pooling layers, and the arrangement of each layer from front to back of the network is as follows: a first rolling layer conv1, a first maximum pooling layer pool1, a second rolling layer conv2, a second maximum pooling layer pool2, a third rolling layer conv3, a third maximum pooling layer pool3, a fourth rolling layer conv4, a fourth maximum pooling layer pool4, a fifth rolling layer conv5, a fifth maximum pooling layer pool5, a sixth rolling layer conv6 and a seventh rolling layer conv 7. Wherein:
performing conv 1-conv 5 on the first to fifth conv layers, wherein each layer comprises two times of continuous convolution calculation, and performing feature extraction processing on an input image to obtain a feature map; a sixth convolution layer conv6, including a convolution calculation, for performing a feature extraction process on the input image to obtain a feature map, and outputting the feature map as the next layer; a seventh convolution layer conv7, which includes a primary convolution calculation and classification activation function, and is used for performing feature extraction processing on the input image and performing pixel-level classification to obtain a heat map of a low-resolution class, namely a confidence map; the sizes of convolution kernels of each layer from conv1 to conv5 are 2 multiplied by 2, and the step length is 2; the convolution kernel size of the sixth convolution layer is 7 × 7, and the step size is 1; the convolution kernel size of the seventh convolution layer is 1 × 1 with a step size of 1.
Reducing the data dimension under the condition of not losing the features from the first largest pooling layer to the fifth largest pooling layer pool 1-pool 5 to obtain a feature map with reduced data dimension; the sizes of pool 1-pool 5 pooling nuclei are all 2X 2, step size is 2.
Second part, the fusion part
The fusion part comprises a first characteristic fusion layer and a second characteristic fusion layer, and the first characteristic fusion layer performs characteristic fusion with the output of the fifth maximum pooling layer after up-sampling the output of the seventh convolution layer; and the second feature fusion layer performs feature fusion on the output result of the first feature fusion layer and the features extracted by the fourth maximum pooling layer.
The first fusion layer fuses the segmentation results of the characteristics of the bottom two layers and is used for fusing the characteristics of multi-layer output, so that the identification accuracy is improved; the second feature fusion layer fuses the segmentation results of the bottom three layers, and cross-layer connection is achieved.
The feature fusion part is shown in figure 2:
the first feature fusion layer comprises an up-sampling layer and a convolution layer, wherein the input of the up-sampling layer is the output of conv7, the up-sampling multiple is 2 times, and the function is to expand pixels of the confidence map so as to facilitate feature fusion and obtain a confidence map A2 with expanded dimensions; the input of the convolutional layer is the output of pool5, the size of the convolutional kernel is 1 multiplied by 1, the step length is 1, the activation function is the Relu function, and a confidence map B of the output of the pool5 layer is obtained; the final output of the first feature fusion layer is the sum of the confidence map B and the confidence map A2, and is marked as a confidence map C, namely a segmentation result map fused with the features of the bottom two layers, and the segmentation result map is used for fusing the features of the multi-layer output and improving the recognition accuracy.
The second feature fusion layer comprises an up-sampling layer and a convolution layer, the input of the up-sampling layer is a segmentation result graph C, the up-sampling multiple is 2 times, the function is to expand pixels of the confidence graph so as to facilitate feature fusion, and a dimension expanded confidence graph C2 is obtained; the input of the convolutional layer is the output of pool4, the size of the convolutional kernel is 1 multiplied by 1, the step length is 1, the activation function is a relu function, and a confidence map D of the output of the pool4 layer is obtained; the final output of the second feature fusion layer is the sum of the confidence map D and the confidence map C2, and is recorded as a confidence map E, namely, the segmentation result map of the bottom three layers is fused, so that cross-layer connection is realized, the features of multi-layer output are fused, and the recognition accuracy is improved.
Third, output section
The output part comprises an output layer, the output layer comprises an up-sampling layer and a classification layer, wherein the input of the up-sampling layer is the output of the fusion part, namely the second characteristic fusion layer; the upsampling multiple is 8 times, and the function is to enlarge the size of the pixels of the fused confidence image E to the size of an original input image (an image input into a classification network); the classification layer activation function is a SoftMax function and is used for performing classification prediction on each pixel to finally obtain a high-resolution class thermodynamic diagram consistent with the size of an original input image.
Step 2, training the semantic segmentation network constructed in the step 1
Creating a data set: in the data set, the golden monkey partial pixels are uniformly marked as target areas. And finally, creating a golden monkey target detection set-Nature golden monkey Segmentation under a natural scene according to the standard of the PASCAL VOC data set, and naming the golden monkey target detection set-Nature golden monkey Segmentation as the NGS. The NGS data set contains 600 golden monkey images in a natural scene in total, and 31 golden monkey individuals appear in total, and the golden monkey has uniform gender and age layer distribution and rich action types.
Training is carried out by utilizing an NGS data set, training data and testing data are randomly distributed according to the ratio of 4:1, the maximum iteration number is set to be 5000 times, and a network model is stored after the training is finished. The specific training process is as follows:
using the final output of the semantic segmentation network as
Figure GDA0002918802590000065
Inputting the classified images of the images as y (namely the actual classification result of the original image) into a loss function and calculating, wherein the calculated result reflects the difference between the prediction and the actual result of the network; and (3) carrying out derivation on parameters in the network by the loss function, updating the network parameters according to the relationship of the derivatives, setting the learning rate to be 0.00005, and keeping the learning rate stable and unchanged in the learning process.
In the scheme, a loss function distance-weight loss based on distance weight is adopted as the loss function and is marked as DWL; the loss function is obtained by introducing weight coefficients to different pixel points in a basic cross entropy function and taking the weight coefficients as a distance constraint weight function. The DWL loss function can better calculate the distance between the predicted value and the true value of the sample, so that the model focuses more on the central area of the body of the golden monkey, and learns the structural information of the body of the golden monkey, and the DWL loss function is as follows:
Figure GDA0002918802590000061
a specific derivation of the DWL loss function is given below.
For the segmentation problem of the invention, the network model outputs the prediction probability value of whether each pixel belongs to the golden monkey region, and the cross entropy function L of the network model can be deduced through the basic classification cross entropy function as follows:
Figure GDA0002918802590000062
wherein, y(i,j)Representing the label value at the actually classified image pixel point (i, j) corresponding to the input image,
Figure GDA0002918802590000063
the predicted value of the pixel point (i, j) of the output image after the input image is processed by the semantic segmentation network is represented, and height and width respectively represent the height and width of the image.
The invention mainly makes two improvements to the cross entropy loss function: the improvement of different weights of the ROI area and the environmental background area and the introduction of the distance information from the center to the edge position of the golden monkey in the ROI area.
Firstly, introducing a weight coefficient W to different pixel points in a basic cross entropy function(i,j)The new cross entropy function WL is as follows:
Figure GDA0002918802590000064
wherein, W(i,j)The weight of loss at the pixel point (i, j) is expressed, and the original cross entropy loss function L can be understood as W(i,j)Is constantly equal to 1 and is therefore negligible.
In calculating W(i,j)In the process, the label value y of the pixel point is required to be determined(i,j)And the position information pair weight coefficient W of the pixel point (i, j)(i,j)Respectively taking values: firstly, keeping the weight of an environmental background region unchanged, and increasing the weight coefficient of a golden monkey ROI region to enable a model to pay more attention to the golden monkey region; secondly, for a golden monkey ROI target region, in order to utilize the body structure information of the golden monkey, linearly reduced weights are set from the central part of the body of the golden monkey, namely the rectangular center of the ROI region to the edge, so that the learning of the model to the central region of the body of the golden monkey is strengthened, and meanwhile, important distance constraint information between the center of the body of the golden monkey and the hair edge can be kept as much as possible; the distance constraint weight function dw(i,j)As follows:
Figure GDA0002918802590000071
wherein, I(i,j)Indicating the pixel at (i, j), center(i,j)(ii) represents the center of the connected domain where the pixel point (i, j) is located, distance: ((i,j),enter(i,j)) Representing a pixel point I(i,j)Center away from the connected domain where it is located(i,j)A and beta are two constants for controlling the pixel point I(i,j)The weighted value of the region makes the weighted range of different pixel points in the ROI region of the golden monkey be [ alpha-beta, alpha ]](ii) a Let α be 2 and β be max (distance: (b)(i,j),enter(i,j)) When I) then(i,j)When being the center of the connected component, dw(i,j)A value of 2; when I is(i,j)At farthest distance from the center of the connected component, dw(i,j)The value is 1, thereby achieving the goal of decreasing weights in the golden monkey ROI region from the center to the edge of the connected domain.
Weighting coefficient W of new cross entropy function WL(i,j)Value is dw(i,j)An improved distance weight based loss function DWL can be obtained as follows:
Figure GDA0002918802590000072
and 3, storing the trained network model for the segmentation detection of the image to be segmented.
After the semantic segmentation network is trained in the step 2, storing the trained network model; in practical application, an image to be segmented is input into a network model, and the output of the network model is the segmented high-resolution thermodynamic diagram.
Through experimental contrastive analysis, the problem that the body of the golden monkey is divided into a plurality of parts can be better solved by using the improved loss function, and the improved natural scene golden monkey detection network can better improve the image segmentation result of the previous problems.
The experimental comparative analysis procedure is as follows:
the invention performs experiments on the NGS dataset and randomly distributes training data and test data in a 4:1 ratio. In the natural scene golden monkey detection algorithm, the learning rate is set to be 0.00005, the learning rate is stable and unchanged in the learning process, and the maximum iteration number of the experiment is set to be 5000 times. After the segmentation result of the original image in the natural scene is obtained, generating a rectangular golden monkey individual detection result according to the edge of the segmentation image, in the generation process, eliminating a target pixel region which is too small to be normally utilized, and finally obtaining golden monkey individual data through the image region in the rectangular frame.
IoU standard can be used to quantitatively measure the correlation between the true value and the predicted value, as shown in fig. 3, in the present invention, the segmentation results obtained by the FCN network before and after the DWL function is applied are compared, IoU and the average value of all 100 images in the test set are calculated by calculating IoU between the segmentation results and ground truth as the evaluation index of the network performance, and the results are shown in the following table:
IoU comparison before and after DWL function improvement
Loss function IoU
Cross entropy function 85.28%
DWL function 86.14%
As can be seen from the table, the segmentation effect obtained by the original semantic segmentation network is better for the basic data, and the method is proved to be suitable for the segmentation task of the invention; after the loss function is improved, the result of using the DWL function is improved by 0.86 percent compared with the original network, and the improvement of the loss function is proved by the invention, so that the segmentation effect of the network model is improved to a certain extent on the whole.
As shown in fig. 4, the rectangle frame of (a) shows the segmentation result of the original image, and the rectangle frame of (b) is the rectangular golden monkey individual detection result generated based on the result image.
Fig. 5 shows in detail the influence of segmentation error on the generation of rectangle detection result, where (a) and (c) show two cases where the lower edge of the golden monkey is not completely covered by the rectangle detection result, and the segmentation error causes the problem of missing edge information of the golden monkey. (b) The situation that a single golden monkey is divided into two detection boxes due to the segmentation result error is shown, and the segmentation error causes the problem that the single golden monkey is divided into a plurality of rectangles. It is clear that the test data shown in fig. 5 is not usable for the golden monkey individual re-identification experiment.
By the method, the loss function of the original FCN network is improved, the loss weight calculation mode of different pixels based on distance is adopted, so that the constraint information of the body structure of the golden monkey is introduced, the segmentation result is shown in fig. 6, and the result has a better improvement effect. The image compares the situation that the rectangular detection result cannot be used due to two typical segmentation errors before improvement, the segmentation results in the two images are obviously improved, and the area of the error result is obviously reduced. For the generated rectangular detection results, (a) the lower edge of the golden monkey is completely included in the detection results, and (b) the golden monkey originally divided into two rectangular detection frames is successfully and accurately detected as a complete golden monkey individual.
The distance weight-based loss function DWL designed by the invention can greatly improve the golden monkey segmentation result in a natural scene, thereby effectively improving the accuracy of the final rectangular detection result and obtaining golden monkey individual image data meeting the requirements of the golden monkey individual re-identification experiment through the rectangular region result.

Claims (6)

1. A golden monkey body segmentation algorithm under a natural scene is characterized by comprising the following steps:
constructing a semantic segmentation network to realize end-to-end image segmentation; training the semantic segmentation network, and storing the trained network model for segmentation detection of the image to be segmented;
the semantic segmentation network comprises a classification network, a fusion part and an output part, wherein:
the semantic segmentation network sequentially comprises a first convolution layer, a first maximum pooling layer, a second convolution layer c, a second maximum pooling layer, a third convolution layer, a third maximum pooling layer, a fourth convolution layer, a fourth maximum pooling layer, a fifth convolution layer, a fifth maximum pooling layer, a sixth convolution layer and a seventh convolution layer from front to back; each layer of the first convolution layer to the fifth convolution layer comprises two continuous convolution calculations, the sixth convolution layer comprises one convolution calculation, and the function is to perform feature extraction processing on an input image to obtain a feature map; the seventh convolution layer comprises a primary convolution calculation and a classification activation function, and is used for performing feature extraction processing and pixel-level classification to obtain a confidence map; the role of the first through fifth max pooling layers is to reduce the data dimension without losing features;
the fusion part comprises a first characteristic fusion layer and a second characteristic fusion layer, and the first characteristic fusion layer performs characteristic fusion with the output of the fifth maximum pooling layer after up-sampling the output of the seventh convolution layer; the second feature fusion layer performs feature fusion on the output result of the first feature fusion layer and the features extracted by the fourth maximum pooling layer to obtain a fused confidence map;
the output part comprises an output layer, the output layer comprises an up-sampling layer and a classification layer, wherein the input of the up-sampling layer is the output of the second feature fusion layer, and the function is to expand the fused confidence map to the size of the original input image; the classification layer is used for performing classification prediction on each pixel to finally obtain a high-resolution class thermodynamic diagram consistent with the size of the original input image.
2. The golden monkey body segmentation algorithm in natural scene as claimed in claim 1, wherein the convolution kernel size of each of the first convolution layer to the fifth convolution layer is 2 x 2, and the step size is 2; the convolution kernel size of the sixth convolution layer is 7 × 7, and the step size is 1; the convolution kernel size of the seventh convolution layer is 1 × 1 with a step size of 1.
3. The golden monkey body segmentation algorithm in natural scene of claim 1, wherein the pooling kernel size of each of the first largest pooling layer to the fifth largest pooling layer is 2 x 2, and the step size is 2.
4. The golden monkey body segmentation algorithm in natural scene as claimed in claim 1, wherein the first feature fusion layer performs feature fusion with the output of the fifth largest pooling layer after upsampling the output of the seventh convolution layer, and comprises:
the first feature fusion layer comprises an up-sampling layer and a convolution layer, the input of the up-sampling layer is the output of the seventh convolution layer, the up-sampling multiple is 2 times, and the function is to expand pixels of the confidence map so as to facilitate feature fusion and obtain a confidence map A2 with expanded dimensions; the input of the convolutional layer is the output of the fifth maximum pooling layer, the size of the convolutional kernel is 1 multiplied by 1, the step length is 1, the activation function is a Relu function, and a confidence map B of the output of the fifth maximum pooling layer is obtained; the final output of the first feature fusion layer is the sum of the confidence map B and the confidence map a2, denoted as confidence map C.
5. The golden monkey body segmentation algorithm in natural scene as claimed in claim 1, wherein the second feature fusion layer performs feature fusion on the output result of the first feature fusion layer and the features extracted by the fourth largest pooling layer to obtain a fused confidence map, comprising:
the second feature fusion layer comprises an up-sampling layer and a convolution layer, the input of the up-sampling layer is a segmentation result graph C, the up-sampling multiple is 2 times, the function is to expand pixels of the confidence graph so as to facilitate feature fusion, and a dimension expanded confidence graph C2 is obtained; the input of the convolution layer is the output of the fourth maximum pooling layer, the size of the convolution kernel is 1 multiplied by 1, the step length is 1, the activation function is a Relu function, and a confidence map D of the output of the fourth maximum pooling layer is obtained; the final output of the second feature fusion layer is the sum of the confidence map D and the confidence map C2, which is denoted as the confidence map E, i.e. the fused confidence map.
6. The golden monkey body segmentation algorithm in natural scene as claimed in claim 1, wherein the loss function adopted when the semantic segmentation network is trained is as follows:
Figure FDA0003024407380000021
wherein, y(i,j)Representing the label value at the actually classified image pixel point (i, j) corresponding to the input image,
Figure FDA0003024407380000022
representing the predicted value of the pixel point (i, j) of the output image after the input image is processed by the semantic segmentation network, height and width respectively representing the height and width of the image, dw(i,j)The weight function is constrained for the distance, expressed as:
Figure FDA0003024407380000023
wherein, distance (I)(i,j),center(i,j)) Representing a pixel point I(i,j)Center away from the connected domain where it is located(i,j)α and β are two constants.
CN201910405596.6A 2019-05-16 2019-05-16 Golden monkey body segmentation algorithm in natural scene Expired - Fee Related CN110287777B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910405596.6A CN110287777B (en) 2019-05-16 2019-05-16 Golden monkey body segmentation algorithm in natural scene

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910405596.6A CN110287777B (en) 2019-05-16 2019-05-16 Golden monkey body segmentation algorithm in natural scene

Publications (2)

Publication Number Publication Date
CN110287777A CN110287777A (en) 2019-09-27
CN110287777B true CN110287777B (en) 2021-06-08

Family

ID=68002084

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910405596.6A Expired - Fee Related CN110287777B (en) 2019-05-16 2019-05-16 Golden monkey body segmentation algorithm in natural scene

Country Status (1)

Country Link
CN (1) CN110287777B (en)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110930383A (en) * 2019-11-20 2020-03-27 佛山市南海区广工大数控装备协同创新研究院 Injector defect detection method based on deep learning semantic segmentation and image classification
CN111179262A (en) * 2020-01-02 2020-05-19 国家电网有限公司 Electric power inspection image hardware fitting detection method combined with shape attribute
CN111242929A (en) * 2020-01-13 2020-06-05 中国科学技术大学 Fetal skull shape parameter measuring method, system, equipment and medium
CN113744276A (en) * 2020-05-13 2021-12-03 Oppo广东移动通信有限公司 Image processing method, image processing apparatus, electronic device, and readable storage medium
CN111626196B (en) * 2020-05-27 2023-05-16 西南石油大学 Knowledge-graph-based intelligent analysis method for body structure of typical bovine animal
CN112163449B (en) * 2020-08-21 2022-12-16 同济大学 Lightweight multi-branch feature cross-layer fusion image semantic segmentation method
CN115984309B (en) * 2021-12-10 2024-03-15 北京百度网讯科技有限公司 Method and apparatus for training image segmentation model and image segmentation
CN117351537B (en) * 2023-09-11 2024-05-17 中国科学院昆明动物研究所 Kiwi face intelligent recognition method and system based on deep learning

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109447990A (en) * 2018-10-22 2019-03-08 北京旷视科技有限公司 Image, semantic dividing method, device, electronic equipment and computer-readable medium

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8494285B2 (en) * 2010-12-09 2013-07-23 The Hong Kong University Of Science And Technology Joint semantic segmentation of images and scan data
CN105825168B (en) * 2016-02-02 2019-07-02 西北大学 A kind of Rhinopithecus roxellana face detection and method for tracing based on S-TLD
WO2017210690A1 (en) * 2016-06-03 2017-12-07 Lu Le Spatial aggregation of holistically-nested convolutional neural networks for automated organ localization and segmentation in 3d medical scans
CN106709568B (en) * 2016-12-16 2019-03-22 北京工业大学 The object detection and semantic segmentation method of RGB-D image based on deep layer convolutional network
CN109145939B (en) * 2018-07-02 2021-11-02 南京师范大学 Semantic segmentation method for small-target sensitive dual-channel convolutional neural network
CN109389051A (en) * 2018-09-20 2019-02-26 华南农业大学 A kind of building remote sensing images recognition methods based on convolutional neural networks

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109447990A (en) * 2018-10-22 2019-03-08 北京旷视科技有限公司 Image, semantic dividing method, device, electronic equipment and computer-readable medium

Also Published As

Publication number Publication date
CN110287777A (en) 2019-09-27

Similar Documents

Publication Publication Date Title
CN110287777B (en) Golden monkey body segmentation algorithm in natural scene
CN108256562B (en) Salient target detection method and system based on weak supervision time-space cascade neural network
CN107977932B (en) Face image super-resolution reconstruction method based on discriminable attribute constraint generation countermeasure network
CN110443143B (en) Multi-branch convolutional neural network fused remote sensing image scene classification method
CN110211045B (en) Super-resolution face image reconstruction method based on SRGAN network
CN110532900B (en) Facial expression recognition method based on U-Net and LS-CNN
CN110516536B (en) Weak supervision video behavior detection method based on time sequence class activation graph complementation
CN111814611B (en) Multi-scale face age estimation method and system embedded with high-order information
CN110321805B (en) Dynamic expression recognition method based on time sequence relation reasoning
CN110245620B (en) Non-maximization inhibition method based on attention
CN111553438A (en) Image identification method based on convolutional neural network
CN112861718A (en) Lightweight feature fusion crowd counting method and system
CN111738363A (en) Alzheimer disease classification method based on improved 3D CNN network
CN111079539A (en) Video abnormal behavior detection method based on abnormal tracking
CN112766283B (en) Two-phase flow pattern identification method based on multi-scale convolution network
CN114360067A (en) Dynamic gesture recognition method based on deep learning
CN116012337A (en) Hot rolled strip steel surface defect detection method based on improved YOLOv4
CN114283285A (en) Cross consistency self-training remote sensing image semantic segmentation network training method and device
CN110930378A (en) Emphysema image processing method and system based on low data demand
CN112329784A (en) Correlation filtering tracking method based on space-time perception and multimodal response
CN113297936A (en) Volleyball group behavior identification method based on local graph convolution network
CN110991317A (en) Crowd counting method based on multi-scale perspective sensing type network
CN113111716A (en) Remote sensing image semi-automatic labeling method and device based on deep learning
CN116740119A (en) Tobacco leaf image active contour segmentation method based on deep learning
CN111310516B (en) Behavior recognition method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20210608