CN113537004A - Double-pyramid multivariate feature extraction network of image, image segmentation method, system and medium - Google Patents

Double-pyramid multivariate feature extraction network of image, image segmentation method, system and medium Download PDF

Info

Publication number
CN113537004A
CN113537004A CN202110747532.1A CN202110747532A CN113537004A CN 113537004 A CN113537004 A CN 113537004A CN 202110747532 A CN202110747532 A CN 202110747532A CN 113537004 A CN113537004 A CN 113537004A
Authority
CN
China
Prior art keywords
feature
features
layer
semantic
pyramid
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110747532.1A
Other languages
Chinese (zh)
Other versions
CN113537004B (en
Inventor
杨大伟
任凤至
毛琳
张汝波
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Dalian Minzu University
Original Assignee
Dalian Minzu University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dalian Minzu University filed Critical Dalian Minzu University
Priority to CN202110747532.1A priority Critical patent/CN113537004B/en
Publication of CN113537004A publication Critical patent/CN113537004A/en
Application granted granted Critical
Publication of CN113537004B publication Critical patent/CN113537004B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology

Abstract

The image double-pyramid multi-feature extraction network, the image segmentation method, the image segmentation system and the medium belong to the field of deep learning image processing and comprise four input features, an example feature pyramid, a semantic feature pyramid and two output features, wherein the two output features comprise an example feature pyramid output feature and a semantic feature pyramid output feature. The method solves the problem that the traditional feature extraction method cannot meet the feature requirements of the multi-thread task, can provide detailed example target feature information for the target identification task and rich semantic logic feature information for the semantic analysis task, and greatly improves the precision of the multi-thread task.

Description

Double-pyramid multivariate feature extraction network of image, image segmentation method, system and medium
Technical Field
The invention belongs to the field of deep learning image processing, and particularly relates to a double-pyramid multivariate feature extraction method capable of providing two types of features for a multitask model.
Background
Digital image analysis technology plays an important role in the current society, and machine vision becomes an important research content in various industries. At present, the development of machine vision technology has gradually abandoned the scheme of the traditional manual design algorithm of digital image processing, and the deep learning is used instead, and the convolutional neural network is taken as a representative so as to achieve the analysis result with high accuracy. In the patent "feature extraction model and feature extraction method capable of sufficiently retaining image features" (publication number: CN110659653A), it is proposed to perform lossless feature extraction operation on input images with arbitrary resolution to solve the problem of insufficient information amount in the later analysis due to the fact that the backbone network continuously discards feature information. The patent "a method for extracting image features by using a low-complexity scale pyramid" (publication number: CN108537235A) discloses a method for extracting image features by using a low-complexity scale pyramid, which proposes that five groups of image blocks which are generated by filtering an image and form a scale pyramid are divided into two parts for different processing, and then the two parts of processing results are merged to form a final feature point list.
For the existing convolutional neural network model, the backbone feature extraction network is originated from the initial image classification network, and the traditional feature extraction network is only suitable for network frameworks with single task requirements, such as target detection, semantic segmentation and the like.
However, with the development of the deep learning computer vision field, the goal of multi-task integration in the deep neural network is increasingly required. Each task in the multi-thread task often has different purposes, and the requirements of different tasks on characteristics are greatly different according to different purposes, so that the traditional characteristic extraction method cannot meet the different requirements of the multi-thread task on the characteristics. Therefore, in the deep learning multithreading network, the problem that the traditional feature extraction method cannot meet the feature requirement of the multithreading task becomes an urgent solution.
Disclosure of Invention
In order to solve the problem that the traditional feature extraction method cannot meet the feature requirements of a multi-thread task, the invention provides the following technical scheme: a double-pyramid multi-feature extraction network of an image is composed of four input features, an example feature pyramid, a semantic feature pyramid and two output features, wherein the two output features are composed of example feature pyramid output features and semantic feature pyramid output features.
Further, in the above-mentioned case,
the example feature pyramid consists of four layers of example features, three up-sampling modules, three addition fusion modules, four identical standard convolution layers and one merging module;
the example feature pyramid constructs four layers of example features along a top-down path:
inputting a fourth layer of example features of which the features 1 form an example feature pyramid, then, one path of the fourth layer of example features enters an up-sampling module for size amplification, and the other path of the fourth layer of example features enters a merging and fusing module through a standard convolution layer to wait for merging and fusing;
the input features 2 and the amplified and transformed fourth-layer example features jointly enter an addition fusion module for feature fusion, feature fusion results form third-layer example features of an example feature pyramid, and then one path of the third-layer example features enters an up-sampling module for size amplification; the other path enters a merging and fusing module through the standard convolution layer to wait for merging and fusing;
the input features 3 and the third layer of example features subjected to the up-sampling amplification transformation jointly enter an addition fusion module for feature fusion, feature fusion results form example feature pyramid second layer of example features, and then one path of the second layer of example features enters an up-sampling module for size amplification; the other path enters a merging and fusing module through the standard convolution layer to wait for merging and fusing;
the input features 4 and the second layer of example features subjected to the up-sampling amplification conversion enter an addition fusion module together for feature fusion, feature fusion results form first layer of example features of an example feature pyramid, and then the first layer of example features enter a merging fusion module through a standard convolution layer to wait for merging and fusion processing;
the merging module merges the four example feature information waiting for merging and merging processing, outputs a merging and merging result as the output feature of the example feature pyramid, and forms one of two output features of the double-pyramid multi-feature extraction network;
the semantic feature pyramid consists of four layers of semantic features, three cavity convolution layers, three addition fusion modules, four standard convolution layers and one merging module;
the semantic feature pyramid constructs four layers of semantic features along a bottom-up path:
inputting a first layer of semantic features of a semantic feature pyramid formed by the features 4, then, enabling one path of the first layer of semantic features to enter a cavity convolution layer for size reduction, enabling the other path of the first layer of semantic features to enter a merging module through a standard convolution layer, and waiting for merging and merging;
the input features 3 and the reduced and transformed first-layer semantic features jointly enter an addition fusion module for feature fusion, feature fusion results form a second-layer semantic feature of a semantic feature pyramid, one path of the second-layer semantic feature enters a cavity convolution layer for size reduction, and the other path of the second-layer semantic feature enters a merging module through a standard convolution layer to wait for merging and fusion processing;
the input features 2 and the reduced and transformed second-layer semantic features jointly enter an addition fusion module for feature fusion, feature fusion results form a third-layer semantic feature of a semantic feature pyramid, one path of the third-layer semantic feature enters a cavity convolution layer for size reduction, and the other path of the third-layer semantic feature enters a merging module through a standard convolution layer to wait for merging and fusion processing;
the input features 1 and the reduced and transformed third-layer semantic features jointly enter an addition fusion module for feature fusion, feature fusion results form a fourth-layer semantic feature of a semantic feature pyramid, and then the fourth-layer semantic feature enters a merging module through a standard convolution layer to wait for merging and fusion processing.
And the merging module merges the four semantic feature information to be merged and fused, outputs a merged and fused result as the output feature of the semantic feature pyramid, and forms one of the two output features of the double-pyramid multi-feature extraction network.
Further, in the above-mentioned case,
the 4 input features of the double pyramid multi-feature extraction network are four results of feature rough extraction on the same input image.
The input feature 1 in the 4 input features of the double pyramid multi-element feature extraction network is a three-dimensional matrix with the size [256 × 25 × 38 ]; input feature 2 is a three-dimensional matrix with size [256 x 50 x 76 ]; input feature 3 is a three-dimensional matrix with dimensions [256 x 100 x 152 ]; the input features 4 are three-dimensional matrices with dimensions [256 x 200 x 304 ].
The upsampling module in the example feature pyramid expands the feature size input to the module by a factor of two.
Example feature pyramid output features are three-dimensional matrices with dimensions of [256 × 25 × 38 ].
The void convolutional layer in the semantic feature pyramid reduces the feature size input into the convolutional layer by a factor of two.
The semantic feature pyramid output features are three-dimensional matrices with dimensions [256 × 200 × 304 ].
An image segmentation method comprises the following steps:
step 1: reading a data set image, roughly extracting features, and obtaining a three-dimensional matrix with the size of [256 × 25 × 38] as an input feature 1, a three-dimensional matrix with the size of [256 × 50 × 76] as an input feature 2, a three-dimensional matrix with the size of [256 × 100 × 152] as an input feature 3, and a three-dimensional matrix with the size of [256 × 200 × 304] as an input feature 4;
step 2: transmitting the input features 1, the input features 2, the input features 3 and the input features 4 obtained in the step 1 to an example feature pyramid to obtain an example target feature matrix with the size [256 × 25 × 38 ];
and 3, step 3: inputting the example target feature matrix in the step 2 into a regional candidate network, and then generating a structure through full connection and a mask to obtain a segmentation result of the example target in the panorama;
and 4, step 4: transmitting the input features 1, 2, 3 and 4 obtained in the step 1 to a semantic feature pyramid to obtain a semantic feature matrix with the size [256 × 25 × 38 ];
and 5, step 5: inputting the semantic feature matrix in the step 4 into a full convolution structure to obtain a panoramic semantic segmentation result;
and 6, step 6: and merging and fusing the segmentation result of the instance target in the step 3 and the semantic segmentation result in the step 4 through a panoramic fusion structure to generate a panoramic segmentation result.
A computer system comprising a processor and a memory, the processor executing code in the memory to implement the method.
A computer storage medium storing a computer program for execution by hardware to implement the method.
Has the advantages that: the invention provides a double-pyramid multi-feature extraction network capable of providing two types of features, wherein the network provides detailed example target feature information for a task with target identification as a key point, provides rich semantic logic feature information for a task with semantic analysis as a key point, and can greatly improve the precision of a multi-thread task. The method is suitable for the multitask integration model of visual environment perception such as unmanned driving, mobile robots and the like.
Drawings
FIG. 1 is a schematic overall framework of the process
FIG. 2 is a schematic diagram of an example feature pyramid
FIG. 3 is a schematic diagram of a semantic feature pyramid
FIG. 4 is a panoramic view of an outdoor scene in example 1
FIG. 5 is a panoramic view of an indoor scene in example 2
FIG. 6 is a traffic scene panorama segmentation chart in example 3
Detailed Description
The invention is described in further detail below with reference to the following detailed description and accompanying drawings:
1. technical scheme
Deep learning network tasks can be divided into two broad categories: firstly, the identification of the target in the image comprises target detection, target tracking and the like; and secondly, performing semantic analysis on the whole image, including semantic segmentation and the like. In order to meet the requirements of the multithread task on the characteristics, the invention provides a double-pyramid multivariate characteristic extraction network capable of providing two different types of characteristics. The double-pyramid multi-feature extraction network comprises an example feature pyramid and a semantic feature pyramid. The example feature pyramid is used for acquiring detailed feature information of an example target in the image and can be used in the fields of target detection and the like; the semantic feature pyramid is used for acquiring rough feature information such as semantic positions in the image, is used for semantic analysis, and is suitable for the fields of semantic segmentation and the like.
2. Double pyramid multivariate feature extraction network
Double pyramid multivariate feature extraction network definition; the double-pyramid multi-feature extraction network is composed of four input features, an example feature pyramid, a semantic feature pyramid and two output features.
Wherein the four input features consist of input feature 1, input feature 2, input feature 3 and input feature 4; the two output features consist of an example feature pyramid output feature and a semantic feature pyramid output feature.
(1) Example feature pyramid
Definition 1: the example feature pyramid is composed of four layers of example features, three up-sampling modules, three additive fusion modules, 4 identical standard convolution layers and a merging module.
From a construction geometry perspective, the example feature pyramid constructs four layers of example features along a top-down path.
Inputting a fourth layer of example features of which the features 1 form an example feature pyramid, then, one path of the fourth layer of example features enters an up-sampling module for size amplification, and the other path of the fourth layer of example features enters a merging and fusing module through a standard convolution layer to wait for merging and fusing;
the input features 2 and the amplified and transformed fourth-layer example features jointly enter an addition fusion module for feature fusion, feature fusion results form third-layer example features of an example feature pyramid, and then one path of the third-layer example features enters an up-sampling module for size amplification; the other path enters a merging and fusing module through the standard convolution layer to wait for merging and fusing;
the input features 3 and the third layer of example features subjected to the up-sampling amplification transformation jointly enter an addition fusion module for feature fusion, feature fusion results form example feature pyramid second layer of example features, and then one path of the second layer of example features enters an up-sampling module for size amplification; the other path enters a merging and fusing module through the standard convolution layer to wait for merging and fusing;
the input features 4 and the second layer of example features subjected to the up-sampling amplification conversion enter an addition fusion module together for feature fusion, feature fusion results form first layer of example features of an example feature pyramid, and then the first layer of example features enter a merging fusion module through a standard convolution layer to wait for merging and fusion processing;
and the merging module merges the four example feature information waiting for merging and merging processing, outputs a merging and merging result as the output feature of the example feature pyramid, and forms one of the two output features of the double-pyramid multi-feature extraction network.
(2) Semantic feature pyramid
Definition 2: the semantic feature pyramid consists of four layers of semantic features, three cavity convolution layers, three additive fusion modules, 4 standard convolution layers and one merging module.
From the aspect of forming a geometric form, the semantic feature pyramid constructs four layers of semantic features along a path from bottom to top.
Inputting a first layer of semantic features of a semantic feature pyramid formed by the features 4, then, enabling one path of the first layer of semantic features to enter a cavity convolution layer for size reduction, enabling the other path of the first layer of semantic features to enter a merging module through a standard convolution layer, and waiting for merging and merging;
the input features 3 and the reduced and transformed first-layer semantic features jointly enter an addition fusion module for feature fusion, feature fusion results form a second-layer semantic feature of a semantic feature pyramid, one path of the second-layer semantic feature enters a cavity convolution layer for size reduction, and the other path of the second-layer semantic feature enters a merging module through a standard convolution layer to wait for merging and fusion processing;
the input features 2 and the reduced and transformed second-layer semantic features jointly enter an addition fusion module for feature fusion, feature fusion results form a third-layer semantic feature of a semantic feature pyramid, one path of the third-layer semantic feature enters a cavity convolution layer for size reduction, and the other path of the third-layer semantic feature enters a merging module through a standard convolution layer to wait for merging and fusion processing;
the input features 1 and the reduced and transformed third-layer semantic features jointly enter an addition fusion module for feature fusion, feature fusion results form a fourth-layer semantic feature of a semantic feature pyramid, and then the fourth-layer semantic feature enters a merging module through a standard convolution layer to wait for merging and fusion processing.
And the merging module merges the four semantic feature information to be merged and fused, outputs a merged and fused result as the output feature of the semantic feature pyramid, and forms one of the two output features of the double-pyramid multi-feature extraction network.
3. Constraint conditions
(1) The 4 input features of the double pyramid multi-feature extraction network are four results of feature rough extraction on the same input image.
(2) The input feature 1 in the 4 input features of the double pyramid multi-element feature extraction network is a three-dimensional matrix with the size [256 × 25 × 38 ]; input feature 2 is a three-dimensional matrix with size [256 x 50 x 76 ]; input feature 3 is a three-dimensional matrix with dimensions [256 x 100 x 152 ]; the input features 4 are three-dimensional matrices with dimensions [256 x 200 x 304 ].
(3) The upsampling module in the example feature pyramid expands the feature size input to the module by a factor of two.
(4) Example feature pyramid output features are three-dimensional matrices with dimensions of [256 × 25 × 38 ].
(5) The void convolutional layer in the semantic feature pyramid reduces the feature size input into the convolutional layer by a factor of two.
(6) The semantic feature pyramid output features are three-dimensional matrices with dimensions [256 × 200 × 304 ].
4. Principle analysis
The 4 input features of the double pyramid multi-feature extraction network are different feature forms obtained by roughly extracting the same image. The input feature 1 is the smallest in size, the feature information of the contained example target is the most abundant, and the abundance degrees of the input feature 2, the input feature 3 and the input feature 4 are sequentially decreased; the size of the input feature 4 is the largest, the feature information of the included semantic logic is the most abundant, and the richness of the input feature 3, the input feature 2 and the input feature 1 is gradually decreased.
(1) The example feature pyramid has rich example target features
The example feature pyramid takes the input feature 1 as a first-layer example feature to obtain the most detailed example target feature information, then amplifies the feature and continuously transmits the feature downwards, so that the example target feature is continuously strengthened and becomes more remarkable, and then is stored and output through the merging module.
(2) Semantic feature pyramid has rich semantic features
The semantic feature pyramid uses the input feature 4 as a first layer of semantic features to obtain the most abundant semantic logic feature information, and then sends the features into the hole convolution layer for scaling conversion and continuous downward transmission. The hole convolution layer enhances the characteristics of position logic and the like in the image by expanding the receptive field of the convolution kernel, so that the input first-layer semantic characteristics and the second-layer and third-layer semantic characteristics in the downward transmission process of the semantic characteristic pyramid are further enhanced, the semantic characteristics are continuously enhanced and become more obvious, and then the semantic characteristics are stored and output by the merging module.
5. Advantageous effects
(1) Providing two types of features
The present invention can provide two different types of features for a multitasking model. Rich instance characteristic information can be provided for tasks such as object detection and the like which take an instance target as a key point to identify an object; specific semantic feature information can be provided for tasks with global semantic analysis as a key point, such as semantic segmentation and the like.
(2) Adapted to multi-tasking models
Each task in the multitask network model often has a specific achievement goal, and the requirements of different tasks on features are greatly different according to different purposes. The two types of features provided by the invention can meet different requirements of the multi-thread task on the features.
(3) Model suitable for panorama segmentation
As a network model with multi-task integration, panorama segmentation needs to realize two different task targets, namely semantic segmentation on a panorama and instance segmentation on an instance target in the panorama. The two types of characteristics provided by the invention can well meet the requirements of the panoramic segmentation model on the characteristic information. The semantic features which are required by the semantic segmentation task and can provide position logic information for segmentation can be provided by a semantic feature pyramid in the invention; example target features needed by the example segmentation task to provide example target detail information for segmentation can be provided by the example feature pyramid in the present invention. The double-pyramid multi-element feature extraction network provides rich and comprehensive image features for the panoramic segmentation model, and can greatly improve the panoramic segmentation precision.
(4) Suitable for unmanned driving technology
The invention is a computer vision environment perception technology, is suitable for the field of unmanned driving, can extract example target information of pedestrians, vehicles, buildings and the like in the driving environment and information of semantic positions of the whole driving environment, provides comprehensive characteristic information for a network model, and provides important safety guarantee for normal driving.
(5) Be suitable for public transport monitored control system
The method effectively identifies the pedestrians, the vehicles and the road environment, meets the requirements of the road traffic scene, and provides an auxiliary means for safe driving for drivers; by means of the precision and the speed of the invention, the characteristic information can be effectively extracted aiming at the illegal vehicles, pedestrians who do not guard on traffic rules and accidents in traffic environment, thereby providing favorable conditions for the next identification work and improving the working efficiency of the public monitoring system.
The logic schematic of the method is shown in fig. 1, and the specific implementation steps of the algorithm are as follows:
step 1: reading a data set image, roughly extracting features through an arbitrary feature network, and obtaining a three-dimensional matrix with the size of [256 × 25 × 38] as an input feature 1, a three-dimensional matrix with the size of [256 × 50 × 76] as an input feature 2, a three-dimensional matrix with the size of [256 × 100 × 152] as an input feature 3, and a three-dimensional matrix with the size of [256 × 200 × 304] as an input feature 4;
step 2: transmitting the input features 1, the input features 2, the input features 3 and the input features 4 obtained in the step 1 to an example feature pyramid to obtain an example target feature matrix with the size [256 × 25 × 38 ];
and 3, step 3: inputting the example target feature matrix in the step 2 into a candidate network of the area, and then generating a structure through full connection and a mask to obtain a segmentation result of the example target in the panorama.
And 4, step 4: transmitting the input features 1, 2, 3 and 4 obtained in the step 1 to a semantic feature pyramid to obtain a semantic feature matrix with the size [256 × 25 × 38 ];
and 5, step 5: and (4) inputting the semantic feature matrix in the step (4) into a full convolution structure to obtain a panoramic semantic segmentation result.
And 6, step 6: and merging and fusing the segmentation result of the instance target in the step 3 and the semantic segmentation result in the step 4 through a panoramic fusion structure to generate a panoramic segmentation result.
Example 1:
the implementation example is to input the outdoor activity scene into the network model and perform panoramic segmentation on all objects in the outdoor scene. The outdoor scene panorama segmentation result is shown in fig. 4.
Example 2:
in the embodiment, an indoor life scene is input into a network model, and all objects in the indoor scene are subjected to panoramic segmentation. The indoor scene panorama segmentation result is shown in fig. 5.
Example 3:
the implementation example is that a road traffic scene is input into a network model, and example targets such as pedestrians and vehicles and non-example targets such as roads and sky in the traffic scene are subjected to panoramic segmentation. The traffic scene panorama segmentation result is shown in fig. 6.

Claims (6)

1. A double-pyramid multi-feature extraction network of an image is characterized by comprising four input features, an example feature pyramid, a semantic feature pyramid and two output features, wherein the two output features comprise an example feature pyramid output feature and a semantic feature pyramid output feature.
2. The double-pyramid multivariate feature extraction network of images of claim 1, wherein,
the example feature pyramid consists of four layers of example features, three up-sampling modules, three addition fusion modules, four identical standard convolution layers and one merging module;
the example feature pyramid constructs four layers of example features along a top-down path:
inputting a fourth layer of example features of which the features 1 form an example feature pyramid, then, one path of the fourth layer of example features enters an up-sampling module for size amplification, and the other path of the fourth layer of example features enters a merging and fusing module through a standard convolution layer to wait for merging and fusing;
the input features 2 and the amplified and transformed fourth-layer example features jointly enter an addition fusion module for feature fusion, feature fusion results form third-layer example features of an example feature pyramid, and then one path of the third-layer example features enters an up-sampling module for size amplification; the other path enters a merging and fusing module through the standard convolution layer to wait for merging and fusing;
the input features 3 and the third layer of example features subjected to the up-sampling amplification transformation jointly enter an addition fusion module for feature fusion, feature fusion results form example feature pyramid second layer of example features, and then one path of the second layer of example features enters an up-sampling module for size amplification; the other path enters a merging and fusing module through the standard convolution layer to wait for merging and fusing;
the input features 4 and the second layer of example features subjected to the up-sampling amplification conversion enter an addition fusion module together for feature fusion, feature fusion results form first layer of example features of an example feature pyramid, and then the first layer of example features enter a merging fusion module through a standard convolution layer to wait for merging and fusion processing;
the merging module merges the four example feature information waiting for merging and merging processing, outputs a merging and merging result as the output feature of the example feature pyramid, and forms one of two output features of the double-pyramid multi-feature extraction network;
the semantic feature pyramid consists of four layers of semantic features, three cavity convolution layers, three addition fusion modules, four standard convolution layers and one merging module;
the semantic feature pyramid constructs four layers of semantic features along a bottom-up path:
inputting a first layer of semantic features of a semantic feature pyramid formed by the features 4, then, enabling one path of the first layer of semantic features to enter a cavity convolution layer for size reduction, enabling the other path of the first layer of semantic features to enter a merging module through a standard convolution layer, and waiting for merging and merging;
the input features 3 and the reduced and transformed first-layer semantic features jointly enter an addition fusion module for feature fusion, feature fusion results form a second-layer semantic feature of a semantic feature pyramid, one path of the second-layer semantic feature enters a cavity convolution layer for size reduction, and the other path of the second-layer semantic feature enters a merging module through a standard convolution layer to wait for merging and fusion processing;
the input features 2 and the reduced and transformed second-layer semantic features jointly enter an addition fusion module for feature fusion, feature fusion results form a third-layer semantic feature of a semantic feature pyramid, one path of the third-layer semantic feature enters a cavity convolution layer for size reduction, and the other path of the third-layer semantic feature enters a merging module through a standard convolution layer to wait for merging and fusion processing;
the input features 1 and the reduced and transformed third-layer semantic features jointly enter an addition fusion module for feature fusion, feature fusion results form a fourth-layer semantic feature of a semantic feature pyramid, and then the fourth-layer semantic feature enters a merging module through a standard convolution layer to wait for merging and fusion processing.
And the merging module merges the four semantic feature information to be merged and fused, outputs a merged and fused result as the output feature of the semantic feature pyramid, and forms one of the two output features of the double-pyramid multi-feature extraction network.
3. The double-pyramid multivariate feature extraction network of images of claim 1, wherein,
the 4 input features of the double pyramid multi-feature extraction network are four results of feature rough extraction on the same input image.
The input feature 1 in the 4 input features of the double pyramid multi-element feature extraction network is a three-dimensional matrix with the size [256 × 25 × 38 ]; input feature 2 is a three-dimensional matrix with size [256 x 50 x 76 ]; input feature 3 is a three-dimensional matrix with dimensions [256 x 100 x 152 ]; the input features 4 are three-dimensional matrices with dimensions [256 x 200 x 304 ].
The upsampling module in the example feature pyramid expands the feature size input to the module by a factor of two.
Example feature pyramid output features are three-dimensional matrices with dimensions of [256 × 25 × 38 ].
The void convolutional layer in the semantic feature pyramid reduces the feature size input into the convolutional layer by a factor of two.
The semantic feature pyramid output features are three-dimensional matrices with dimensions [256 × 200 × 304 ].
4. An image segmentation method comprises the following characteristic steps:
step 1: reading a data set image, roughly extracting features, and obtaining a three-dimensional matrix with the size of [256 × 25 × 38] as an input feature 1, a three-dimensional matrix with the size of [256 × 50 × 76] as an input feature 2, a three-dimensional matrix with the size of [256 × 100 × 152] as an input feature 3, and a three-dimensional matrix with the size of [256 × 200 × 304] as an input feature 4;
step 2: transmitting the input features 1, the input features 2, the input features 3 and the input features 4 obtained in the step 1 to an example feature pyramid to obtain an example target feature matrix with the size [256 × 25 × 38 ];
and 3, step 3: inputting the example target feature matrix in the step 2 into a regional candidate network, and then generating a structure through full connection and a mask to obtain a segmentation result of the example target in the panorama;
and 4, step 4: transmitting the input features 1, 2, 3 and 4 obtained in the step 1 to a semantic feature pyramid to obtain a semantic feature matrix with the size [256 × 25 × 38 ];
and 5, step 5: inputting the semantic feature matrix in the step 4 into a full convolution structure to obtain a panoramic semantic segmentation result;
and 6, step 6: and merging and fusing the segmentation result of the instance target in the step 3 and the semantic segmentation result in the step 4 through a panoramic fusion structure to generate a panoramic segmentation result.
5. A computer system comprising a processor and a memory, the processor executing code in the memory to implement the method of claim 4.
6. A computer storage medium storing a computer program for execution by hardware to implement the method of claim 4.
CN202110747532.1A 2021-07-01 2021-07-01 Image double pyramid multi-element feature extraction network, image segmentation method, system and medium Active CN113537004B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110747532.1A CN113537004B (en) 2021-07-01 2021-07-01 Image double pyramid multi-element feature extraction network, image segmentation method, system and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110747532.1A CN113537004B (en) 2021-07-01 2021-07-01 Image double pyramid multi-element feature extraction network, image segmentation method, system and medium

Publications (2)

Publication Number Publication Date
CN113537004A true CN113537004A (en) 2021-10-22
CN113537004B CN113537004B (en) 2023-09-01

Family

ID=78097593

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110747532.1A Active CN113537004B (en) 2021-07-01 2021-07-01 Image double pyramid multi-element feature extraction network, image segmentation method, system and medium

Country Status (1)

Country Link
CN (1) CN113537004B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109325534A (en) * 2018-09-22 2019-02-12 天津大学 A kind of semantic segmentation method based on two-way multi-Scale Pyramid
US20190057507A1 (en) * 2017-08-18 2019-02-21 Samsung Electronics Co., Ltd. System and method for semantic segmentation of images
CN110084274A (en) * 2019-03-29 2019-08-02 南京邮电大学 Realtime graphic semantic segmentation method and system, readable storage medium storing program for executing and terminal
CN111524150A (en) * 2020-07-03 2020-08-11 支付宝(杭州)信息技术有限公司 Image processing method and device
US20200334819A1 (en) * 2018-09-30 2020-10-22 Boe Technology Group Co., Ltd. Image segmentation apparatus, method and relevant computing device
CN112232232A (en) * 2020-10-20 2021-01-15 城云科技(中国)有限公司 Target detection method

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190057507A1 (en) * 2017-08-18 2019-02-21 Samsung Electronics Co., Ltd. System and method for semantic segmentation of images
CN109325534A (en) * 2018-09-22 2019-02-12 天津大学 A kind of semantic segmentation method based on two-way multi-Scale Pyramid
US20200334819A1 (en) * 2018-09-30 2020-10-22 Boe Technology Group Co., Ltd. Image segmentation apparatus, method and relevant computing device
CN110084274A (en) * 2019-03-29 2019-08-02 南京邮电大学 Realtime graphic semantic segmentation method and system, readable storage medium storing program for executing and terminal
CN111524150A (en) * 2020-07-03 2020-08-11 支付宝(杭州)信息技术有限公司 Image processing method and device
CN112232232A (en) * 2020-10-20 2021-01-15 城云科技(中国)有限公司 Target detection method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
姜世浩;齐苏敏;王来花;贾惠;: "基于Mask R-CNN和多特征融合的实例分割", 计算机技术与发展, no. 09 *

Also Published As

Publication number Publication date
CN113537004B (en) 2023-09-01

Similar Documents

Publication Publication Date Title
Min et al. Traffic sign recognition based on semantic scene understanding and structural traffic sign location
He et al. Rail transit obstacle detection based on improved CNN
WO2023030182A1 (en) Image generation method and apparatus
CN115439483B (en) High-quality welding seam and welding seam defect identification system, method and storage medium
CN112101153A (en) Remote sensing target detection method based on receptive field module and multiple characteristic pyramid
CN115861619A (en) Airborne LiDAR (light detection and ranging) urban point cloud semantic segmentation method and system of recursive residual double-attention kernel point convolution network
CN113128476A (en) Low-power consumption real-time helmet detection method based on computer vision target detection
CN115588126A (en) GAM, CARAFE and SnIoU fused vehicle target detection method
Song et al. Msfanet: A light weight object detector based on context aggregation and attention mechanism for autonomous mining truck
Pham Semantic road segmentation using deep learning
CN115019274A (en) Pavement disease identification method integrating tracking and retrieval algorithm
Yuan et al. Multi-level object detection by multi-sensor perception of traffic scenes
CN113537004A (en) Double-pyramid multivariate feature extraction network of image, image segmentation method, system and medium
Xiang et al. A real-time vehicle traffic light detection algorithm based on modified YOLOv3
Feng et al. Embedded YOLO: A real-time object detector for small intelligent trajectory cars
CN116229410A (en) Lightweight neural network road scene detection method integrating multidimensional information pooling
Wei et al. An Efficient Point Cloud-based 3D Single Stage Object Detector
Valiente et al. Robust perception and visual understanding of traffic signs in the wild
Lai et al. Aircraft Target Detection Based on Attention Mechanism and Faster R-CNN
CN117152646B (en) Unmanned electric power inspection AI light-weight large model method and system
Cheng Global-feature enhanced network for fast semantic segmentation
Wang et al. YOLOv5-Based Dense Small Target Detection Algorithm for Aerial Images Using DIOU-NMS
Vaidya et al. Detecting Buildings from Remote Sensing Imagery: Unleashing the Power of YOLOv5 and YOLOv8
CN116503838A (en) Traffic sign detection algorithm based on feature multi-scale fusion
CN117292335A (en) Dangerous chemical vehicle detection method based on improved YOLOv5 algorithm

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant