CN113537004A - Double-pyramid multivariate feature extraction network of image, image segmentation method, system and medium - Google Patents
Double-pyramid multivariate feature extraction network of image, image segmentation method, system and medium Download PDFInfo
- Publication number
- CN113537004A CN113537004A CN202110747532.1A CN202110747532A CN113537004A CN 113537004 A CN113537004 A CN 113537004A CN 202110747532 A CN202110747532 A CN 202110747532A CN 113537004 A CN113537004 A CN 113537004A
- Authority
- CN
- China
- Prior art keywords
- feature
- features
- layer
- semantic
- pyramid
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
- G06F18/253—Fusion techniques of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
Abstract
The image double-pyramid multi-feature extraction network, the image segmentation method, the image segmentation system and the medium belong to the field of deep learning image processing and comprise four input features, an example feature pyramid, a semantic feature pyramid and two output features, wherein the two output features comprise an example feature pyramid output feature and a semantic feature pyramid output feature. The method solves the problem that the traditional feature extraction method cannot meet the feature requirements of the multi-thread task, can provide detailed example target feature information for the target identification task and rich semantic logic feature information for the semantic analysis task, and greatly improves the precision of the multi-thread task.
Description
Technical Field
The invention belongs to the field of deep learning image processing, and particularly relates to a double-pyramid multivariate feature extraction method capable of providing two types of features for a multitask model.
Background
Digital image analysis technology plays an important role in the current society, and machine vision becomes an important research content in various industries. At present, the development of machine vision technology has gradually abandoned the scheme of the traditional manual design algorithm of digital image processing, and the deep learning is used instead, and the convolutional neural network is taken as a representative so as to achieve the analysis result with high accuracy. In the patent "feature extraction model and feature extraction method capable of sufficiently retaining image features" (publication number: CN110659653A), it is proposed to perform lossless feature extraction operation on input images with arbitrary resolution to solve the problem of insufficient information amount in the later analysis due to the fact that the backbone network continuously discards feature information. The patent "a method for extracting image features by using a low-complexity scale pyramid" (publication number: CN108537235A) discloses a method for extracting image features by using a low-complexity scale pyramid, which proposes that five groups of image blocks which are generated by filtering an image and form a scale pyramid are divided into two parts for different processing, and then the two parts of processing results are merged to form a final feature point list.
For the existing convolutional neural network model, the backbone feature extraction network is originated from the initial image classification network, and the traditional feature extraction network is only suitable for network frameworks with single task requirements, such as target detection, semantic segmentation and the like.
However, with the development of the deep learning computer vision field, the goal of multi-task integration in the deep neural network is increasingly required. Each task in the multi-thread task often has different purposes, and the requirements of different tasks on characteristics are greatly different according to different purposes, so that the traditional characteristic extraction method cannot meet the different requirements of the multi-thread task on the characteristics. Therefore, in the deep learning multithreading network, the problem that the traditional feature extraction method cannot meet the feature requirement of the multithreading task becomes an urgent solution.
Disclosure of Invention
In order to solve the problem that the traditional feature extraction method cannot meet the feature requirements of a multi-thread task, the invention provides the following technical scheme: a double-pyramid multi-feature extraction network of an image is composed of four input features, an example feature pyramid, a semantic feature pyramid and two output features, wherein the two output features are composed of example feature pyramid output features and semantic feature pyramid output features.
Further, in the above-mentioned case,
the example feature pyramid consists of four layers of example features, three up-sampling modules, three addition fusion modules, four identical standard convolution layers and one merging module;
the example feature pyramid constructs four layers of example features along a top-down path:
inputting a fourth layer of example features of which the features 1 form an example feature pyramid, then, one path of the fourth layer of example features enters an up-sampling module for size amplification, and the other path of the fourth layer of example features enters a merging and fusing module through a standard convolution layer to wait for merging and fusing;
the input features 2 and the amplified and transformed fourth-layer example features jointly enter an addition fusion module for feature fusion, feature fusion results form third-layer example features of an example feature pyramid, and then one path of the third-layer example features enters an up-sampling module for size amplification; the other path enters a merging and fusing module through the standard convolution layer to wait for merging and fusing;
the input features 3 and the third layer of example features subjected to the up-sampling amplification transformation jointly enter an addition fusion module for feature fusion, feature fusion results form example feature pyramid second layer of example features, and then one path of the second layer of example features enters an up-sampling module for size amplification; the other path enters a merging and fusing module through the standard convolution layer to wait for merging and fusing;
the input features 4 and the second layer of example features subjected to the up-sampling amplification conversion enter an addition fusion module together for feature fusion, feature fusion results form first layer of example features of an example feature pyramid, and then the first layer of example features enter a merging fusion module through a standard convolution layer to wait for merging and fusion processing;
the merging module merges the four example feature information waiting for merging and merging processing, outputs a merging and merging result as the output feature of the example feature pyramid, and forms one of two output features of the double-pyramid multi-feature extraction network;
the semantic feature pyramid consists of four layers of semantic features, three cavity convolution layers, three addition fusion modules, four standard convolution layers and one merging module;
the semantic feature pyramid constructs four layers of semantic features along a bottom-up path:
inputting a first layer of semantic features of a semantic feature pyramid formed by the features 4, then, enabling one path of the first layer of semantic features to enter a cavity convolution layer for size reduction, enabling the other path of the first layer of semantic features to enter a merging module through a standard convolution layer, and waiting for merging and merging;
the input features 3 and the reduced and transformed first-layer semantic features jointly enter an addition fusion module for feature fusion, feature fusion results form a second-layer semantic feature of a semantic feature pyramid, one path of the second-layer semantic feature enters a cavity convolution layer for size reduction, and the other path of the second-layer semantic feature enters a merging module through a standard convolution layer to wait for merging and fusion processing;
the input features 2 and the reduced and transformed second-layer semantic features jointly enter an addition fusion module for feature fusion, feature fusion results form a third-layer semantic feature of a semantic feature pyramid, one path of the third-layer semantic feature enters a cavity convolution layer for size reduction, and the other path of the third-layer semantic feature enters a merging module through a standard convolution layer to wait for merging and fusion processing;
the input features 1 and the reduced and transformed third-layer semantic features jointly enter an addition fusion module for feature fusion, feature fusion results form a fourth-layer semantic feature of a semantic feature pyramid, and then the fourth-layer semantic feature enters a merging module through a standard convolution layer to wait for merging and fusion processing.
And the merging module merges the four semantic feature information to be merged and fused, outputs a merged and fused result as the output feature of the semantic feature pyramid, and forms one of the two output features of the double-pyramid multi-feature extraction network.
Further, in the above-mentioned case,
the 4 input features of the double pyramid multi-feature extraction network are four results of feature rough extraction on the same input image.
The input feature 1 in the 4 input features of the double pyramid multi-element feature extraction network is a three-dimensional matrix with the size [256 × 25 × 38 ]; input feature 2 is a three-dimensional matrix with size [256 x 50 x 76 ]; input feature 3 is a three-dimensional matrix with dimensions [256 x 100 x 152 ]; the input features 4 are three-dimensional matrices with dimensions [256 x 200 x 304 ].
The upsampling module in the example feature pyramid expands the feature size input to the module by a factor of two.
Example feature pyramid output features are three-dimensional matrices with dimensions of [256 × 25 × 38 ].
The void convolutional layer in the semantic feature pyramid reduces the feature size input into the convolutional layer by a factor of two.
The semantic feature pyramid output features are three-dimensional matrices with dimensions [256 × 200 × 304 ].
An image segmentation method comprises the following steps:
step 1: reading a data set image, roughly extracting features, and obtaining a three-dimensional matrix with the size of [256 × 25 × 38] as an input feature 1, a three-dimensional matrix with the size of [256 × 50 × 76] as an input feature 2, a three-dimensional matrix with the size of [256 × 100 × 152] as an input feature 3, and a three-dimensional matrix with the size of [256 × 200 × 304] as an input feature 4;
step 2: transmitting the input features 1, the input features 2, the input features 3 and the input features 4 obtained in the step 1 to an example feature pyramid to obtain an example target feature matrix with the size [256 × 25 × 38 ];
and 3, step 3: inputting the example target feature matrix in the step 2 into a regional candidate network, and then generating a structure through full connection and a mask to obtain a segmentation result of the example target in the panorama;
and 4, step 4: transmitting the input features 1, 2, 3 and 4 obtained in the step 1 to a semantic feature pyramid to obtain a semantic feature matrix with the size [256 × 25 × 38 ];
and 5, step 5: inputting the semantic feature matrix in the step 4 into a full convolution structure to obtain a panoramic semantic segmentation result;
and 6, step 6: and merging and fusing the segmentation result of the instance target in the step 3 and the semantic segmentation result in the step 4 through a panoramic fusion structure to generate a panoramic segmentation result.
A computer system comprising a processor and a memory, the processor executing code in the memory to implement the method.
A computer storage medium storing a computer program for execution by hardware to implement the method.
Has the advantages that: the invention provides a double-pyramid multi-feature extraction network capable of providing two types of features, wherein the network provides detailed example target feature information for a task with target identification as a key point, provides rich semantic logic feature information for a task with semantic analysis as a key point, and can greatly improve the precision of a multi-thread task. The method is suitable for the multitask integration model of visual environment perception such as unmanned driving, mobile robots and the like.
Drawings
FIG. 1 is a schematic overall framework of the process
FIG. 2 is a schematic diagram of an example feature pyramid
FIG. 3 is a schematic diagram of a semantic feature pyramid
FIG. 4 is a panoramic view of an outdoor scene in example 1
FIG. 5 is a panoramic view of an indoor scene in example 2
FIG. 6 is a traffic scene panorama segmentation chart in example 3
Detailed Description
The invention is described in further detail below with reference to the following detailed description and accompanying drawings:
1. technical scheme
Deep learning network tasks can be divided into two broad categories: firstly, the identification of the target in the image comprises target detection, target tracking and the like; and secondly, performing semantic analysis on the whole image, including semantic segmentation and the like. In order to meet the requirements of the multithread task on the characteristics, the invention provides a double-pyramid multivariate characteristic extraction network capable of providing two different types of characteristics. The double-pyramid multi-feature extraction network comprises an example feature pyramid and a semantic feature pyramid. The example feature pyramid is used for acquiring detailed feature information of an example target in the image and can be used in the fields of target detection and the like; the semantic feature pyramid is used for acquiring rough feature information such as semantic positions in the image, is used for semantic analysis, and is suitable for the fields of semantic segmentation and the like.
2. Double pyramid multivariate feature extraction network
Double pyramid multivariate feature extraction network definition; the double-pyramid multi-feature extraction network is composed of four input features, an example feature pyramid, a semantic feature pyramid and two output features.
Wherein the four input features consist of input feature 1, input feature 2, input feature 3 and input feature 4; the two output features consist of an example feature pyramid output feature and a semantic feature pyramid output feature.
(1) Example feature pyramid
Definition 1: the example feature pyramid is composed of four layers of example features, three up-sampling modules, three additive fusion modules, 4 identical standard convolution layers and a merging module.
From a construction geometry perspective, the example feature pyramid constructs four layers of example features along a top-down path.
Inputting a fourth layer of example features of which the features 1 form an example feature pyramid, then, one path of the fourth layer of example features enters an up-sampling module for size amplification, and the other path of the fourth layer of example features enters a merging and fusing module through a standard convolution layer to wait for merging and fusing;
the input features 2 and the amplified and transformed fourth-layer example features jointly enter an addition fusion module for feature fusion, feature fusion results form third-layer example features of an example feature pyramid, and then one path of the third-layer example features enters an up-sampling module for size amplification; the other path enters a merging and fusing module through the standard convolution layer to wait for merging and fusing;
the input features 3 and the third layer of example features subjected to the up-sampling amplification transformation jointly enter an addition fusion module for feature fusion, feature fusion results form example feature pyramid second layer of example features, and then one path of the second layer of example features enters an up-sampling module for size amplification; the other path enters a merging and fusing module through the standard convolution layer to wait for merging and fusing;
the input features 4 and the second layer of example features subjected to the up-sampling amplification conversion enter an addition fusion module together for feature fusion, feature fusion results form first layer of example features of an example feature pyramid, and then the first layer of example features enter a merging fusion module through a standard convolution layer to wait for merging and fusion processing;
and the merging module merges the four example feature information waiting for merging and merging processing, outputs a merging and merging result as the output feature of the example feature pyramid, and forms one of the two output features of the double-pyramid multi-feature extraction network.
(2) Semantic feature pyramid
Definition 2: the semantic feature pyramid consists of four layers of semantic features, three cavity convolution layers, three additive fusion modules, 4 standard convolution layers and one merging module.
From the aspect of forming a geometric form, the semantic feature pyramid constructs four layers of semantic features along a path from bottom to top.
Inputting a first layer of semantic features of a semantic feature pyramid formed by the features 4, then, enabling one path of the first layer of semantic features to enter a cavity convolution layer for size reduction, enabling the other path of the first layer of semantic features to enter a merging module through a standard convolution layer, and waiting for merging and merging;
the input features 3 and the reduced and transformed first-layer semantic features jointly enter an addition fusion module for feature fusion, feature fusion results form a second-layer semantic feature of a semantic feature pyramid, one path of the second-layer semantic feature enters a cavity convolution layer for size reduction, and the other path of the second-layer semantic feature enters a merging module through a standard convolution layer to wait for merging and fusion processing;
the input features 2 and the reduced and transformed second-layer semantic features jointly enter an addition fusion module for feature fusion, feature fusion results form a third-layer semantic feature of a semantic feature pyramid, one path of the third-layer semantic feature enters a cavity convolution layer for size reduction, and the other path of the third-layer semantic feature enters a merging module through a standard convolution layer to wait for merging and fusion processing;
the input features 1 and the reduced and transformed third-layer semantic features jointly enter an addition fusion module for feature fusion, feature fusion results form a fourth-layer semantic feature of a semantic feature pyramid, and then the fourth-layer semantic feature enters a merging module through a standard convolution layer to wait for merging and fusion processing.
And the merging module merges the four semantic feature information to be merged and fused, outputs a merged and fused result as the output feature of the semantic feature pyramid, and forms one of the two output features of the double-pyramid multi-feature extraction network.
3. Constraint conditions
(1) The 4 input features of the double pyramid multi-feature extraction network are four results of feature rough extraction on the same input image.
(2) The input feature 1 in the 4 input features of the double pyramid multi-element feature extraction network is a three-dimensional matrix with the size [256 × 25 × 38 ]; input feature 2 is a three-dimensional matrix with size [256 x 50 x 76 ]; input feature 3 is a three-dimensional matrix with dimensions [256 x 100 x 152 ]; the input features 4 are three-dimensional matrices with dimensions [256 x 200 x 304 ].
(3) The upsampling module in the example feature pyramid expands the feature size input to the module by a factor of two.
(4) Example feature pyramid output features are three-dimensional matrices with dimensions of [256 × 25 × 38 ].
(5) The void convolutional layer in the semantic feature pyramid reduces the feature size input into the convolutional layer by a factor of two.
(6) The semantic feature pyramid output features are three-dimensional matrices with dimensions [256 × 200 × 304 ].
4. Principle analysis
The 4 input features of the double pyramid multi-feature extraction network are different feature forms obtained by roughly extracting the same image. The input feature 1 is the smallest in size, the feature information of the contained example target is the most abundant, and the abundance degrees of the input feature 2, the input feature 3 and the input feature 4 are sequentially decreased; the size of the input feature 4 is the largest, the feature information of the included semantic logic is the most abundant, and the richness of the input feature 3, the input feature 2 and the input feature 1 is gradually decreased.
(1) The example feature pyramid has rich example target features
The example feature pyramid takes the input feature 1 as a first-layer example feature to obtain the most detailed example target feature information, then amplifies the feature and continuously transmits the feature downwards, so that the example target feature is continuously strengthened and becomes more remarkable, and then is stored and output through the merging module.
(2) Semantic feature pyramid has rich semantic features
The semantic feature pyramid uses the input feature 4 as a first layer of semantic features to obtain the most abundant semantic logic feature information, and then sends the features into the hole convolution layer for scaling conversion and continuous downward transmission. The hole convolution layer enhances the characteristics of position logic and the like in the image by expanding the receptive field of the convolution kernel, so that the input first-layer semantic characteristics and the second-layer and third-layer semantic characteristics in the downward transmission process of the semantic characteristic pyramid are further enhanced, the semantic characteristics are continuously enhanced and become more obvious, and then the semantic characteristics are stored and output by the merging module.
5. Advantageous effects
(1) Providing two types of features
The present invention can provide two different types of features for a multitasking model. Rich instance characteristic information can be provided for tasks such as object detection and the like which take an instance target as a key point to identify an object; specific semantic feature information can be provided for tasks with global semantic analysis as a key point, such as semantic segmentation and the like.
(2) Adapted to multi-tasking models
Each task in the multitask network model often has a specific achievement goal, and the requirements of different tasks on features are greatly different according to different purposes. The two types of features provided by the invention can meet different requirements of the multi-thread task on the features.
(3) Model suitable for panorama segmentation
As a network model with multi-task integration, panorama segmentation needs to realize two different task targets, namely semantic segmentation on a panorama and instance segmentation on an instance target in the panorama. The two types of characteristics provided by the invention can well meet the requirements of the panoramic segmentation model on the characteristic information. The semantic features which are required by the semantic segmentation task and can provide position logic information for segmentation can be provided by a semantic feature pyramid in the invention; example target features needed by the example segmentation task to provide example target detail information for segmentation can be provided by the example feature pyramid in the present invention. The double-pyramid multi-element feature extraction network provides rich and comprehensive image features for the panoramic segmentation model, and can greatly improve the panoramic segmentation precision.
(4) Suitable for unmanned driving technology
The invention is a computer vision environment perception technology, is suitable for the field of unmanned driving, can extract example target information of pedestrians, vehicles, buildings and the like in the driving environment and information of semantic positions of the whole driving environment, provides comprehensive characteristic information for a network model, and provides important safety guarantee for normal driving.
(5) Be suitable for public transport monitored control system
The method effectively identifies the pedestrians, the vehicles and the road environment, meets the requirements of the road traffic scene, and provides an auxiliary means for safe driving for drivers; by means of the precision and the speed of the invention, the characteristic information can be effectively extracted aiming at the illegal vehicles, pedestrians who do not guard on traffic rules and accidents in traffic environment, thereby providing favorable conditions for the next identification work and improving the working efficiency of the public monitoring system.
The logic schematic of the method is shown in fig. 1, and the specific implementation steps of the algorithm are as follows:
step 1: reading a data set image, roughly extracting features through an arbitrary feature network, and obtaining a three-dimensional matrix with the size of [256 × 25 × 38] as an input feature 1, a three-dimensional matrix with the size of [256 × 50 × 76] as an input feature 2, a three-dimensional matrix with the size of [256 × 100 × 152] as an input feature 3, and a three-dimensional matrix with the size of [256 × 200 × 304] as an input feature 4;
step 2: transmitting the input features 1, the input features 2, the input features 3 and the input features 4 obtained in the step 1 to an example feature pyramid to obtain an example target feature matrix with the size [256 × 25 × 38 ];
and 3, step 3: inputting the example target feature matrix in the step 2 into a candidate network of the area, and then generating a structure through full connection and a mask to obtain a segmentation result of the example target in the panorama.
And 4, step 4: transmitting the input features 1, 2, 3 and 4 obtained in the step 1 to a semantic feature pyramid to obtain a semantic feature matrix with the size [256 × 25 × 38 ];
and 5, step 5: and (4) inputting the semantic feature matrix in the step (4) into a full convolution structure to obtain a panoramic semantic segmentation result.
And 6, step 6: and merging and fusing the segmentation result of the instance target in the step 3 and the semantic segmentation result in the step 4 through a panoramic fusion structure to generate a panoramic segmentation result.
Example 1:
the implementation example is to input the outdoor activity scene into the network model and perform panoramic segmentation on all objects in the outdoor scene. The outdoor scene panorama segmentation result is shown in fig. 4.
Example 2:
in the embodiment, an indoor life scene is input into a network model, and all objects in the indoor scene are subjected to panoramic segmentation. The indoor scene panorama segmentation result is shown in fig. 5.
Example 3:
the implementation example is that a road traffic scene is input into a network model, and example targets such as pedestrians and vehicles and non-example targets such as roads and sky in the traffic scene are subjected to panoramic segmentation. The traffic scene panorama segmentation result is shown in fig. 6.
Claims (6)
1. A double-pyramid multi-feature extraction network of an image is characterized by comprising four input features, an example feature pyramid, a semantic feature pyramid and two output features, wherein the two output features comprise an example feature pyramid output feature and a semantic feature pyramid output feature.
2. The double-pyramid multivariate feature extraction network of images of claim 1, wherein,
the example feature pyramid consists of four layers of example features, three up-sampling modules, three addition fusion modules, four identical standard convolution layers and one merging module;
the example feature pyramid constructs four layers of example features along a top-down path:
inputting a fourth layer of example features of which the features 1 form an example feature pyramid, then, one path of the fourth layer of example features enters an up-sampling module for size amplification, and the other path of the fourth layer of example features enters a merging and fusing module through a standard convolution layer to wait for merging and fusing;
the input features 2 and the amplified and transformed fourth-layer example features jointly enter an addition fusion module for feature fusion, feature fusion results form third-layer example features of an example feature pyramid, and then one path of the third-layer example features enters an up-sampling module for size amplification; the other path enters a merging and fusing module through the standard convolution layer to wait for merging and fusing;
the input features 3 and the third layer of example features subjected to the up-sampling amplification transformation jointly enter an addition fusion module for feature fusion, feature fusion results form example feature pyramid second layer of example features, and then one path of the second layer of example features enters an up-sampling module for size amplification; the other path enters a merging and fusing module through the standard convolution layer to wait for merging and fusing;
the input features 4 and the second layer of example features subjected to the up-sampling amplification conversion enter an addition fusion module together for feature fusion, feature fusion results form first layer of example features of an example feature pyramid, and then the first layer of example features enter a merging fusion module through a standard convolution layer to wait for merging and fusion processing;
the merging module merges the four example feature information waiting for merging and merging processing, outputs a merging and merging result as the output feature of the example feature pyramid, and forms one of two output features of the double-pyramid multi-feature extraction network;
the semantic feature pyramid consists of four layers of semantic features, three cavity convolution layers, three addition fusion modules, four standard convolution layers and one merging module;
the semantic feature pyramid constructs four layers of semantic features along a bottom-up path:
inputting a first layer of semantic features of a semantic feature pyramid formed by the features 4, then, enabling one path of the first layer of semantic features to enter a cavity convolution layer for size reduction, enabling the other path of the first layer of semantic features to enter a merging module through a standard convolution layer, and waiting for merging and merging;
the input features 3 and the reduced and transformed first-layer semantic features jointly enter an addition fusion module for feature fusion, feature fusion results form a second-layer semantic feature of a semantic feature pyramid, one path of the second-layer semantic feature enters a cavity convolution layer for size reduction, and the other path of the second-layer semantic feature enters a merging module through a standard convolution layer to wait for merging and fusion processing;
the input features 2 and the reduced and transformed second-layer semantic features jointly enter an addition fusion module for feature fusion, feature fusion results form a third-layer semantic feature of a semantic feature pyramid, one path of the third-layer semantic feature enters a cavity convolution layer for size reduction, and the other path of the third-layer semantic feature enters a merging module through a standard convolution layer to wait for merging and fusion processing;
the input features 1 and the reduced and transformed third-layer semantic features jointly enter an addition fusion module for feature fusion, feature fusion results form a fourth-layer semantic feature of a semantic feature pyramid, and then the fourth-layer semantic feature enters a merging module through a standard convolution layer to wait for merging and fusion processing.
And the merging module merges the four semantic feature information to be merged and fused, outputs a merged and fused result as the output feature of the semantic feature pyramid, and forms one of the two output features of the double-pyramid multi-feature extraction network.
3. The double-pyramid multivariate feature extraction network of images of claim 1, wherein,
the 4 input features of the double pyramid multi-feature extraction network are four results of feature rough extraction on the same input image.
The input feature 1 in the 4 input features of the double pyramid multi-element feature extraction network is a three-dimensional matrix with the size [256 × 25 × 38 ]; input feature 2 is a three-dimensional matrix with size [256 x 50 x 76 ]; input feature 3 is a three-dimensional matrix with dimensions [256 x 100 x 152 ]; the input features 4 are three-dimensional matrices with dimensions [256 x 200 x 304 ].
The upsampling module in the example feature pyramid expands the feature size input to the module by a factor of two.
Example feature pyramid output features are three-dimensional matrices with dimensions of [256 × 25 × 38 ].
The void convolutional layer in the semantic feature pyramid reduces the feature size input into the convolutional layer by a factor of two.
The semantic feature pyramid output features are three-dimensional matrices with dimensions [256 × 200 × 304 ].
4. An image segmentation method comprises the following characteristic steps:
step 1: reading a data set image, roughly extracting features, and obtaining a three-dimensional matrix with the size of [256 × 25 × 38] as an input feature 1, a three-dimensional matrix with the size of [256 × 50 × 76] as an input feature 2, a three-dimensional matrix with the size of [256 × 100 × 152] as an input feature 3, and a three-dimensional matrix with the size of [256 × 200 × 304] as an input feature 4;
step 2: transmitting the input features 1, the input features 2, the input features 3 and the input features 4 obtained in the step 1 to an example feature pyramid to obtain an example target feature matrix with the size [256 × 25 × 38 ];
and 3, step 3: inputting the example target feature matrix in the step 2 into a regional candidate network, and then generating a structure through full connection and a mask to obtain a segmentation result of the example target in the panorama;
and 4, step 4: transmitting the input features 1, 2, 3 and 4 obtained in the step 1 to a semantic feature pyramid to obtain a semantic feature matrix with the size [256 × 25 × 38 ];
and 5, step 5: inputting the semantic feature matrix in the step 4 into a full convolution structure to obtain a panoramic semantic segmentation result;
and 6, step 6: and merging and fusing the segmentation result of the instance target in the step 3 and the semantic segmentation result in the step 4 through a panoramic fusion structure to generate a panoramic segmentation result.
5. A computer system comprising a processor and a memory, the processor executing code in the memory to implement the method of claim 4.
6. A computer storage medium storing a computer program for execution by hardware to implement the method of claim 4.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110747532.1A CN113537004B (en) | 2021-07-01 | 2021-07-01 | Image double pyramid multi-element feature extraction network, image segmentation method, system and medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110747532.1A CN113537004B (en) | 2021-07-01 | 2021-07-01 | Image double pyramid multi-element feature extraction network, image segmentation method, system and medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113537004A true CN113537004A (en) | 2021-10-22 |
CN113537004B CN113537004B (en) | 2023-09-01 |
Family
ID=78097593
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110747532.1A Active CN113537004B (en) | 2021-07-01 | 2021-07-01 | Image double pyramid multi-element feature extraction network, image segmentation method, system and medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113537004B (en) |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109325534A (en) * | 2018-09-22 | 2019-02-12 | 天津大学 | A kind of semantic segmentation method based on two-way multi-Scale Pyramid |
US20190057507A1 (en) * | 2017-08-18 | 2019-02-21 | Samsung Electronics Co., Ltd. | System and method for semantic segmentation of images |
CN110084274A (en) * | 2019-03-29 | 2019-08-02 | 南京邮电大学 | Realtime graphic semantic segmentation method and system, readable storage medium storing program for executing and terminal |
CN111524150A (en) * | 2020-07-03 | 2020-08-11 | 支付宝(杭州)信息技术有限公司 | Image processing method and device |
US20200334819A1 (en) * | 2018-09-30 | 2020-10-22 | Boe Technology Group Co., Ltd. | Image segmentation apparatus, method and relevant computing device |
CN112232232A (en) * | 2020-10-20 | 2021-01-15 | 城云科技(中国)有限公司 | Target detection method |
-
2021
- 2021-07-01 CN CN202110747532.1A patent/CN113537004B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20190057507A1 (en) * | 2017-08-18 | 2019-02-21 | Samsung Electronics Co., Ltd. | System and method for semantic segmentation of images |
CN109325534A (en) * | 2018-09-22 | 2019-02-12 | 天津大学 | A kind of semantic segmentation method based on two-way multi-Scale Pyramid |
US20200334819A1 (en) * | 2018-09-30 | 2020-10-22 | Boe Technology Group Co., Ltd. | Image segmentation apparatus, method and relevant computing device |
CN110084274A (en) * | 2019-03-29 | 2019-08-02 | 南京邮电大学 | Realtime graphic semantic segmentation method and system, readable storage medium storing program for executing and terminal |
CN111524150A (en) * | 2020-07-03 | 2020-08-11 | 支付宝(杭州)信息技术有限公司 | Image processing method and device |
CN112232232A (en) * | 2020-10-20 | 2021-01-15 | 城云科技(中国)有限公司 | Target detection method |
Non-Patent Citations (1)
Title |
---|
姜世浩;齐苏敏;王来花;贾惠;: "基于Mask R-CNN和多特征融合的实例分割", 计算机技术与发展, no. 09 * |
Also Published As
Publication number | Publication date |
---|---|
CN113537004B (en) | 2023-09-01 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Min et al. | Traffic sign recognition based on semantic scene understanding and structural traffic sign location | |
He et al. | Rail transit obstacle detection based on improved CNN | |
WO2023030182A1 (en) | Image generation method and apparatus | |
CN115439483B (en) | High-quality welding seam and welding seam defect identification system, method and storage medium | |
CN112101153A (en) | Remote sensing target detection method based on receptive field module and multiple characteristic pyramid | |
CN115861619A (en) | Airborne LiDAR (light detection and ranging) urban point cloud semantic segmentation method and system of recursive residual double-attention kernel point convolution network | |
CN113128476A (en) | Low-power consumption real-time helmet detection method based on computer vision target detection | |
CN115588126A (en) | GAM, CARAFE and SnIoU fused vehicle target detection method | |
Song et al. | Msfanet: A light weight object detector based on context aggregation and attention mechanism for autonomous mining truck | |
Pham | Semantic road segmentation using deep learning | |
CN115019274A (en) | Pavement disease identification method integrating tracking and retrieval algorithm | |
Yuan et al. | Multi-level object detection by multi-sensor perception of traffic scenes | |
CN113537004A (en) | Double-pyramid multivariate feature extraction network of image, image segmentation method, system and medium | |
Xiang et al. | A real-time vehicle traffic light detection algorithm based on modified YOLOv3 | |
Feng et al. | Embedded YOLO: A real-time object detector for small intelligent trajectory cars | |
CN116229410A (en) | Lightweight neural network road scene detection method integrating multidimensional information pooling | |
Wei et al. | An Efficient Point Cloud-based 3D Single Stage Object Detector | |
Valiente et al. | Robust perception and visual understanding of traffic signs in the wild | |
Lai et al. | Aircraft Target Detection Based on Attention Mechanism and Faster R-CNN | |
CN117152646B (en) | Unmanned electric power inspection AI light-weight large model method and system | |
Cheng | Global-feature enhanced network for fast semantic segmentation | |
Wang et al. | YOLOv5-Based Dense Small Target Detection Algorithm for Aerial Images Using DIOU-NMS | |
Vaidya et al. | Detecting Buildings from Remote Sensing Imagery: Unleashing the Power of YOLOv5 and YOLOv8 | |
CN116503838A (en) | Traffic sign detection algorithm based on feature multi-scale fusion | |
CN117292335A (en) | Dangerous chemical vehicle detection method based on improved YOLOv5 algorithm |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |