WO2019154201A1 - 实例分割方法和装置、电子设备、程序和介质 - Google Patents

实例分割方法和装置、电子设备、程序和介质 Download PDF

Info

Publication number
WO2019154201A1
WO2019154201A1 PCT/CN2019/073819 CN2019073819W WO2019154201A1 WO 2019154201 A1 WO2019154201 A1 WO 2019154201A1 CN 2019073819 W CN2019073819 W CN 2019073819W WO 2019154201 A1 WO2019154201 A1 WO 2019154201A1
Authority
WO
WIPO (PCT)
Prior art keywords
feature
features
level
instance
network
Prior art date
Application number
PCT/CN2019/073819
Other languages
English (en)
French (fr)
Inventor
刘枢
亓鲁
秦海芳
石建萍
贾佳亚
Original Assignee
北京市商汤科技开发有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from CN201810137044.7A external-priority patent/CN108460411B/zh
Priority claimed from CN201810136371.0A external-priority patent/CN108335305B/zh
Application filed by 北京市商汤科技开发有限公司 filed Critical 北京市商汤科技开发有限公司
Priority to KR1020207016941A priority Critical patent/KR102438095B1/ko
Priority to JP2020533099A priority patent/JP7032536B2/ja
Priority to SG11201913332WA priority patent/SG11201913332WA/en
Publication of WO2019154201A1 publication Critical patent/WO2019154201A1/zh
Priority to US16/729,423 priority patent/US11270158B2/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/7715Feature extraction, e.g. by transforming the feature space, e.g. multi-dimensional scaling [MDS]; Mappings, e.g. subspace methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/231Hierarchical techniques, i.e. dividing or merging pattern sets so as to obtain a dendrogram
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2413Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]

Definitions

  • the present disclosure relates to computer vision technology, and more particularly to an example segmentation method and apparatus, electronic device, program, and medium.
  • Instance segmentation is a very important direction in the field of computer vision. This task combines the features of semantic segmentation and object detection. For each object in the input image, it generates an independent pixel-level mask for each object and predicts it. Its corresponding category. Example segmentation has a very broad application in the fields of driverless, home robots and so on.
  • Embodiments of the present disclosure provide an example partitioning scheme.
  • an example segmentation method including:
  • an example segmentation apparatus including:
  • a neural network for extracting features from an image and outputting at least two different levels of features
  • an extracting module configured to extract, from the features of the at least two different levels, region features corresponding to at least one instance candidate region in the image
  • a first fusion module configured to perform a fusion of the region features corresponding to the candidate region of the same instance, to obtain a first fusion feature of each candidate region of the instance
  • a segmentation module configured to perform segmentation based on each first fusion feature, to obtain an instance segmentation result of the corresponding instance candidate region and/or an instance segmentation result of the image.
  • an electronic device including:
  • a memory for storing a computer program
  • a processor configured to execute a computer program stored in the memory, and when the computer program is executed, implements the method of any one of the embodiments of the present disclosure.
  • a computer readable storage medium having stored thereon a computer program, when executed by a processor, implements the method of any one of the embodiments of the present disclosure.
  • a computer program comprising computer instructions that, when executed in a processor of a device, implement the method of any of the embodiments of the present disclosure.
  • the image is extracted by the neural network, and at least two different levels of features are output; and the images are extracted from the features of the two different levels.
  • At least one region candidate feature corresponding to the region candidate, and the region feature corresponding to the same instance candidate region is fused to obtain a first fusion feature of each instance candidate region; and performing segmentation based on each first fusion feature to obtain a corresponding instance candidate region
  • Example segmentation results and/or instance segmentation results of the image are designed a technical solution for instance segmentation based on a deep learning framework. Since deep learning has powerful modeling capabilities, it helps to obtain better instance segmentation results.
  • instance segmentation is performed on instance candidate regions.
  • the accuracy of instance segmentation can be improved, the computational complexity and complexity required for instance segmentation can be reduced, and the segmentation efficiency of the instance can be improved.
  • the candidate region corresponding to the candidate can be extracted from the features of at least two different levels. The regional features are merged, and the segmentation is performed based on the obtained fusion features, so that each instance candidate region can obtain more different levels of information at the same time, since the information extracted from the features of different levels is at different semantic levels. Thereby, the context information can be utilized to improve the accuracy of the instance segmentation result of each instance candidate region.
  • FIG. 1 is a flow chart of an embodiment of an example segmentation method of the present disclosure.
  • FIG. 2 is a schematic diagram of a feature fusion in an embodiment of the present disclosure.
  • FIG. 3 is a flow chart of another embodiment of an example segmentation method of the present disclosure.
  • FIG. 4 is a schematic diagram of a network structure for performing two-way mask prediction in an embodiment of the present disclosure.
  • FIG. 5 is a flow chart of an application embodiment of an example segmentation method of the present disclosure.
  • FIG. 6 is a schematic diagram of a process of the application embodiment shown in FIG. 5.
  • FIG. 7 is a schematic structural diagram of an embodiment of an example dividing device according to the present disclosure.
  • FIG. 8 is a schematic structural diagram of another embodiment of an example dividing device according to the present disclosure.
  • FIG. 9 is a schematic structural diagram of an embodiment of a splitting module according to an embodiment of the present disclosure.
  • FIG. 10 is a schematic structural diagram of an embodiment of an electronic device according to an embodiment of the present disclosure.
  • a plurality may mean two or more, and “at least one” may mean one, two or more.
  • association relationship describing an association object, indicating that there may be three relationships, for example, A and/or B, which may indicate that A exists separately, and A and B exist at the same time. There are three cases of B alone.
  • character "/" in the present disclosure generally indicates that the contextual object is an "or" relationship.
  • Embodiments of the present disclosure may be applied to electronic devices such as terminal devices, computer systems, servers, etc., which may operate with numerous other general purpose or special purpose computing system environments or configurations.
  • Examples of well-known terminal devices, computing systems, environments, and/or configurations suitable for use with electronic devices such as terminal devices, computer systems, servers, and the like include, but are not limited to, personal computer systems, server computer systems, thin clients, thick clients Machines, handheld or laptop devices, microprocessor-based systems, set-top boxes, programmable consumer electronics, networked personal computers, small computer systems, mainframe computer systems, and distributed cloud computing technology environments including any of the above, and the like.
  • Electronic devices such as terminal devices, computer systems, servers, etc., can be described in the general context of computer system executable instructions (such as program modules) being executed by a computer system.
  • program modules may include routines, programs, target programs, components, logic, data structures, and the like that perform particular tasks or implement particular abstract data types.
  • the computer system/server can be implemented in a distributed cloud computing environment where tasks are performed by remote processing devices that are linked through a communication network.
  • program modules may be located on a local or remote computing system storage medium including storage devices.
  • FIG. 1 is a flow chart of an embodiment of an example segmentation method of the present disclosure. As shown in FIG. 1, the example segmentation method of this embodiment includes:
  • Expressions of features in various embodiments of the present disclosure may include, for example, but are not limited to, feature maps, feature vectors, or feature matrices, and the like.
  • the at least two different levels refer to two or more network layers in the neural network that are at different depths of the neural network.
  • the image may include, for example but is not limited to: a still image, a frame image in a video, and the like.
  • the operation 102 may be performed by a processor invoking a corresponding instruction stored in a memory, or may be performed by a neural network operated by the processor.
  • Examples may include, for example, but are not limited to, a particular object, such as a particular person, a particular item, and the like.
  • One or more instance candidate regions may be obtained by detecting the image through a neural network.
  • the instance candidate area represents an area in the image where the above example may occur.
  • the operation 104 may be performed by a processor invoking a corresponding instruction stored in a memory or by an extraction module executed by the processor.
  • the regional features corresponding to the candidate regions of the same instance are respectively fused, and the first fusion feature of each candidate region of the instance is obtained.
  • the manner of merging a plurality of regional features may be, for example, summing each pixel based on a plurality of regional features, taking a maximum value, averaging, and the like.
  • the operation 106 may be performed by a processor invoking a corresponding instruction stored in a memory, or may be performed by a first fusion module executed by the processor.
  • the instance segmentation result of the instance candidate region may include: the pixel of the instance candidate region belongs to a certain instance and a category to which the instance belongs, for example, a pixel belonging to a certain boy in the instance candidate region and the boy belongs to The category is people.
  • the operation 108 may be performed by a processor invoking a corresponding instruction stored in a memory, or may be performed by a segmentation module executed by the processor.
  • the image is extracted by the neural network, and at least two different levels of features are outputted; and the regional features corresponding to at least one instance candidate region in the image are extracted from the features of the two different levels. And merging the regional features corresponding to the candidate regions of the same instance to obtain the first fusion feature of each candidate region; performing segmentation based on each first fusion feature to obtain an instance segmentation result and/or an instance of the image of the corresponding candidate region Segment the result.
  • the embodiment of the present disclosure designs a deep learning-based framework to solve the problem of instance segmentation. Since deep learning has powerful modeling capabilities, it helps to obtain better instance segmentation results. In addition, instance segmentation is performed for instance candidate regions.
  • the accuracy of instance segmentation can be improved, the computational complexity and complexity required for instance segmentation can be reduced, and the segmentation efficiency of the instance can be improved.
  • the candidate region corresponding to the instance can be extracted from the features of at least two different levels.
  • the regional features are merged, and the segmentation is performed based on the obtained fusion features, so that each instance candidate region can obtain more different levels of information at the same time, since the information extracted from the features of different levels is at different semantic levels, thereby
  • the context information can be utilized to improve the accuracy of the instance segmentation results for each instance candidate region.
  • the operation 102 extracts features of the image through the neural network, and outputs at least two different levels of features, which may include: performing feature extraction on the image through the neural network, through the neural network At least two network layers of different network depths output features of the at least two different levels described above.
  • the neural network includes two or more network layers having different network depths.
  • the network layer used for feature extraction may be referred to as a feature layer.
  • Feature extraction is performed on the input image through the first network layer, and the extracted features are input to the second network layer.
  • each network layer performs feature extraction on the input features in turn, and extracts The incoming feature is input to the next network layer for feature extraction.
  • the network depth of each network layer in the neural network is shallow to deep according to the order of input and output or the order of feature extraction, and the layers of the feature extraction output of each layer are sequentially low to high and the resolution is high to low.
  • the network layer may generally include: at least one convolution layer for feature extraction, and an upsampling layer that upsamples features extracted by the convolution layer (eg, feature map) by upsampling the feature
  • an upsampling layer that upsamples features extracted by the convolution layer (eg, feature map) by upsampling the feature
  • the size of features (such as feature maps) extracted by the convolutional layer can be reduced.
  • the merging of the region features corresponding to the candidate region of the same instance in the operation 106 may include: performing pixel-level fusion on the plurality of region features corresponding to the candidate region of the same instance. .
  • pixel-level fusion is performed on multiple regional features corresponding to the same instance candidate region, which may be:
  • the plurality of region features corresponding to the same instance candidate region are respectively subjected to an element-wise max based on each pixel, that is, a feature of each pixel location is a maximum value among a plurality of region features corresponding to the same instance candidate region;
  • the plurality of regional features corresponding to the same instance candidate region are respectively averaged based on the pixels, that is, the features of the respective pixel locations are averaged among the plurality of regional features corresponding to the same instance candidate region;
  • the plurality of regional features corresponding to the same instance candidate region are respectively summed based on the respective pixels, that is, the features of the respective pixel locations are summed among the plurality of regional features corresponding to the same instance candidate region.
  • the plurality of regional features corresponding to the same instance candidate region are based on the maximum value of each pixel, and the other manner is In this case, the features of the instance candidate region are made more obvious, so that the instance segmentation is more accurate to improve the accuracy of the instance segmentation result.
  • the same layer may be adjusted through a network layer, such as a full convolution layer or a fully connected layer.
  • the area feature corresponding to the instance candidate area for example, adjusting the dimension of each area feature corresponding to the same instance candidate area that is involved in the fusion, and adapting the characteristics of each area corresponding to the same instance candidate area participating in the fusion, so that the same instance candidate participating in the fusion is matched.
  • the regional features corresponding to the region are more suitable for fusion, so as to obtain more accurate fusion features.
  • the method may further include: performing the at least one feature of the at least two different levels to be merged at least once to obtain the second fusion feature.
  • the one-time reversal fusion includes: based on the depth direction of the network of the neural network, the features of different levels output by the network layers respectively with different network depths are sequentially fused according to two different hierarchical directions.
  • the operation 104 may include: extracting, from the second fusion feature, a region feature corresponding to the at least one instance candidate region.
  • the two different hierarchical directions include: a direction from a high level feature to a low level feature, and a direction from a low level feature to a high level feature. Therefore, the context information is better utilized for feature fusion, thereby improving the instance segmentation result of each instance candidate region.
  • the foregoing steps according to two different hierarchical directions may include: sequentially following the direction from the high-level feature to the low-level feature (from the characteristics of the network layer with deep network depth in the neural network to The direction of the characteristics of the network layer output with shallow network depth) and the direction from the low-level features to the high-level features (from the characteristics of the network layer with shallow network depth in the neural network to the characteristics of the network layer with deep network depth) Or the direction of the feature from the low level feature to the high level feature and the direction from the high level feature to the low level feature.
  • features of different levels outputted by network layers of different network depths are sequentially along a direction from a high-level feature to a low-level feature and from a low-level feature to a high-level feature.
  • Directional integration including:
  • the depth of the network along the neural network is from deep to shallow, and the higher-level features outputted by the network layer with deeper network depth are up-sampled in the neural network, and then compared with the output of the network layer with shallow network depth.
  • the low-level features are fused, for example, the higher-level features are up-sampled and added to the lower-level features to obtain a third fused feature.
  • the feature of the higher layer may include: a feature outputted by a network layer having a deep network depth in the neural network, or a feature obtained by performing at least one feature extraction on a feature of the network layer output deep in the network depth.
  • the feature of the highest level may be the feature of the highest level among the features of the at least two different levels, or may be the feature obtained by one or more feature extractions of the feature of the highest level,
  • the triple fusion feature may include the features of the highest level described above and the fusion features obtained by each fusion;
  • the down-level fusion features are downsampled in the direction from the low-level features to the high-level features, and then merged with the higher-level fusion features in the third fusion feature.
  • the lowest level fusion feature may be the lowest level fusion feature of the third fusion feature, or may be one or more times of the lowest fusion feature of the third fusion feature.
  • the plurality of fusion features obtained by performing feature fusion from the low-level feature to the high-level feature are the second fusion feature;
  • Two different levels of features can be performed in two or more fold-back fusions, and multiple operations along the direction from the high-level features to the low-level features and from the low-level features to the high-level features can be performed.
  • the fusion feature is the second fusion feature.
  • the neural network may be sequentially The higher-level features of the deeper network layer (for example, the 80th network layer along the input and output direction of the neural network) are upsampled, and the adjacent network layers with shallower network depth (for example, along the The lower level features of the output of the 79th network layer of the input and output directions of the neural network are fused.
  • the higher-level features of the deeper network layer for example, the 80th network layer along the input and output direction of the neural network
  • the adjacent network layers with shallower network depth for example, along the The lower level features of the output of the 79th network layer of the input and output directions of the neural network are fused.
  • the higher-level features of the neural network that are output through the deep network layer (for example, the 80th network layer along the input/output direction of the neural network) in the neural network may be sequentially sampled and compared with the network depth.
  • the deep network layer is not adjacent, and the network layer with shallow network depth (for example, the 50th network layer along the input and output direction of the neural network) outputs the lower-level features, that is, the fusion of cross-level features .
  • the lower-level fusion features for example, P 2 , where "2" represents features level
  • the third fusion fusion features characteristic of the higher level e.g. P 3, where "3" indicates the character level
  • the fused features of the third fused features that are not adjacent to the feature tier eg, P 4 , where “4” represents the feature tier
  • Convergence of cross-level fusion features e.g. Convergence of cross-level fusion features.
  • FIG. 2 is a schematic diagram of a feature fusion in an embodiment of the present disclosure. As shown in FIG. 2, a schematic diagram is shown in which a lower-level fusion feature N i is downsampled and merged with an adjacent, higher-level feature P i+1 to obtain a corresponding fusion feature N i+1 . . Where i is an integer with a value greater than zero.
  • the high-level low-resolution features and low-level features are gradually Feature fusion of resolution, a batch of new features are obtained, and then the lower-level fusion features are downsampled and phased in order from bottom-up order (ie, low-level features to high-level features).
  • Adjacent, higher-level feature fusion, gradually merging low-level high-resolution features and high-level low-resolution features to obtain another batch of new features for instance segmentation this embodiment passes a bottom-up
  • the information channel can help lower-level information to be easily transmitted to the high-level network (that is, the network layer with deep network depth), reduce the loss of information transmission, and enable information to be transmitted more smoothly within the neural network.
  • Some details are sensitive and can provide information that is very useful for positioning and segmentation, thereby improving the results of instance segmentation; Network (namely: network layer network depth deeper) easier access to the full level information, thus further enhancing instance segmentation result.
  • features of different levels outputted by network layers of different network depths are sequentially followed from low-level features to high-level features and high-level features to low-level features.
  • the direction of integration including:
  • the depth of the network along the neural network is from shallow to deep, and the lower-level features of the network layer outputted by the network depth are downsampled in turn, and the output of the network layer deeper than the network depth is compared.
  • the high-level features are merged to obtain the fourth fusion feature.
  • the lower-level feature may include, for example, a feature outputted by a network layer having a shallow network depth in a neural network, or a feature obtained by extracting at least one feature from a network layer output feature having a shallow network depth.
  • the feature of the lowest level may be the feature of the lowest level of the features of the at least two different levels, or may be the feature obtained by one or more feature extractions of the feature of the lowest level,
  • the four fusion features may include the features of the lowest level described above and the fusion features obtained by each fusion;
  • the higher-level fusion feature is sequentially sampled, and then merged with the lower-level fusion feature of the fourth fusion feature.
  • the highest level fusion feature may be the highest level fusion feature of the fourth fusion feature, or may be one or more times of the highest fusion feature of the fourth fusion feature.
  • a fusion of the features is obtained by merging the features from the low-level features to the high-level features and the high-level features to the low-level features.
  • the feature is the second fusion feature; if the features of the at least two different levels are folded back or merged twice or more, the direction from the low-level feature to the high-level feature and the high-level feature to the low-level feature may be performed multiple times.
  • the direction of the feature is fused to perform the operation of a batch of fused features obtained by feature fusion, and finally a batch of fused features is the second fused feature.
  • the lower-level features of the network layer outputted by the shallower network layer in the neural network are downsampled, and then merged with the higher-level features of the network layer output deeper than the network depth.
  • the lower layer features outputted by the network layer having a shallower network depth may be downsampled, and the network layer having a deeper network depth adjacent to the network layer having a shallower network depth may be compared.
  • High-level features are fused.
  • the network layer output of the neural network may be down-sampled by a lower-layer feature of the network layer with a shallower network depth, and the network layer having a deeper network depth is not adjacent to the network layer having a shallower network depth.
  • the higher-level features are fused, that is, the fusion of cross-level features.
  • the higher-level fusion feature may be upsampled and adjacent to the fourth fusion.
  • the fusion features of the lower levels of the feature are fused.
  • the higher-level fusion feature may be up-sampled and merged with the non-adjacent, lower-level fusion feature of the fourth fusion feature, that is, the fusion of the cross-level fusion feature is performed.
  • the instance segmentation is performed based on each of the first fused features, and the instance segmentation result of the corresponding instance candidate region and/or the instance segmentation result of the image is obtained, which may include:
  • the instance candidate region corresponding to the first fusion feature is segmented by the instance according to a first fusion feature, and the instance segmentation result of the corresponding instance candidate region is obtained, wherein the first fusion feature is not limited to the specific first fusion feature.
  • the first fusion feature of any instance candidate region may be; and/or, the image is segmented based on each first fusion feature to obtain an instance segmentation result of the image.
  • performing instance segmentation based on each first fusion feature, and obtaining an instance segmentation result of the image may include: first, each based on each first fusion feature The instance candidate regions corresponding to the fused features are segmented by an instance to obtain an instance segmentation result of each instance candidate region; and the instance segmentation result of the image is obtained based on the instance segmentation result of each instance candidate region.
  • FIG. 3 is a flow chart of another embodiment of an example segmentation method of the present disclosure. As shown in FIG. 3, the example segmentation method of this embodiment includes:
  • the operation 302 may be performed by a processor invoking a corresponding instruction stored in a memory, or may be performed by a neural network operated by the processor.
  • the depth of the network along the neural network is from deep to shallow, and the higher-level features outputted by the network layer with deeper network depth are sequentially sampled in the neural network, and the network layer output is shallower than the network depth.
  • the lower level features are fused to obtain a third fused feature.
  • the above-mentioned higher-level features may include: a feature outputted by a network layer having a deep network depth in the neural network, or a feature obtained by performing at least one feature extraction on a feature of the network layer output deep in the network depth.
  • the feature of the highest level may be the feature of the highest level among the features of the at least two different levels, or may be the feature obtained by one or more feature extractions of the feature of the highest level
  • the triple fusion feature may include the highest level feature of the at least two different levels of features described above and the merge feature obtained by each of the operations 304 performing the blending operation.
  • the fusion features of the lower-level fusion features are sequentially sampled, and the fusion features of the higher-level fusion features are merged to obtain the second fusion feature.
  • the lowest level fusion feature may be the lowest level fusion feature of the third fusion feature, or may be the lowest level fusion feature of the third fusion feature once or a feature obtained by multiple feature extraction; a batch of fusion features obtained by feature fusion from a low-level feature to a high-level feature, including a lowest-level fusion feature of the third fusion feature and each of the operations 306 The fusion feature obtained by the fusion operation is performed.
  • This embodiment is described by taking this one-time fusion as an example. If the features of the at least two different levels are folded back or merged twice or more, multiple operations 304-306 can be performed, and the final batch of fusion features is For the second fusion feature.
  • the operations 304-306 may be performed by a processor invoking a corresponding instruction stored in a memory, or may be performed by a second fusion module executed by the processor.
  • a Region Proposal Network may be used to generate each instance candidate region for an image, and each instance candidate region may be mapped to each feature in the second fusion feature. Thereafter, for example, a region feature corresponding to each instance candidate region may be extracted from the second fusion feature by using, but not limited to, a region of interest (ROI) alignment (ROIAlign) method.
  • ROI region of interest
  • the operation 308 may be performed by a processor invoking a corresponding instruction stored in a memory or by an extraction module executed by the processor.
  • the operation 310 may be performed by a processor invoking a corresponding instruction stored in a memory, or may be performed by a first fusion module executed by the processor.
  • 312 Perform instance segmentation based on each first fusion feature, respectively, to obtain an instance segmentation result of the corresponding instance candidate region.
  • the operation 312 may be performed by a processor invoking a corresponding instruction stored in a memory or by a segmentation module executed by the processor.
  • the instance candidate region corresponding to the first fusion feature is segmented by an instance based on a first fusion feature, and the instance segmentation result of the corresponding instance candidate region is obtained, which may include :
  • the first fusion feature is a first fusion feature of any instance candidate region
  • an instance segmentation result of the instance object candidate region corresponding to the first fusion feature where the segmentation result includes: a pixel belonging to an instance in the current instance candidate region and the instance belongs to Category information.
  • the pixel class instance category prediction and the front background prediction are performed simultaneously, and the first fusion feature can be finely classified and multi-classified by the pixel level instance category prediction.
  • the background prediction can obtain better global information, and because the detailed information between the multi-instance categories is not required to be paid attention to, the prediction speed is improved, and the instance segmentation result of the instance object candidate region is obtained based on the above-described instance category prediction result and the previous background prediction result, The instance segmentation result of the instance candidate region or image can be improved.
  • performing pixel class instance category prediction based on the first fusion feature described above may include:
  • the first convolution network includes at least one full convolution layer
  • Pixel-level object class prediction is performed based on the features of the first convolutional network output by the first full convolutional layer.
  • pixel-level pre-background prediction based on a first fusion feature includes:
  • the pixels belonging to the foreground and/or the pixels belonging to the background among the instance candidate regions corresponding to the first fusion feature are predicted.
  • the background and prospects can be set according to needs.
  • the foreground may include all instance category corresponding portions
  • the background may include portions other than the corresponding portion of the instance category; or the background may include all instance category corresponding portions, and the foreground may include: portions other than the corresponding portions of all instance categories.
  • performing pixel level pre-background prediction based on a first fusion feature may include:
  • the second convolution network includes at least one full convolution layer
  • Pixel-level pre-background prediction is performed based on the features output by the second convolutional network described above by the fully connected layer.
  • the instance segmentation result of the instance object candidate region corresponding to the first fusion feature is obtained based on the foregoing example class prediction result and the previous background prediction result, including:
  • the object class prediction result of the instance candidate region corresponding to the first fusion feature and the previous background prediction result are subjected to pixel level addition processing to obtain an instance segmentation result of the instance object candidate region corresponding to the first first fusion feature.
  • the method further includes: converting the foregoing pre-background prediction result to the same as the dimension of the instance category prediction result.
  • Background prediction results For example, the pre-background prediction result is converted from a vector to a matrix that is consistent with the dimension of the object class prediction.
  • performing pixel-level addition processing on the object class prediction result of the example candidate region corresponding to the first fusion feature and the pre-background prediction result may include: performing an instance of the instance candidate region corresponding to the first fusion feature
  • the class prediction result and the pre-background prediction result obtained by the conversion are subjected to pixel-level addition processing.
  • the instance fusion is performed based on the first fusion feature of each instance candidate region, and the instance segmentation result of each instance candidate region is obtained, because the first fusion based on the instance candidate region is simultaneously
  • the feature performs pixel class instance class prediction and pre-background prediction.
  • This part of the scheme may be referred to as two-way mask prediction, as shown in FIG. 4, which is a schematic diagram of a network structure for performing two-mask mask prediction in the embodiment of the present disclosure.
  • the first branch includes: four full convolution layers (conv1-conv4), that is, the first convolutional network, and a deconvolution layer (deconv), that is, the first full convolution layer.
  • conv1-conv4 full convolution layers
  • deconv deconvolution layer
  • the other branch includes: a third full convolutional layer and a fourth full convolutional layer (conv3-conv4) from the first branch, and two full convolutional layers (conv4 - fc and conv5 - fc), ie
  • the second convolutional network; a fully connected layer (fc); and a reshape layer are used to convert the pre-background prediction result into a pre-background prediction result that is consistent with the dimension of the instance category prediction result.
  • the first branch performs pixel-level mask prediction for each potential instance category, while the fully-connected layer performs a mask prediction that is independent of the instance category (ie, pixel-level pre-background prediction). Finally, the mask predictions of the two branches are added to obtain the final instance segmentation result.
  • FIG. 5 is a flow chart of an application embodiment of an example segmentation method of the present disclosure.
  • FIG. 6 is a schematic diagram of a process of the application embodiment shown in FIG. 5. Referring to FIG. 5 and FIG. 6 simultaneously, the example segmentation method of the application embodiment includes:
  • Feature extraction is performed on the image by using a neural network, and four levels of features M 1 -M 4 are output through the network layers of four different network depths in the neural network.
  • the operation 502 may be performed by a processor invoking a corresponding instruction stored in a memory, or may be performed by a neural network operated by the processor.
  • the higher-level features M i+1 are sequentially sampled and then The lower level features M i are fused to obtain the first batch of fused features P 2 -P 5 .
  • the first fusion characteristics and features involved in the fusion, the fusion characteristics of P 5 is the highest level of the four different levels of characteristic features of the highest-level M 4 or M 4 obtained by extracting feature of this feature the full convolution layer
  • the first fusion feature includes a fusion feature of the highest level among the features of the above four different levels and a fusion feature P 2 -P 5 obtained by each fusion.
  • the lower-level fusion feature P k is sequentially sampled and phased according to the order from the low-level feature P 2 to the high-level feature P 5 (ie, bottom-up).
  • the higher-level features P k+1 of the neighbors are fused to obtain a second batch of fused features N 2 -N 5 .
  • the value of k is an integer in 2-4.
  • the lowest level fusion feature N 2 is the lowest level fusion feature P 2 of the first fusion feature or the fusion feature P 2 is characterized by the full convolution layer.
  • the second batch of the fusion feature includes a feature corresponding to the lowest level feature P 2 of the first fusion feature and a fusion feature obtained by each fusion, wherein the feature corresponding to the lowest level feature of the first fusion feature is wherein the first fusion lowest level P 2 fusion feature or features by convolution integration of the layer P 2 obtained feature extraction feature.
  • the application embodiment is described by taking a foldback fusion of the features M 1 -M 4 of the above four levels as an example. Therefore, the second batch of fusion features obtained by operation 506 is the second of the above embodiments of the present disclosure. Fusion features.
  • the operations 502-504 may be performed by a processor invoking a corresponding instruction stored in a memory, or may be performed by a second fusion module executed by the processor.
  • the area recommendation network may be used to generate at least one instance candidate region for the image, and each instance candidate region is mapped to each feature in the second fusion feature, and then, for example, With the method of, but not limited to, the alignment of the regions of interest, the region features corresponding to the candidate regions of the same instance are respectively extracted from the second fusion features.
  • the operation 508 may be performed by a processor invoking a corresponding instruction stored in a memory or by an extraction module executed by the processor.
  • the operation 510 may be performed by a processor invoking a corresponding instruction stored in a memory, or may be performed by a first fusion module executed by the processor.
  • the instance segmentation result includes an object box or location of each instance and an instance class to which the instance belongs.
  • the operation 512 may be performed by a processor invoking a corresponding instruction stored in a memory, or may be performed by a first segmentation unit that is executed by the processor.
  • the operation 514 may be performed by a processor invoking a corresponding instruction stored in the memory, or may be performed by a first segmentation unit or a first prediction subunit and a second prediction in the first segmentation unit being executed by the processor. Subunit execution.
  • the segmentation result of the instance includes: a pixel belonging to an instance in the current instance candidate region and an instance category to which the instance belongs, wherein the instance category may be: a background or an instance category.
  • the operation 516 may be performed by a processor invoking a corresponding instruction stored in a memory, or may be performed by a first segmentation unit executed by the processor or an acquisition subunit in the first segmentation unit.
  • At least part of the image may be semantically segmented based on the first fusion feature to obtain a semantic segmentation result.
  • At least part of the image may be semantically segmented based on the second fusion feature to obtain a semantic segmentation result.
  • the semantic segmentation result may include, for example, a category to which each pixel in at least a part of the image belongs.
  • At least a partial region of the image may be an entire region of the image or a local region (for example, a candidate region), that is, the entire image may be semantically segmented to obtain a semantic segmentation result of the image; Semantic segmentation (for example, candidate regions) results in semantic segmentation results of local regions.
  • the candidate area may be, for example, an example candidate area in each of the above embodiments, or may be a candidate area generated in other manners.
  • the above-described semantic segmentation of at least a portion of the image may be performed by a processor invoking a corresponding instruction stored in the memory, or may be performed by a segmentation module or a segmentation module executed by the processor.
  • semantic segmentation of at least a portion of an image is achieved.
  • semantic segmentation of at least part of the image based on the first fusion feature or the second fusion feature may use context information to improve the accuracy of the image semantic segmentation result.
  • the instance segmentation may be performed based on the second fusion feature to obtain an instance segmentation result of the corresponding instance candidate region and/or the The result of the segmentation of the image.
  • the method for performing segmentation based on the second fusion feature, obtaining an instance segmentation result of the corresponding instance candidate region, and/or implementing the segmentation result of the image may refer to the foregoing segmentation based on the first fusion feature to obtain a corresponding instance candidate region.
  • the embodiment of the example segmentation result and/or the example segmentation result of the image may be implemented in a similar manner, and the disclosure will not be repeated.
  • any of the example segmentation methods provided by the embodiments of the present disclosure may be performed by any suitable device having data processing capabilities, including but not limited to: a terminal device, a server, and the like.
  • any of the example segmentation methods provided by the embodiments of the present disclosure may be performed by a processor, such as the processor performing any one of the example segmentation methods mentioned in the embodiments of the present disclosure by calling corresponding instructions stored in the memory. This will not be repeated below.
  • the foregoing program may be stored in a computer readable storage medium, and the program is executed when executed.
  • the foregoing steps include the steps of the foregoing method embodiments; and the foregoing storage medium includes: a medium that can store program codes, such as a ROM, a RAM, a magnetic disk, or an optical disk.
  • FIG. 7 is a schematic structural diagram of an embodiment of an example dividing device according to the present disclosure.
  • the example segmentation apparatus of this embodiment can be used to implement the various example segmentation method embodiments of the present disclosure.
  • the apparatus of this embodiment includes: a neural network, an extraction module, a first fusion module, and a segmentation module. among them:
  • a neural network for extracting features from an image and outputting features of at least two different levels.
  • the neural network may include at least two network layers of different network depths for performing feature extraction on the image, and outputting at least two different hierarchical features through the network layer of at least two different network depths.
  • an extracting module configured to extract, from the features of the at least two different levels, the regional features corresponding to the at least one instance candidate region in the image.
  • the first fusion module is configured to fuse the regional features corresponding to the candidate regions of the same instance to obtain the first fusion feature of each candidate region.
  • a segmentation module configured to perform segmentation based on each first fusion feature, and obtain an instance segmentation result of the corresponding instance candidate region and/or an instance segmentation result of the image.
  • the image is extracted by the neural network, and at least two different levels of features are outputted; and the regional features corresponding to at least one instance candidate region in the image are extracted from the features of the two different levels. And merging the regional features corresponding to the candidate regions of the same instance to obtain the first fusion feature of each candidate region; performing segmentation based on each first fusion feature to obtain an instance segmentation result and/or an instance of the image of the corresponding candidate region Segment the result.
  • the embodiment of the present disclosure designs a deep learning-based framework to solve the problem of instance segmentation. Since deep learning has powerful modeling capabilities, it helps to obtain better instance segmentation results. In addition, instance segmentation is performed for instance candidate regions.
  • the accuracy of instance segmentation can be improved, the computational complexity and complexity required for instance segmentation can be reduced, and the segmentation efficiency of the instance can be improved.
  • the candidate region corresponding to the instance can be extracted from the features of at least two different levels.
  • the regional features are merged, and the segmentation is performed based on the obtained fusion features, so that each instance candidate region can obtain more different levels of information at the same time, since the information extracted from the features of different levels is at different semantic levels, thereby
  • the context information can be utilized to improve the accuracy of the instance segmentation results for each instance candidate region.
  • FIG. 8 is a schematic structural diagram of another embodiment of an example dividing device according to the present disclosure.
  • the example segmentation apparatus of the embodiment further includes: a second fusion module, configured to perform at least one feature of the at least two different levels to be merged and merged at least once, as compared with the embodiment shown in FIG. 7 .
  • the one-time reversal fusion includes: based on the depth direction of the network of the neural network, the features of different levels output by the network layers respectively with different network depths are sequentially fused according to two different hierarchical directions.
  • the extracting module is configured to extract, from the second fusion feature, the regional feature corresponding to the at least one instance candidate region.
  • the two different hierarchical directions may include: a direction from a high-level feature to a low-level feature, and a direction from a low-level feature to a high-level feature.
  • the two steps according to two different hierarchical directions may include: sequentially following a direction from a high-level feature to a low-level feature and a direction from a low-level feature to a high-level feature; or sequentially following a feature from a low-level feature to a high-level feature.
  • the direction and direction from the high-level features to the low-level features may include: sequentially following a direction from a high-level feature to a low-level feature and a direction from a low-level feature to a high-level feature; or sequentially following a feature from a low-level feature to a high-level feature.
  • the second fusion module sequentially selects different levels of features from network layers of different network depths, sequentially from high-level features to low-level features, and from low-level features to high-level features.
  • the direction is fused, it is used to: the depth of the network along the neural network is from deep to shallow, and then the higher-level features of the network layer outputted by the network deeper in the neural network are sequentially sampled, and the depth of the network is The lower level features of the shallower network layer are fused to obtain the third fusion feature; the lower fusion features are downsampled and the third fusion feature is sequentially followed from the low-level feature to the high-level feature Fusion features in the middle and higher levels are merged.
  • the feature of the higher level may include, for example, a feature outputted by a network layer having a deep network depth in a neural network, or a feature obtained by extracting at least one feature from a network layer output feature having a deep network depth.
  • the second fusion module sequentially samples the higher-level features of the neural network, which are outputted by the network layer with deeper network depth, and the lower output of the network layer with shallower network depth.
  • the higher-level features outputted by the network layer deeper in the neural network are sequentially sampled, and then compared with the adjacent network layers with shallower network depths. Low-level features are fused.
  • the second fusion module sequentially downsamples the lower-level fusion features and merges with the higher-level fusion features of the third fusion feature, and sequentially uses the lower-level fusion features. After downsampling, the fusion features of the higher level of the adjacent, third fusion features are merged.
  • the second fusion module sequentially outputs different levels of features from the network layers of different network depths, sequentially from low-level features to high-level features, and from high-level features to low-level features.
  • the direction is fused, it is used to: down the depth of the network along the neural network from the shallow to the deep direction, and then down-sampling the lower-level features of the network layer outputted by the shallower network layer in the neural network, and the depth of the network
  • the higher-level features of the deeper network layer output are fused to obtain the fourth fusion feature;
  • the higher-level fusion feature is sequentially sampled, and then merged with the lower-level fusion feature of the fourth fusion feature.
  • the lower level features may include, for example, a feature outputted by a network layer having a shallow network depth in a neural network, or a feature obtained by performing at least one feature extraction on a network layer output feature having a shallow network depth.
  • the second fusion module sequentially downsamples the lower-level features of the neural network, which are output through the network layer with shallower network depth, and the higher layer of the network layer that is deeper than the network depth.
  • the feature of the level is fused, it is used to sequentially downsample the lower-level features of the neural network, which are output through the network layer with shallower network depth, and the output of the adjacent network layer with deeper network depth. High-level features are fused.
  • the second fusion module sequentially samples the higher-level fusion feature and then merges with the lower-level fusion feature of the fourth fusion feature, and then sequentially uses the higher-level fusion feature. After upsampling, the fusion features of the lower level of the adjacent, fourth fusion features are merged.
  • the first fusion module performs pixel level fusion on multiple regional features corresponding to the same instance candidate region when the region features corresponding to the same instance candidate region are merged.
  • the plurality of regional features corresponding to the same instance candidate region when the first fusion module performs pixel-level merging on the plurality of regional features corresponding to the same instance candidate region, the plurality of regional features corresponding to the same instance candidate region respectively take a maximum value based on each pixel; or, respectively, The plurality of regional features corresponding to the same instance candidate region are averaged based on the respective pixels; or, the plurality of regional features corresponding to the same instance candidate region are respectively summed based on the respective pixels.
  • the splitting module may include:
  • a first segmentation unit configured to perform segmentation on an instance candidate region corresponding to a first fusion feature based on a first fusion feature, to obtain an instance segmentation result of the corresponding instance candidate region;
  • the second dividing unit is configured to perform image segmentation based on each first fusion feature to obtain an instance segmentation result of the image.
  • FIG. 9 is a schematic structural diagram of an embodiment of a splitting module according to an embodiment of the present disclosure. As shown in FIG. 9, in the foregoing embodiments of the present disclosure, the splitting module may include:
  • a first segmentation unit configured to perform segmentation on an instance candidate region corresponding to each of the first fusion features based on each first fusion feature, to obtain an instance segmentation result of each instance candidate region
  • an obtaining unit configured to obtain an instance segmentation result of the image based on the instance segmentation result of each instance candidate region.
  • the first segmentation unit comprises:
  • a first prediction sub-unit configured to perform pixel class instance class prediction based on a first fusion feature, and obtain an instance class prediction result of the instance candidate region corresponding to the first fusion feature
  • a second prediction sub-unit configured to perform pixel-level pre-background prediction based on a first fusion feature, to obtain a pre-background prediction result of the instance candidate region corresponding to the first fusion feature;
  • the obtaining subunit is configured to obtain an instance segmentation result of the instance object candidate region corresponding to the first fusion feature based on the instance class prediction result and the pre-background prediction result.
  • the second prediction sub-unit is configured to predict pixels belonging to the foreground and/or pixels belonging to the background in the instance candidate region corresponding to the first fusion feature based on a first fusion feature.
  • the foreground includes all the corresponding parts of the instance category, and the background includes: a part other than the corresponding part of all the instance categories; or the background includes all the corresponding parts of the instance category, and the foreground includes: a part other than the corresponding part of all the instance categories.
  • the first prediction subunit may include: a first convolution network for performing feature extraction on a first fusion feature; the first convolution network includes at least one full convolution layer; A convolutional layer for pixel-level object class prediction based on features output by the first convolutional network.
  • the second prediction subunit may include: a second convolution network for performing feature extraction on a first fusion feature; a second convolution network including at least one full convolution layer; a fully connected layer For pixel-level pre-background prediction based on features output by the second convolutional network.
  • the obtaining sub-unit is configured to: perform pixel-level addition processing on the object class prediction result of the instance candidate region corresponding to the first fusion feature and the previous background prediction result, to obtain a first fusion feature corresponding to The instance segmentation result of the instance object candidate region.
  • the first segmentation unit may further include: a conversion subunit, configured to convert the pre-background prediction result into a pre-background prediction result that is consistent with the dimension of the instance category prediction result.
  • the obtaining subunit is configured to perform pixel level addition processing on the instance category prediction result of the instance candidate region corresponding to a first fusion feature and the converted previous background prediction result.
  • the segmentation module may further include: a third segmentation unit, configured to perform semantic segmentation on at least a portion of the image based on the first fusion feature to obtain a semantic segmentation result; or And performing semantic segmentation on at least part of the image based on the second fusion feature to obtain a semantic segmentation result.
  • another electronic device provided by the embodiment of the present disclosure includes:
  • a memory for storing a computer program
  • a processor for executing a computer program stored in a memory, and implementing the example segmentation method of the above-described embodiments of the present disclosure when the computer program is executed.
  • FIG. 10 is a schematic structural diagram of an application embodiment of an electronic device according to the present disclosure.
  • the electronic device includes one or more processors, a communication unit, etc., such as one or more central processing units (CPUs), and/or one or more images.
  • processors such as one or more central processing units (CPUs), and/or one or more images.
  • a processor GPU or the like, the processor can perform various appropriate actions and processes according to executable instructions stored in a read only memory (ROM) or executable instructions loaded from a storage portion into a random access memory (RAM) .
  • ROM read only memory
  • RAM random access memory
  • the communication portion may include, but is not limited to, a network card, which may include, but is not limited to, an IB (Infiniband) network card, and the processor may communicate with the read only memory and/or the random access memory to execute executable instructions, and connect to the communication portion through the bus.
  • a network card which may include, but is not limited to, an IB (Infiniband) network card
  • the processor may communicate with the read only memory and/or the random access memory to execute executable instructions, and connect to the communication portion through the bus.
  • the communication unit And communicating with other target devices by the communication unit, thereby performing operations corresponding to any method provided by the embodiments of the present disclosure, for example, performing feature extraction on the image through a neural network, and outputting at least two different levels of features; Extracting the region features corresponding to the at least one instance candidate region in the image and merging the region features corresponding to the candidate region of the same instance to obtain the first fusion feature of each instance candidate region; The fusion feature is subjected to instance segmentation to obtain an instance segmentation result of the corresponding instance candidate region and/or an instance segmentation result of the image.
  • the CPU, ROM, and RAM are connected to each other through a bus.
  • the ROM is an optional module.
  • the RAM stores executable instructions, or writes executable instructions to the ROM at runtime, the executable instructions causing the processor to perform operations corresponding to any of the methods of the present disclosure.
  • An input/output (I/O) interface is also connected to the bus.
  • the communication unit can be integrated or set up with multiple sub-modules (eg multiple IB network cards) and on the bus link.
  • the following components are connected to the I/O interface: an input portion including a keyboard, a mouse, and the like; an output portion including a cathode ray tube (CRT), a liquid crystal display (LCD), and the like, and a speaker; a storage portion including a hard disk or the like; The communication part of the network interface card of the LAN card, modem, etc.
  • the communication section performs communication processing via a network such as the Internet.
  • the drive is also connected to the I/O interface as needed.
  • a removable medium such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory or the like is mounted on the drive as needed so that a computer program read therefrom is installed into the storage portion as needed.
  • FIG. 10 is only an optional implementation manner.
  • the number and types of components in the foregoing FIG. 10 may be selected, deleted, added, or replaced according to actual needs;
  • Functional components can also be implemented in separate settings or integrated settings, such as GPU and CPU detachable settings or GPU can be integrated on the CPU, the communication can be separated, or integrated on the CPU or GPU, etc. Wait.
  • an embodiment of the present disclosure includes a computer program product comprising a computer program tangibly embodied on a machine readable medium, the computer program comprising program code for executing the method illustrated in the flowchart, the program code comprising The instructions corresponding to the steps of the face anti-counterfeiting detection method provided by the embodiments of the present disclosure are executed.
  • the computer program can be downloaded and installed from the network via a communication portion, and/or installed from a removable medium. The above-described functions defined in the method of the present disclosure are performed when the computer program is executed by the CPU.
  • embodiments of the present disclosure also provide a computer program comprising computer instructions that, when executed in a processor of a device, implement an example segmentation method of any of the embodiments of the present disclosure.
  • an embodiment of the present disclosure further provides a computer readable storage medium having stored thereon a computer program, which when executed by a processor, implements an example segmentation method of any embodiment of the present disclosure.
  • the embodiments of the present disclosure have very broad applications in the fields of unmanned driving, home robots, maps, and the like.
  • the embodiments of the present disclosure can be applied to an automatic driving scene to accurately identify different traffic participants in an automatic driving scene;
  • the example can be applied to street scenes, identifying buildings and objects with different landmarks in the street scene, thereby helping to construct the high-precision map;
  • the embodiments of the present disclosure can be applied to home robots, for example, the robot needs to be used for each object when grasping objects.
  • precise pixel level positioning objects can be accurately identified and located using embodiments of the present disclosure. It is to be understood that the foregoing is only an exemplary embodiment and should not be construed as limiting the scope of the disclosure.
  • the methods and apparatus of the present disclosure may be implemented in a number of ways.
  • the methods and apparatus of the present disclosure may be implemented in software, hardware, firmware or any combination of software, hardware, firmware.
  • the above-described sequence of steps for the method is for illustrative purposes only, and the steps of the method of the present disclosure are not limited to the order described above unless otherwise specifically stated.
  • the present disclosure may also be embodied as programs recorded in a recording medium, the programs including machine readable instructions for implementing a method in accordance with the present disclosure.
  • the present disclosure also covers a recording medium storing a program for executing the method according to the present disclosure.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Multimedia (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)

Abstract

本公开实施例公开了一种实例分割方法和装置、电子设备、程序和介质,其中,方法包括:通过神经网络对图像进行特征提取,输出至少两个不同层级的特征;从所述至少两个不同层级的特征中抽取所述图像中至少一实例候选区域对应的区域特征、并对同一实例候选区域对应的区域特征进行融合,得到各实例候选区域的第一融合特征;基于各第一融合特征进行实例分割,获得相应实例候选区域的实例分割结果和/或所述图像的实例分割结果。本公开实施例设计了基于深度学习的框架解决实例分割的问题,可以获得更准确的实例分割结果。

Description

实例分割方法和装置、电子设备、程序和介质
本公开要求在2018年02月09日提交中国专利局、申请号为CN2018101370447、发明名称为“实例分割方法和装置、电子设备、程序和介质”的中国专利申请的优先权,和2018年02月09日提交中国专利局、申请号为CN2018101363710、发明名称为“图像分割方法和装置、电子设备、程序和介质”的中国专利申请的优先权,其全部内容通过引用结合在本公开中。
技术领域
本公开涉及计算机视觉技术,尤其是一种实例分割方法和装置、电子设备、程序和介质。
背景技术
实例分割是计算机视觉领域非常重要的方向,此任务结合了语义分割和物体检测的特点,对于输入图像中的每一个物体,分别为他们生成一个独立的像素级别的掩膜(mask),并且预测其对应的类别。实例分割在无人驾驶、家居机器人等领域有着非常广阔的应用。
发明内容
本公开实施例提供一种实例分割方案。
根据本公开实施例的一个方面,提供的一种实例分割方法,包括:
通过神经网络对图像进行特征提取,输出至少两个不同层级的特征;
从所述至少两个不同层级的特征中抽取所述图像中至少一实例候选区域对应的区域特征、并对同一实例候选区域对应的区域特征进行融合,得到各实例候选区域的第一融合特征;
基于各第一融合特征进行实例分割,获得相应实例候选区域的实例分割结果和/或所述图像的实例分割结果。
根据本公开实施例的另一个方面,提供的一种实例分割装置,包括:
神经网络,用于对图像进行特征提取,输出至少两个不同层级的特征;
抽取模块,用于从所述至少两个不同层级的特征中抽取所述图像中至少一实例候选区域对应的区域特征;
第一融合模块,用于对同一实例候选区域对应的区域特征进行融合,得到各实例候选区域的第一融合特征;
分割模块,用于基于各第一融合特征进行实例分割,获得相应实例候选区域的实例分割结果和/或所述图像的实例分割结果。
根据本公开实施例的再一个方面,提供的一种电子设备,包括:
存储器,用于存储计算机程序;
处理器,用于执行所述存储器中存储的计算机程序,且所述计算机程序被执行时,实现本公开任一实施例所述的方法。
根据本公开实施例的再一个方面,提供的一种计算机可读存储介质,其上存储有计算机程序,该计算机程序被处理器执行时,实现本公开任一实施例所述的方法。
根据本公开实施例的再一个方面,提供的一种计算机程序,包括计算机指令,当所述计算机指令在设备的处理器中运行时,实现本公开任一实施例所述的方法。
基于本公开上述实施例提供的实例分割方法和装置、电子设备、程序和介质,通过神经网络对图像进行特征提取,输出至少两个不同层级的特征;从两个不同层级的特征中抽取图像中至少一实例候选区域对应的区域特征、并对同一实例候选区域对应的区域特征进行融合,得到各实例候选区域的第一融合特征;基于各第一融合特征进行实例分割,获得相应实例候选区域的实例分割结果和/或图像的实例分割结果。本公开实施例设计了基于深度学习的框架进行实例分割的技术方案,由于深度学习具有强大的建模能力,有助于获得更好的实例分割结果;另外,对实例候选区域进行实例分割,相对于直接对整个图像进行实例分割,可以提高实例分割的准确性,降低实例分割所需的计算量和复杂度,提高实例分割效率;并且,从至少两个不同层级的特征中抽取实例候选区域对应的区域特征进行融合,并基于得到的融合特征进行实例分割,使得每个实例候选区域都可以同时获得更多不同层级的信息,由于从不同层级的特征抽取的信息都是处于不同的语义层级,从而可以利用上下文信息提高各实例候选区域的实例分割结果的准确性。
下面通过附图和实施例,对本公开的技术方案做进一步的详细描述。
附图说明
构成说明书的一部分的附图描述了本公开的实施例,并且连同描述一起用于解释本公开的原理。
参照附图,根据下面的详细描述,可以更加清楚地理解本公开,其中:
图1为本公开实例分割方法一个实施例的流程图。
图2为本公开实施例中的一个特征融合示意图。
图3为本公开实例分割方法另一个实施例的流程图。
图4为本公开实施例中进行双路掩膜预测的一个网络结构示意图。
图5为本公开实例分割方法一个应用实施例的流程图。
图6为图5所示应用实施例的过程示意图。
图7为本公开实例分割装置一个实施例的结构示意图。
图8为本公开实例分割装置另一个实施例的结构示意图。
图9为本公开实施例中分割模块一个实施例的结构示意图。
图10为本公开实施例中电子设备一个实施例的结构示意图。
具体实施方式
现在将参照附图来详细描述本公开的各种示例性实施例。应注意到:除非另外说明,否则在这些实施例中阐述的部件和步骤的相对布置、数字表达式和数值不限制本公开的范围。
还应理解,在本公开实施例中,“多个”可以指两个或两个以上,“至少一个”可以指一个、两个或两个以上。
本领域技术人员可以理解,本公开实施例中的“第一”、“第二”等术语仅用于区别不同步骤、设备或模块等,既不代表任何特定技术含义,也不表示它们之间的必然逻辑顺序。
还应理解,对于本公开实施例中提及的任一部件、数据或结构,在没有明确限定或者在前后文给出相反启示的情况下,一般可以理解为一个或多个。
还应理解,本公开对各个实施例的描述着重强调各个实施例之间的不同之处,其相同或相似之处可以相互参考,为了简洁,不再一一赘述。
同时,应当明白,为了便于描述,附图中所示出的各个部分的尺寸并不是按照实际的比例关系绘制的。
以下对至少一个示例性实施例的描述实际上仅仅是说明性的,决不作为对本公开及其应用或使用的任何限制。
对于相关领域普通技术人员已知的技术、方法和设备可能不作详细讨论,但在适当情况下,所述技术、方法和设备应当被视为说明书的一部分。
应注意到:相似的标号和字母在下面的附图中表示类似项,因此,一旦某一项在一个附图中被定义,则在随后的附图中不需要对其进行进一步讨论。
另外,公开中术语“和/或”,仅仅是一种描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。另外,本公开中字符“/”,一般表示前后关联对象是一种“或”的关系。
本公开实施例可以应用于终端设备、计算机系统、服务器等电子设备,其可与众多其它通用或专用计算系统环境或配置一起操作。适于与终端设备、计算机系统、服务器等电子设备一起使用的众所周知的终端设备、计算系统、环境和/或配置的例子包括但不限于:个人计算机系统、服务器计算机系统、瘦客户机、厚客户机、手持或膝上设备、基于微处理器的系统、机顶盒、可编程消费电子产品、网络个人电脑、小型计算机系统﹑大型计算机系统和包括上述任何系统的分布式云计算技术环境,等等。
终端设备、计算机系统、服务器等电子设备可以在由计算机系统执行的计算机系统可执行指令(诸如程序模块)的一般语境下描述。通常,程序模块可以包括例程、程序、目标程序、组件、逻辑、数据结构等等,它们执行特定的任务或者实现特定的抽象数据类型。计算机系统/服务器可以在分布式云计算环境中实施,分布式云计算环境中,任务是由通过通信网络链接的远程处理设备执行的。在分布式云计算环境中,程序模块可以位于包括存储设备的本地或远程计算系统存储介质上。
图1为本公开实例分割方法一个实施例的流程图。如图1所示,该实施例的实例分割方法包括:
102,通过神经网络对图像进行特征提取,输出至少两个不同层级的特征。
本公开各实施例中的特征的表现形式例如可以包括但不限于:特征图、特征向量或者特征矩阵,等等。所述至少两个不同层级是指神经网络中位于该神经网络不同深度的两个或两个以上的网络层。所述图像例如可以包括但不限于:静态图像,视频中的帧图像,等等。
在一个可选示例中,该操作102可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的神经网络执行。
104,从上述至少两个不同层级的特征中抽取图像中至少一个实例候选区域对应的区域特征。
实例例如可以包括但不限于某一个具体对象,如某一具体的人、某一具体的物,等等。通过神经网络对图像进行检测可获得一个或多个实例候选区域。实例候选区域表示图像中可能出现上述实例的区域。
在一个可选示例中,该操作104可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的抽取模块执行。
106,分别将同一实例候选区域对应的区域特征进行融合,得到各实例候选区域的第一融合特征。
本公开各实施例中,对多个区域特征进行融合的方式,例如可以是对多个区域特征中基于各像素求和、取最大值、取平均值等。
在一个可选示例中,该操作106可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的第一融合模块执行。
108,分别基于各第一融合特征进行实例分割(Instance Segmentation),获得相应实例候选区域的实例分割结果和/或所述图像的实例分割结果。
本公开各实施例中,实例候选区域的实例分割结果可以包括:该实例候选区域属于某实例的像素以及该实例所属的类别,例如,该实例候选区域中属于某男孩的像素以及该男孩所属的类别为人。
在一个可选示例中,该操作108可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的分割模块执行。
基于本公开上述实施例提供的实例分割方法,通过神经网络对图像进行特征提取,输出至少两个不同层级的特征;从两个不同层级的特征中抽取图像中至少一实例候选区域对应的区域特征、并对同一实例候选区域对应的区域特征进行融合,得到各实例候选区域的第一融合特征;基于各第一融合特征进行实例分割,获得相应实例候选区域的实例分割结果和/或图像的实例分割结果。本公开实施例设计了基于深度学习的框架解决实例分割的问题,由于深度学习具有强大的建模能力,有助于获得更好的实例分割结果;另外,针对实例候选区域进行实例分割,相对于直接对整个图像进行实例分割,可以提高实例分割的准确性,降低实例分割所需的计算量和复杂度,提高实例分割效率;并且,从至少两个不同层级的特征中抽取实例候选区域对应的区域特征进行融合,并基于得到的融合特征进行实例分割,使得每个实例候选区域都可以同时获得更多不同层级的信息,由于从不同层级的特征抽取的信息都是处于不同的语义层级,从而可以利用上下文信息提高各实例候选区域的实例分割结果的准确性。
本公开各实例分割方法实施例的一个实施方式中,操作102通过神经网络对图像进行特征提取,输出至少两个不同层级的特征,可以包括:通过神经网络对图像进行特征提取,经该神经网络中至少两个不同网络深度的网络层输出上述至少两个不同层级的特征。
本公开各实施例中,神经网络包括两个以上网络深度不同的网络层,神经网络包括的网络层中,用于进行特征提取的网络层可以称为特征层,神经网络接收到一个图像后,通过第一个网络层对输入的图像进行特征提取,并将提取的特征输入至第二个网络层,从第二个网络层起,每个网络层依次对输入的特征进行特征提取,将提取到的特征输入至下一个网络层进行特征提取。神经网络中各网络层的网络深度依据输入输出的顺序或者特征提取的顺序由浅至深,各网络层依次进行特征提取输出的特征的层级由低到高,分辨率由高至低。相对于同一神经网络中网络深度较浅的网络层,网络深度较深的网络层视野域较大,较多的关注空间结构信息,提取到的特征用于实例分割时,可以使得分割结果更准确。在神经网络中,网络层通常可以包括:至少一个用于进行特征提取的卷积层,和对卷积层提取的特征(例如特征图)进行上采样的上采样层,通过对特征进行上采样,可以减小卷积层提取的特征(例如特征图)的大小。
本公开各实例分割方法实施例的一个实施方式中,操作106中分别将同一实例候选区域对应的区域特征进行融合,可以包括:分别对同一实例候选区域对应的多个区域特征进行像素级别的融合。
例如,在其中一个可选示例中,分别对同一实例候选区域对应的多个区域特征进行像素级别的融合,可以是:
分别将同一实例候选区域对应的多个区域特征基于各像素取最大值(element-wise max),即,将同一实例候选区域对应的多个区域特征中,各像素位置的特征取最大值;
或者,分别将同一实例候选区域对应的多个区域特征基于各像素取平均值,即,将同一实例候选区域对应的多个区域特征中,各像素位置的特征求取平均值;
或者,分别将同一实例候选区域对应的多个区域特征基于各像素取求和,即,将同一实例候选区域对应的多个区域特征中,各像素位置的特征求和。
其中,在上述实施方式中,将同一实例候选区域对应的多个区域特征进行像素级别的融合时,将同 一实例候选区域对应的多个区域特征基于各像素取最大值的方式,相对于其他方式而言,使得实例候选区域的特征更明显,从而使得实例分割更准确,以提升实例分割结果的准确率。
可选地,在本公开实例分割方法的又一个实施例中,分别将同一实例候选区域对应的区域特征进行融合之前,还可以通过一个网络层,例如全卷积层或者全连接层,调整同一实例候选区域对应的区域特征,例如调整参与融合的同一实例候选区域对应的各区域特征的维度等,对参与融合的同一实例候选区域对应的各区域特征进行适配,使参与融合的同一实例候选区域对应的各区域特征更加适用于融合,从而获得更准确的融合特征。
在本公开实例分割方法的另一个实施例中,操作102输出至少两个不同层级的特征之后,还可以包括:将上述至少两个不同层级的特征进行至少一次折回融合,得到第二融合特征。其中,一次折回融合包括:基于神经网络的网络深度方向,对分别由不同网络深度的网络层输出的不同层级的特征,依次按照两个不同的层级方向进行融合。相应地,该实施例中,操作104可以包括:从第二融合特征中抽取至少一实例候选区域对应的区域特征。
在各实施例的一个实施方式中,上述两个不同的层级方向,包括:从高层级特征到低层级特征的方向、和从低层级特征到高层级特征的方向。由此更好利用上下文信息进行特征融合,进而提高各实例候选区域的实例分割结果。
则在其中一个可选示例中,上述依次按照两个不同的层级方向,可以包括:依次沿从高层级特征到低层级特征的方向(从神经网络中网络深度较深的网络层输出的特征到网络深度较浅的网络层输出的特征的方向)和从低层级特征到高层级特征的方向(从神经网络中网络深度较浅的网络层输出的特征到网络深度较深的网络层输出的特征的方向);或者,依次沿从低层级特征到高层级特征的方向和从高层级特征到低层级特征的方向。
在本公开各实施例的一个实施方式中,对分别由不同网络深度的网络层输出的不同层级的特征,依次沿从高层级特征到低层级特征的方向和从低层级特征到高层级特征的方向进行融合,包括:
沿神经网络的网络深度从深到浅的方向,依次将神经网络中,经网络深度较深的网络层输出的较高层级的特征上采样后,与经网络深度较浅的网络层输出的较低层级的特征进行融合,例如:将较高层级的特征上采样后与较低层级的特征相加,获得第三融合特征。其中,较高层级的特征,可以包括:经神经网络中网络深度较深的网络层输出的特征、或者对该网络深度较深的网络层输出的特征进行至少一次特征提取得到的特征。例如,参与融合的特征中,最高层级的特征可以是上述至少两个不同层级的特征中最高层级的特征,或者也可以是对该最高层级的特征进行一次或多次特征提取得到的特征,第三融合特征可以包括上述最高层级的特征和每次融合得到的融合特征;
沿从低层级特征到高层级特征的方向,依次将较低层级的融合特征降采样后,与第三融合特征中较高层级的融合特征进行融合。其中,参与本次融合的融合特征中,最低层级的融合特征可以是第三融合特征中最低层级的融合特征,或者也可以是对该第三融合特征中最低层级的融合特征进行一次或多次特征提取得到的特征;本次沿从低层级特征到高层级特征的方向进行特征融合得到的一批融合特征中,包括第三融合特征中最低层级的融合特征和每次融合得到的融合特征。
其中,若是将上述至少两个不同层级的特征进行一次折回融合,则沿从低层级特征到高层级特征的方向进行特征融合得到的一批融合特征即为第二融合特征;若是将上述至少两个不同层级的特征进行两次或以上折回融合,则可以执行多次沿从高层级特征到低层级特征的方向和从低层级特征到高层级特征的方向进行融合的操作,最终得到的一批融合特征即为第二融合特征。
其中,将经网络深度较深的网络层输出的较高层级的特征上采样后,与经网络深度较浅的网络层输出的较低层级的特征进行融合时,可以依次将神经网络中,经网络深度较深的网络层(例如沿神经网络的输入输出方向的第80个网络层)输出的较高层级的特征上采样后,与相邻的、经网络深度较浅的网络层(例如沿神经网络的输入输出方向的第79个网络层)输出的较低层级的特征进行融合。另外,也可以依次将神经网络中,经网络深度较深的网络层(例如沿神经网络的输入输出方向的第80个网络层)输出的较高层级的特征上采样后,与该网络深度较深的网络层不相邻的、网络深度较浅的网络层(例如沿神经网络的输入输出方向的第50个网络层)输出的较低层级的特征进行融合,即:进行跨层级特征的融合。
类似地,将较低层级的融合特征降采样后,与第三融合特征中较高层级的融合特征进行融合时,也可以将较低层级的融合特征(例如P 2,其中“2”表示特征层级)降采样后,与相邻的、第三融合特征中较高层级的融合特征(例如P 3,其中“3”表示特征层级)进行融合。或者,将较低层级的融合特征降采样后,与特征层级不相邻的、第三融合特征中较高层级的融合特征(例如P 4,其中“4”表示特征层级)进行融合,即:进行跨层级融合特征的融合。
图2为本公开实施例中的一个特征融合示意图。如图2所示,示出了将一个较低层级的融合特征 N i降采样后与相邻的、较高层级的特征P i+1融合,得到相应的融合特征N i+1的一个示意图。其中,i为取值大于0的整数。
基于该实施例,按照自上而下的顺序(即:神经网络中网络深度从深至浅、从高层级特征到低层级特征的顺序),逐渐将高层级低分辨率的特征和低层级高分辨率的特征融合,得到一批新的特征,然后再按照从自下而上的顺序(即:低层级特征到高层级特征的顺序),依次将较低层级的融合特征降采样后与相邻的、较高层级的特征融合,逐渐将低层级高分辨率的特征和高层级低分辨率的特征融合,得到另一批新的特征以用于实例分割,本实施例通过一个自下而上的信息通路,能够帮助低层信息更容易地传播到高层网络(即:网络深度较深的网络层),降低信息传播的损失,使得信息在神经网络内部能够更加顺畅的传递,由于低层信息对于某些细节信息比较敏感,能够提供对定位和分割非常有益的信息,从而提升实例分割结果;通过两遍特征融合,可以让高层网络(即:网络深度较深的网络层)更容易、全面地获取底层信息,从而进一步提升实例分割结果。
在本公开各实施例的另一个实施方式中,对分别由不同网络深度的网络层输出的不同层级的特征,依次沿从低层级特征到高层级特征的方向和从高层级特征到低层级特征的方向进行融合,包括:
沿神经网络的网络深度从浅到深的方向,依次将神经网络中,经网络深度较浅的网络层输出的较低层级的特征降采样后,与经网络深度较深的网络层输出的较高层级的特征进行融合,获得第四融合特征。其中,较低层级的特征,例如可以包括:经神经网络中网络深度较浅的网络层输出的特征、或者对网络深度较浅的网络层输出的特征进行至少一次特征提取得到的特征。例如,参与融合的特征中,最低层级的特征可以是上述至少两个不同层级的特征中最低层级的特征,或者也可以是对该最低层级的特征进行一次或多次特征提取得到的特征,第四融合特征可以包括上述最低层级的特征和每次融合得到的融合特征;
沿从高层级特征到低层级特征的方向,依次将较高层级的融合特征上采样后,与第四融合特征中较低层级的融合特征进行融合。其中,参与本次融合的融合特征中,最高层级的融合特征可以是第四融合特征中最高层级的融合特征,或者也可以是对该第四融合特征中最高层级的融合特征进行一次或多次特征提取得到的特征;本次沿从低层级特征到高层级特征的方向和从高层级特征到低层级特征的方向进行融合进行特征融合得到的一批融合特征中,包括第四融合特征中最高层级的融合特征和每次融合得到的融合特征。
其中,若是将上述至少两个不同层级的特征进行一次折回融合,则沿从低层级特征到高层级特征的方向和从高层级特征到低层级特征的方向进行融合进行特征融合得到的一批融合特征即为第二融合特征;若是将上述至少两个不同层级的特征进行两次或以上折回融合,则可以执行多次沿从低层级特征到高层级特征的方向和从高层级特征到低层级特征的方向进行融合进行特征融合得到的一批融合特征的操作,最终得到的一批融合特征即为第二融合特征。
在其中一个可选示例中,将神经网络中,经网络深度较浅的网络层输出的较低层级的特征降采样后,与经网络深度较深的网络层输出的较高层级的特征进行融合时,可以将神经网络中,经网络深度较浅的网络层输出的较低层级的特征降采样后,与该网络深度较浅的网络层相邻的、网络深度较深的网络层输出的较高层级的特征进行融合。或者,也可以将神经网络中,经网络深度较浅的网络层输出的较低层级的特征降采样后,与该网络深度较浅的网络层不相邻的、网络深度较深的网络层输出的较高层级的特征进行融合,即:进行跨层级特征的融合。
类似地,将较高层级的融合特征上采样后,与第四融合特征中较低层级的融合特征进行融合时,可以将较高层级的融合特征上采样后,与相邻的、第四融合特征中较低层级的融合特征进行融合。或者,也可以将较高层级的融合特征上采样后,与不相邻的、第四融合特征中较低层级的融合特征进行融合,即:进行跨层级融合特征的融合。
在本公开上述各实施例的一个实施方式中,操作108中,基于各第一融合特征进行实例分割,获得相应实例候选区域的实例分割结果和/或图像的实例分割结果,可以包括:
基于一第一融合特征,对一第一融合特征对应的实例候选区域进行实例分割,获得对应的实例候选区域的实例分割结果,其中的一第一融合特征不限定为特定的第一融合特征,可以是任一实例候选区域的第一融合特征;和/或,基于各第一融合特征对图像进行实例分割,获得图像的实例分割结果。
在本公开上述各实施例的另一个实施方式中,操作108中,基于各第一融合特征进行实例分割,获得图像的实例分割结果,可以包括:分别基于各第一融合特征,对各第一融合特征各自对应的实例候选区域进行实例分割,获得各实例候选区域的实例分割结果;基于各实例候选区域的实例分割结果获取图像的实例分割结果。
图3为本公开实例分割方法另一个实施例的流程图。如图3所示,该实施例的实例分割方法包括:
302,通过神经网络对图像进行特征提取,经神经网络中至少两个不同网络深度的网络层输出至少 两个不同层级的特征。
在一个可选示例中,该操作302可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的神经网络执行。
304,沿神经网络的网络深度从深到浅的方向,依次将神经网络中,经网络深度较深的网络层输出的较高层级的特征上采样后,与经网络深度较浅的网络层输出的较低层级的特征进行融合,获得第三融合特征。
其中,上述较高层级的特征可以包括:经神经网络中网络深度较深的网络层输出的特征、或者对该网络深度较深的网络层输出的特征进行至少一次特征提取得到的特征。例如,参与融合的特征中,最高层级的特征可以是上述至少两个不同层级的特征中最高层级的特征,或者也可以是对该最高层级的特征进行一次或多次特征提取得到的特征,第三融合特征可以包括上述至少两个不同层级的特征中最高层级的特征和通过该操作304中每次进行融合操作得到的融合特征。
306,沿从低层级特征到高层级特征的方向,依次将较低层级的融合特征降采样后,与第三融合特征中较高层级的融合特征进行融合,获得第二融合特征。
其中,其中,参与本次融合的融合特征中,最低层级的融合特征可以是第三融合特征中最低层级的融合特征,或者也可以是对该第三融合特征中最低层级的融合特征进行一次或多次特征提取得到的特征;本次沿从低层级特征到高层级特征的方向进行特征融合得到的一批融合特征中,包括第三融合特征中最低层级的融合特征和通过该操作306中每次进行融合操作融合得到的融合特征。
该实施例以进行一次这回融合为例进行说明,若是将上述至少两个不同层级的特征进行两次或以上折回融合,则可以执行多次操作304-306,最终得到的一批融合特征即为第二融合特征。
在一个可选示例中,该操作304-306可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的第二融合模块执行。
308,分别根据图像中的各实例候选区域,从第二融合特征中抽取至少一实例候选区域对应的区域特征。
本公开各实施例中,例如,可以采用但不限于区域推荐网络(Region Proposal Network,RPN)对图像生成各实例候选区域,并将各实例候选区域映射到第二融合特征中的各特征上,之后,例如,可以采用但不限于感兴趣区域(region of interest,ROI)对齐(ROIAlign)的方法,从第二融合特征中抽取各实例候选区域对应的区域特征。
在一个可选示例中,该操作308可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的抽取模块执行。
310,分别对同一实例候选区域对应的多个区域特征进行像素级别的融合,得到各实例候选区域的融合特征。
在一个可选示例中,该操作310可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的第一融合模块执行。
312,分别基于各第一融合特征进行实例分割,获得相应实例候选区域的实例分割结果。
在一个可选示例中,该操作312可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的分割模块执行。
本公开各实例分割方法实施例的一个实施方式中,基于一第一融合特征,对该一第一融合特征对应的实例候选区域进行实例分割,获得对应的实例候选区域的实例分割结果,可以包括:
基于上述一第一融合特征,进行像素级别的实例类别预测,获得上述一第一融合特征对应的实例候选区的实例类别预测结果;基于上述一第一融合特征进行像素级别的前背景预测,获得上述一第一融合特征对应的实例候选区域的前背景预测结果。其中,上述一第一融合特征为任一实例候选区域的第一融合特征;
基于上述实例类别预测结果和前背景预测结果,获取上述一第一融合特征对应的实例物体候选区域的实例分割结果,该实例分割结果包括:当前实例候选区域中属于某实例的像素以及该实例所属的类别信息。
基于本实施例,基于上述一第一融合特征,同时进行像素级别的实例类别预测和前背景预测,通过像素级别的实例类别预测可以对该一第一融合特征的精细分类和多分类,通过前背景预测可以获得较好的全局信息,并且由于无需关注多实例类别之间的细节信息,提高了预测速度,同时基于上述实例类别预测结果和前背景预测结果获取实例物体候选区域的实例分割结果,可以提高实例候选区域或者图像的实例分割结果。
在其中一个可选示例中,基于上述一第一融合特征,进行像素级别的实例类别预测,可以包括:
通过第一卷积网络,对上述一第一融合特征进行特征提取;该第一卷积网络包括至少一个全卷积层;
通过第一全卷积层,基于上述第一卷积网络输出的特征进行像素级别的物体类别预测。
在其中一个可选示例中,基于一第一融合特征进行像素级别的前背景预测,包括:
基于上述一第一融合特征,预测上述一第一融合特征对应的实例候选区域中属于前景的像素和/或属于背景的像素。
其中,背景与前景可以根据需求设定。例如,前景可以包括所有实例类别对应部分,背景可以包括所有实例类别对应部分以外的部分;或者,背景可以包括所有实例类别对应部分,前景可以包括:所有实例类别对应部分以外的部分。
在另一个可选示例中,基于一第一融合特征进行像素级别的前背景预测,可以包括:
通过第二卷积网络,对上述一第一融合特征进行特征提取;该第二卷积网络包括至少一个全卷积层;
通过全连接层,基于上述第二卷积网络输出的特征进行像素级别的前背景预测。
本公开各实例分割方法实施例的一个实施方式中,基于上述实例类别预测结果和前背景预测结果,获取一第一融合特征对应的实例物体候选区域的实例分割结果,包括:
将上述一第一融合特征对应的实例候选区域的物体类别预测结果与前背景预测结果进行像素级的相加处理,获得上述一第一融合特征对应的实例物体候选区域的实例分割结果。
在另一个实施例方式中,获得上述一第一融合特征对应的实例候选区域的前背景预测结果之后,还可以包括:将上述前背景预测结果转换为与上述实例类别预测结果的维度一致的前背景预测结果。例如,将前背景预测结果由向量转换为与物体类别预测的维度一致的矩阵。相应地,将上述一第一融合特征对应的实例候选区域的物体类别预测结果与前背景预测结果进行像素级的相加处理,可以包括:将上述一第一融合特征对应的实例候选区域的实例类别预测结果与转换得到的前背景预测结果进行像素级的相加处理。
其中,本公开各实施例的上述实施方式中,分别基于各实例候选区域的第一融合特征进行实例分割,获得各实例候选区域的实例分割结果时,由于同时基于该实例候选区域的第一融合特征进行像素级别的实例类别预测和前背景预测,该部分方案可以称为双路掩膜预测,如图4所示,为本公开实施例中进行双路掩膜预测的一个网络结构示意图。
如图4所示,实例候选区域对应的多个区域特征,分别经过两个分支进行实例类别预测和前背景预测。其中,第一个分支包括:四个全卷积层(conv1-conv4)即上述第一卷积网络,和一个解卷积层(deconv)即上述第一全卷积层。另外一个分支包括:从第一个分支的第三个全卷积层和第四个全卷积层(conv3-conv4)、以及两个全卷积层(conv4 -fc和conv5 -fc),即上述第二卷积网络;全连接层(fc);以及转换(reshape)层,用于将前背景预测结果转换为与实例类别预测结果的维度一致的前背景预测结果。第一个分支对每个潜在的实例类别都会进行像素级别的掩膜预测,而全连接层则进行一个与实例类别无关的掩膜预测(即,进行像素级别的前背景预测)。最终这两个分支的掩膜预测相加得到最终的实例分割结果。
图5为本公开实例分割方法一个应用实施例的流程图。图6为图5所示应用实施例的过程示意图。请同时参见图5和图6,该应用实施例的实例分割方法包括:
502,通过神经网络对图像进行特征提取,经神经网络中四个不同网络深度的网络层输出四个层级的特征M 1-M 4
在一个可选示例中,该操作502可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的神经网络执行。
504,将上述四个层级的特征中,按照从高层级特征M 4到低层级特征M 1(即:自上而下)的顺序,依次将较高层级的特征M i+1上采样后与较低层级的特征M i进行融合,获得第一批融合特征P 2-P 5
其中,i的取值依次为1-3中的整数。参与融合的特征和第一批融合特征中,最高层级的融合特征P 5为上述四个不同层级的特征中最高层级的特征M 4或者通过全卷积层对该特征M 4进行特征提取得到的特征;第一融合特征包括上述四个不同层级的特征中最高层级的融合特征和每次融合得到的融合特征P 2-P 5
506,将上述第一批融合特征中,按照从低层级特征P 2到高层级特征P 5(即:自下而上)的顺序,依次将较低层级的融合特征P k降采样后与相邻的较高层级的特征P k+1进行融合,获得第二批融合特征N 2-N 5
其中,k的取值依次为2-4中的整数。参与本次融合的融合特征和第二批融合特征中,最低层级的融合特征N 2为第一批融合特征中最低层级的融合特征P 2或者通过全卷积层对该融合特征P 2进行特征提取得到的特征,第二批融合特征包括第一融合特征中最低层级的特征P 2对应的特征和每次融合得到的融合特征,其中,第一融合特征中最低层级的特征对应的特征,即第一融合特征中最低层级的融合特征P 2或者通过卷积层对该融合特征P 2进行特征提取得到的特征。
本应用实施例以对上述四个层级的特征M 1-M 4进行一次折回融合为例进行说明,因此,通过操作506获得的第二批融合特征即为本公开上述各实施例中的第二融合特征。
在一个可选示例中,该操作502-504可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的第二融合模块执行。
508,从第二融合特征N 2-N 5中抽取上述图像中至少一实例候选区域对应的区域特征。
本公开各实施例中,例如,可以采用但不限于区域推荐网络对图像生成至少一个实例候选区域,并将各实例候选区域分别映射到第二融合特征中的各特征上,之后,例如,可以采用但不限于感兴趣区域对齐的方法,分别从第二融合特征中抽取同一实例候选区域对应的区域特征。
在一个可选示例中,该操作508可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的抽取模块执行。
510,分别对同一实例候选区域对应的多个区域特征进行像素级别的融合,得到各实例候选区域的第一融合特征。
在一个可选示例中,该操作510可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的第一融合模块执行。
之后,分别执行操作512和516。
512,分别基于各实例候选区域的第一融合特征进行实例分割,获得各实例候选区域的实例分割结果。
该实例分割结果包括各实例的物体框(box)或者位置和该实例所属的实例类别(class)。
在一个可选示例中,该操作512可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的第一分割单元执行。
之后,不执行本应用实施例的后续流程。
514,分别基于各实例候选区域的第一融合特征进行像素级别的实例类别预测,获得各实例候选区域的实例类别预测结果;以及分别基于各实例候选区域的第一融合特征进行像素级别的前背景预测,获得各实例候选区域的前背景预测结果。
在一个可选示例中,该操作514可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的第一分割单元或者第一分割单元中的第一预测子单元和第二预测子单元执行。
516,分别将各实例物体候选区域的第一融合特征对应物体类别预测结果与前背景预测结果进行像素级的相加处理,获得各第一融合特征对应的实例物体候选区域的实例分割结果。
其中,该实例分割结果包括:当前实例候选区域中属于某一实例的像素以及该实例所属的实例类别,其中的实例类别可以:背景或者某一实例类别。
在一个可选示例中,该操作516可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的第一分割单元或者第一分割单元中的获取子单元执行。
其中,该操作512与操作514-516之间在执行时间上不存在先后顺序,二者可以同时执行,也可以以任意时间顺序执行。
另外,在本公开上述各实施例中,得到各实例候选区域的第一融合特征之后,还可以基于该第一融合特征对图像的至少部分区域进行语义分割,获得语义分割结果。
或者,在本公开上述各实施例中,得到各实例候选区域的第二融合特征之后,还可以基于该第二融合特征对图像的至少部分区域进行语义分割,获得语义分割结果。
其中,上述语义分割结果例如可以包括:该图像的至少部分区域中各像素所属的类别。
本发明各实施例中,图像的至少部分区域可以是图像的全部区域或者局部区域(例如候选区域),即:可以对整个图像进行语义分割,得到图像的语义分割结果;也可以对图像的局部(例如候选区域)进行语义分割,得到局部区域的语义分割结果。其中的候选区域例如可以是上述各实施例中的实例候选区域,或者还可以是以其他方式产生的候选区域。
在一个可选示例中,上述对图像的至少部分区域进行语义分割的操作可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的分割模块或者分割模块中的执行。
基于上述实施例,实现了对图像的至少部分区域的语义分割。另外,基于第一融合特征或者第二融合特征对图像的至少部分区域进行语义分割,可以利用上下文信息提升图像语义分割结果的准确性。
需要说明的是在本公开上述各实施例中,得到各实例候选区域的第二融合特征之后,也可以基于第二融合特征进行实例分割,获得相应实例候选区域的实例分割结果和/或所述图像的实例分割结果。其中,基于第二融合特征进行实例分割,获得相应实例候选区域的实例分割结果和/或所述图像的实例分割结果的实现,可以参考上述基于第一融合特征进行实例分割,获得相应实例候选区域的实例分割结果和/或图像的实例分割结果的各实施例,二者的可以采取类似的方案实现,本公开不再赘述。
本公开实施例提供的任一种实例分割方法可以由任意适当的具有数据处理能力的设备执行,包括但不限于:终端设备和服务器等。或者,本公开实施例提供的任一种实例分割方法可以由处理器执行,如处理器通过调用存储器存储的相应指令来执行本公开实施例提及的任一种实例分割方法。下文不再赘述。
本领域普通技术人员可以理解:实现上述方法实施例的全部或部分步骤可以通过程序指令相关的硬件来完成,前述的程序可以存储于一计算机可读取存储介质中,该程序在执行时,执行包括上述方法实施例的步骤;而前述的存储介质包括:ROM、RAM、磁碟或者光盘等各种可以存储程序代码的介质。
图7为本公开实例分割装置一个实施例的结构示意图。该实施例的实例分割装置可用于实现本公开上述各实例分割方法实施例。如图7所示,该实施例的装置包括:神经网络,抽取模块,第一融合模块和分割模块。其中:
神经网络,用于对图像进行特征提取,输出至少两个不同层级的特征。
其中,该神经网络可以包括至少两个不同网络深度的网络层,用于对图像进行特征提取,经至少两个不同网络深度的网络层输出至少两个不同层级的特征。
抽取模块,用于从上述至少两个不同层级的特征中抽取图像中至少一实例候选区域对应的区域特征。
第一融合模块,用于对同一实例候选区域对应的区域特征进行融合,得到各实例候选区域的第一融合特征。
分割模块,用于基于各第一融合特征进行实例分割,获得相应实例候选区域的实例分割结果和/或图像的实例分割结果。
基于本公开上述实施例提供的实例分割装置,通过神经网络对图像进行特征提取,输出至少两个不同层级的特征;从两个不同层级的特征中抽取图像中至少一实例候选区域对应的区域特征、并对同一实例候选区域对应的区域特征进行融合,得到各实例候选区域的第一融合特征;基于各第一融合特征进行实例分割,获得相应实例候选区域的实例分割结果和/或图像的实例分割结果。本公开实施例设计了基于深度学习的框架解决实例分割的问题,由于深度学习具有强大的建模能力,有助于获得更好的实例分割结果;另外,针对实例候选区域进行实例分割,相对于直接对整个图像进行实例分割,可以提高实例分割的准确性,降低实例分割所需的计算量和复杂度,提高实例分割效率;并且,从至少两个不同层级的特征中抽取实例候选区域对应的区域特征进行融合,并基于得到的融合特征进行实例分割,使得每个实例候选区域都可以同时获得更多不同层级的信息,由于从不同层级的特征抽取的信息都是处于不同的语义层级,从而可以利用上下文信息提高各实例候选区域的实例分割结果的准确性。
图8为本公开实例分割装置另一个实施例的结构示意图。如图8所示,与图7所示的实施例相比,该实施例的实例分割装置还包括:第二融合模块,用于将至少两个不同层级的特征进行至少一次折回融合,得到第二融合特征。其中,一次折回融合包括:基于神经网络的网络深度方向,对分别由不同网络深度的网络层输出的不同层级的特征,依次按照两个不同的层级方向进行融合。相应地,该实施例中,抽取模块用于从第二融合特征中抽取至少一实例候选区域对应的区域特征。
在其中一个实施例方式中,上述两个不同的层级方向,可以包括:从高层级特征到低层级特征的方向、和从低层级特征到高层级特征的方向。
则上述依次按照两个不同的层级方向,可以包括:依次沿从高层级特征到低层级特征的方向和从低层级特征到高层级特征的方向;或者,依次沿从低层级特征到高层级特征的方向和从高层级特征到低层级特征的方向。
在其中一个可选示例中,第二融合模块对分别由不同网络深度的网络层输出的不同层级的特征,依次沿从高层级特征到低层级特征的方向和从低层级特征到高层级特征的方向进行融合时,用于:沿神经网络的网络深度从深到浅的方向,依次将神经网络中,经网络深度较深的网络层输出的较高层级的特征上采样后,与经网络深度较浅的网络层输出的较低层级的特征进行融合,获得第三融合特征;沿从低层级特征到高层级特征的方向,依次将较低层级的融合特征降采样后,与第三融合特征中较高层级的融合特征进行融合。
其中,较高层级的特征,例如可以包括:经神经网络中网络深度较深的网络层输出的特征、或者对网络深度较深的网络层输出的特征进行至少一次特征提取得到的特征。
在其中一个可选示例中,第二融合模块依次将神经网络中,经网络深度较深的网络层输出的较高层级的特征上采样后,与经网络深度较浅的网络层输出的较低层级的特征进行融合时,用于依次将神经网络中,经网络深度较深的网络层输出的较高层级的特征上采样后,与相邻的、经网络深度较浅的网络层输出的较低层级的特征进行融合。
在其中一个可选示例中,第二融合模块依次将较低层级的融合特征降采样后,与第三融合特征中较高层级的融合特征进行融合时,用于依次将较低层级的融合特征降采样后,与相邻的、第三融合特征中较高层级的融合特征进行融合。
在其中一个可选示例中,第二融合模块对分别由不同网络深度的网络层输出的不同层级的特征,依次沿从低层级特征到高层级特征的方向和从高层级特征到低层级特征的方向进行融合时,用于:沿神经网络的网络深度从浅到深的方向,依次将神经网络中,经网络深度较浅的网络层输出的较低层级的特征降采样后,与经网络深度较深的网络层输出的较高层级的特征进行融合,获得第四融合特征;
沿从高层级特征到低层级特征的方向,依次将较高层级的融合特征上采样后,与第四融合特征中较低层级的融合特征进行融合。
其中,较低层级的特征例如可以包括:经神经网络中网络深度较浅的网络层输出的特征、或者对网络深度较浅的网络层输出的特征进行至少一次特征提取得到的特征。
在其中一个可选示例中,第二融合模块依次将神经网络中,经网络深度较浅的网络层输出的较低层级的特征降采样后,与经网络深度较深的网络层输出的较高层级的特征进行融合时,用于依次将神经网络中,经网络深度较浅的网络层输出的较低层级的特征降采样后,与相邻的、经网络深度较深的网络层输出的较高层级的特征进行融合。
在其中一个可选示例中,第二融合模块依次将较高层级的融合特征上采样后,与第四融合特征中较低层级的融合特征进行融合时,用于依次将较高层级的融合特征上采样后,与相邻的、第四融合特征中较低层级的融合特征进行融合。
在其中一个可选示例中,第一融合模块对同一实例候选区域对应的区域特征进行融合时,用于分别对同一实例候选区域对应的多个区域特征进行像素级别的融合。
例如,第一融合模块对同一实例候选区域对应的多个区域特征进行像素级别的融合时,用于:分别对同一实例候选区域对应的多个区域特征基于各像素取最大值;或者,分别对同一实例候选区域对应的多个区域特征基于各像素取平均值;或者,分别对同一实例候选区域对应的多个区域特征基于各像素取求和。
另外,再参见图8,在本公开上述各实施例的一个实施方式中,分割模块可以包括:
第一分割单元,用于基于一第一融合特征,对一第一融合特征对应的实例候选区域进行实例分割,获得对应的实例候选区域的实例分割结果;和/或,
第二分割单元,用于基于各第一融合特征对图像进行实例分割,获得图像的实例分割结果。
图9为本公开实施例中分割模块一个实施例的结构示意图。如图9所示,在本公开上述各实施例中,分割模块可以包括:
第一分割单元,用于分别基于各第一融合特征,对各第一融合特征各自对应的实例候选区域进行实例分割,获得各实例候选区域的实例分割结果;
获取单元,用于基于各实例候选区域的实例分割结果获取图像的实例分割结果。
在其中一个实施方式中,第一分割单元包括:
第一预测子单元,用于基于一第一融合特征,进行像素级别的实例类别预测,获得一第一融合特征对应的实例候选区的实例类别预测结果;
第二预测子单元,用于基于一第一融合特征进行像素级别的前背景预测,获得一第一融合特征对应的实例候选区域的前背景预测结果;
获取子单元,用于基于实例类别预测结果和前背景预测结果,获取一第一融合特征对应的实例物体候选区域的实例分割结果。
在其中一个可选示例中,第二预测子单元,用于基于一第一融合特征,预测一第一融合特征对应的实例候选区域中属于前景的像素和/或属于背景的像素。
其中,前景包括所有实例类别对应部分,背景包括:所有实例类别对应部分以外的部分;或者,背景包括所有实例类别对应部分,前景包括:所有实例类别对应部分以外的部分。
在其中一个可选示例中,第一预测子单元可以包括:第一卷积网络,用于对一第一融合特征进行特征提取;第一卷积网络包括至少一个全卷积层;第一全卷积层,用于基于第一卷积网络输出的特征进行像素级别的物体类别预测。
在其中一个可选示例中,第二预测子单元可以包括:第二卷积网络,用于对一第一融合特征进行特征提取;第二卷积网络包括至少一个全卷积层;全连接层,用于基于第二卷积网络输出的特征进行像素级别的前背景预测。
在其中一个可选示例中,获取子单元用于:将一第一融合特征对应的实例候选区域的物体类别预测结果与前背景预测结果进行像素级的相加处理,获得一第一融合特征对应的实例物体候选区域的实例分割结果。
另外,再参见图9,在另一个实施例中,第一分割单元还可以包括:转换子单元,用于将前背景预测结果转换为与实例类别预测结果的维度一致的前背景预测结果。相应地,该实施例中,获取子单元用 于将一第一融合特征对应的实例候选区域的实例类别预测结果与转换得到的前背景预测结果进行像素级的相加处理。
另外,在本公开上述各实施例的一个实施方式中,分割模块还可以包括:第三分割单元,用于基于第一融合特征对图像的至少部分区域进行语义分割,获得语义分割结果;或者,用于基于第二融合特征对图像的至少部分区域进行语义分割,获得语义分割结果。
另外,本公开实施例提供的另一种电子设备,包括:
存储器,用于存储计算机程序;
处理器,用于执行存储器中存储的计算机程序,且计算机程序被执行时,实现本公开上述实施例的实例分割方法。
图10为本公开电子设备一个应用实施例的结构示意图。下面参考图10,其示出了适于用来实现本公开实施例的终端设备或服务器的电子设备的结构示意图。如图10所示,该电子设备包括一个或多个处理器、通信部等,所述一个或多个处理器例如:一个或多个中央处理单元(CPU),和/或一个或多个图像处理器(GPU)等,处理器可以根据存储在只读存储器(ROM)中的可执行指令或者从存储部分加载到随机访问存储器(RAM)中的可执行指令而执行各种适当的动作和处理。通信部可包括但不限于网卡,所述网卡可包括但不限于IB(Infiniband)网卡,处理器可与只读存储器和/或随机访问存储器中通信以执行可执行指令,通过总线与通信部相连、并经通信部与其他目标设备通信,从而完成本公开实施例提供的任一方法对应的操作,例如,通过神经网络对图像进行特征提取,输出至少两个不同层级的特征;从所述至少两个不同层级的特征中抽取所述图像中至少一实例候选区域对应的区域特征、并对同一实例候选区域对应的区域特征进行融合,得到各实例候选区域的第一融合特征;基于各第一融合特征进行实例分割,获得相应实例候选区域的实例分割结果和/或所述图像的实例分割结果。
此外,在RAM中,还可存储有装置操作所需的各种程序和数据。CPU、ROM以及RAM通过总线彼此相连。在有RAM的情况下,ROM为可选模块。RAM存储可执行指令,或在运行时向ROM中写入可执行指令,可执行指令使处理器执行本公开任一方法对应的操作。输入/输出(I/O)接口也连接至总线。通信部可以集成设置,也可以设置为具有多个子模块(例如多个IB网卡),并在总线链接上。
以下部件连接至I/O接口:包括键盘、鼠标等的输入部分;包括诸如阴极射线管(CRT)、液晶显示器(LCD)等以及扬声器等的输出部分;包括硬盘等的存储部分;以及包括诸如LAN卡、调制解调器等的网络接口卡的通信部分。通信部分经由诸如因特网的网络执行通信处理。驱动器也根据需要连接至I/O接口。可拆卸介质,诸如磁盘、光盘、磁光盘、半导体存储器等等,根据需要安装在驱动器上,以便于从其上读出的计算机程序根据需要被安装入存储部分。
需要说明的,如图10所示的架构仅为一种可选实现方式,在实践过程中,可根据实际需要对上述图10的部件数量和类型进行选择、删减、增加或替换;在不同功能部件设置上,也可采用分离设置或集成设置等实现方式,例如GPU和CPU可分离设置或者可将GPU集成在CPU上,通信部可分离设置,也可集成设置在CPU或GPU上,等等。这些可替换的实施方式均落入本公开公开的保护范围。
特别地,根据本公开的实施例,上文参考流程图描述的过程可以被实现为计算机软件程序。例如,本公开的实施例包括一种计算机程序产品,其包括有形地包含在机器可读介质上的计算机程序,计算机程序包含用于执行流程图所示的方法的程序代码,程序代码可包括对应执行本公开实施例提供的人脸防伪检测方法步骤对应的指令。在这样的实施例中,该计算机程序可以通过通信部分从网络上被下载和安装,和/或从可拆卸介质被安装。在该计算机程序被CPU执行时,执行本公开的方法中限定的上述功能。
另外,本公开实施例还提供了一种计算机程序,包括计算机指令,当计算机指令在设备的处理器中运行时,实现本公开任一实施例的实例分割方法。
另外,本公开实施例还提供了一种计算机可读存储介质,其上存储有计算机程序,该计算机程序被处理器执行时,实现本公开任一实施例的实例分割方法。
本公开实施例在无人驾驶、家居机器人、地图等领域有着非常广阔的应用,例如:本公开实施例可应用于自动驾驶场景,精确识别出自动驾驶场景中的不同交通参与者;本公开实施例可应用于街道场景,识别出街道场景中不同路标性质的建筑和物体,从而帮助高精地图的构建;本公开实施例可应用于居家机器人,例如机器人在抓取物体时需要对每个物体都有精确的像素级别的定位,利用本公开实施例,可以对物体进行精确识别和定位。应当理解,以上仅为示例性场景,不应理解为对本公开保护范围的限制。
本说明书中各个实施例均采用递进的方式描述,每个实施例重点说明的都是与其它实施例的不同之处,各个实施例之间相同或相似的部分相互参见即可。对于系统实施例而言,由于其与方法实施例基本对应,所以描述的比较简单,相关之处参见方法实施例的部分说明即可。
可能以许多方式来实现本公开的方法和装置。例如,可通过软件、硬件、固件或者软件、硬件、固件的任何组合来实现本公开的方法和装置。用于所述方法的步骤的上述顺序仅是为了进行说明,本公开 的方法的步骤不限于以上描述的顺序,除非以其它方式特别说明。此外,在一些实施例中,还可将本公开实施为记录在记录介质中的程序,这些程序包括用于实现根据本公开的方法的机器可读指令。因而,本公开还覆盖存储用于执行根据本公开的方法的程序的记录介质。
本公开的描述是为了示例和描述起见而给出的,而并不是无遗漏的或者将本公开限于所公开的形式。很多修改和变化对于本领域的普通技术人员而言是显然的。选择和描述实施例是为了更好说明本公开的原理和实际应用,并且使本领域的普通技术人员能够理解本公开从而设计适于特定用途的带有各种修改的各种实施例。

Claims (54)

  1. 一种实例分割方法,其特征在于,包括:
    通过神经网络对图像进行特征提取,输出至少两个不同层级的特征;
    从所述至少两个不同层级的特征中抽取所述图像中至少一实例候选区域对应的区域特征、并对同一实例候选区域对应的区域特征进行融合,得到各实例候选区域的第一融合特征;
    基于各第一融合特征进行实例分割,获得相应实例候选区域的实例分割结果和/或所述图像的实例分割结果。
  2. 根据权利要求1所述的方法,其特征在于,所述通过神经网络对图像进行特征提取,输出至少两个不同层级的特征,包括:通过所述神经网络对所述图像进行特征提取,经所述神经网络中至少两个不同网络深度的网络层输出至少两个不同层级的特征。
  3. 根据权利要求1或2所述的方法,其特征在于,所述输出至少两个不同层级的特征之后,还包括:将所述至少两个不同层级的特征进行至少一次折回融合,得到第二融合特征;其中,一次所述折回融合包括:基于所述神经网络的网络深度方向,对分别由不同网络深度的网络层输出的不同层级的特征,依次按照两个不同的层级方向进行融合;
    从所述至少两个不同层级的特征中抽取所述图像中至少一实例候选区域对应的区域特征,包括:从所述第二融合特征中抽取所述至少一实例候选区域对应的区域特征。
  4. 根据权利要求3所述的方法,其特征在于,所述两个不同的层级方向,包括:从高层级特征到低层级特征的方向、和从低层级特征到高层级特征的方向。
  5. 根据权利要求4所述的方法,其特征在于,所述依次按照两个不同的层级方向,包括:依次沿从高层级特征到低层级特征的方向和从低层级特征到高层级特征的方向;或者,依次沿从低层级特征到高层级特征的方向和从高层级特征到低层级特征的方向。
  6. 根据权利要求5所述的方法,其特征在于,对分别由不同网络深度的网络层输出的不同层级的特征,依次沿从高层级特征到低层级特征的方向和从低层级特征到高层级特征的方向进行融合,包括:
    沿所述神经网络的网络深度从深到浅的方向,依次将所述神经网络中,经网络深度较深的网络层输出的较高层级的特征上采样后,与经网络深度较浅的网络层输出的较低层级的特征进行融合,获得第三融合特征;
    沿从低层级特征到高层级特征的方向,依次将较低层级的融合特征降采样后,与所述第三融合特征中较高层级的融合特征进行融合。
  7. 根据权利要求6所述的方法,其特征在于,所述较高层级的特征,包括:经所述神经网络中所述网络深度较深的网络层输出的特征、或者对所述网络深度较深的网络层输出的特征进行至少一次特征提取得到的特征。
  8. 根据权利要求6或7所述的方法,其特征在于,所述依次将所述神经网络中,经网络深度较深的网络层输出的较高层级的特征上采样后,与经网络深度较浅的网络层输出的较低层级的特征进行融合,包括:依次将所述神经网络中,经网络深度较深的网络层输出的较高层级的特征上采样后,与相邻的、经网络深度较浅的网络层输出的较低层级的特征进行融合。
  9. 根据权利要求6-8任一所述的方法,其特征在于,所述依次将较低层级的融合特征降采样后,与所述第三融合特征中较高层级的融合特征进行融合,包括:依次将较低层级的融合特征降采样后,与相邻的、所述第三融合特征中较高层级的融合特征进行融合。
  10. 根据权利要求5所述的方法,其特征在于,对分别由不同网络深度的网络层输出的不同层级的特征,依次沿从低层级特征到高层级特征的方向和从高层级特征到低层级特征的方向进行融合,包括:
    沿所述神经网络的网络深度从浅到深的方向,依次将所述神经网络中,经网络深度较浅的网络层输出的较低层级的特征降采样后,与经网络深度较深的网络层输出的较高层级的特征进行融合,获得第四融合特征;
    沿从高层级特征到低层级特征的方向,依次将较高层级的融合特征上采样后,与所述第四融合特征中较低层级的融合特征进行融合。
  11. 根据权利要求10所述的方法,其特征在于,所述较低层级的特征,包括:经所述神经网络中所述网络深度较浅的网络层输出的特征、或者对所述网络深度较浅的网络层输出的特征进行至少一次特征提取得到的特征。
  12. 根据权利要求10或11所述的方法,其特征在于,所述依次将所述神经网络中,经网络深度较浅的网络层输出的较低层级的特征降采样后,与经网络深度较深的网络层输出的较高层级的特征进行融 合,包括:依次将所述神经网络中,经网络深度较浅的网络层输出的较低层级的特征降采样后,与相邻的、经网络深度较深的网络层输出的较高层级的特征进行融合。
  13. 根据权利要求10-12任一所述的方法,其特征在于,所述依次将较高层级的融合特征上采样后,与所述第四融合特征中较低层级的融合特征进行融合,包括:依次将较高层级的融合特征上采样后,与相邻的、所述第四融合特征中较低层级的融合特征进行融合。
  14. 根据权利要求1-13任一所述的方法,其特征在于,所述对同一实例候选区域对应的区域特征进行融合,包括:分别对同一实例候选区域对应的多个区域特征进行像素级别的融合。
  15. 根据权利要求14所述的方法,其特征在于,所述对同一实例候选区域对应的多个区域特征进行像素级别的融合,包括:分别对同一实例候选区域对应的多个区域特征基于各像素取最大值;或者,分别对同一实例候选区域对应的多个区域特征基于各像素取平均值;或者,分别对同一实例候选区域对应的多个区域特征基于各像素取求和。
  16. 根据权利要求1-15任一所述的方法,其特征在于,所述基于各第一融合特征进行实例分割,获得相应实例候选区域的实例分割结果和/或所述图像的实例分割结果,包括:
    基于一第一融合特征,对所述一第一融合特征对应的实例候选区域进行实例分割,获得所述对应的实例候选区域的实例分割结果;和/或,
    基于各第一融合特征对所述图像进行实例分割,获得所述图像的实例分割结果。
  17. 根据权利要求1-16任一所述的方法,其特征在于,所述基于各第一融合特征进行实例分割,获得所述图像的实例分割结果,包括:
    分别基于各第一融合特征,对各第一融合特征各自对应的实例候选区域进行实例分割,获得各实例候选区域的实例分割结果;
    基于所述各实例候选区域的实例分割结果获取所述图像的实例分割结果。
  18. 根据权利要求16或17所述的方法,其特征在于,所述基于一第一融合特征,对所述一第一融合特征对应的实例候选区域进行实例分割,获得所述对应的实例候选区域的实例分割结果,包括:
    基于所述一第一融合特征,进行像素级别的实例类别预测,获得所述一第一融合特征对应的实例候选区的实例类别预测结果;基于所述一第一融合特征进行像素级别的前背景预测,获得所述一第一融合特征对应的实例候选区域的前背景预测结果;
    基于所述实例类别预测结果和所述前背景预测结果,获取所述一第一融合特征对应的实例物体候选区域的实例分割结果。
  19. 根据权利要求18所述的方法,其特征在于,基于所述一第一融合特征进行像素级别的前背景预测,包括:基于所述一第一融合特征,预测所述一第一融合特征对应的实例候选区域中属于前景的像素和/或属于背景的像素。
  20. 根据权利要求19所述的方法,其特征在于,所述前景包括所有实例类别对应部分,所述背景包括所述所有实例类别对应部分以外的部分;或者,所述背景包括所有实例类别对应部分,所述前景包括所述所有实例类别对应部分以外的部分。
  21. 根据权利要求18-20任一所述的方法,其特征在于,基于所述一第一融合特征,进行像素级别的实例类别预测,包括:
    通过第一卷积网络,对所述一第一融合特征进行特征提取;所述第一卷积网络包括至少一个全卷积层;
    通过第一全卷积层,基于所述第一卷积网络输出的特征进行像素级别的物体类别预测。
  22. 根据权利要求18-21任一所述的方法,其特征在于,基于所述一第一融合特征进行像素级别的前背景预测,包括:
    通过第二卷积网络,对所述一第一融合特征进行特征提取;所述第二卷积网络包括至少一个全卷积层;
    通过全连接层,基于所述第二卷积网络输出的特征进行像素级别的前背景预测。
  23. 根据权利要求18-22任一所述的方法,其特征在于,基于所述实例类别预测结果和所述前背景预测结果,获取所述一第一融合特征对应的实例物体候选区域的实例分割结果,包括:将所述一第一融合特征对应的实例候选区域的物体类别预测结果与前背景预测结果进行像素级的相加处理,获得所述一第一融合特征对应的实例物体候选区域的实例分割结果。
  24. 根据权利要求23所述的方法,其特征在于,获得所述一第一融合特征对应的实例候选区域的前背景预测结果之后,还包括:将所述前背景预测结果转换为与所述实例类别预测结果的维度一致的前背景预测结果;
    将所述一第一融合特征对应的实例候选区域的物体类别预测结果与前背景预测结果进行像素级的 相加处理,包括:将所述一第一融合特征对应的实例候选区域的实例类别预测结果与转换得到的前背景预测结果进行像素级的相加处理。
  25. 根据权利要求1-15任一所述的方法,其特征在于,所述得到各实例候选区域的第一融合特征之后,还包括:基于所述第一融合特征对所述图像的至少部分区域进行语义分割,获得语义分割结果。
  26. 根据权利要求3-15任一所述的方法,其特征在于,所述得到各实例候选区域的第二融合特征之后,还包括:基于所述第二融合特征对所述图像的至少部分区域进行语义分割,获得语义分割结果。
  27. 一种实例分割装置,其特征在于,包括:
    神经网络,用于对图像进行特征提取,输出至少两个不同层级的特征;
    抽取模块,用于从所述至少两个不同层级的特征中抽取所述图像中至少一实例候选区域对应的区域特征;
    第一融合模块,用于对同一实例候选区域对应的区域特征进行融合,得到各实例候选区域的第一融合特征;
    分割模块,用于基于各第一融合特征进行实例分割,获得相应实例候选区域的实例分割结果和/或所述图像的实例分割结果。
  28. 根据权利要求27所述的装置,其特征在于,所述神经网络包括至少两个不同网络深度的网络层,用于对所述图像进行特征提取,经所述至少两个不同网络深度的网络层输出至少两个不同层级的特征。
  29. 根据权利要求27或28所述的装置,其特征在于,还包括:
    第二融合模块,用于将所述至少两个不同层级的特征进行至少一次折回融合,得到第二融合特征;其中,一次所述折回融合包括:基于所述神经网络的网络深度方向,对分别由不同网络深度的网络层输出的不同层级的特征,依次按照两个不同的层级方向进行融合;所述抽取模块用于从所述第二融合特征中抽取所述至少一实例候选区域对应的区域特征。
  30. 根据权利要求29所述的装置,其特征在于,所述两个不同的层级方向,包括:从高层级特征到低层级特征的方向、和从低层级特征到高层级特征的方向。
  31. 根据权利要求30所述的装置,其特征在于,所述依次按照两个不同的层级方向,包括:依次沿从高层级特征到低层级特征的方向和从低层级特征到高层级特征的方向;或者,依次沿从低层级特征到高层级特征的方向和从高层级特征到低层级特征的方向。
  32. 根据权利要求31所述的装置,其特征在于,所述第二融合模块对分别由不同网络深度的网络层输出的不同层级的特征,依次沿从高层级特征到低层级特征的方向和从低层级特征到高层级特征的方向进行融合时,用于:沿所述神经网络的网络深度从深到浅的方向,依次将所述神经网络中,经网络深度较深的网络层输出的较高层级的特征上采样后,与经网络深度较浅的网络层输出的较低层级的特征进行融合,获得第三融合特征;沿从低层级特征到高层级特征的方向,依次将较低层级的融合特征降采样后,与所述第三融合特征中较高层级的融合特征进行融合。
  33. 根据权利要求32所述的装置,其特征在于,所述较高层级的特征,包括:
    经所述神经网络中所述网络深度较深的网络层输出的特征、或者对所述网络深度较深的网络层输出的特征进行至少一次特征提取得到的特征。
  34. 根据权利要求32或33所述的装置,其特征在于,所述第二融合模块依次将所述神经网络中,经网络深度较深的网络层输出的较高层级的特征上采样后,与经网络深度较浅的网络层输出的较低层级的特征进行融合时,用于依次将所述神经网络中,经网络深度较深的网络层输出的较高层级的特征上采样后,与相邻的、经网络深度较浅的网络层输出的较低层级的特征进行融合。
  35. 根据权利要求32-34任一所述的装置,其特征在于,所述第二融合模块依次将较低层级的融合特征降采样后,与所述第三融合特征中较高层级的融合特征进行融合时,用于依次将较低层级的融合特征降采样后,与相邻的、所述第三融合特征中较高层级的融合特征进行融合。
  36. 根据权利要求31所述的装置,其特征在于,所述第二融合模块对分别由不同网络深度的网络层输出的不同层级的特征,依次沿从低层级特征到高层级特征的方向和从高层级特征到低层级特征的方向进行融合时,用于:沿所述神经网络的网络深度从浅到深的方向,依次将所述神经网络中,经网络深度较浅的网络层输出的较低层级的特征降采样后,与经网络深度较深的网络层输出的较高层级的特征进行融合,获得第四融合特征;沿从高层级特征到低层级特征的方向,依次将较高层级的融合特征上采样后,与所述第四融合特征中较低层级的融合特征进行融合。
  37. 根据权利要求36所述的装置,其特征在于,所述较低层级的特征,包括:经所述神经网络中所述网络深度较浅的网络层输出的特征、或者对所述网络深度较浅的网络层输出的特征进行至少一次特征提取得到的特征。
  38. 根据权利要求36或37所述的装置,其特征在于,所述第二融合模块依次将所述神经网络中,经网络深度较浅的网络层输出的较低层级的特征降采样后,与经网络深度较深的网络层输出的较高层级的特征进行融合时,用于依次将所述神经网络中,经网络深度较浅的网络层输出的较低层级的特征降采样后,与相邻的、经网络深度较深的网络层输出的较高层级的特征进行融合。
  39. 根据权利要求36-38任一所述的装置,其特征在于,所述第二融合模块依次将较高层级的融合特征上采样后,与所述第四融合特征中较低层级的融合特征进行融合时,用于依次将较高层级的融合特征上采样后,与相邻的、所述第四融合特征中较低层级的融合特征进行融合。
  40. 根据权利要求27-39任一所述的装置,其特征在于,所述第一融合模块对同一实例候选区域对应的区域特征进行融合时,用于分别对同一实例候选区域对应的多个区域特征进行像素级别的融合。
  41. 根据权利要求40所述的装置,其特征在于,所述第一融合模块对同一实例候选区域对应的多个区域特征进行像素级别的融合时,用于:分别对同一实例候选区域对应的多个区域特征基于各像素取最大值;或者,分别对同一实例候选区域对应的多个区域特征基于各像素取平均值;或者,分别对同一实例候选区域对应的多个区域特征基于各像素取求和。
  42. 根据权利要求27-41任一所述的装置,其特征在于,所述分割模块包括:
    第一分割单元,用于基于一第一融合特征,对所述一第一融合特征对应的实例候选区域进行实例分割,获得所述对应的实例候选区域的实例分割结果;和/或,
    第二分割单元,用于基于各第一融合特征对所述图像进行实例分割,获得所述图像的实例分割结果。
  43. 根据权利要求27-41任一所述的装置,其特征在于,所述分割模块包括:
    第一分割单元,用于分别基于各第一融合特征,对所述各第一融合特征各自对应的实例候选区域进行实例分割,获得各实例候选区域的实例分割结果;
    获取单元,用于基于所述各实例候选区域的实例分割结果获取所述图像的实例分割结果。
  44. 根据权利要求42或43所述的装置,其特征在于,所述第一分割单元包括:
    第一预测子单元,用于基于所述一第一融合特征,进行像素级别的实例类别预测,获得所述一第一融合特征对应的实例候选区的实例类别预测结果;
    第二预测子单元,用于基于所述一第一融合特征进行像素级别的前背景预测,获得所述一第一融合特征对应的实例候选区域的前背景预测结果;
    获取子单元,用于基于所述实例类别预测结果和所述前背景预测结果,获取所述一第一融合特征对应的实例物体候选区域的实例分割结果。
  45. 根据权利要求44所述的装置,其特征在于,所述第二预测子单元,用于基于所述一第一融合特征,预测所述一第一融合特征对应的实例候选区域中属于前景的像素和/或属于背景的像素。
  46. 根据权利要求45所述的装置,其特征在于,所述前景包括所有实例类别对应部分,所述背景包括:所述所有实例类别对应部分以外的部分;或者,
    所述背景包括所有实例类别对应部分,所述前景包括:所述所有实例类别对应部分以外的部分。
  47. 根据权利要求44-46任一所述的装置,其特征在于,所述第一预测子单元包括:
    第一卷积网络,用于对所述一第一融合特征进行特征提取;所述第一卷积网络包括至少一个全卷积层;
    第一全卷积层,用于基于所述第一卷积网络输出的特征进行像素级别的物体类别预测。
  48. 根据权利要求44-47任一所述的装置,其特征在于,所述第二预测子单元包括:
    第二卷积网络,用于对所述一第一融合特征进行特征提取;所述第二卷积网络包括至少一个全卷积层;
    全连接层,用于基于所述第二卷积网络输出的特征进行像素级别的前背景预测。
  49. 根据权利要求44-48任一所述的装置,其特征在于,所述获取子单元,用于:将所述一第一融合特征对应的实例候选区域的物体类别预测结果与前背景预测结果进行像素级的相加处理,获得所述一第一融合特征对应的实例物体候选区域的实例分割结果。
  50. 根据权利要求49所述的装置,其特征在于,所述第一分割单元还包括:
    转换子单元,用于将所述前背景预测结果转换为与所述实例类别预测结果的维度一致的前背景预测结果;
    所述获取子单元,用于将所述一第一融合特征对应的实例候选区域的实例类别预测结果与转换得到的前背景预测结果进行像素级的相加处理。
  51. 根据权利要求27-50任一所述的装置,其特征在于,所述分割模块还包括:第三分割单元,用于基于所述第一融合特征对所述图像的至少部分区域进行语义分割,获得语义分割结果。
  52. 根据权利要求29-50任一所述的方法,其特征在于,所述分割模块还包括:第三分割单元,用 于基于所述第二融合特征对所述图像的至少部分区域进行语义分割,获得语义分割结果。
  53. 一种电子设备,其特征在于,包括:
    存储器,用于存储计算机程序;
    处理器,用于执行所述存储器中存储的计算机程序,且所述计算机程序被执行时,实现上述权利要求1-26任一项所述的方法。
  54. 一种计算机可读存储介质,其上存储有计算机程序,其特征在于,该计算机程序被处理器执行时,实现上述权利要求1-26任一项所述的方法。
PCT/CN2019/073819 2018-02-09 2019-01-30 实例分割方法和装置、电子设备、程序和介质 WO2019154201A1 (zh)

Priority Applications (4)

Application Number Priority Date Filing Date Title
KR1020207016941A KR102438095B1 (ko) 2018-02-09 2019-01-30 인스턴스 분할 방법 및 장치, 전자 기기, 프로그램 및 매체
JP2020533099A JP7032536B2 (ja) 2018-02-09 2019-01-30 インスタンスセグメンテーション方法および装置、電子機器、プログラムならびに媒体
SG11201913332WA SG11201913332WA (en) 2018-02-09 2019-01-30 Instance segmentation methods and apparatuses, electronic devices, programs, and media
US16/729,423 US11270158B2 (en) 2018-02-09 2019-12-29 Instance segmentation methods and apparatuses, electronic devices, programs, and media

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
CN201810137044.7A CN108460411B (zh) 2018-02-09 2018-02-09 实例分割方法和装置、电子设备、程序和介质
CN201810136371.0A CN108335305B (zh) 2018-02-09 2018-02-09 图像分割方法和装置、电子设备、程序和介质
CN201810136371.0 2018-02-09
CN201810137044.7 2018-02-09

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US16/729,423 Continuation US11270158B2 (en) 2018-02-09 2019-12-29 Instance segmentation methods and apparatuses, electronic devices, programs, and media

Publications (1)

Publication Number Publication Date
WO2019154201A1 true WO2019154201A1 (zh) 2019-08-15

Family

ID=67548217

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/073819 WO2019154201A1 (zh) 2018-02-09 2019-01-30 实例分割方法和装置、电子设备、程序和介质

Country Status (5)

Country Link
US (1) US11270158B2 (zh)
JP (1) JP7032536B2 (zh)
KR (1) KR102438095B1 (zh)
SG (1) SG11201913332WA (zh)
WO (1) WO2019154201A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112489060A (zh) * 2020-12-07 2021-03-12 北京医准智能科技有限公司 一种用于肺炎病灶分割的系统及方法
CN113297991A (zh) * 2021-05-28 2021-08-24 杭州萤石软件有限公司 一种行为识别方法、装置及设备

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110866526A (zh) * 2018-08-28 2020-03-06 北京三星通信技术研究有限公司 图像分割方法、电子设备及计算机可读存储介质
EP3912126B1 (en) * 2019-01-15 2023-10-11 Services Pétroliers Schlumberger Residual signal detection for noise attenuation
CN111626969B (zh) * 2020-05-22 2023-05-30 张卫东 一种基于注意力机制的玉米病害图像处理方法
CN111652142A (zh) * 2020-06-03 2020-09-11 广东小天才科技有限公司 基于深度学习的题目分割方法、装置、设备和介质
CN112465801B (zh) * 2020-12-09 2022-11-29 北京航空航天大学 一种分尺度提取掩码特征的实例分割方法
CN113096140B (zh) * 2021-04-15 2022-11-22 北京市商汤科技开发有限公司 实例分割方法及装置、电子设备及存储介质
KR20220125719A (ko) * 2021-04-28 2022-09-14 베이징 바이두 넷컴 사이언스 테크놀로지 컴퍼니 리미티드 목표 대상 검측 모델을 트레이닝하는 방법 및 장비, 목표 대상을 검측하는 방법 및 장비, 전자장비, 저장매체 및 컴퓨터 프로그램
CN113792738A (zh) * 2021-08-05 2021-12-14 北京旷视科技有限公司 实例分割方法、装置、电子设备和计算机可读存储介质
WO2023106546A1 (ko) * 2021-12-09 2023-06-15 재단법인대구경북과학기술원 상향식 인스턴스 세분화 방법 및 장치
CN115205906B (zh) * 2022-09-15 2022-12-23 山东能源数智云科技有限公司 基于人体解析的仓储作业人员的检测方法、装置及介质

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105512661A (zh) * 2015-11-25 2016-04-20 中国人民解放军信息工程大学 一种基于多模态特征融合的遥感影像分类方法
CN107424159A (zh) * 2017-07-28 2017-12-01 西安电子科技大学 基于超像素边缘和全卷积网络的图像语义分割方法
CN108335305A (zh) * 2018-02-09 2018-07-27 北京市商汤科技开发有限公司 图像分割方法和装置、电子设备、程序和介质
CN108460411A (zh) * 2018-02-09 2018-08-28 北京市商汤科技开发有限公司 实例分割方法和装置、电子设备、程序和介质

Family Cites Families (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP6395481B2 (ja) * 2014-07-11 2018-09-26 キヤノン株式会社 画像認識装置、方法及びプログラム
US9558268B2 (en) * 2014-08-20 2017-01-31 Mitsubishi Electric Research Laboratories, Inc. Method for semantically labeling an image of a scene using recursive context propagation
KR102450971B1 (ko) * 2015-05-08 2022-10-05 삼성전자주식회사 객체 인식 장치 및 방법
EP3156942A1 (en) 2015-10-16 2017-04-19 Thomson Licensing Scene labeling of rgb-d data with interactive option
US9881234B2 (en) 2015-11-25 2018-01-30 Baidu Usa Llc. Systems and methods for end-to-end object detection
CN106250812B (zh) 2016-07-15 2019-08-20 汤一平 一种基于快速r-cnn深度神经网络的车型识别方法
CN106709924B (zh) 2016-11-18 2019-11-22 中国人民解放军信息工程大学 基于深度卷积神经网络和超像素的图像语义分割方法
CN107085609A (zh) 2017-04-24 2017-08-22 国网湖北省电力公司荆州供电公司 一种基于神经网络进行多特征融合的行人检索方法
CN107169974A (zh) 2017-05-26 2017-09-15 中国科学技术大学 一种基于多监督全卷积神经网络的图像分割方法
CN107483920B (zh) 2017-08-11 2018-12-21 北京理工大学 一种基于多层级质量因子的全景视频评估方法及系统
US10679351B2 (en) * 2017-08-18 2020-06-09 Samsung Electronics Co., Ltd. System and method for semantic segmentation of images

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105512661A (zh) * 2015-11-25 2016-04-20 中国人民解放军信息工程大学 一种基于多模态特征融合的遥感影像分类方法
CN107424159A (zh) * 2017-07-28 2017-12-01 西安电子科技大学 基于超像素边缘和全卷积网络的图像语义分割方法
CN108335305A (zh) * 2018-02-09 2018-07-27 北京市商汤科技开发有限公司 图像分割方法和装置、电子设备、程序和介质
CN108460411A (zh) * 2018-02-09 2018-08-28 北京市商汤科技开发有限公司 实例分割方法和装置、电子设备、程序和介质

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112489060A (zh) * 2020-12-07 2021-03-12 北京医准智能科技有限公司 一种用于肺炎病灶分割的系统及方法
CN112489060B (zh) * 2020-12-07 2022-05-10 北京医准智能科技有限公司 一种用于肺炎病灶分割的系统及方法
CN113297991A (zh) * 2021-05-28 2021-08-24 杭州萤石软件有限公司 一种行为识别方法、装置及设备

Also Published As

Publication number Publication date
JP7032536B2 (ja) 2022-03-08
KR102438095B1 (ko) 2022-08-30
SG11201913332WA (en) 2020-01-30
US20200134365A1 (en) 2020-04-30
JP2021507388A (ja) 2021-02-22
US11270158B2 (en) 2022-03-08
KR20200087808A (ko) 2020-07-21

Similar Documents

Publication Publication Date Title
WO2019154201A1 (zh) 实例分割方法和装置、电子设备、程序和介质
CN108335305B (zh) 图像分割方法和装置、电子设备、程序和介质
CN108460411B (zh) 实例分割方法和装置、电子设备、程序和介质
US11321593B2 (en) Method and apparatus for detecting object, method and apparatus for training neural network, and electronic device
JP6963695B2 (ja) 単眼画像深度推定方法及び装置、機器、プログラム及び記憶媒体
WO2019129032A1 (zh) 遥感图像识别方法、装置、存储介质以及电子设备
KR102663519B1 (ko) 교차 도메인 이미지 변환 기법
JP2023541532A (ja) テキスト検出モデルのトレーニング方法及び装置、テキスト検出方法及び装置、電子機器、記憶媒体並びにコンピュータプログラム
CN108154222B (zh) 深度神经网络训练方法和系统、电子设备
US20230177643A1 (en) Image super-resolution
CN113657388A (zh) 一种融合图像超分辨率重建的图像语义分割方法
WO2022218012A1 (zh) 特征提取方法、装置、设备、存储介质以及程序产品
CN114037985A (zh) 信息提取方法、装置、设备、介质及产品
AlDahoul et al. Localization and classification of space objects using EfficientDet detector for space situational awareness
Liang et al. Hybrid transformer-CNN networks using superpixel segmentation for remote sensing building change detection
CN113409188A (zh) 一种图像背景替换方法、系统、电子设备及存储介质
GB2623399A (en) System, devices and/or processes for image anti-aliasing
Chen et al. Aggnet for self-supervised monocular depth estimation: Go an aggressive step furthe
Kim et al. Depth map super-resolution using guided deformable convolution
CN115272906A (zh) 一种基于点渲染的视频背景人像分割模型及算法
Aakerberg et al. Single-loss multi-task learning for improving semantic segmentation using super-resolution
Wang et al. Fast vehicle detection based on colored point cloud with bird’s eye view representation
Li et al. Light field reconstruction with arbitrary angular resolution using a deep Coarse-To-Fine framework
CN115239782A (zh) 用于呈现图像的方法、电子设备和存储介质
WO2023023160A1 (en) Depth information reconstruction from multi-view stereo (mvs) images

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19750841

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 20207016941

Country of ref document: KR

Kind code of ref document: A

ENP Entry into the national phase

Ref document number: 2020533099

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19750841

Country of ref document: EP

Kind code of ref document: A1