CN112396620A - Image semantic segmentation method and system based on multiple thresholds - Google Patents

Image semantic segmentation method and system based on multiple thresholds Download PDF

Info

Publication number
CN112396620A
CN112396620A CN202011284251.9A CN202011284251A CN112396620A CN 112396620 A CN112396620 A CN 112396620A CN 202011284251 A CN202011284251 A CN 202011284251A CN 112396620 A CN112396620 A CN 112396620A
Authority
CN
China
Prior art keywords
semantic segmentation
region
image
segmentation
interest
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011284251.9A
Other languages
Chinese (zh)
Inventor
耿玉水
刘建鑫
赵晶
张康
李文骁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Qilu University of Technology
Original Assignee
Qilu University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Qilu University of Technology filed Critical Qilu University of Technology
Priority to CN202011284251.9A priority Critical patent/CN112396620A/en
Publication of CN112396620A publication Critical patent/CN112396620A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/136Segmentation; Edge detection involving thresholding
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]

Abstract

The invention discloses an image semantic segmentation method and system based on multiple thresholds, which comprises the following steps: extracting the characteristics of the region of interest in the image multi-scale characteristic map according to the target object; segmenting the restored region of interest characteristics sequentially through a multi-level threshold value, and training a preset image semantic segmentation model; and processing the image to be segmented by using the trained image semantic segmentation model to obtain a semantic segmentation result. Extracting the characteristics of the region of interest by adopting a non-maximum inhibition method, and avoiding the problem of repeated proposal regions; and setting a multi-level threshold value for the segmentation branches, and optimizing the segmentation result by adopting the DenSeCRF, thereby improving the segmentation precision.

Description

Image semantic segmentation method and system based on multiple thresholds
Technical Field
The invention relates to the technical field of image processing, in particular to an image semantic segmentation method and system based on multiple thresholds.
Background
The statements in this section merely provide background information related to the present disclosure and may not necessarily constitute prior art.
In the field of machine vision, image segmentation refers to dividing an image into a plurality of non-overlapping sub-regions, so that features in the same sub-region have certain similarity, and features of different sub-regions show obvious differences. In practical problems, a large number of application scenes need to process a large amount of image data at the same time, the image types are complex, and the traditional image segmentation algorithms, such as a threshold-based segmentation algorithm, a watershed algorithm and the like, cannot meet the current requirements; as deep learning progresses rapidly, more and more deep learning solutions are applied to the field of machine vision, where image segmentation progresses depending on the development of deep learning.
At present, many image segmentation algorithms based on deep learning, such as VGGNet and ResNet networks, still have advantages in the field of feature extraction. Long J et al published 2015 on CVPR proposed FCN networks, most image segmentation methods more or less utilize FCN or some of it. Pinheiro et al propose a depth mask segmentation model that segments each instance object by an instance output prediction candidate mask appearing in the input image, but with a lower accuracy of boundary segmentation. He et al propose a Mask R-CNN framework, which is an algorithm with better example segmentation results in the existing segmentation algorithms. Huang et al propose Mask screening R-CNN to optimize information transfer of Mask R-CNN and improve quality of generating prediction Mask, aiming at the problem that the Mask R-CNN classification frame and prediction Mask share an evaluation function to interfere with the segmentation result, and meanwhile, the segmentation task has great advantages under the condition of not training mass data.
However, the inventor finds that, so far, the existing algorithm still has some defects, such as the model is too complex, the precision is not high, a large amount of labeled data is required for training, and the like; in most cases, the above problems cannot be handled simultaneously and must be taken or rejected; in addition, the proposal of the Mask ordering R-CNN algorithm is used for image instance segmentation, the threshold value is set as an important influence factor in order to obtain a better Mask prediction result, generally, the higher the threshold value is, the more accurate the prediction result is, but the too high threshold value can cause the number of positive samples to be sharply reduced, thereby causing the phenomenon of model overfitting; if the threshold value is too low, more redundant results are contained in the samples, so that the detector is difficult to distinguish positive and negative samples, and the training effect is influenced; therefore, in some scenes, the network has the problems that the segmentation edge of some images is rough and not fine enough, the segmentation edge exceeds or does not reach the expected position, the threshold value is set to be fixed, and the segmentation of the adjustment parameter on the image edge is not improved greatly.
Disclosure of Invention
In order to solve the problems, the invention provides an image semantic segmentation method and system based on multiple thresholds, which adopts a non-maximum inhibition method to extract the characteristics of an interested region and avoids the problem of repeated suggested regions; and setting a multi-level threshold value for the segmentation branches, and optimizing the segmentation result by adopting the DenSeCRF, thereby improving the segmentation precision.
In order to achieve the purpose, the invention adopts the following technical scheme:
in a first aspect, the present invention provides a method for segmenting image semantics based on multiple thresholds, including:
extracting the characteristics of the region of interest in the image multi-scale characteristic map according to the target object;
segmenting the restored region of interest characteristics sequentially through a multi-level threshold value, and training a preset image semantic segmentation model;
and processing the image to be segmented by using the trained image semantic segmentation model to obtain a semantic segmentation result.
In a second aspect, the present invention provides a multi-threshold-based image semantic segmentation system, including:
the characteristic extraction module is used for extracting the characteristics of the region of interest in the image multi-scale characteristic spectrum according to the target object;
the multi-stage segmentation module is used for sequentially segmenting the restored region-of-interest features through multi-stage thresholds so as to train a preset image semantic segmentation model;
and the processing module is used for processing the image to be segmented by using the trained image semantic segmentation model to obtain a semantic segmentation result.
In a third aspect, the present invention provides an electronic device comprising a memory and a processor, and computer instructions stored on the memory and executed on the processor, wherein when the computer instructions are executed by the processor, the method of the first aspect is performed.
In a fourth aspect, the present invention provides a computer readable storage medium for storing computer instructions which, when executed by a processor, perform the method of the first aspect.
Compared with the prior art, the invention has the beneficial effects that:
aiming at the problem of Mask screening R-CNN, namely the problem that the segmentation edge of an image exceeds or develops to an expected position, the invention provides a probability image segmentation algorithm based on multiple thresholds, and a network model can better screen the prediction result of the network by setting a multi-level threshold method, so that the prediction precision is higher; for the problem of image segmentation edge processing, the invention adds the segmentation effect of a DenseCRF optimization network in a segmentation branch, realizes efficient feature extraction, and improves the segmentation efficiency and precision.
Advantages of additional aspects of the invention will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, are included to provide a further understanding of the invention, and are incorporated in and constitute a part of this specification, illustrate exemplary embodiments of the invention and together with the description serve to explain the invention and not to limit the invention.
Fig. 1 is a flowchart of a multi-threshold-based image semantic segmentation method provided in embodiment 1 of the present invention;
FIG. 2 is a block diagram of a Multi-threshold robust mask branch structure provided in embodiment 1 of the present invention;
fig. 3 is a diagram of a MTPMS R-CNN network model provided in embodiment 1 of the present invention.
The specific implementation mode is as follows:
the invention is further described with reference to the following figures and examples.
It is to be understood that the following detailed description is exemplary and is intended to provide further explanation of the invention as claimed. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.
It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of exemplary embodiments according to the invention. As used herein, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise, and it should be understood that the terms "comprises" and "comprising", and any variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
The embodiments and features of the embodiments of the present invention may be combined with each other without conflict.
Example 1
As shown in fig. 1, the present embodiment provides a method for segmenting image semantics based on multiple thresholds, including:
s1: extracting the characteristics of the region of interest in the image multi-scale characteristic map according to the target object;
s2: segmenting the restored region of interest characteristics sequentially through a multi-level threshold value, and training a preset image semantic segmentation model;
s3: and processing the image to be segmented by using the trained image semantic segmentation model to obtain a semantic segmentation result.
In the embodiment, an image semantic segmentation model is pre-constructed on the basis of an MTPMS R-CNN network, ResNet-101 is used as a network main body structure, namely the number of network layers is 101, and due to the fact that images are various and complex, image features cannot be effectively extracted only through a single convolutional neural network, and therefore the feature pyramid network FPN is integrated in the network and is beneficial to feature extraction. The FPN adopts a transversely connected top-down hierarchical structure, single-scale input is carried out to construct a network feature pyramid, the multi-scale problem of extracting a target object from an image is solved, the feature pyramid network FPN has strong robustness and adaptability, and required parameters are few.
Therefore, in step S1, in this embodiment, the feature pyramid network structure is used to extract the multi-scale feature map of the image, the image is divided into different sizes, and features corresponding to different sizes are generated, the shallow features can distinguish simple large targets, and the deep features can distinguish small targets.
In addition, the DCN model has strong learning ability, and can automatically acquire high-order nonlinear feature combinations, however, the features are usually implicit, and the meaning is difficult to interpret. Cross Network proposed by DCN can explicitly and automatically acquire Cross features, is lighter than DNN and therefore inferior to DNN in expression capacity; therefore, the DCN is added into the network, and the performance of the network is effectively improved.
In step S1, feature maps of different scales generated by the front-layer network are input to the target detector, and the region-of-interest features are extracted from the feature maps of different levels of the feature pyramid according to the size of the target object, so that a simple network structure is changed, the detection performance of small targets is greatly improved, and the accuracy and speed are improved without increasing the calculation amount to a large extent.
In this embodiment, the target detector for extracting the features of the region of interest adopts an RPN, which is equivalent to a sliding window-based non-class target detector and is based on a convolutional neural network structure, specifically: according to the size of the target object, an anchor frame is generated by adopting sliding frame scanning, a plurality of anchor points with different sizes and aspect ratios can be generated in one suggested area, the anchor points overlap and cover the images as much as possible, the size of the suggested candidate frame area and the overlap (IOU) of the expected area directly influence the classification effect, and the characteristics of the region of interest are obtained according to the overlap ratio of the suggested area and the expected area.
Because anchor points are frequently overlapped, the suggested areas are finally overlapped on the same target, in order to solve the problem of repeated suggestion, the embodiment adopts a non-maximum suppression (NMS) algorithm to score the overlapping rate of the suggested candidate frame areas and the expected area, NMS obtains a suggestion list ordered according to scores, iterates the ordered list, discards suggestions with IOU values larger than a certain predefined threshold value, and proposes a suggestion with a higher score; specifically, the method comprises the following steps:
sorting the scores of all the suggested candidate frames, and selecting the highest score and the anchor frame corresponding to the highest score; and traversing the rest of frames, deleting the rest of frames if the overlapping area of the rest of frames and the current highest scoring frame is larger than a certain threshold, continuously selecting one frame with the highest score from the frames which are not processed, repeating the process, and obtaining the characteristics of the region of interest according to the overlapping rate of the region of the rest of suggested candidate frames and the expected region.
In this embodiment, 9 anchor frames with different sizes and different aspect ratios are adopted, and it should be noted that the edge of each anchor frame cannot be larger than the edge of the image.
Since image segmentation is a pixel-level operation, when segmentation is performed, it is necessary to determine whether a given pixel is a part of a target, accuracy is necessary at the pixel level, after a series of convolution and pooling operations are performed on an original image, and a frame fine adjustment step is provided in an RPN, a RoI frame may have different sizes, and if pixel-level segmentation is directly performed, an image target object cannot be accurately located, and therefore RoI must be restored.
Therefore, in step S2, in this embodiment, in the Mask R-CNN, the interest-region alignment layer (RoIAlign) uses a bilinear interpolation method, so as to retain the spatial information on the feature map, solve the error caused by the twice quantization of the feature map on the RoI Pooling layer, solve the problem of region mismatching of the image object, and implement pixel-level detection and segmentation.
The RoI alignment layer RoIAlign differs from RoI poling in that the RoI alignment layer RoIAlign eliminates quantization operations, does not quantize the RoI boundary cells, but calculates the exact position of the sampling point of each cell using bilinear interpolation, and then outputs the last fixed-size RoI using max-pool or average-pool operations.
In the Mask Scoring R-CNN, a segmentation branch and a detection branch are inserted in parallel, while the embodiment sets a multi-level threshold, and needs to set a multi-level branch, considering that image segmentation is a pixel-level operation, the detection of an instance object does not greatly help segmentation, and too many overlay networks cause slow operation; therefore, in step S2, the present embodiment sets a multi-level threshold only for the split branches;
the classification branch and the bounding box regression branch use the configuration in Mask Scoring R-CNN, so the present embodiment should consider the number of levels of Mask branch threshold setting, and add one segmentation branch in each Cascade stage, so as to maximize the sample diversity for learning the Mask predict task. Therefore, the result generated by the previous mask branch in this embodiment is passed to the next level of mask branch after polling as the input of the next level.
For the mask branch, the main body of the embodiment uses a Full Convolution (FCN) network, and then uses the DenseCRF to optimize the segmentation effect. Because the DenseCRF processes the image segmentation edge in the image segmentation process more finely, the embodiment enables the DenseCRF to optimize the segmentation result of the previous network on the basis of the original mask branch so as to improve the final segmentation precision; and due to the property of the DenseCRF, the embodiment only accesses the DenseCRF on the mask branch of the last stage.
To overcome the limitations of short-range conditional random fields, the present embodiment modifies the fully-connected conditional random field model using the following energy function:
Figure BDA0002781814970000081
where x is a label for pixel assignment, which is used as a single pointPotential energy thetai(xi)=-logP(xi),P(xi) Is the label distribution probability at pixel i calculated by the deep convolutional neural network; a pair of potentials has the same form and can be inferred by using a fully connected graph, for example, connecting all pairs i and j of image pixels, using the following expression:
Figure BDA0002781814970000082
wherein if xi≠xj,μ(xi,xj) 1, otherwise 0; the rest part of the expression adopts two Gaussian kernels with different feature spaces, the first is a bidirectional kernel between the pixel position (marked as P) and the RGB color (marked as C), and the second kernel is the pixel position; hyper-parameter sigmaαβγThe scale of the gaussian kernel is controlled, the first kernel forces the same and located pixels to have the same label, and the second kernel only considers spatial proximity when forcing smoothing.
In this embodiment, the model can effectively approximate probabilistic reasoning. Full decomposition mean field estimation e (x) piiei(xi) The information transfer update can be expressed as a Gaussian convolution under a double-sided space, and the calculation process is obviously accelerated by a high-dimensional filtering algorithm, so that the algorithm is very fast in practice.
In this embodiment, an end-to-end training mode is adopted, network parameters are updated and optimized in a unified manner, Mask branches output k binary masks with a size of k × m × m and a coding resolution of m × m, that is, each class corresponds to one Mask, and a sigmod function is used for each Mask, where l is a function of a maximum number of Mask branchesmaskIs the average binary cross entropy loss;
the ground route category of the RoI is k, lmaskDefined only in the kth class, i.e. although each point has k binary masks, there is only one mask pair l of the kth classmaskMake a contribution; thus, the mask branch has no inter-class contention.
Mask branches predict each category, three-level branches are still used when the classification layer is used for selecting and outputting a Mask to predict a segmentation Mask of an example object generated in a final target detection stage during reasoning, because the network has a certain optimization effect on the Mask, the final Mask predicts the Mask branch from the last level, a Mask branch schematic diagram is shown in fig. 2, and a network model structure diagram is shown in fig. 3.
In this embodiment, the loss function of the image semantic segmentation model is used to evaluate the degree of difference between the model prediction output and the ground truth, and can intuitively reflect the training effect of the model, and generally, the smaller the loss, the closer the prediction output is to the ground truth, and the better the performance of the model is. The loss function of the model of the embodiment is divided into two parts, wherein the first part is the loss function in the RPN, and the second part is the loss function of three branches; the RPN is used to generate candidate regions and fine-tune bounding boxes, so the penalty function of the RPN consists of the target recognition penalty and the bounding box regression penalty:
Figure BDA0002781814970000091
wherein i is the index of the mini-batch anchor box; n is a radical ofclsAnd NregIndicating the number of classification layers and regression layers respectively; p is a radical ofiThe predicted probability value representing the anchor is an object, if the anchor bin is negative
Figure BDA0002781814970000092
ti4 parameterized coordinates representing prediction candidate boxes;
Figure BDA0002781814970000093
4 parameterized coordinates representing a truth region; l isclsAnd LregRepresenting the classification loss and the regression loss, respectively, a parameter τ with a value of 10 is added in order to balance the influence of the two loss functions.
The branch loss is composed of four parts, respectively, the classification loss LclsRegression of the bounding box LboxThe division loss LmaskAnd a maskiou loss of Lmaskiou(ii) a The branch penalty function is as follows:
Lhead=Lcls+Lbox+Lmask+Lmaskiou
wherein, the classification loss uses cross entropy loss, the bounding box regression loss is smooth _ L1_ loss function, the partition mask loss uses binary _ cross _ entry _ with _ logits loss function, and the mask loss uses mean square error to calculate the regression loss between the prediction mask and the ground route; the final loss function is as follows:
Lfinal=L({pi},{ti})+Lhead
according to the method, the prediction result of the network can be better screened by setting the multi-level threshold, and the prediction precision of the result is higher; for the problem of image segmentation edge processing, the segmentation effect of a DenseCRF optimization network is added into a multi-stage segmentation branch to obtain an optimal image segmentation result.
Example 2
The embodiment provides an image semantic segmentation system based on multiple thresholds, which includes:
the characteristic extraction module is used for extracting the characteristics of the region of interest in the image multi-scale characteristic spectrum according to the target object;
the multi-stage segmentation module is used for sequentially segmenting the restored region-of-interest features through multi-stage thresholds so as to train a preset image semantic segmentation model;
and the processing module is used for processing the image to be segmented by using the trained image semantic segmentation model to obtain a semantic segmentation result.
It should be noted that the above modules correspond to steps S1 to S3 in embodiment 1, and the above modules are the same as the examples and application scenarios realized by the corresponding steps, but are not limited to the disclosure in embodiment 1. It should be noted that the modules described above as part of a system may be implemented in a computer system such as a set of computer-executable instructions.
In further embodiments, there is also provided:
an electronic device comprising a memory and a processor and computer instructions stored on the memory and executed on the processor, the computer instructions when executed by the processor performing the method of embodiment 1. For brevity, no further description is provided herein.
It should be understood that in this embodiment, the processor may be a central processing unit CPU, and the processor may also be other general purpose processors, digital signal processors DSP, application specific integrated circuits ASIC, off-the-shelf programmable gate arrays FPGA or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, and so on. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The memory may include both read-only memory and random access memory, and may provide instructions and data to the processor, and a portion of the memory may also include non-volatile random access memory. For example, the memory may also store device type information.
A computer readable storage medium storing computer instructions which, when executed by a processor, perform the method described in embodiment 1.
The method in embodiment 1 may be directly implemented by a hardware processor, or implemented by a combination of hardware and software modules in the processor. The software modules may be located in ram, flash, rom, prom, or eprom, registers, among other storage media as is well known in the art. The storage medium is located in a memory, and a processor reads information in the memory and completes the steps of the method in combination with hardware of the processor. To avoid repetition, it is not described in detail here.
Those of ordinary skill in the art will appreciate that the various illustrative elements, i.e., algorithm steps, described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
The above is only a preferred embodiment of the present invention, and is not intended to limit the present invention, and various modifications and changes will occur to those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.
Although the embodiments of the present invention have been described with reference to the accompanying drawings, it is not intended to limit the scope of the present invention, and it should be understood by those skilled in the art that various modifications and variations can be made without inventive efforts by those skilled in the art based on the technical solution of the present invention.

Claims (10)

1. A multi-threshold-based image semantic segmentation method is characterized by comprising the following steps:
extracting the characteristics of the region of interest in the image multi-scale characteristic map according to the target object;
segmenting the restored region of interest characteristics sequentially through a multi-level threshold value, and training a preset image semantic segmentation model;
and processing the image to be segmented by using the trained image semantic segmentation model to obtain a semantic segmentation result.
2. The multi-threshold-based image semantic segmentation method as claimed in claim 1, wherein a feature pyramid network is used to extract a multi-scale feature map of the image, and the multi-scale feature map is input into a pre-trained target detector to extract region-of-interest features of a target object.
3. The method for image semantic segmentation based on multiple threshold values as claimed in claim 1, wherein the extracting the region-of-interest features comprises: and generating an anchor frame by adopting sliding frame scanning according to the size of the target object, and obtaining the characteristics of the region of interest according to the overlapping rate of the suggested candidate frame region and the expected region.
4. The image semantic segmentation method based on the multiple thresholds as claimed in claim 3, characterized in that, the overlapping rate of the suggested candidate frame area and the expected area is scored, the suggested candidate frame area with the highest score and the corresponding anchor frame are selected, the anchor frames of the other suggested candidate frame areas are traversed, and if the overlapping rate of the other anchor frames and the anchor frame with the current highest score is greater than a preset threshold, the anchor frames are deleted; and obtaining the region-of-interest characteristics according to the overlapping rate of the residual suggested candidate box region and the expected region.
5. The image semantic segmentation method based on multiple thresholds as set forth in claim 1, characterized in that a bilinear interpolation method of RoIAlign is adopted to restore the features of the region of interest.
6. The multi-threshold-based image semantic segmentation method according to claim 1, wherein a DenSeCRF is added to the segmentation branch of the last level threshold to optimize the segmentation result.
7. The multi-threshold-based image semantic segmentation method according to claim 1, wherein the loss functions of the image semantic segmentation model comprise a target recognition loss function, a bounding box regression loss function and a segmentation branch loss function, and the segmentation branch loss functions comprise a classification loss, a bounding box regression, a segmentation loss and a mask iou loss function.
8. A multi-threshold based image semantic segmentation system, comprising:
the characteristic extraction module is used for extracting the characteristics of the region of interest in the image multi-scale characteristic spectrum according to the target object;
the multi-stage segmentation module is used for sequentially segmenting the restored region-of-interest features through multi-stage thresholds so as to train a preset image semantic segmentation model;
and the processing module is used for processing the image to be segmented by using the trained image semantic segmentation model to obtain a semantic segmentation result.
9. An electronic device comprising a memory and a processor and computer instructions stored on the memory and executed on the processor, the computer instructions when executed by the processor performing the method of any of claims 1-7.
10. A computer-readable storage medium storing computer instructions which, when executed by a processor, perform the method of any one of claims 1 to 7.
CN202011284251.9A 2020-11-17 2020-11-17 Image semantic segmentation method and system based on multiple thresholds Pending CN112396620A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011284251.9A CN112396620A (en) 2020-11-17 2020-11-17 Image semantic segmentation method and system based on multiple thresholds

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011284251.9A CN112396620A (en) 2020-11-17 2020-11-17 Image semantic segmentation method and system based on multiple thresholds

Publications (1)

Publication Number Publication Date
CN112396620A true CN112396620A (en) 2021-02-23

Family

ID=74600583

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011284251.9A Pending CN112396620A (en) 2020-11-17 2020-11-17 Image semantic segmentation method and system based on multiple thresholds

Country Status (1)

Country Link
CN (1) CN112396620A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113076816A (en) * 2021-03-17 2021-07-06 上海电力大学 Solar photovoltaic module hot spot identification method based on infrared and visible light images

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108596184A (en) * 2018-04-25 2018-09-28 清华大学深圳研究生院 Training method, readable storage medium storing program for executing and the electronic equipment of image, semantic parted pattern
CN109583517A (en) * 2018-12-26 2019-04-05 华东交通大学 A kind of full convolution example semantic partitioning algorithm of the enhancing suitable for small target deteection
CN109598728A (en) * 2018-11-30 2019-04-09 腾讯科技(深圳)有限公司 Image partition method, device, diagnostic system and storage medium
CN109816669A (en) * 2019-01-30 2019-05-28 云南电网有限责任公司电力科学研究院 A kind of improvement Mask R-CNN image instance dividing method identifying power equipments defect
CN110232380A (en) * 2019-06-13 2019-09-13 应急管理部天津消防研究所 Fire night scenes restored method based on Mask R-CNN neural network
CN110599448A (en) * 2019-07-31 2019-12-20 浙江工业大学 Migratory learning lung lesion tissue detection system based on MaskScoring R-CNN network
CN111339882A (en) * 2020-02-19 2020-06-26 山东大学 Power transmission line hidden danger detection method based on example segmentation
CN111401293A (en) * 2020-03-25 2020-07-10 东华大学 Gesture recognition method based on Head lightweight Mask scanning R-CNN
CN111489327A (en) * 2020-03-06 2020-08-04 浙江工业大学 Cancer cell image detection and segmentation method based on Mask R-CNN algorithm
CN111862119A (en) * 2020-07-21 2020-10-30 武汉科技大学 Semantic information extraction method based on Mask-RCNN
US20200349763A1 (en) * 2019-05-03 2020-11-05 Facebook Technologies, Llc Semantic Fusion

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108596184A (en) * 2018-04-25 2018-09-28 清华大学深圳研究生院 Training method, readable storage medium storing program for executing and the electronic equipment of image, semantic parted pattern
CN109598728A (en) * 2018-11-30 2019-04-09 腾讯科技(深圳)有限公司 Image partition method, device, diagnostic system and storage medium
CN109583517A (en) * 2018-12-26 2019-04-05 华东交通大学 A kind of full convolution example semantic partitioning algorithm of the enhancing suitable for small target deteection
CN109816669A (en) * 2019-01-30 2019-05-28 云南电网有限责任公司电力科学研究院 A kind of improvement Mask R-CNN image instance dividing method identifying power equipments defect
US20200349763A1 (en) * 2019-05-03 2020-11-05 Facebook Technologies, Llc Semantic Fusion
CN110232380A (en) * 2019-06-13 2019-09-13 应急管理部天津消防研究所 Fire night scenes restored method based on Mask R-CNN neural network
CN110599448A (en) * 2019-07-31 2019-12-20 浙江工业大学 Migratory learning lung lesion tissue detection system based on MaskScoring R-CNN network
CN111339882A (en) * 2020-02-19 2020-06-26 山东大学 Power transmission line hidden danger detection method based on example segmentation
CN111489327A (en) * 2020-03-06 2020-08-04 浙江工业大学 Cancer cell image detection and segmentation method based on Mask R-CNN algorithm
CN111401293A (en) * 2020-03-25 2020-07-10 东华大学 Gesture recognition method based on Head lightweight Mask scanning R-CNN
CN111862119A (en) * 2020-07-21 2020-10-30 武汉科技大学 Semantic information extraction method based on Mask-RCNN

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
JIFENG DAI,ET AL: "《Instance-aware Semantic Segmentation via Multi-task Network Cascades》", 《ARXIV:1512.04412V1》 *
XIAOXIAO LI,ET AL: "《Not All Pixels Are Equal: Difficulty-Aware Semantic Segmentation via Deep Layer Cascade》", 《2017 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR)》 *
ZHAOJIN HUANG ET AL: "《Mask Scoring R-CNN》", 《ARXIV PREPRINT:ARXIV:1903.00241V1》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113076816A (en) * 2021-03-17 2021-07-06 上海电力大学 Solar photovoltaic module hot spot identification method based on infrared and visible light images
CN113076816B (en) * 2021-03-17 2023-06-02 上海电力大学 Solar photovoltaic module hot spot identification method based on infrared and visible light images

Similar Documents

Publication Publication Date Title
Theis et al. Faster gaze prediction with dense networks and fisher pruning
CN108304798B (en) Street level order event video detection method based on deep learning and motion consistency
CN109376572B (en) Real-time vehicle detection and trajectory tracking method in traffic video based on deep learning
CN109978807B (en) Shadow removing method based on generating type countermeasure network
CN106446896B (en) Character segmentation method and device and electronic equipment
CN112150821B (en) Lightweight vehicle detection model construction method, system and device
CN112507777A (en) Optical remote sensing image ship detection and segmentation method based on deep learning
CN109993101B (en) Vehicle detection method based on multi-branch circulation self-attention network and circulation frame regression
CN110991311A (en) Target detection method based on dense connection deep network
CN111027493A (en) Pedestrian detection method based on deep learning multi-network soft fusion
CN111539343B (en) Black smoke vehicle detection method based on convolution attention network
CN110826558B (en) Image classification method, computer device, and storage medium
CN111027475A (en) Real-time traffic signal lamp identification method based on vision
CN109242826B (en) Mobile equipment end stick-shaped object root counting method and system based on target detection
CN111986125A (en) Method for multi-target task instance segmentation
CN111738055A (en) Multi-class text detection system and bill form detection method based on same
CN114187311A (en) Image semantic segmentation method, device, equipment and storage medium
CN111898432A (en) Pedestrian detection system and method based on improved YOLOv3 algorithm
CN114359245A (en) Method for detecting surface defects of products in industrial scene
CN110852199A (en) Foreground extraction method based on double-frame coding and decoding model
CN110889360A (en) Crowd counting method and system based on switching convolutional network
CN111986126A (en) Multi-target detection method based on improved VGG16 network
CN115565043A (en) Method for detecting target by combining multiple characteristic features and target prediction method
CN114882423A (en) Truck warehousing goods identification method based on improved Yolov5m model and Deepsort
CN111931572B (en) Target detection method for remote sensing image

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20210223