CN114187211A - Image processing method and device for optimizing image semantic segmentation result - Google Patents

Image processing method and device for optimizing image semantic segmentation result Download PDF

Info

Publication number
CN114187211A
CN114187211A CN202111525227.4A CN202111525227A CN114187211A CN 114187211 A CN114187211 A CN 114187211A CN 202111525227 A CN202111525227 A CN 202111525227A CN 114187211 A CN114187211 A CN 114187211A
Authority
CN
China
Prior art keywords
semantic segmentation
image
training
segmentation model
graph
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111525227.4A
Other languages
Chinese (zh)
Inventor
张文俊
孙军欢
张春海
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Zhixing Technology Co Ltd
Original Assignee
Shenzhen Zhixing Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Zhixing Technology Co Ltd filed Critical Shenzhen Zhixing Technology Co Ltd
Priority to CN202111525227.4A priority Critical patent/CN114187211A/en
Publication of CN114187211A publication Critical patent/CN114187211A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/20Image enhancement or restoration using local operators
    • G06T5/30Erosion or dilatation, e.g. thinning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/136Segmentation; Edge detection involving thresholding
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/187Segmentation; Edge detection involving region growing; involving region merging; involving connected component labelling
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20092Interactive image processing based on input by user
    • G06T2207/20104Interactive definition of region of interest [ROI]

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

The application relates to an image processing method and device for optimizing image semantic segmentation results. The method comprises the following steps: obtaining a semantic segmentation mask image, wherein each pixel point of the semantic segmentation mask image is allocated with a corresponding mask based on the identified object type, the semantic segmentation mask image comprises a plurality of connected regions, and each connected region consists of pixel points which are allocated with the same mask and are continuously distributed; inputting the semantic segmentation mask map into a filtering module, removing connected regions with the size smaller than a preset size threshold from the plurality of connected regions, and generating a first filtering result map; inputting the first filtering result graph into a reverse module to obtain a first reverse result graph; inputting the first inversion result graph into a filtering module, and removing a connected region with the size smaller than a preset size threshold value from the connected region included in the first inversion result graph to generate a second filtering result graph; and inputting the second filtering result image into a reverse module to obtain the optimized semantic segmentation mask image. This reduces prediction error and improves recognition accuracy.

Description

Image processing method and device for optimizing image semantic segmentation result
Technical Field
The application relates to the technical field of computer vision, in particular to an image processing method and device for optimizing image semantic segmentation results.
Background
With the development of artificial intelligence technology, deep learning technology has made significant development in the field of computer vision technology. The face recognition product based on the computer vision technology is widely applied to places such as entry and exit ports, railway stations, airport halls and the like, and the aim of identity detection and judgment is achieved by extracting face features from collected images, comparing and searching. In the field of industrial application, detection means based on computer vision technologies such as image semantic segmentation technologies are also used for target detection, automatic identification and generation of corresponding decisions such as sorting, transferring and the like for different objects. However, in the field of industrial application, objects with similar shapes are often required to be detected, and the detected objects are generally densely packed with many other objects and may be shielded from each other, so that the precise identification is challenging, and especially, a detection means based on an image semantic segmentation technology may face a large error of a prediction result.
Therefore, an image processing method and an image processing device for optimizing the image semantic segmentation result are needed, which can reduce the prediction error and improve the recognition accuracy, and are suitable for the detection based on the image semantic segmentation technology in the industrial application field.
Disclosure of Invention
In a first aspect, an embodiment of the present application provides an image processing method, which is used for optimizing an image semantic segmentation result. The image processing method comprises the following steps: obtaining a semantic segmentation mask image output by an image semantic segmentation model, wherein each pixel point of the semantic segmentation mask image is allocated with a mask corresponding to the type of an object based on the type of the object identified by the image semantic segmentation model, the semantic segmentation mask image comprises a plurality of connected regions, and each connected region of the plurality of connected regions consists of one or more pixel points which are allocated with the same mask and are continuously distributed; inputting the semantic segmentation mask image into a filtering module, and removing connected regions with the size smaller than a preset size threshold value from the plurality of connected regions by using the filtering module so as to generate a first filtering result image; inputting the first filtering result graph into a reverse module to perform reverse operation to obtain a first reverse result graph; inputting the first inversion result diagram into the filtering module, and removing connected regions with the size smaller than the preset size threshold from the connected regions included in the first inversion result diagram by using the filtering module, so as to generate a second filtering result diagram; and inputting the second filtering result graph into the reversing module to perform reversing operation to obtain the optimized semantic segmentation mask graph.
According to the technical scheme described in the first aspect, through the multiplexing filtering module and the reversing module, the connected regions with the size smaller than the preset size threshold on the semantic segmentation mask graph and the connected regions with the size smaller than the preset size threshold on the first reversing result graph are removed, the problem of inaccurate boundary prediction is favorably solved, the edge recognition error is reduced, the mask boundary is smoother, the prediction error is reduced, and the recognition accuracy is improved.
According to a possible implementation manner of the technical solution of the first aspect, an embodiment of the present application further provides that the image processing method further includes: performing erosion and dilation processing on the semantically segmented mask map before inputting the semantically segmented mask map into the filtering module.
According to a possible implementation manner of the technical solution of the first aspect, an embodiment of the present application further provides that the preset size threshold is based on a prior probability associated with a training process of the image semantic segmentation model.
According to a possible implementation manner of the technical solution of the first aspect, an embodiment of the present application further provides that the preset size threshold is based on a prior probability associated with a training process of the image semantic segmentation model, and the preset size threshold includes: and according to a training data set associated with the training process of the image semantic segmentation model, calculating the respective occupied areas of connected regions formed by pixel points respectively corresponding to various object types in the training data set, selecting the connected region with the minimum area, and taking the side length of the minimum circumscribed square of the connected region with the minimum area as the preset size threshold.
According to a possible implementation manner of the technical solution of the first aspect, an embodiment of the present application further provides that the preset size threshold is based on a prior probability associated with a training process of the image semantic segmentation model, and the preset size threshold includes: and calculating the respective areas of all connected regions in the training data set according to the labels of the training data set associated with the training process of the image semantic segmentation model, and taking the side length of the minimum circumscribed square of the connected region with the minimum area as the preset size threshold.
According to a possible implementation manner of the technical solution of the first aspect, an embodiment of the present application further provides that the preset size threshold is a designated numerical value.
According to a possible implementation manner of the technical solution of the first aspect, an embodiment of the present application further provides that the specified numerical value is preset according to an application scenario of the image processing method.
According to a possible implementation manner of the technical solution of the first aspect, an embodiment of the present application further provides that the training process of the image semantic segmentation model includes: generating at least one correction graph corresponding to at least one training graph according to the at least one training graph used for training the image semantic segmentation model, and training the image semantic segmentation model by using the at least one correction graph, wherein the generation process of each correction graph of the at least one correction graph comprises the following steps: obtaining a loss result obtained after a training image corresponding to the correction image performs forward calculation and backward propagation in the image semantic segmentation model, wherein the loss result comprises loss values of a plurality of interested regions ROI; selecting the highest loss value from the loss values of the ROIs and calculating the intersection ratio of the ROIs and the ROI corresponding to the highest loss value respectively so as to screen out the ROI of which the intersection ratio is smaller than a preset intersection ratio threshold; the correction map is generated by extracting a portion corresponding to the screened ROI from the training map.
According to a possible implementation manner of the technical solution of the first aspect, an embodiment of the present application further provides that the at least one correction map is generated by a difficult-to-sample mining module, and the difficult-to-sample mining module obtains a loss result obtained after a training map corresponding to each correction map of the at least one correction map is executed in the image semantic segmentation model through a training process of sampling the image semantic segmentation model.
According to a possible implementation manner of the technical solution of the first aspect, an embodiment of the present application further provides that selecting a highest loss value from the loss values of the plurality of ROIs and calculating intersection ratios of the plurality of ROIs respectively corresponding to the highest loss value to screen out an ROI whose intersection ratio is smaller than the preset intersection ratio threshold includes: and sequencing the intersection ratios of the ROIs respectively corresponding to the highest loss value and the ROIs according to the sequence from high to low, and screening out a specific number of ROIs with the intersection ratios smaller than the preset intersection ratio threshold from the lowest intersection ratio.
According to a possible implementation manner of the technical solution of the first aspect, an embodiment of the present application further provides that the training process of the image semantic segmentation model includes: inputting a training data set associated with a training process of the image semantic segmentation model into a difficult case mining module; generating a correction data set corresponding to the training data set through the difficult-to-sample mining module, and training the image semantic segmentation model by using the correction data set, wherein for at least one training image in the training data set, the process of generating a correction image corresponding to the at least one training image through the difficult-to-sample mining module comprises the following steps: obtaining a loss result obtained after the at least one training image performs forward calculation and backward propagation in the image semantic segmentation model, wherein the loss result comprises loss values of a plurality of ROIs; selecting the highest loss value from the loss values of the ROIs and calculating the intersection ratio of the ROIs and the ROI corresponding to the highest loss value respectively so as to screen out the ROI of which the intersection ratio is smaller than a preset intersection ratio threshold; extracting a portion corresponding to the screened ROI from the at least one training image to generate the correction image.
According to a possible implementation manner of the technical solution of the first aspect, an embodiment of the present application further provides that the image processing method is used for detecting a scrap set in a transportation process of the scrap set, the optimized semantic segmentation mask map is used for determining at least one piece of associated information of the scrap set, and the at least one piece of associated information of the scrap set includes at least one of the following: contour information, category information, source information, coordinate information, area information, pixel feature information.
In a second aspect, the present application provides a non-transitory computer-readable storage medium storing computer instructions which, when executed by a processor, implement the image processing method according to any one of the first aspect.
According to the technical scheme described in the second aspect, through the multiplexing filtering module and the reversing module, the connected regions with the size smaller than the preset size threshold value on the semantic segmentation mask graph and the connected regions with the size smaller than the preset size threshold value on the first reversing result graph are removed, the problem of inaccurate boundary prediction is solved, the edge recognition error is reduced, the smoother mask boundary is reduced, and the prediction error is reduced and the recognition accuracy is improved.
In a third aspect, an embodiment of the present application provides an electronic device, including: a processor; a memory for storing processor-executable instructions; wherein the processor implements the image processing method according to any one of the first aspect by executing the executable instructions.
According to the technical scheme described in the third aspect, through the multiplexing filtering module and the reversing module, the connected regions with the size smaller than the preset size threshold value on the semantic segmentation mask graph and the connected regions with the size smaller than the preset size threshold value on the first reversing result graph are removed, the problem of inaccurate boundary prediction is favorably solved, the edge recognition error is reduced, the mask boundary is smoother, the prediction error is reduced, and the recognition accuracy is improved.
In a fourth aspect, an embodiment of the present application provides an image processing apparatus, configured to optimize an image semantic segmentation result. The image processing apparatus includes: the semantic segmentation mask map comprises a plurality of connected regions, wherein each connected region of the plurality of connected regions consists of one or more pixels which are distributed with the same mask and are continuously distributed; a filtering module, configured to perform a filtering operation on the semantic segmentation mask map so as to remove connected regions with a size smaller than a preset size threshold from a plurality of connected regions included in the semantic segmentation mask map and generate a first filtering result map; the reversing module is used for reversing the first filtering result graph to obtain a first reversing result graph, wherein the filtering module is further used for filtering the first reversing result graph so as to remove a connected region with the size smaller than the preset size threshold value from the connected region included in the first reversing result graph and generate a second filtering result graph, and the reversing module is further used for reversing the second filtering result graph to obtain an optimized semantic segmentation mask graph.
According to the technical scheme described in the fourth aspect, through the multiplexing filtering module and the reversing module, the connected regions with the size smaller than the preset size threshold on the semantic segmentation mask graph and the connected regions with the size smaller than the preset size threshold on the first reversing result graph are removed, the problem of inaccurate boundary prediction is solved, the edge recognition error is reduced, the smoother mask boundary is eliminated, the prediction error is reduced, and the recognition accuracy is improved.
According to a possible implementation manner of the technical solution of the fourth aspect, the embodiment of the present application further provides that the preset size threshold is based on a prior probability associated with a training process of the image semantic segmentation model.
According to a possible implementation manner of the technical solution of the fourth aspect, an embodiment of the present application further provides that the preset size threshold is based on a prior probability associated with a training process of the image semantic segmentation model, and the preset size threshold includes: and according to a training data set associated with the training process of the image semantic segmentation model, calculating the respective occupied areas of connected regions formed by pixel points respectively corresponding to various object types in the training data set, selecting the connected region with the minimum area, and taking the side length of the minimum circumscribed square of the connected region with the minimum area as the preset size threshold.
According to a possible implementation manner of the technical solution of the fourth aspect, an embodiment of the present application further provides that the preset size threshold is based on a prior probability associated with a training process of the image semantic segmentation model, and the preset size threshold includes: and calculating the respective areas of all connected regions in the training data set according to the labels of the training data set associated with the training process of the image semantic segmentation model, and taking the side length of the minimum circumscribed square of the connected region with the minimum area as the preset size threshold.
According to a possible implementation manner of the technical solution of the fourth aspect, an embodiment of the present application further provides that the training process of the image semantic segmentation model includes: generating at least one correction graph corresponding to at least one training graph according to the at least one training graph used for training the image semantic segmentation model, and training the image semantic segmentation model by using the at least one correction graph, wherein the generation process of each correction graph of the at least one correction graph comprises the following steps: obtaining a loss result obtained after a training image corresponding to the correction image performs forward calculation and backward propagation in the image semantic segmentation model, wherein the loss result comprises loss values of a plurality of interested regions ROI; selecting the highest loss value from the loss values of the ROIs and calculating the intersection ratio of the ROIs and the ROI corresponding to the highest loss value respectively so as to screen out the ROI of which the intersection ratio is smaller than a preset intersection ratio threshold; the correction map is generated by extracting a portion corresponding to the screened ROI from the training map.
According to a possible implementation manner of the technical solution of the fourth aspect, an embodiment of the present application further provides that the training process of the image semantic segmentation model includes: inputting a training data set associated with a training process of the image semantic segmentation model into a difficult case mining module; generating a correction data set corresponding to the training data set through the difficult-to-sample mining module, and training the image semantic segmentation model by using the correction data set, wherein for at least one training image in the training data set, the process of generating a correction image corresponding to the at least one training image through the difficult-to-sample mining module comprises the following steps: obtaining a loss result obtained after the at least one training image performs forward calculation and backward propagation in the image semantic segmentation model, wherein the loss result comprises loss values of a plurality of ROIs; selecting the highest loss value from the loss values of the ROIs and calculating the intersection ratio of the ROIs and the ROI corresponding to the highest loss value respectively so as to screen out the ROI of which the intersection ratio is smaller than a preset intersection ratio threshold; extracting a portion corresponding to the screened ROI from the at least one training image to generate the correction image.
Drawings
In order to explain the technical solutions in the embodiments or background art of the present application, the drawings used in the embodiments or background art of the present application will be described below.
Fig. 1 shows a flowchart of an image processing method for optimizing an image semantic segmentation result according to an embodiment of the present application.
Fig. 2 shows a block diagram of an electronic device used in the image processing method shown in fig. 1 according to an embodiment of the present application.
Fig. 3 shows a block diagram of an image processing apparatus for optimizing semantic segmentation results of an image according to an embodiment of the present application.
Detailed Description
In order to solve the technical problem of how to reduce prediction errors and improve recognition accuracy, the embodiment of the application provides an image processing method and an image processing device for optimizing an image semantic segmentation result. The image processing method comprises the following steps: obtaining a semantic segmentation mask image output by an image semantic segmentation model, wherein each pixel point of the semantic segmentation mask image is allocated with a mask corresponding to the type of an object based on the type of the object identified by the image semantic segmentation model, the semantic segmentation mask image comprises a plurality of connected regions, and each connected region of the plurality of connected regions consists of one or more pixel points which are allocated with the same mask and are continuously distributed; inputting the semantic segmentation mask image into a filtering module, and removing connected regions with the size smaller than a preset size threshold value from the plurality of connected regions by using the filtering module so as to generate a first filtering result image; inputting the first filtering result graph into a reverse module to perform reverse operation to obtain a first reverse result graph; inputting the first inversion result diagram into the filtering module, and removing connected regions with the size smaller than the preset size threshold from the connected regions included in the first inversion result diagram by using the filtering module, so as to generate a second filtering result diagram; and inputting the second filtering result graph into the reversing module to perform reversing operation to obtain the optimized semantic segmentation mask graph. Therefore, through the multiplexing filtering module and the reversing module, the connected region with the size smaller than the preset size threshold on the semantic segmentation mask graph and the connected region with the size smaller than the preset size threshold on the first reversing result graph are removed, the problem of inaccurate boundary prediction is solved, the edge recognition error is reduced, the mask boundary is smoother, and the prediction error is reduced and the recognition accuracy is improved.
The embodiment of the application can be applied to the following application scenes, including but not limited to, industrial automation, goods sorting in logistics centers, port automation, intelligent automatic goods inspection and judgment, waste steel recovery, intelligent automatic waste steel inspection and judgment, and any application scenes, such as coal automatic sorting, garbage recovery, garbage automatic sorting and the like, which can improve the production efficiency and reduce the labor cost through the identification method and device for intelligent material inspection and judgment.
The embodiments of the present application may be modified and improved according to specific application environments, and are not limited herein.
In order to make the technical field of the present application better understand, embodiments of the present application will be described below with reference to the accompanying drawings in the embodiments of the present application.
Aspects of the present application and various embodiments and implementations mentioned below relate to concepts of artificial intelligence, machine learning, and neural networks. In general, Artificial Intelligence (AI) studies the nature of human Intelligence and builds intelligent machines that can react in a manner similar to human Intelligence. Research in the field of artificial intelligence applications includes robotics, speech recognition, natural language processing, image recognition, decision reasoning, human-computer interaction, expert systems, and the like. Machine Learning (ML) studies how artificial intelligence systems model or implement human Learning behavior, acquire new knowledge or skills, reorganize existing knowledge structures, and improve self-competency. Machine learning learns rules from a large number of samples, data, or experiences through various algorithms to identify new samples or to make decisions and predictions about events. Examples of machine learning algorithms include decision tree learning, bayesian classification, support vector machines, clustering algorithms, and the like. Deep Learning (DL) refers to the natural Deep structures of the human brain and cognitive processes graded by depth, studies how to input large amounts of data into complex models, and "trains" the models to learn how to grab features. Neural Networks (NN) can be divided into Artificial Neural Networks (ANN) and Spiking Neural Networks (SNN). The SNN simulates a pulse neuron model of a biological nerve working mechanism, and pulse coding information is adopted in the calculation process. Currently, ANN is widely used. The neural network NN referred to herein generally refers to an artificial neural network, i.e., an ANN, unless specified otherwise or indicated otherwise or a different interpretation is made in conjunction with the context.
The ANN refers to an algorithmic mathematical model which is established by the inspiration of a brain neuron structure and a nerve conduction principle, and has a network structure which imitates animal neural network behavior characteristics to process information. Neural networks comprise a large number of interconnected nodes or neurons, sometimes referred to as artificial neurons or perceptrons, which are inspired by the structure of neurons in the brain. The Shallow Neural Network (shadow Neural Network) only comprises an input layer and an output layer, wherein the input layer is responsible for receiving input signals, and the output layer is responsible for outputting calculation results of the Neural Network. After the input signals are linearly combined, an Activation Function (Activation Function) is applied to the input signals for transformation to obtain a result of an output layer. The complex model used in Deep learning is mainly a multi-layer Neural Network, sometimes referred to as Deep Neural Network (DNN). The multi-layer neural network includes hidden layers in addition to an input layer and an output layer, each hidden layer includes an arbitrary number of neurons connected as nodes with a node of a previous layer in a network structure, and each neuron can be regarded as a linear combiner and assigns a weight to each connected input value for weighted linear combination. The activation function is a nonlinear mapping after weighted linear combination of input signals, which in a multilayer neural network can be understood as a functional relationship between the output of a neuron in a previous layer and the input of a neuron in a next layer. Each hidden layer may have a different activation function. Common activation functions are ReLU, Sigmoid, Tanh, etc. The neural network passes the information of each layer to the next layer through the mesh structure. The forward propagation is a process of calculating layer by layer from an input layer to an output layer, the weighted linear combination and the transformation are repeatedly carried out in the forward propagation process, and finally, a Loss Function (Loss Function) is calculated and used for measuring the deviation degree between the predicted value and the true value of the model. The back propagation is to propagate from the output layer to the hidden layer to the input layer, and the neural network parameters are corrected according to the error between the actual output and the expected output in the back propagation process. DNN can be classified into Convolutional Neural Network (CNN), Fully Connected Neural Network (FCN), and Recurrent Neural Network (RNN) according to the composition of a base layer. The CNN is composed of a convolutional layer, a pooling layer and a full-link layer. The FCN consists of multiple fully connected layers. The RNN consists of fully connected layers but with feedback paths and gating operations between layers, also called recursive layers. Different types of neural network base layers have different computational characteristics and computational requirements, for example, the computation proportion of convolutional layers in some neural networks is high and the computation amount of each convolutional layer is large. In addition, the calculation parameters of each convolution layer of the neural network, such as the convolution kernel size and the input/output characteristic diagram size, vary widely.
Fig. 1 shows a flowchart of an image processing method for optimizing an image semantic segmentation result according to an embodiment of the present application. As shown in fig. 1, the image processing method 100 includes the following steps.
Step S102: and obtaining a semantic segmentation mask image output by the image semantic segmentation model.
Each pixel point of the semantic segmentation mask map is allocated with a mask corresponding to the object type based on the object type identified by the image semantic segmentation model of the pixel point, the semantic segmentation mask map comprises a plurality of connected regions, and each connected region of the plurality of connected regions consists of one or more pixel points which are allocated with the same mask and are continuously distributed.
Step S104: and inputting the semantic segmentation mask map into a filtering module, and removing connected regions with the size smaller than a preset size threshold from a plurality of connected regions included in the semantic segmentation mask map by using the filtering module so as to generate a first filtering result map.
Step S106: and inputting the first filtering result graph into a reverse module to perform reverse operation to obtain a first reverse result graph.
And S108, inputting the first inversion result graph into the filtering module, and removing the connected regions with the size smaller than the preset size threshold from the connected regions included in the first inversion result graph by using the filtering module so as to generate a second filtering result graph.
Step S110: and inputting the second filtering result graph into the reversing module to perform reversing operation to obtain the optimized semantic segmentation mask graph.
The image processing method 100 obtains an optimized semantic segmentation mask map based on the semantic segmentation mask map output by the image semantic segmentation model. Here, the image semantic segmentation model identifies each pixel point on the image, assigns categories and labels the corresponding categories, thereby obtaining a pixel-level prediction result of the image. The semantic segmentation mask image output by the image semantic segmentation model provides labeled category or semantic information of each pixel point, and pixels labeled as the same category or having the same semantic information are distributed to the same mask in the form of the mask. In the semantic segmentation mask map obtained in step S102, each pixel of the semantic segmentation mask map is assigned with a mask corresponding to the type of the object based on the type of the object identified by the image semantic segmentation model, the semantic segmentation mask map includes a plurality of connected regions, and each connected region of the plurality of connected regions is composed of one or more pixels that are assigned with the same mask and are continuously distributed. In this way, the semantic segmentation mask map output by the image semantic segmentation model labels the pixel points on the image which are respectively attributed to different object types through a plurality of masks. In some embodiments, different colors may be respectively corresponding to different object types, so that the pixel points under the different object types have corresponding colors, or different colors are assigned to masks corresponding to the different object types, so that the pixel points belonging to the different object types are represented by different colors on the semantic segmentation mask map output by the image semantic segmentation model, which is beneficial to distinguishing the objects of the different object types. It should be understood that the model structure, model parameters, etc. of the image semantic segmentation model may be based on any suitable image semantic segmentation technology, as long as a semantic segmentation mask map meeting the requirements can be output, and are not specifically limited herein.
In step S102, the obtained semantic segmentation mask map includes a plurality of connected regions, and each of the plurality of connected regions is composed of one or more pixels that are allocated with the same mask and are continuously distributed. Thus, one or more pixels comprising each connected component are assigned to the same mask, meaning that they are identified as being of the same object type. The plurality of connected regions require that each connected region is composed of one or more pixels which are continuously distributed, in other words, the pixels in the same connected region are all identified as the same object type and are allocated with the same mask, and pixels which are identified as other object types or are allocated with other masks do not exist among the pixels (if the pixels exist, the limitation requirement that each connected region is composed of one or more pixels which are allocated with the same mask and are continuously distributed is not met). Thus, by defining a plurality of connected regions, and in particular requiring each connected region to be composed of one or more pixels that are assigned to the same mask and are distributed consecutively, subsequent processing is facilitated to reduce prediction errors and improve recognition accuracy, as described in detail below.
In step S104, the semantic segmentation mask map is input into a filtering module, and a first filtering result map is generated by removing connected regions with a size smaller than a preset size threshold from a plurality of connected regions included in the semantic segmentation mask map by using the filtering module. Here, the preset size threshold may be based on a prior probability associated with a training process of the image semantic segmentation model, or a specified value, for example, preset according to an application scenario of the image processing method, or any other suitable manner. The predetermined size threshold is used to compare the sizes of the plurality of connected regions to remove connected regions having a size less than the predetermined size threshold. For example, the preset size threshold may be used for comparing with the maximum value of the distance between any two points on the outline of the connected region, or may be used for comparing with a certain geometric characteristic of the minimum circumscribed outline of the connected region, such as the side length of the minimum circumscribed square of the connected region or the radius of the minimum circumscribed circle of the connected region. The connected regions with the size smaller than the preset size threshold can be removed from the plurality of connected regions included in the semantic segmentation mask map or small connected regions on the semantic segmentation mask map (namely, the connected regions with the size smaller than the preset size threshold) through the preset size threshold, so that the problem of inaccurate boundary prediction is solved, the edge recognition error is reduced, and a smoother mask boundary is obtained.
In step S106, the first filtering result graph is input to the inversion module for inversion operation to obtain a first inversion result graph. Here, the inversion operation means interchanging a portion recognized as a background and a portion recognized as a target. Each pixel point of the semantic segmentation mask map is assigned a mask corresponding to the object type based on the object type identified by the image semantic segmentation model for the pixel point, so that the output results of the semantic segmentation mask map and the first filtering result map based on the semantic segmentation mask map include identification of various object types and corresponding connected regions. The inversion operation at step S106 means that the output result of the first inversion result map includes connected regions where various non-target, non-objects are identified. For example, in an industrial application of a scrap recycling link, the type of an object corresponds to the type of a scrap, the output result of the semantic segmentation mask map includes various scrap and corresponding connected regions, and the background portion of the semantic segmentation mask map corresponds to the content of non-scrap (such as pedestrians, buildings, vehicles, etc.). The output of the first inversion result map includes the identified connected regions of the contents of the various non-scrap pieces, such as the connected regions of the contents identified as pedestrians, buildings, vehicles, and the like.
In step S108, the first inversion result map is input into the filtering module, and the filtering module is used to remove connected regions with a size smaller than the preset size threshold from the connected regions included in the first inversion result map, so as to generate a second filtering result map. Similar to the operation of removing the connected regions with the size smaller than the preset size threshold from the plurality of connected regions included in the semantic segmentation mask map by using the filtering module to generate the first filtering result map in step S104, the operation in step S108 uses the same filtering module and the same preset size threshold. However, through the reversing operation in step S106, the connected regions included in the first reversal result map correspond to the connected regions of the contents of the various identified non-scrap pieces, for example, the connected regions of the contents identified as pedestrians, buildings, vehicles, and the like. In this way, the connected regions with the size smaller than the preset size threshold can be removed from the connected regions included in the first inversion result map, or the small connected regions on the first inversion result map (i.e. the connected regions with the size smaller than the preset size threshold) can be removed by the preset size threshold. This is beneficial to eliminating unreasonable components in non-target content, eliminating inaccurate boundary prediction, reducing edge recognition errors, and obtaining smoother mask boundaries.
In step S110, the second filtering result graph is input to the inversion module for inversion operation to obtain the optimized semantic segmentation mask graph. Here, similarly to the operation of inputting the first filtered result map into the reverse module to perform the reverse operation to obtain the first reverse result map at step S106, the reverse operation is performed by using the same reverse module at step S110, that is, the portion recognized as the background and the portion recognized as the target are exchanged again. The output of the optimized semantically segmented mask map thus obtained includes identifying various object types and corresponding connected regions. The optimized semantically segmented mask map obtained at step S110 is subjected to filtering operations at step S104 and step S108, thereby removing unreasonable portions of the portions identified as targets at step S104 (removing connected regions on the semantically segmented mask map having a size smaller than the preset size threshold), and unreasonable portions of the portions identified as backgrounds (or non-targets) at step S108 (removing connected regions on the first inversion result map having a size smaller than the preset size threshold). Therefore, compared with the semantically segmented mask map obtained in step S102, the optimized semantically segmented mask map obtained in step S110 is advantageous in eliminating the inaccurate boundary prediction problem, reducing the edge recognition error, and smoother mask boundary, thereby achieving the reduction of the prediction error and the improvement of the recognition accuracy, because the unreasonable portion of the portion identified as the target and the unreasonable portion of the portion identified as the background are removed.
Referring to steps S102 to S110, the image processing method 100 implements removing the connected regions with the size smaller than the preset size threshold on the semantic segmentation mask map and removing the connected regions with the size smaller than the preset size threshold on the first inversion result map by the multiplexing filter module and the inversion module, which is beneficial to eliminating the problem of inaccurate boundary prediction, reducing the edge recognition error, and smoother mask boundary, and implements reducing the prediction error and improving the recognition accuracy.
In one possible implementation, the image processing method 100 further includes: performing erosion and dilation processing on the semantically segmented mask map before inputting the semantically segmented mask map into the filtering module. For example, burrs or fine-shaped objects can be removed by the etching treatment, thereby improving the detection effect. By means of the expansion processing or the morphological operation, smoother mask boundaries can be obtained, and the problem that edge prediction is inaccurate can be solved.
In a possible implementation, the preset size threshold is based on a prior probability associated with a training process of the image semantic segmentation model. In some embodiments, according to a training data set associated with a training process of the image semantic segmentation model, respective occupied areas of connected regions composed of pixel points respectively corresponding to various object types in the training data set are calculated, a connected region with a minimum area is selected, and a side length of a minimum circumscribed square of the connected region with the minimum area is used as the preset size threshold. In this way, it is achieved that the preset size threshold is determined according to a prior probability associated with a training process of the image semantic segmentation model. In some embodiments, according to the label of the training data set associated with the training process of the image semantic segmentation model, the area of each of all connected regions in the training data set is calculated, and the side length of the minimum circumscribed square of the connected region with the minimum area is used as the preset size threshold. In this way, it is achieved that the preset size threshold is determined according to a prior probability associated with a training process of the image semantic segmentation model. In some embodiments, the preset size threshold is a specified value. For example, the designated value is preset according to an application scenario of the image processing method. Therefore, the optimization design is facilitated according to the actual application scene.
In addition, the image processing method 100 provided in the embodiment of the present application may further combine optimization of a training process of the image semantic segmentation model, that is, combine optimization of a training process of the image semantic segmentation model on the basis of the optimized semantic segmentation mask map. Considering the problems of sample imbalance and the like in the training process, for example, the cross-over ratio of a certain type of samples is too high, which may result in poor recognition effect of the trained model on other types of inputs. To this end, in one possible implementation, the training process of the image semantic segmentation model includes: generating at least one correction graph corresponding to at least one training graph according to the at least one training graph used for training the image semantic segmentation model, and training the image semantic segmentation model by using the at least one correction graph, wherein the generation process of each correction graph of the at least one correction graph comprises the following steps: obtaining a loss result obtained after a training image corresponding to the correction image performs forward calculation and backward propagation in the image semantic segmentation model, wherein the loss result comprises loss values of a plurality of interested regions ROI; selecting the highest loss value from the loss values of the ROIs and calculating the intersection ratio of the ROIs and the ROI corresponding to the highest loss value respectively so as to screen out the ROI of which the intersection ratio is smaller than a preset intersection ratio threshold; the correction map is generated by extracting a portion corresponding to the screened ROI from the training map. Therefore, by generating the correction map according to the mode, the influence of a sample with an excessively high intersection ratio on the model is considered, and therefore, the ROI with the intersection ratio smaller than the preset intersection ratio threshold value is screened out, and then the part corresponding to the screened ROI is extracted from the training map so as to generate the correction map, namely, the screened ROI is utilized to generate the correction map for back propagation and update the image semantic segmentation model, so that the influence of the ROI with the intersection ratio exceeding the preset intersection ratio threshold value is inhibited, namely, the adverse influence possibly caused by some samples with the excessively high intersection ratio is inhibited, the problem of unbalanced samples is effectively solved, and the recognition effect of the image semantic segmentation model is improved. In some embodiments, the at least one correctional map is generated by a difficult case mining module, and the difficult case mining module obtains a loss result obtained after a training map corresponding to each correctional map of the at least one correctional map performs forward calculation and backward propagation in the image semantic segmentation model through a training process of sampling the image semantic segmentation model. In some embodiments, selecting a highest loss value from the loss values of the plurality of ROIs and calculating an intersection ratio of the ROIs corresponding to the highest loss value respectively to screen out an ROI whose intersection ratio is smaller than the preset intersection ratio threshold, includes: and sequencing the intersection ratios of the ROIs respectively corresponding to the highest loss value and the ROIs according to the sequence from high to low, and screening out a specific number of ROIs with the intersection ratios smaller than the preset intersection ratio threshold from the lowest intersection ratio. Therefore, the screened ROI is utilized to generate a correction map for back propagation and update the image semantic segmentation model, so that the influence of the ROI with the intersection ratio exceeding the preset intersection ratio threshold value is inhibited, namely the adverse influence possibly caused by some samples with too high intersection ratio is inhibited, the problem of unbalanced samples is effectively solved, and the recognition effect of the image semantic segmentation model is improved. Here, the Intersection ratio may also be referred to as an Intersection over Union (IoU), which refers to an overlapping rate of the ROIs corresponding to the highest loss value, i.e. a ratio of their intersections to their Union, for example, in an ideal case of complete overlap, that is, an Intersection ratio or IoU is 1.
In one possible implementation, the training process of the image semantic segmentation model includes: inputting a training data set associated with a training process of the image semantic segmentation model into a difficult case mining module; generating a correction data set corresponding to the training data set through the difficult-to-sample mining module, and training the image semantic segmentation model by using the correction data set, wherein for at least one training image in the training data set, the process of generating a correction image corresponding to the at least one training image through the difficult-to-sample mining module comprises the following steps: obtaining a loss result obtained after the at least one training image performs forward calculation and backward propagation in the image semantic segmentation model, wherein the loss result comprises loss values of a plurality of ROIs; selecting the highest loss value from the loss values of the ROIs and calculating the intersection ratio of the ROIs and the ROI corresponding to the highest loss value respectively so as to screen out the ROI of which the intersection ratio is smaller than a preset intersection ratio threshold; extracting a portion corresponding to the screened ROI from the at least one training image to generate the correction image. Therefore, the screened ROI is utilized to generate a correction map for back propagation and update the image semantic segmentation model, so that the influence of the ROI with the intersection ratio exceeding the preset intersection ratio threshold value is inhibited, namely the adverse influence possibly caused by some samples with too high intersection ratio is inhibited, the problem of unbalanced samples is effectively solved, and the recognition effect of the image semantic segmentation model is improved.
In one possible embodiment, the image processing method 100 is used for material detection during transportation of a scrap material set, the optimized semantic segmentation mask map is used for determining at least one piece of associated information of the scrap material set, and the at least one piece of associated information of the scrap material set includes at least one of the following: contour information, category information, source information, coordinate information, area information, pixel feature information. The profile information indicates the profile of each scrap part in the scrap part set, and may be a result of matching with a plurality of preset profile types, or may be semantic descriptions (such as side length, curvature, and the like) in a numerical manner, or may be generalized semantic descriptions (such as a disc shape, a strip shape, and the like). The type information indicates how many types of steel scrap pieces are contained in each steel scrap piece of the steel scrap piece set and the number of each type of steel scrap piece, and the information can be used for further analyzing and extracting more information, so that the related information at least comprises type information under general conditions. For example, the type information of the scrap steel parts set may indicate that each scrap steel part of the scrap steel parts set has 10 train wheels, 20 car bearings, 30 screws, and the like. The source information indicates from which location a scrap piece comes, for example from a train or barge. The coordinate information indicates the coordinates of a certain scrap piece on the image. The area information indicates the area of a certain scrap piece identified on the image. The pixel characteristic information indicates characteristics of all pixels to which a certain scrap piece belongs. It should be understood that more abundant associated information of the scrap steel part set can be obtained according to the computer vision technology which is specifically adopted to obtain the semantic segmentation result of the original image. The above listed examples of association information are illustrative only and not limiting. Therefore, abundant associated information is obtained, and basis is provided for decision making and subsequent processing.
It is to be understood that the above-described method may be implemented by a corresponding execution body or carrier. In some exemplary embodiments, a non-transitory computer readable storage medium stores computer instructions that, when executed by a processor, implement the above-described method and any of the above-described embodiments, implementations, or combinations thereof. In some example embodiments, an electronic device includes: a processor; a memory for storing processor-executable instructions; wherein the processor implements the above method and any of the above embodiments, implementations, or combinations thereof by executing the executable instructions.
Fig. 2 shows a block diagram of an electronic device used in the image processing method shown in fig. 1 according to an embodiment of the present application. As shown in FIG. 2, electronic device 200 comprises a main processor 202, an internal bus 204, a network interface 206, a main memory 208, and secondary processor 210 and secondary memory 212, as well as secondary processor 220 and secondary memory 222. The main processor 202 is connected to the main memory 208, and the main memory 208 can be used for storing computer instructions executable by the main processor 202, so that the image processing method 100 shown in fig. 1 can be implemented, including some or all of the steps, and any possible combination or combination and possible replacement or variation of the steps. The network interface 206 is used to provide network connectivity and to transmit and receive data over a network. The internal bus 204 is used to provide internal data interaction between the main processor 202, the network interface 206, the auxiliary processor 210, and the auxiliary processor 220. The auxiliary processor 210 is coupled to the auxiliary memory 212 and provides auxiliary computing power, and the auxiliary processor 220 is coupled to the auxiliary memory 222 and provides auxiliary computing power. The auxiliary processors 210 and 220 may provide the same or different auxiliary computing capabilities including, but not limited to, computing capabilities optimized for particular computing requirements such as parallel processing capabilities or tensor computing capabilities, computing capabilities optimized for particular algorithms or logic structures such as iterative computing capabilities or graph computing capabilities, and the like. The auxiliary processors 210 and 220 may include one or more processors of a particular type, such as a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA), or the like, so that customized functionality and structure may be provided. In some exemplary embodiments, the electronic device 200 may not include an auxiliary processor, may include only one auxiliary processor, and may further include any number of auxiliary processors and each have a corresponding customized function and structure, which are not specifically limited herein. The architecture of the two auxiliary processors shown in FIG. 2 is for illustration only and should not be construed as limiting. In addition, main processor 202 may include a single-core or multi-core computing unit to provide the functions and operations necessary for embodiments of the present application. In addition, the main processor 202 and the auxiliary processors (such as the auxiliary processor 210 and the auxiliary processor 220 in fig. 2) may have different architectures, that is, the electronic device 200 may be a heterogeneous architecture based system, for example, the main processor 202 may be a general-purpose processor such as a CPU based on an instruction set operating system, and the auxiliary processor may be a graphics processor GPU suitable for parallelized computation or a dedicated accelerator suitable for neural network model-related operations. The auxiliary memory (e.g., auxiliary memory 212 and auxiliary memory 222 shown in fig. 2) may be used to implement customized functions and structures in cooperation with the respective auxiliary processors. And main memory 208 stores the necessary instructions, software, configurations, data, etc. to cooperate with main processor 202 to provide the functionality and operations necessary for the embodiments of the present application. In some exemplary embodiments, the electronic device 200 may not include the auxiliary memory, may include only one auxiliary memory, and may further include any number of auxiliary memories, which is not specifically limited herein. The architecture of the two auxiliary memories shown in fig. 2 is illustrative only and should not be construed as limiting. Main memory 208 and possibly secondary memory may include one or more of the following features: volatile, nonvolatile, dynamic, static, readable/writable, read-only, random-access, sequential-access, location-addressability, file-addressability, and content-addressability, and may include random-access memory (RAM), flash memory, read-only memory (ROM), erasable programmable read-only memory (EPROM), electrically erasable programmable read-only memory (EEPROM), registers, a hard disk, a removable disk, a recordable and/or rewriteable Compact Disc (CD), a Digital Versatile Disc (DVD), a mass storage media device, or any other form of suitable storage media. The internal bus 204 may include any of a variety of different bus structures or combinations of different bus structures, such as a memory bus or memory controller, a peripheral bus, a universal serial bus, and/or a processor or local bus that utilizes any of a variety of bus architectures. It should be understood that the electronic device 200 shown in fig. 2, the illustrated configuration of which does not constitute a specific limitation on the apparatus or system involved, may in some exemplary embodiments include more or less components than the specific embodiments and figures, or combine certain components, or split certain components, or have a different arrangement of components.
With continued reference to fig. 2, in one possible implementation, the auxiliary processor 210 and/or the auxiliary processor 220 may have a computing architecture that is custom designed for the characteristics of neural network computing, such as a neural network accelerator. Moreover, the electronic device 200 may include any number of auxiliary processors each having a computing architecture that is custom designed for the characteristics of neural network computations, or the electronic device 200 may include any number of neural network accelerators. In some embodiments, for illustrative purposes only, an exemplary neural network accelerator may be: the neural network accelerator is provided with a time domain computing architecture based on a control flow, and the instruction flow of an instruction set is customized based on a neural network algorithm to perform centralized control on computing resources and storage resources; alternatively, neural network accelerators with a data-flow based spatial computation architecture, such as two-dimensional spatial computation arrays based on Row Stationary (RS) data flows, two-dimensional matrix multiplication arrays using Systolic arrays (Systolic Array), and the like; or any neural network accelerator having any suitable custom designed computational architecture.
Fig. 3 shows a block diagram of an image processing apparatus for optimizing semantic segmentation results of an image according to an embodiment of the present application. As shown in fig. 3, the image processing apparatus 300 includes: the receiving module 310 is configured to obtain a semantic segmentation mask map output by an image semantic segmentation model, where each pixel of the semantic segmentation mask map is assigned with a mask corresponding to an object type based on the object type identified by the image semantic segmentation model, the semantic segmentation mask map includes a plurality of connected regions, and each connected region of the plurality of connected regions is composed of one or more pixels that are assigned with the same mask and are continuously distributed; a filtering module 320, configured to perform a filtering operation on the semantic segmentation mask map so as to remove connected regions with a size smaller than a preset size threshold from a plurality of connected regions included in the semantic segmentation mask map and generate a first filtering result map; the reverse module 330 is configured to perform a reverse operation on the first filtering result graph to obtain a first reverse result graph. The filtering module 320 is further configured to perform a filtering operation on the first inversion result map so as to remove connected regions with a size smaller than the preset size threshold from the connected regions included in the first inversion result map and generate a second filtering result map. The reverse module 330 is further configured to perform a reverse operation on the second filtering result map to obtain an optimized semantic segmentation mask map.
Referring to fig. 3, the image processing apparatus 300, through the multiplexing filter module and the inversion module, removes the connected regions with the size smaller than the preset size threshold on the semantic segmentation mask map and removes the connected regions with the size smaller than the preset size threshold on the first inversion result map, which is beneficial to eliminating the inaccurate boundary prediction problem, reducing the edge recognition error, and smoother mask boundaries, and achieves reducing the prediction error and improving the recognition accuracy.
In a possible implementation, the preset size threshold is based on a prior probability associated with a training process of the image semantic segmentation model.
In a possible implementation, the preset size threshold is based on a prior probability associated with a training process of the image semantic segmentation model, and includes: and according to a training data set associated with the training process of the image semantic segmentation model, calculating the respective occupied areas of connected regions formed by pixel points respectively corresponding to various object types in the training data set, selecting the connected region with the minimum area, and taking the side length of the minimum circumscribed square of the connected region with the minimum area as the preset size threshold.
In a possible implementation, the preset size threshold is based on a prior probability associated with a training process of the image semantic segmentation model, and includes: and calculating the respective areas of all connected regions in the training data set according to the labels of the training data set associated with the training process of the image semantic segmentation model, and taking the side length of the minimum circumscribed square of the connected region with the minimum area as the preset size threshold.
In one possible implementation, the training process of the image semantic segmentation model includes: generating at least one correction graph corresponding to at least one training graph according to the at least one training graph used for training the image semantic segmentation model, and training the image semantic segmentation model by using the at least one correction graph, wherein the generation process of each correction graph of the at least one correction graph comprises the following steps: obtaining a loss result obtained after a training image corresponding to the correction image performs forward calculation and backward propagation in the image semantic segmentation model, wherein the loss result comprises loss values of a plurality of interested regions ROI; selecting the highest loss value from the loss values of the ROIs and calculating the intersection ratio of the ROIs and the ROI corresponding to the highest loss value respectively so as to screen out the ROI of which the intersection ratio is smaller than a preset intersection ratio threshold; the correction map is generated by extracting a portion corresponding to the screened ROI from the training map.
In one possible implementation, the training process of the image semantic segmentation model includes: inputting a training data set associated with a training process of the image semantic segmentation model into a difficult case mining module; generating a correction data set corresponding to the training data set through the difficult-to-sample mining module, and training the image semantic segmentation model by using the correction data set, wherein for at least one training image in the training data set, the process of generating a correction image corresponding to the at least one training image through the difficult-to-sample mining module comprises the following steps: obtaining a loss result obtained after the at least one training image performs forward calculation and backward propagation in the image semantic segmentation model, wherein the loss result comprises loss values of a plurality of ROIs; selecting the highest loss value from the loss values of the ROIs and calculating the intersection ratio of the ROIs and the ROI corresponding to the highest loss value respectively so as to screen out the ROI of which the intersection ratio is smaller than a preset intersection ratio threshold; extracting a portion corresponding to the screened ROI from the at least one training image to generate the correction image.
The embodiments provided herein may be implemented in any one or combination of hardware, software, firmware, or solid state logic circuitry, and may be implemented in connection with signal processing, control, and/or application specific circuitry. Particular embodiments of the present application provide an apparatus or device that may include one or more processors (e.g., microprocessors, controllers, Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs), etc.) that process various computer-executable instructions to control the operation of the apparatus or device. Particular embodiments of the present application provide an apparatus or device that can include a system bus or data transfer system that couples the various components together. A system bus can include any of a variety of different bus structures or combination of different bus structures, such as a memory bus or memory controller, a peripheral bus, a universal serial bus, and/or a processor or local bus that utilizes any of a variety of bus architectures. The devices or apparatuses provided in the embodiments of the present application may be provided separately, or may be part of a system, or may be part of other devices or apparatuses.
Particular embodiments provided herein may include or be combined with computer-readable storage media, such as one or more storage devices capable of providing non-transitory data storage. The computer-readable storage medium/storage device may be configured to store data, programmers and/or instructions that, when executed by a processor of an apparatus or device provided by embodiments of the present application, cause the apparatus or device to perform operations associated therewith. The computer-readable storage medium/storage device may include one or more of the following features: volatile, non-volatile, dynamic, static, read/write, read-only, random access, sequential access, location addressability, file addressability, and content addressability. In one or more exemplary embodiments, the computer-readable storage medium/storage device may be integrated into a device or apparatus provided in the embodiments of the present application or belong to a common system. The computer-readable storage medium/memory device may include optical, semiconductor, and/or magnetic memory devices, etc., and may also include Random Access Memory (RAM), flash memory, read-only memory (ROM), erasable programmable read-only memory (EPROM), electrically erasable programmable read-only memory (EEPROM), registers, a hard disk, a removable disk, a recordable and/or rewriteable Compact Disc (CD), a Digital Versatile Disc (DVD), a mass storage media device, or any other form of suitable storage media.
The above is an implementation manner of the embodiments of the present application, and it should be noted that the steps in the method described in the embodiments of the present application may be sequentially adjusted, combined, and deleted according to actual needs. In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments. It is to be understood that the embodiments of the present application and the structures shown in the drawings are not to be construed as particularly limiting the devices or systems concerned. In other embodiments of the present application, an apparatus or system may include more or fewer components than the specific embodiments and figures, or may combine certain components, or may separate certain components, or may have a different arrangement of components. Those skilled in the art will understand that various modifications and changes may be made in the arrangement, operation, and details of the methods and apparatus described in the specific embodiments without departing from the spirit and scope of the embodiments herein; without departing from the principles of embodiments of the present application, several improvements and modifications may be made, and such improvements and modifications are also considered to be within the scope of the present application.

Claims (20)

1. An image processing method for optimizing semantic segmentation results of an image, the image processing method comprising:
obtaining a semantic segmentation mask image output by an image semantic segmentation model, wherein each pixel point of the semantic segmentation mask image is allocated with a mask corresponding to the type of an object based on the type of the object identified by the image semantic segmentation model, the semantic segmentation mask image comprises a plurality of connected regions, and each connected region of the plurality of connected regions consists of one or more pixel points which are allocated with the same mask and are continuously distributed;
inputting the semantic segmentation mask image into a filtering module, and removing connected regions with the size smaller than a preset size threshold value from the plurality of connected regions by using the filtering module so as to generate a first filtering result image;
inputting the first filtering result graph into a reverse module to perform reverse operation to obtain a first reverse result graph;
inputting the first inversion result diagram into the filtering module, and removing connected regions with the size smaller than the preset size threshold from the connected regions included in the first inversion result diagram by using the filtering module, so as to generate a second filtering result diagram; and
and inputting the second filtering result graph into the reversing module to perform reversing operation to obtain the optimized semantic segmentation mask graph.
2. The image processing method according to claim 1, characterized in that the image processing method further comprises:
performing erosion and dilation processing on the semantically segmented mask map before inputting the semantically segmented mask map into the filtering module.
3. The image processing method of claim 1, wherein the preset size threshold is based on a prior probability associated with a training process of the image semantic segmentation model.
4. The image processing method according to claim 3, wherein the preset size threshold is based on a prior probability associated with a training process of the image semantic segmentation model, and comprises:
and according to a training data set associated with the training process of the image semantic segmentation model, calculating the respective occupied areas of connected regions formed by pixel points respectively corresponding to various object types in the training data set, selecting the connected region with the minimum area, and taking the side length of the minimum circumscribed square of the connected region with the minimum area as the preset size threshold.
5. The image processing method according to claim 3, wherein the preset size threshold is based on a prior probability associated with a training process of the image semantic segmentation model, and comprises:
and calculating the respective areas of all connected regions in the training data set according to the labels of the training data set associated with the training process of the image semantic segmentation model, and taking the side length of the minimum circumscribed square of the connected region with the minimum area as the preset size threshold.
6. The image processing method according to claim 1, wherein the preset size threshold is a specified numerical value.
7. The image processing method according to claim 6, wherein the specified value is preset according to an application scenario of the image processing method.
8. The image processing method according to any one of claims 1 to 7, wherein the training process of the image semantic segmentation model comprises:
generating at least one correction graph corresponding to the at least one training graph respectively according to the at least one training graph for training the image semantic segmentation model, and training the image semantic segmentation model by using the at least one correction graph,
wherein, the generation process of each correction chart of the at least one correction chart comprises the following steps:
obtaining a loss result obtained after a training image corresponding to the correction image performs forward calculation and backward propagation in the image semantic segmentation model, wherein the loss result comprises loss values of a plurality of interested regions ROI;
selecting the highest loss value from the loss values of the ROIs and calculating the intersection ratio of the ROIs and the ROI corresponding to the highest loss value respectively so as to screen out the ROI of which the intersection ratio is smaller than a preset intersection ratio threshold;
the correction map is generated by extracting a portion corresponding to the screened ROI from the training map.
9. The image processing method according to claim 8, wherein the at least one correctional map is generated by a difficult mining module, and the difficult mining module obtains a loss result obtained after performing forward calculation and backward propagation on the training map corresponding to each of the at least one correctional map in the image semantic segmentation model through a training process of sampling the image semantic segmentation model.
10. The image processing method according to claim 8, wherein selecting a highest loss value from the loss values of the plurality of ROIs and calculating an intersection ratio of the ROIs respectively corresponding to the highest loss value to screen out an ROI whose intersection ratio is smaller than the preset intersection ratio threshold value comprises:
and sequencing the intersection ratios of the ROIs respectively corresponding to the highest loss value and the ROIs according to the sequence from high to low, and screening out a specific number of ROIs with the intersection ratios smaller than the preset intersection ratio threshold from the lowest intersection ratio.
11. The image processing method according to any one of claims 1 to 7, wherein the training process of the image semantic segmentation model comprises:
inputting a training data set associated with a training process of the image semantic segmentation model into a difficult case mining module;
generating, by the difficult case mining module, a correction dataset corresponding to the training dataset, and training the image semantic segmentation model with the correction dataset,
wherein, for at least one training graph in the training data set, the process of generating a correction graph corresponding to the at least one training graph by the difficult-to-sample mining module includes:
obtaining a loss result obtained after the at least one training image performs forward calculation and backward propagation in the image semantic segmentation model, wherein the loss result comprises loss values of a plurality of ROIs;
selecting the highest loss value from the loss values of the ROIs and calculating the intersection ratio of the ROIs and the ROI corresponding to the highest loss value respectively so as to screen out the ROI of which the intersection ratio is smaller than a preset intersection ratio threshold;
extracting a portion corresponding to the screened ROI from the at least one training image to generate the correction image.
12. The image processing method according to any one of claims 1 to 7, wherein the image processing method is used for part detection during handling of a scrap part set, the optimized semantic segmentation mask map is used for determining at least one piece of associated information of the scrap part set, and the at least one piece of associated information of the scrap part set comprises at least one of the following: contour information, category information, source information, coordinate information, area information, pixel feature information.
13. A non-transitory computer readable storage medium storing computer instructions which, when executed by a processor, implement the image processing method according to any one of claims 1 to 12.
14. An electronic device, comprising:
a processor;
a memory for storing processor-executable instructions;
wherein the processor implements the image processing method according to any one of claims 1 to 12 by executing the executable instructions.
15. An image processing apparatus for optimizing semantic segmentation results of an image, the image processing apparatus comprising:
the semantic segmentation mask map comprises a plurality of connected regions, wherein each connected region of the plurality of connected regions consists of one or more pixels which are distributed with the same mask and are continuously distributed;
a filtering module, configured to perform a filtering operation on the semantic segmentation mask map so as to remove connected regions with a size smaller than a preset size threshold from a plurality of connected regions included in the semantic segmentation mask map and generate a first filtering result map;
the reverse module is used for carrying out reverse operation on the first filtering result graph to obtain a first reverse result graph,
wherein the filtering module is further configured to perform a filtering operation on the first inversion result map so as to remove connected regions having a size smaller than the preset size threshold from the connected regions included in the first inversion result map and generate a second filtering result map,
and the reverse module is also used for carrying out reverse operation on the second filtering result image to obtain the optimized semantic segmentation mask image.
16. The apparatus according to claim 15, wherein the preset size threshold is based on a prior probability associated with a training process of the image semantic segmentation model.
17. The apparatus according to claim 16, wherein the preset size threshold is based on a prior probability associated with a training process of the image semantic segmentation model, and comprises:
and according to a training data set associated with the training process of the image semantic segmentation model, calculating the respective occupied areas of connected regions formed by pixel points respectively corresponding to various object types in the training data set, selecting the connected region with the minimum area, and taking the side length of the minimum circumscribed square of the connected region with the minimum area as the preset size threshold.
18. The apparatus according to claim 16, wherein the preset size threshold is based on a prior probability associated with a training process of the image semantic segmentation model, and comprises:
and calculating the respective areas of all connected regions in the training data set according to the labels of the training data set associated with the training process of the image semantic segmentation model, and taking the side length of the minimum circumscribed square of the connected region with the minimum area as the preset size threshold.
19. The image processing apparatus according to any one of claims 15 to 18, wherein the training process of the image semantic segmentation model includes:
generating at least one correction graph corresponding to the at least one training graph respectively according to the at least one training graph for training the image semantic segmentation model, and training the image semantic segmentation model by using the at least one correction graph,
wherein, the generation process of each correction chart of the at least one correction chart comprises the following steps:
obtaining a loss result obtained after a training image corresponding to the correction image performs forward calculation and backward propagation in the image semantic segmentation model, wherein the loss result comprises loss values of a plurality of interested regions ROI;
selecting the highest loss value from the loss values of the ROIs and calculating the intersection ratio of the ROIs and the ROI corresponding to the highest loss value respectively so as to screen out the ROI of which the intersection ratio is smaller than a preset intersection ratio threshold;
the correction map is generated by extracting a portion corresponding to the screened ROI from the training map.
20. The image processing apparatus according to any one of claims 15 to 18, wherein the training process of the image semantic segmentation model includes:
inputting a training data set associated with a training process of the image semantic segmentation model into a difficult case mining module;
generating, by the difficult case mining module, a correction dataset corresponding to the training dataset, and training the image semantic segmentation model with the correction dataset,
wherein, for at least one training graph in the training data set, the process of generating a correction graph corresponding to the at least one training graph by the difficult-to-sample mining module includes:
obtaining a loss result obtained after the at least one training image performs forward calculation and backward propagation in the image semantic segmentation model, wherein the loss result comprises loss values of a plurality of ROIs;
selecting the highest loss value from the loss values of the ROIs and calculating the intersection ratio of the ROIs and the ROI corresponding to the highest loss value respectively so as to screen out the ROI of which the intersection ratio is smaller than a preset intersection ratio threshold;
extracting a portion corresponding to the screened ROI from the at least one training image to generate the correction image.
CN202111525227.4A 2021-12-14 2021-12-14 Image processing method and device for optimizing image semantic segmentation result Pending CN114187211A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111525227.4A CN114187211A (en) 2021-12-14 2021-12-14 Image processing method and device for optimizing image semantic segmentation result

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111525227.4A CN114187211A (en) 2021-12-14 2021-12-14 Image processing method and device for optimizing image semantic segmentation result

Publications (1)

Publication Number Publication Date
CN114187211A true CN114187211A (en) 2022-03-15

Family

ID=80543676

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111525227.4A Pending CN114187211A (en) 2021-12-14 2021-12-14 Image processing method and device for optimizing image semantic segmentation result

Country Status (1)

Country Link
CN (1) CN114187211A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210390709A1 (en) * 2020-06-16 2021-12-16 Xue Feng System and method to improve model-based organ segmentation with image post-processing

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210390709A1 (en) * 2020-06-16 2021-12-16 Xue Feng System and method to improve model-based organ segmentation with image post-processing
US11455733B2 (en) * 2020-06-16 2022-09-27 Xue Feng System and method to improve model-based organ segmentation with image post-processing

Similar Documents

Publication Publication Date Title
Srivastava et al. Comparative analysis of deep learning image detection algorithms
US10282589B2 (en) Method and system for detection and classification of cells using convolutional neural networks
CN114187442A (en) Image processing method, storage medium, electronic device, and image processing apparatus
CN113935997B (en) Image processing method, storage medium and image processing device for detecting material
Shah et al. A three-way clustering approach for novelty detection
CN113936220B (en) Image processing method, storage medium, electronic device, and image processing apparatus
Sarraf et al. A comprehensive review of deep learning architectures for computer vision applications
CN111767860A (en) Method and terminal for realizing image recognition through convolutional neural network
US20230201973A1 (en) System and method for automatic detection of welding tasks
Sannen et al. A multilevel information fusion approach for visual quality inspection
Chew et al. Large-scale 3D point-cloud semantic segmentation of urban and rural scenes using data volume decomposition coupled with pipeline parallelism
Sharan et al. Automated cnn based coral reef classification using image augmentation and deep learning
CN114187211A (en) Image processing method and device for optimizing image semantic segmentation result
US10643092B2 (en) Segmenting irregular shapes in images using deep region growing with an image pyramid
Kenger et al. Fuzzy min–max neural networks: a bibliometric and social network analysis
Suzuki et al. Superpixel convolution for segmentation
CN114092817B (en) Target detection method, storage medium, electronic device, and target detection apparatus
Stalder et al. What you see is what you classify: Black box attributions
Hoang et al. Pixel-level clustering network for unsupervised image segmentation
Charisma et al. Transfer learning with Densenet201 architecture model for potato leaf disease classification
CN113936253B (en) Material conveying operation cycle generation method, storage medium, electronic device and device
US10776923B2 (en) Segmenting irregular shapes in images using deep region growing
INTHIYAZ et al. YOLO (YOU ONLY LOOK ONCE) Making Object detection work in Medical Imaging on Convolution detection System.
Lv et al. An image rendering-based identification method for apples with different growth forms
Palit et al. Biomedical image segmentation using fully convolutional networks on TrueNorth

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination