CN113989307A - Image semantic segmentation method, electronic device and storage medium - Google Patents

Image semantic segmentation method, electronic device and storage medium Download PDF

Info

Publication number
CN113989307A
CN113989307A CN202111274244.5A CN202111274244A CN113989307A CN 113989307 A CN113989307 A CN 113989307A CN 202111274244 A CN202111274244 A CN 202111274244A CN 113989307 A CN113989307 A CN 113989307A
Authority
CN
China
Prior art keywords
image
matrix
semantic segmentation
gray
processing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111274244.5A
Other languages
Chinese (zh)
Inventor
罗壮
张雪
张海强
李成军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhidao Network Technology Beijing Co Ltd
Original Assignee
Zhidao Network Technology Beijing Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhidao Network Technology Beijing Co Ltd filed Critical Zhidao Network Technology Beijing Co Ltd
Priority to CN202111274244.5A priority Critical patent/CN113989307A/en
Publication of CN113989307A publication Critical patent/CN113989307A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/12Edge-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/136Segmentation; Edge detection involving thresholding
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10024Color image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)

Abstract

The application relates to an image semantic segmentation method, electronic equipment and a storage medium. The method comprises the following steps: obtaining an RGB matrix based on the image; processing the image by adopting a preset means to obtain a gray level image matrix; merging the RGB matrix and the gray level image matrix according to a preset rule to obtain a target matrix set; and performing semantic segmentation processing on the image by using a preset semantic segmentation model of a convolutional neural network, wherein an input matrix of the preset semantic segmentation model of the convolutional neural network is the target matrix set during semantic segmentation. The image semantic segmentation method, the electronic device and the storage medium can enhance the characteristics of the target edge in the image and improve the segmentation precision of the image target edge.

Description

Image semantic segmentation method, electronic device and storage medium
Technical Field
The present application relates to the field of computer technologies, and in particular, to an image semantic segmentation method, an electronic device, and a storage medium.
Background
In recent years, machine learning techniques centered on deep learning have attracted attention. For example, autonomous driving of automobiles has become increasingly possible, but throughout the deep learning process, algorithms are required to recognize and learn images provided as raw data, in which semantic segmentation techniques are applied. Semantic segmentation is a typical computer vision problem that involves taking some raw data (e.g., flat images) as input and converting them into masks with highlighted regions of interest. Semantic segmentation processing using a convolutional neural network is widely used as a method with high segmentation accuracy. However, when semantic segmentation processing is performed using a convolutional neural network, the accuracy of pixel recognition of the edge position of a target feature in an image is low, and erroneous recognition or erroneous recognition is likely to occur.
Disclosure of Invention
In order to solve or partially solve the problems in the related art, the application provides an image semantic segmentation method, an electronic device and a storage medium, and the image semantic segmentation method, the electronic device and the storage medium can enhance the characteristics of the target edge in the image and improve the segmentation precision of the target edge of the image.
The first aspect of the present application provides an image semantic segmentation method, including:
obtaining an RGB matrix based on the image;
processing the image by adopting a preset means to obtain a gray level image matrix;
merging the RGB matrix and the gray level image matrix according to a preset rule to obtain a target matrix set;
and performing semantic segmentation processing on the image by using a preset semantic segmentation model of a convolutional neural network, wherein an input matrix of the preset semantic segmentation model of the convolutional neural network is the target matrix set during semantic segmentation.
In one embodiment, the performing semantic segmentation processing on the image by using a preset semantic segmentation model of a convolutional neural network, where an input matrix of the preset semantic segmentation model of the convolutional neural network in semantic segmentation is the target matrix set includes:
establishing a semantic segmentation model, wherein the semantic segmentation model at least comprises an input module, and the input dimension of a first convolution layer in the input module is set to be 4;
and performing semantic segmentation processing on the image by taking the input matrix as the target matrix set based on the semantic segmentation model.
In one embodiment, before obtaining the target matrix set, the method includes:
and respectively carrying out normalization processing on the gray image matrix and the RGB matrix.
In one embodiment, each element value in the gray-scale image matrix after normalization is 0 or 1, and each element value in the RGB matrix is between [0,1 ].
In an embodiment, the processing the image by a preset means to obtain a grayscale image matrix includes:
converting the image into a grey-scale map;
and carrying out binarization processing on the gray level image to obtain a binarized gray level image matrix, wherein the element value in the binarized gray level image matrix is 0 or 255.
In one embodiment, the binarizing processing on the grayscale map includes:
and carrying out binarization processing on the gray level image by adopting an adaptive threshold value binarization algorithm.
In one embodiment, the binarizing the gray map by using an adaptive threshold binarization algorithm includes:
acquiring an mxn pixel region with a current pixel point as a center;
respectively calculating the distance from each pixel point in the mxn pixel region to the current pixel point according to a Gaussian function and carrying out weighted average to obtain a binarization threshold value of the pixel point;
and comparing the gray value of the current pixel point with the binarization threshold, if the gray value of the current pixel point is greater than the binarization threshold, setting the binarization value of the current pixel point to be 255, and if the gray value of the current pixel point is less than the binarization threshold, setting the binarization value of the current pixel point to be 0.
In one embodiment, at least one category label is marked in the segmented image obtained by performing semantic segmentation processing on the image by using a preset semantic segmentation model of a convolutional neural network.
A second aspect of the present application provides an electronic device, comprising:
a processor; and
a memory having executable code stored thereon, which when executed by the processor, causes the processor to perform the method as described above.
A third aspect of the application provides a storage medium having stored thereon executable code, which when executed by a processor of an electronic device, causes the processor to perform the method as described above.
The technical scheme provided by the application can comprise the following beneficial effects:
the image semantic segmentation method, the electronic device and the storage medium provided by the embodiment of the application comprise the following steps: acquiring an RGB matrix based on an image, processing the image by adopting a preset means to obtain a gray level image matrix, and combining the RGB matrix and the gray level image matrix according to a preset rule to obtain a target matrix set; and performing semantic segmentation processing on the image by using a preset semantic segmentation model of the convolutional neural network, wherein an input matrix of the preset semantic segmentation model of the convolutional neural network is a target matrix set during semantic segmentation. And performing common normalization processing on the gray image matrix and the RGB matrix processed by the preset means to obtain a target matrix set, and taking the 4-dimensional target matrix set as an input matrix. Moreover, when the image is converted into a gray image and the matrix of the gray image and the RGB matrix are used together as an input matrix, the boundary of the target feature and the feature pixels beside the target feature are more obvious when the target feature is acquired because the gray image has only two colors, namely black and white. Therefore, after the matrix of the gray-scale image is connected with the RGB matrix and is used as an input matrix for input, the method is more favorable for semantic segmentation so as to improve the identification precision of the target feature edge.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application.
Drawings
The foregoing and other objects, features and advantages of the application will be apparent from the following more particular descriptions of exemplary embodiments of the application, as illustrated in the accompanying drawings wherein like reference numbers generally represent like parts throughout the exemplary embodiments of the application.
FIG. 1 is a first flowchart of an image semantic segmentation method according to an embodiment of the present disclosure;
FIG. 2 is a second flow chart of the image semantic segmentation method according to the embodiment of the present application;
FIG. 3 is a third flowchart of an image semantic segmentation method according to an embodiment of the present application;
fig. 4 is a schematic diagram illustrating an electronic device according to an embodiment of the present application.
Detailed Description
Embodiments of the present application will be described in more detail below with reference to the accompanying drawings. While embodiments of the present application are illustrated in the accompanying drawings, it should be understood that the present application may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in this application and the appended claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items.
It should be understood that although the terms "first," "second," "third," etc. may be used herein to describe various information, these information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope of the present application. Thus, a feature defined as "first" or "second" may explicitly or implicitly include one or more of that feature. In the description of the present application, "a plurality" means two or more unless specifically limited otherwise.
In the related art, semantic segmentation is a common computer vision task and is widely applied to the fields of automatic driving, human-computer interaction, virtual reality and the like. Semantic segmentation is a high-level task, and paves the way for realizing complete understanding of scenes. The goal of semantic segmentation is to classify each pixel point in an image, thereby labeling the image regions occupied by different classes of objects in the classification. With the popularization of deep learning in recent years, many semantic segmentation problems are being solved by adopting a deep structure, most commonly a convolutional neural network, which has higher precision and efficiency.
Optionally, in the application, for example, the semantic segmentation is applied to the field of automatic driving, the vehicle subjected to automatic driving is provided with the laser radar, and in the driving process of the vehicle, information outside the vehicle can be collected by the laser radar and converted into a picture as original image data, and the subsequent process is not separately emphasized.
In a segmented image obtained by processing an image by semantic segmentation of original image data, each pixel point is marked with category information and a boundary which belongs to the same category and serves as an object. For two adjacent objects, the boundary distance is very close or the boundary is shared, the characteristics of the edge of the target object are not obvious, and pixel points positioned on the boundary and nearby are easily identified by mistake or identified by mistake and marked as other types of information, so that the pixel identification precision of the corresponding edge position is low.
In order to solve the problems, the application provides an image semantic segmentation method which can enhance the characteristics of the target edge in an image and improve the segmentation precision of the image target edge.
The technical solutions of the embodiments of the present application are described in detail below with reference to the accompanying drawings.
Fig. 1 is a schematic flow chart of a first process of an image semantic segmentation method according to an embodiment of the present application.
Referring to fig. 1, the method includes:
s101, obtaining an RGB matrix based on the image.
And S102, processing the image by adopting a preset means to obtain a gray level image matrix.
And S103, combining the RGB matrix and the gray level image matrix according to a preset rule to obtain a target matrix set.
And S104, performing semantic segmentation processing on the image by utilizing a preset semantic segmentation model of the convolutional neural network, wherein an input matrix of the preset semantic segmentation model of the convolutional neural network is a target matrix set during semantic segmentation.
In the embodiment of the present application, the acquired image may be: the method includes that an automatic driving vehicle collects external image data of a certain time period through a laser radar and converts the external image data into original image data of picture data, the original image data comprises a plurality of two-dimensional color images of different moments in the time period, the obtained color image is one of the two-dimensional color images, and a semantic segmentation method is explained in detail below by taking one color image of a certain moment as an example.
An RGB matrix is obtained based on the image. All colors in an image are mixed according to different proportions based on red (R), green (G), and blue (B), a group of red, green, and blue are the smallest display unit, which is called a pixel, and a plurality of pixels are arranged according to a certain rule to form a corresponding image. Therefore, one pixel includes three luminance values of red, green, and blue, which are also referred to as RGB values. Generally speaking, the RGB values have 256 levels of brightness for each of the three color colors, numerically represented as 0,1, 2 … … 255, and the collection of all the rows and columns of pixels in an image is called a matrix. Since each pixel includes RGB values of three primary color pixels, for an image, there are an R matrix having an R value, a G matrix having a G value, and a B matrix having a B value, and a set of three sets of matrices is an RGB matrix, and the obtained RGB matrix may be a set including R, G, B three sets of matrices, or may be a set combining three sets of matrices into one matrix as an RGB matrix by calculation, which is not limited herein.
And processing the image by adopting a preset means to obtain a gray level image matrix. Referring to fig. 2, in an embodiment, the obtaining of the grayscale image matrix by the preset method includes:
fig. 2 is a second flow chart of the image semantic segmentation method according to the embodiment of the present application.
And S1021, converting the image into a gray scale image.
And S1022, performing binarization processing on the gray level image to obtain a binarized gray level image matrix, wherein the element value in the binarized gray level image matrix is 0 or 255.
The processing to obtain the gray-scale map based on the image may be implemented in various manners, such as adjusting RGB brightness, that is, adjusting the brightness of R, G, B in each pixel to R ═ G ═ B to obtain a gray-scale image with only black and white, and it is understood that various manners for obtaining the gray-scale map are common and will not be described in detail herein. And performing binarization processing on the gray-scale image on the basis that the gray-scale value of each pixel in the obtained gray-scale image ranges from 0 to 255, so that the gray-scale value of each pixel only has two selectable values, namely 0 or 255, thereby obtaining a pixel set which only has two values, namely 0 and 255, namely a binarized gray-scale image matrix. It should be emphasized here that the binarization processing of an image is the most common and important processing means in image analysis and processing, and that there are many binarization processing methods. For example, one method may be: scanning each pixel value in the gray-scale image, wherein each pixel value is between 0 and 255, if the pixel value in a pixel is less than 127, setting the pixel value of the pixel to be 0, and if the pixel value is more than or equal to 127, setting the pixel value to be 255, and finally obtaining a new pixel set (gray-scale image matrix) consisting of only 0 and 255 pixel values. Another method may be: calculating an average value K of pixels in the gray image, scanning each pixel value of the gray image, if the pixel value is larger than K, setting the pixel value to be 255, and if the pixel value is smaller than or equal to K, setting the pixel value to be 0. One of 0 and 255 may be black and the other white, and in general, 255 is white and 0 is black. Alternatively, other binarization processing methods may be used, which are not listed here.
In one embodiment, an adaptive threshold binarization algorithm may be used to binarize the gray scale map. In various methods for obtaining a gray-scale image by using binarization, in order to avoid local pixel loss caused in the process of converting a processed image into the gray-scale image, the binarization result cannot truly reflect the information of a source image. Therefore, the adaptive threshold value binarization algorithm is selected to process the gray level image, the brightness of different areas in the image is different due to different illumination, namely, the external environment can influence the brightness presented by different pixels of the image, so that the obtained gray level image result is changed, in order to reduce the interference of the outside to the gray level image obtaining process, the optimal threshold value is determined by means of measuring the uniformity of gray level distribution by variance and the like, and the algorithm for performing binarization processing on the image by using the optimal threshold value is the adaptive threshold value binarization algorithm, so that the gray level image with better image effect is obtained by calculating the optimal threshold value. It should be emphasized that this method is only one implementation of obtaining a highly accurate gray scale map, but is not limited thereto.
Alternatively, referring to fig. 3, a specific implementation of binarizing the gray-scale map by using an adaptive threshold binarization algorithm may include:
fig. 3 is a schematic flow chart of a third process of the image semantic segmentation method according to the embodiment of the present application.
S1023, acquiring an m multiplied by n pixel area with the current pixel point as the center;
s1024, respectively calculating the distance from each pixel point in the mxn pixel area to the current pixel point according to a Gaussian function, and carrying out weighted average to obtain a binarization threshold value of the pixel point;
and S1025, comparing the gray value of the current pixel point with a binarization threshold, if the gray value of the current pixel point is greater than the binarization threshold, setting the binarization value of the current pixel point to 255, and if the gray value of the current pixel point is less than the binarization threshold, setting the binarization value of the current pixel point to 0.
It is to be understood that m and n may be the same or different in the obtained m × n pixel region, and in the present application, the obtained m × n pixel region may be an 11 × 11 pixel region or may be another pixel region, which is not specifically limited herein.
In an embodiment, before combining the obtained RGB matrix and the grayscale image matrix according to a preset rule to obtain a target matrix set, the grayscale image matrix and the RGB matrix may be normalized respectively, so that the element values in the normalized RGB matrix and the grayscale image matrix are between [0,1], so as to facilitate comparison or combination between the RGB matrix and the grayscale image matrix.
It will be appreciated that when normalizing the values of the elements in the RGB matrix, R, G, B may exist as separate matrices or may be combined together, for example, a single R matrix, where the value of each column in the matrix is divided by the absolute value of the sum of squares of all elements in each column, so that the sum of squares of the values of the elements in each column of the matrix is 1. For example, the matrix [1,2,3], the result of normalizing it is [0.2673,0.5345,0.8018 ]. The square of which is 1. In this way, a normalized RGB matrix is obtained, where each element value in the RGB matrix is between [0,1 ]. In the gray scale image matrix with element values having only two values of 0 and 255, when normalization processing is performed on the gray scale image matrix, each element value in the obtained normalized gray scale image matrix has only two possibilities of 0 or 1, and detailed description is omitted here.
And combining the RGB matrix and the gray level image matrix according to a preset rule to obtain a target matrix set. In this step, the preset rule may be that the RGB matrix and the grayscale image matrix with element values between [0,1] are obtained by the above-mentioned means, and since the element values in the RGB matrix and the grayscale image matrix after the above-mentioned processing are both between [0,1], they may be combined together as a matrix set. The preset rule may be different from the above-described means, and is not particularly limited herein.
In the obtained target matrix set, R, G, B and the four matrices of the grayscale image may be respectively present in the target matrix set, or the RGB matrix and the grayscale image matrix may be combined into one matrix as the target matrix set, which is not specifically limited herein.
And performing semantic segmentation processing on the image, and performing semantic segmentation by using a preset semantic segmentation model of the convolutional neural network in order to obtain a segmented image with higher precision. A Convolutional Neural Network (CNN) is a feed-forward Neural Network whose artificial neurons can respond to a portion of the coverage of surrounding cells, which performs well for large image processing, and includes Convolutional layers (Convolutional layers) and pooling layers (pooling layers). More specifically, the convolutional neural network includes 3 types of common structures of a convolutional layer, a pooling layer and a full-link layer, data needs to be input first when the convolutional neural network is used, the convolutional layer performs convolution processing according to the input data, and the data is output after the convolutional neural network enters the pooling layer and the full-link layer for processing.
When semantic segmentation is performed by using a preset semantic segmentation model of a convolutional neural network, it is found that the recognition accuracy of the edge position of a segmented object or the edge pixels of a certain category in an image is not enough, and false recognition or false recognition is easy to occur, so that the edge of the segmented object is deviated from an actual object. In order to improve edge accuracy in semantic segmentation, input data are processed, so that edge identification accuracy of a convolutional neural network is enhanced in the operation process. It should be emphasized that, in the present solution, no separate improvement is made on the specific implementation means of the convolution neural network, such as the specific operation algorithm and convolution step, and the processing through the pooling layer and the full link layer, and the semantic segmentation of the convolution neural network is implemented based on the existing means, which is not described in detail herein.
The function of the convolutional layer is to perform feature extraction on input data, the convolutional layer internally comprises a plurality of convolutional kernels, and each element forming the convolutional kernels corresponds to a weight coefficient and a deviation amount, and is similar to a neuron (neuron) of a feedforward neural network. When the convolution kernel works, the input feature is regularly swept, convolution operation is carried out according to the input feature and the corresponding convolution kernel, a new feature is output, the next convolution layer is input for convolution, and the convolution operation is repeated until the convolution is finished.
It can be understood that, in the convolution layers, data exists in three-dimensional form, if the data is a gray picture, there are only 1 feature, that is, 1 channel, and if the data is a color image containing R, G, B, there are 3 features, that is, 3 channels, the corresponding feature is input into the first convolution layer and is subjected to convolution operation with the corresponding convolution kernel, a new feature is output and is input as a new input feature into the next convolution layer and is subjected to convolution operation with the corresponding convolution kernel, and the process is repeated until the process is finished.
In the convolution process, the more input features of the first layer of convolution, the higher the segmentation precision of the image, and the better the identification of the target object edge. In order to improve the recognition accuracy, an input matrix of a preset semantic segmentation model of a convolutional neural network during semantic segmentation is set as a target matrix set, and the target matrix set comprises 4 normalized matrixes of a grayscale image and R, G, B, so that the obtained matrix size is height x width x 4, wherein height is the height of the picture, width is the width of the picture, and 4 represents that 1 channel of the grayscale image is added to three channels of RGB of the image. It can be understood that, when the original image is subjected to convolutional neural network semantic segmentation, the matrix size input to the first layer in convolution is height x width x 3, and when the input matrix is set as a target matrix set, the input matrix size is height x width x 4, compared with the input matrix set with three channels, after the input of the input matrix of four channels, the gray image only consists of matrices with two obvious characteristics of black and white, and the edge characteristics are enhanced in the gray image, so that when the gray image matrix corresponding to the gray image and the RGB matrix are taken as the input matrix in semantic segmentation, the identification of the edge characteristics of an object can be correspondingly enhanced in convolution, and the identification accuracy of the position of the edge characteristics of the object is facilitated.
Specifically, when performing semantic segmentation processing, in order to ensure a better effect, a semantic segmentation model is established first, where the semantic segmentation model is a pre-trained model containing all targets in an image, and when performing semantic segmentation on an image, only data of the image needs to be input into the model, and the model performs convolution calculation according to an established algorithm and outputs segmented data.
It is understood that the whole process of the convolutional neural network operation is included in the semantic segmentation model, and is not described in detail herein. The image segmentation method comprises an input module and an output module, wherein the input module is used for inputting image data needing semantic segmentation, and the output module is used for outputting segmented image data after segmentation is completed. Based on the above method, we can set the input dimension of the first convolutional layer in the input module to 4 in advance, i.e. include R, G, B, 4 dimensions (channels) of the grayscale image. The input dimension may also be automatically modified to 4 in the process of processing based on the semantic segmentation model, which is not specifically limited herein. And inputting the target matrix set obtained through the image in advance as an input matrix into the semantic segmentation model for operation, wherein in the operation process, the input characteristic of the gray image is added, so that the identification of the edge characteristic of the object is facilitated, and the identification precision of the edge characteristic is improved. The semantic segmentation of the convolutional neural network based on the semantic segmentation model is a common means, and will not be described in detail herein.
In order to improve the recognition accuracy of the edge positions of different types of features in an image when performing semantic segmentation on the image, for example, in a color image including a person and a vehicle, if the color difference between the boundary of the target feature of the person and other nearby objects is not obvious or fuzzy, the target feature of the person is not easy to be obtained by segmentation, and the deviation from the actual person is large. Therefore, when performing semantic segmentation, the input dimension of the first convolutional layer in the input module in the semantic segmentation model is set to 4 in advance. On this basis, the color image is further processed to obtain an input matrix matching the 4-dimension. In order to obtain a 4-dimensional input matrix, on the basis of 3 dimensions of an original RGB matrix, a gray image matrix processed by a preset means and the RGB matrix are subjected to common normalization processing to obtain a target matrix set, and the 4-dimensional target matrix set is used as the input matrix so as to increase the edge identification precision of the target edge position in the convolution process. After the color image is converted into the gray image, when the matrix and the RGB matrix are used as the input matrix together, the boundary of the target feature and the feature pixels beside the target feature are more obvious when the target feature of a person is obtained because the gray image has only two colors of black and white. Therefore, after the matrix of the gray-scale image is connected with the RGB matrix and is used as an input matrix for input, the method is more favorable for semantic segmentation so as to improve the identification precision of the target feature edge.
The present application further provides an electronic device, see fig. 4, the electronic device 1000 comprising a memory 1010 and a processor 1020.
The Processor 1020 may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The memory 1010 may include various types of storage units, such as system memory, Read Only Memory (ROM), and permanent storage. Wherein the ROM may store static data or instructions that are needed by the processor 1020 or other modules of the computer. The persistent storage device may be a read-write storage device. The persistent storage may be a non-volatile storage device that does not lose stored instructions and data even after the computer is powered off. In some embodiments, the persistent storage device employs a mass storage device (e.g., magnetic or optical disk, flash memory) as the persistent storage device. In other embodiments, the permanent storage may be a removable storage device (e.g., floppy disk, optical drive). The system memory may be a read-write memory device or a volatile read-write memory device, such as a dynamic random access memory. The system memory may store instructions and data that some or all of the processors require at runtime. Further, the memory 1010 may include any combination of computer-readable storage media, including various types of semiconductor memory chips (DRAM, SRAM, SDRAM, flash memory, programmable read-only memory), magnetic and/or optical disks, among others. In some embodiments, memory 1010 may include a removable storage device that is readable and/or writable, such as a Compact Disc (CD), a read-only digital versatile disc (e.g., DVD-ROM, dual layer DVD-ROM), a read-only Blu-ray disc, an ultra-density optical disc, a flash memory card (e.g., SD card, min SD card, Micro-SD card, etc.), a magnetic floppy disc, or the like. Computer-readable storage media do not contain carrier waves or transitory electronic signals transmitted by wireless or wired means.
The memory 1010 has stored thereon executable code that, when processed by the processor 1020, may cause the processor 1020 to perform some or all of the methods described above.
Furthermore, the method according to the present application may also be implemented as a computer program or computer program product comprising computer program code instructions for performing some or all of the steps of the above-described method of the present application.
Alternatively, the present application may also be embodied as a non-transitory machine-readable storage medium (or computer-readable storage medium, or machine-readable storage medium) having stored thereon executable code (or a computer program, or computer instruction code) which, when executed by a processor of an electronic device (or electronic device, server, etc.), causes the processor to perform some or all of the various steps of the above-described methods in accordance with the present application.
Having described embodiments of the present application, the foregoing description is intended to be exemplary, not exhaustive, and not limited to the disclosed embodiments. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The terminology used herein is chosen in order to best explain the principles of the embodiments, the practical application, or improvements made to the technology in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein.

Claims (10)

1. An image semantic segmentation method, comprising:
obtaining an RGB matrix based on the image;
processing the image by adopting a preset means to obtain a gray level image matrix;
merging the RGB matrix and the gray level image matrix according to a preset rule to obtain a target matrix set;
and performing semantic segmentation processing on the image by using a preset semantic segmentation model of a convolutional neural network, wherein an input matrix of the preset semantic segmentation model of the convolutional neural network is the target matrix set during semantic segmentation.
2. The method according to claim 1, wherein performing semantic segmentation processing on the image by using a preset semantic segmentation model of a convolutional neural network, wherein an input matrix of the preset semantic segmentation model of the convolutional neural network in semantic segmentation is the target matrix set, and the semantic segmentation comprises:
establishing a semantic segmentation model, wherein the semantic segmentation model at least comprises an input module, and the input dimension of a first convolution layer in the input module is set to be 4;
and performing semantic segmentation processing on the image by taking the input matrix as the target matrix set based on the semantic segmentation model.
3. The method of claim 1, wherein before obtaining the set of target matrices, the method comprises:
and respectively carrying out normalization processing on the gray image matrix and the RGB matrix.
4. The method of claim 3, wherein each element value in the normalized grayscale image matrix is 0 or 1, and each element value in the RGB matrix is between [0,1 ].
5. The method according to claim 3, wherein the processing the image by a preset means to obtain a gray-scale image matrix comprises:
converting the image into a grey-scale map;
and carrying out binarization processing on the gray level image to obtain a binarized gray level image matrix, wherein the element value in the binarized gray level image matrix is 0 or 255.
6. The method according to claim 5, wherein the binarizing processing on the gray map comprises:
and carrying out binarization processing on the gray level image by adopting an adaptive threshold value binarization algorithm.
7. The method according to claim 6, wherein the binarizing the gray scale map by using an adaptive threshold binarization algorithm comprises:
acquiring an mxn pixel region with a current pixel point as a center;
respectively calculating the distance from each pixel point in the mxn pixel region to the current pixel point according to a Gaussian function and carrying out weighted average to obtain a binarization threshold value of the pixel point;
and comparing the gray value of the current pixel point with the binarization threshold, if the gray value of the current pixel point is greater than the binarization threshold, setting the binarization value of the current pixel point to be 255, and if the gray value of the current pixel point is less than the binarization threshold, setting the binarization value of the current pixel point to be 0.
8. The method according to any one of claims 1 to 7, wherein at least one class label is labeled in a segmented image obtained by performing semantic segmentation processing on the image by using a preset semantic segmentation model of a convolutional neural network.
9. An electronic device, comprising:
a processor; and
a memory having executable code stored thereon, which when executed by the processor, causes the processor to perform the method of any one of claims 1-8.
10. A storage medium having stored thereon executable code which, when executed by a processor of an electronic device, causes the processor to perform the method of any one of claims 1-8.
CN202111274244.5A 2021-10-29 2021-10-29 Image semantic segmentation method, electronic device and storage medium Pending CN113989307A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111274244.5A CN113989307A (en) 2021-10-29 2021-10-29 Image semantic segmentation method, electronic device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111274244.5A CN113989307A (en) 2021-10-29 2021-10-29 Image semantic segmentation method, electronic device and storage medium

Publications (1)

Publication Number Publication Date
CN113989307A true CN113989307A (en) 2022-01-28

Family

ID=79744604

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111274244.5A Pending CN113989307A (en) 2021-10-29 2021-10-29 Image semantic segmentation method, electronic device and storage medium

Country Status (1)

Country Link
CN (1) CN113989307A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114724133A (en) * 2022-04-18 2022-07-08 北京百度网讯科技有限公司 Character detection and model training method, device, equipment and storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114724133A (en) * 2022-04-18 2022-07-08 北京百度网讯科技有限公司 Character detection and model training method, device, equipment and storage medium
CN114724133B (en) * 2022-04-18 2024-02-02 北京百度网讯科技有限公司 Text detection and model training method, device, equipment and storage medium

Similar Documents

Publication Publication Date Title
CN109584248B (en) Infrared target instance segmentation method based on feature fusion and dense connection network
US10282589B2 (en) Method and system for detection and classification of cells using convolutional neural networks
CN109344874B (en) Chromosome automatic analysis method and system based on deep learning
US8873856B1 (en) Determining a class associated with an image
CN108764358B (en) Terahertz image identification method, device and equipment and readable storage medium
CN109684922B (en) Multi-model finished dish identification method based on convolutional neural network
US11200659B2 (en) Neural network training device, system and method
CN111445478A (en) Intracranial aneurysm region automatic detection system and detection method for CTA image
EP3584742A1 (en) System and method for traffic sign recognition
CN109145964B (en) Method and system for realizing image color clustering
Björklund et al. Automatic license plate recognition with convolutional neural networks trained on synthetic data
CN113822951B (en) Image processing method, device, electronic equipment and storage medium
CN112464983A (en) Small sample learning method for apple tree leaf disease image classification
CN112488083B (en) Identification method, device and medium of traffic signal lamp based on key point extraction of hetmap
GB2580675A (en) Tyre sidewall imaging method
US20230044794A1 (en) Neural network training device, system and method
CN113989307A (en) Image semantic segmentation method, electronic device and storage medium
CN117557784B (en) Target detection method, target detection device, electronic equipment and storage medium
Wang et al. Local defect detection and print quality assessment
Gu et al. Thermal image colorization using Markov decision processes
Senthilkumar et al. Optimal segmentation with back-propagation neural network (BPNN) based citrus leaf disease diagnosis
Kurdthongmee et al. A Framework to Create a Deep Learning Detector from a Small Dataset: A Case of Parawood Pith Estimation
CN115187790A (en) Image contour extraction method based on reference region binarization result
CN114529906A (en) Method and system for detecting abnormity of digital instrument of power transmission equipment based on character recognition
KR20220101373A (en) Apparatus and method for detecting defects in pcb based on video

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination