CN113989307A

CN113989307A - Image semantic segmentation method, electronic device and storage medium

Info

Publication number: CN113989307A
Application number: CN202111274244.5A
Authority: CN
Inventors: 罗壮; 张雪; 张海强; 李成军
Original assignee: Zhidao Network Technology Beijing Co Ltd
Current assignee: Zhidao Network Technology Beijing Co Ltd
Priority date: 2021-10-29
Filing date: 2021-10-29
Publication date: 2022-01-28

Abstract

The application relates to an image semantic segmentation method, electronic equipment and a storage medium. The method comprises the following steps: obtaining an RGB matrix based on the image; processing the image by adopting a preset means to obtain a gray level image matrix; merging the RGB matrix and the gray level image matrix according to a preset rule to obtain a target matrix set; and performing semantic segmentation processing on the image by using a preset semantic segmentation model of a convolutional neural network, wherein an input matrix of the preset semantic segmentation model of the convolutional neural network is the target matrix set during semantic segmentation. The image semantic segmentation method, the electronic device and the storage medium can enhance the characteristics of the target edge in the image and improve the segmentation precision of the image target edge.

Description

Image semantic segmentation method, electronic device and storage medium

Technical Field

The present application relates to the field of computer technologies, and in particular, to an image semantic segmentation method, an electronic device, and a storage medium.

Background

In recent years, machine learning techniques centered on deep learning have attracted attention. For example, autonomous driving of automobiles has become increasingly possible, but throughout the deep learning process, algorithms are required to recognize and learn images provided as raw data, in which semantic segmentation techniques are applied. Semantic segmentation is a typical computer vision problem that involves taking some raw data (e.g., flat images) as input and converting them into masks with highlighted regions of interest. Semantic segmentation processing using a convolutional neural network is widely used as a method with high segmentation accuracy. However, when semantic segmentation processing is performed using a convolutional neural network, the accuracy of pixel recognition of the edge position of a target feature in an image is low, and erroneous recognition or erroneous recognition is likely to occur.

Disclosure of Invention

In order to solve or partially solve the problems in the related art, the application provides an image semantic segmentation method, an electronic device and a storage medium, and the image semantic segmentation method, the electronic device and the storage medium can enhance the characteristics of the target edge in the image and improve the segmentation precision of the target edge of the image.

The first aspect of the present application provides an image semantic segmentation method, including:

obtaining an RGB matrix based on the image;

processing the image by adopting a preset means to obtain a gray level image matrix;

merging the RGB matrix and the gray level image matrix according to a preset rule to obtain a target matrix set;

and performing semantic segmentation processing on the image by using a preset semantic segmentation model of a convolutional neural network, wherein an input matrix of the preset semantic segmentation model of the convolutional neural network is the target matrix set during semantic segmentation.

In one embodiment, the performing semantic segmentation processing on the image by using a preset semantic segmentation model of a convolutional neural network, where an input matrix of the preset semantic segmentation model of the convolutional neural network in semantic segmentation is the target matrix set includes:

establishing a semantic segmentation model, wherein the semantic segmentation model at least comprises an input module, and the input dimension of a first convolution layer in the input module is set to be 4;

and performing semantic segmentation processing on the image by taking the input matrix as the target matrix set based on the semantic segmentation model.

In one embodiment, before obtaining the target matrix set, the method includes:

and respectively carrying out normalization processing on the gray image matrix and the RGB matrix.

In one embodiment, each element value in the gray-scale image matrix after normalization is 0 or 1, and each element value in the RGB matrix is between [0,1 ].

In an embodiment, the processing the image by a preset means to obtain a grayscale image matrix includes:

converting the image into a grey-scale map;

and carrying out binarization processing on the gray level image to obtain a binarized gray level image matrix, wherein the element value in the binarized gray level image matrix is 0 or 255.

In one embodiment, the binarizing processing on the grayscale map includes:

and carrying out binarization processing on the gray level image by adopting an adaptive threshold value binarization algorithm.

In one embodiment, the binarizing the gray map by using an adaptive threshold binarization algorithm includes:

acquiring an mxn pixel region with a current pixel point as a center;

respectively calculating the distance from each pixel point in the mxn pixel region to the current pixel point according to a Gaussian function and carrying out weighted average to obtain a binarization threshold value of the pixel point;

and comparing the gray value of the current pixel point with the binarization threshold, if the gray value of the current pixel point is greater than the binarization threshold, setting the binarization value of the current pixel point to be 255, and if the gray value of the current pixel point is less than the binarization threshold, setting the binarization value of the current pixel point to be 0.

In one embodiment, at least one category label is marked in the segmented image obtained by performing semantic segmentation processing on the image by using a preset semantic segmentation model of a convolutional neural network.

A second aspect of the present application provides an electronic device, comprising:

a processor; and

a memory having executable code stored thereon, which when executed by the processor, causes the processor to perform the method as described above.

A third aspect of the application provides a storage medium having stored thereon executable code, which when executed by a processor of an electronic device, causes the processor to perform the method as described above.

The technical scheme provided by the application can comprise the following beneficial effects:

the image semantic segmentation method, the electronic device and the storage medium provided by the embodiment of the application comprise the following steps: acquiring an RGB matrix based on an image, processing the image by adopting a preset means to obtain a gray level image matrix, and combining the RGB matrix and the gray level image matrix according to a preset rule to obtain a target matrix set; and performing semantic segmentation processing on the image by using a preset semantic segmentation model of the convolutional neural network, wherein an input matrix of the preset semantic segmentation model of the convolutional neural network is a target matrix set during semantic segmentation. And performing common normalization processing on the gray image matrix and the RGB matrix processed by the preset means to obtain a target matrix set, and taking the 4-dimensional target matrix set as an input matrix. Moreover, when the image is converted into a gray image and the matrix of the gray image and the RGB matrix are used together as an input matrix, the boundary of the target feature and the feature pixels beside the target feature are more obvious when the target feature is acquired because the gray image has only two colors, namely black and white. Therefore, after the matrix of the gray-scale image is connected with the RGB matrix and is used as an input matrix for input, the method is more favorable for semantic segmentation so as to improve the identification precision of the target feature edge.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application.

Drawings

The foregoing and other objects, features and advantages of the application will be apparent from the following more particular descriptions of exemplary embodiments of the application, as illustrated in the accompanying drawings wherein like reference numbers generally represent like parts throughout the exemplary embodiments of the application.

FIG. 1 is a first flowchart of an image semantic segmentation method according to an embodiment of the present disclosure;

FIG. 2 is a second flow chart of the image semantic segmentation method according to the embodiment of the present application;

FIG. 3 is a third flowchart of an image semantic segmentation method according to an embodiment of the present application;

fig. 4 is a schematic diagram illustrating an electronic device according to an embodiment of the present application.

Detailed Description

Embodiments of the present application will be described in more detail below with reference to the accompanying drawings. While embodiments of the present application are illustrated in the accompanying drawings, it should be understood that the present application may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in this application and the appended claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items.

It should be understood that although the terms "first," "second," "third," etc. may be used herein to describe various information, these information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope of the present application. Thus, a feature defined as "first" or "second" may explicitly or implicitly include one or more of that feature. In the description of the present application, "a plurality" means two or more unless specifically limited otherwise.

In the related art, semantic segmentation is a common computer vision task and is widely applied to the fields of automatic driving, human-computer interaction, virtual reality and the like. Semantic segmentation is a high-level task, and paves the way for realizing complete understanding of scenes. The goal of semantic segmentation is to classify each pixel point in an image, thereby labeling the image regions occupied by different classes of objects in the classification. With the popularization of deep learning in recent years, many semantic segmentation problems are being solved by adopting a deep structure, most commonly a convolutional neural network, which has higher precision and efficiency.

Optionally, in the application, for example, the semantic segmentation is applied to the field of automatic driving, the vehicle subjected to automatic driving is provided with the laser radar, and in the driving process of the vehicle, information outside the vehicle can be collected by the laser radar and converted into a picture as original image data, and the subsequent process is not separately emphasized.

In a segmented image obtained by processing an image by semantic segmentation of original image data, each pixel point is marked with category information and a boundary which belongs to the same category and serves as an object. For two adjacent objects, the boundary distance is very close or the boundary is shared, the characteristics of the edge of the target object are not obvious, and pixel points positioned on the boundary and nearby are easily identified by mistake or identified by mistake and marked as other types of information, so that the pixel identification precision of the corresponding edge position is low.

In order to solve the problems, the application provides an image semantic segmentation method which can enhance the characteristics of the target edge in an image and improve the segmentation precision of the image target edge.

The technical solutions of the embodiments of the present application are described in detail below with reference to the accompanying drawings.

Fig. 1 is a schematic flow chart of a first process of an image semantic segmentation method according to an embodiment of the present application.

Referring to fig. 1, the method includes:

s101, obtaining an RGB matrix based on the image.

And S102, processing the image by adopting a preset means to obtain a gray level image matrix.

And S103, combining the RGB matrix and the gray level image matrix according to a preset rule to obtain a target matrix set.

And S104, performing semantic segmentation processing on the image by utilizing a preset semantic segmentation model of the convolutional neural network, wherein an input matrix of the preset semantic segmentation model of the convolutional neural network is a target matrix set during semantic segmentation.

In the embodiment of the present application, the acquired image may be: the method includes that an automatic driving vehicle collects external image data of a certain time period through a laser radar and converts the external image data into original image data of picture data, the original image data comprises a plurality of two-dimensional color images of different moments in the time period, the obtained color image is one of the two-dimensional color images, and a semantic segmentation method is explained in detail below by taking one color image of a certain moment as an example.

An RGB matrix is obtained based on the image. All colors in an image are mixed according to different proportions based on red (R), green (G), and blue (B), a group of red, green, and blue are the smallest display unit, which is called a pixel, and a plurality of pixels are arranged according to a certain rule to form a corresponding image. Therefore, one pixel includes three luminance values of red, green, and blue, which are also referred to as RGB values. Generally speaking, the RGB values have 256 levels of brightness for each of the three color colors, numerically represented as 0,1, 2 … … 255, and the collection of all the rows and columns of pixels in an image is called a matrix. Since each pixel includes RGB values of three primary color pixels, for an image, there are an R matrix having an R value, a G matrix having a G value, and a B matrix having a B value, and a set of three sets of matrices is an RGB matrix, and the obtained RGB matrix may be a set including R, G, B three sets of matrices, or may be a set combining three sets of matrices into one matrix as an RGB matrix by calculation, which is not limited herein.

And processing the image by adopting a preset means to obtain a gray level image matrix. Referring to fig. 2, in an embodiment, the obtaining of the grayscale image matrix by the preset method includes:

fig. 2 is a second flow chart of the image semantic segmentation method according to the embodiment of the present application.

And S1021, converting the image into a gray scale image.

And S1022, performing binarization processing on the gray level image to obtain a binarized gray level image matrix, wherein the element value in the binarized gray level image matrix is 0 or 255.

The processing to obtain the gray-scale map based on the image may be implemented in various manners, such as adjusting RGB brightness, that is, adjusting the brightness of R, G, B in each pixel to R ═ G ═ B to obtain a gray-scale image with only black and white, and it is understood that various manners for obtaining the gray-scale map are common and will not be described in detail herein. And performing binarization processing on the gray-scale image on the basis that the gray-scale value of each pixel in the obtained gray-scale image ranges from 0 to 255, so that the gray-scale value of each pixel only has two selectable values, namely 0 or 255, thereby obtaining a pixel set which only has two values, namely 0 and 255, namely a binarized gray-scale image matrix. It should be emphasized here that the binarization processing of an image is the most common and important processing means in image analysis and processing, and that there are many binarization processing methods. For example, one method may be: scanning each pixel value in the gray-scale image, wherein each pixel value is between 0 and 255, if the pixel value in a pixel is less than 127, setting the pixel value of the pixel to be 0, and if the pixel value is more than or equal to 127, setting the pixel value to be 255, and finally obtaining a new pixel set (gray-scale image matrix) consisting of only 0 and 255 pixel values. Another method may be: calculating an average value K of pixels in the gray image, scanning each pixel value of the gray image, if the pixel value is larger than K, setting the pixel value to be 255, and if the pixel value is smaller than or equal to K, setting the pixel value to be 0. One of 0 and 255 may be black and the other white, and in general, 255 is white and 0 is black. Alternatively, other binarization processing methods may be used, which are not listed here.

In one embodiment, an adaptive threshold binarization algorithm may be used to binarize the gray scale map. In various methods for obtaining a gray-scale image by using binarization, in order to avoid local pixel loss caused in the process of converting a processed image into the gray-scale image, the binarization result cannot truly reflect the information of a source image. Therefore, the adaptive threshold value binarization algorithm is selected to process the gray level image, the brightness of different areas in the image is different due to different illumination, namely, the external environment can influence the brightness presented by different pixels of the image, so that the obtained gray level image result is changed, in order to reduce the interference of the outside to the gray level image obtaining process, the optimal threshold value is determined by means of measuring the uniformity of gray level distribution by variance and the like, and the algorithm for performing binarization processing on the image by using the optimal threshold value is the adaptive threshold value binarization algorithm, so that the gray level image with better image effect is obtained by calculating the optimal threshold value. It should be emphasized that this method is only one implementation of obtaining a highly accurate gray scale map, but is not limited thereto.

Alternatively, referring to fig. 3, a specific implementation of binarizing the gray-scale map by using an adaptive threshold binarization algorithm may include:

fig. 3 is a schematic flow chart of a third process of the image semantic segmentation method according to the embodiment of the present application.

S1023, acquiring an m multiplied by n pixel area with the current pixel point as the center;

s1024, respectively calculating the distance from each pixel point in the mxn pixel area to the current pixel point according to a Gaussian function, and carrying out weighted average to obtain a binarization threshold value of the pixel point;

and S1025, comparing the gray value of the current pixel point with a binarization threshold, if the gray value of the current pixel point is greater than the binarization threshold, setting the binarization value of the current pixel point to 255, and if the gray value of the current pixel point is less than the binarization threshold, setting the binarization value of the current pixel point to 0.

It is to be understood that m and n may be the same or different in the obtained m × n pixel region, and in the present application, the obtained m × n pixel region may be an 11 × 11 pixel region or may be another pixel region, which is not specifically limited herein.

In an embodiment, before combining the obtained RGB matrix and the grayscale image matrix according to a preset rule to obtain a target matrix set, the grayscale image matrix and the RGB matrix may be normalized respectively, so that the element values in the normalized RGB matrix and the grayscale image matrix are between [0,1], so as to facilitate comparison or combination between the RGB matrix and the grayscale image matrix.

It will be appreciated that when normalizing the values of the elements in the RGB matrix, R, G, B may exist as separate matrices or may be combined together, for example, a single R matrix, where the value of each column in the matrix is divided by the absolute value of the sum of squares of all elements in each column, so that the sum of squares of the values of the elements in each column of the matrix is 1. For example, the matrix [1,2,3], the result of normalizing it is [0.2673,0.5345,0.8018 ]. The square of which is 1. In this way, a normalized RGB matrix is obtained, where each element value in the RGB matrix is between [0,1 ]. In the gray scale image matrix with element values having only two values of 0 and 255, when normalization processing is performed on the gray scale image matrix, each element value in the obtained normalized gray scale image matrix has only two possibilities of 0 or 1, and detailed description is omitted here.

And combining the RGB matrix and the gray level image matrix according to a preset rule to obtain a target matrix set. In this step, the preset rule may be that the RGB matrix and the grayscale image matrix with element values between [0,1] are obtained by the above-mentioned means, and since the element values in the RGB matrix and the grayscale image matrix after the above-mentioned processing are both between [0,1], they may be combined together as a matrix set. The preset rule may be different from the above-described means, and is not particularly limited herein.

In the obtained target matrix set, R, G, B and the four matrices of the grayscale image may be respectively present in the target matrix set, or the RGB matrix and the grayscale image matrix may be combined into one matrix as the target matrix set, which is not specifically limited herein.

And performing semantic segmentation processing on the image, and performing semantic segmentation by using a preset semantic segmentation model of the convolutional neural network in order to obtain a segmented image with higher precision. A Convolutional Neural Network (CNN) is a feed-forward Neural Network whose artificial neurons can respond to a portion of the coverage of surrounding cells, which performs well for large image processing, and includes Convolutional layers (Convolutional layers) and pooling layers (pooling layers). More specifically, the convolutional neural network includes 3 types of common structures of a convolutional layer, a pooling layer and a full-link layer, data needs to be input first when the convolutional neural network is used, the convolutional layer performs convolution processing according to the input data, and the data is output after the convolutional neural network enters the pooling layer and the full-link layer for processing.

When semantic segmentation is performed by using a preset semantic segmentation model of a convolutional neural network, it is found that the recognition accuracy of the edge position of a segmented object or the edge pixels of a certain category in an image is not enough, and false recognition or false recognition is easy to occur, so that the edge of the segmented object is deviated from an actual object. In order to improve edge accuracy in semantic segmentation, input data are processed, so that edge identification accuracy of a convolutional neural network is enhanced in the operation process. It should be emphasized that, in the present solution, no separate improvement is made on the specific implementation means of the convolution neural network, such as the specific operation algorithm and convolution step, and the processing through the pooling layer and the full link layer, and the semantic segmentation of the convolution neural network is implemented based on the existing means, which is not described in detail herein.

The function of the convolutional layer is to perform feature extraction on input data, the convolutional layer internally comprises a plurality of convolutional kernels, and each element forming the convolutional kernels corresponds to a weight coefficient and a deviation amount, and is similar to a neuron (neuron) of a feedforward neural network. When the convolution kernel works, the input feature is regularly swept, convolution operation is carried out according to the input feature and the corresponding convolution kernel, a new feature is output, the next convolution layer is input for convolution, and the convolution operation is repeated until the convolution is finished.

It can be understood that, in the convolution layers, data exists in three-dimensional form, if the data is a gray picture, there are only 1 feature, that is, 1 channel, and if the data is a color image containing R, G, B, there are 3 features, that is, 3 channels, the corresponding feature is input into the first convolution layer and is subjected to convolution operation with the corresponding convolution kernel, a new feature is output and is input as a new input feature into the next convolution layer and is subjected to convolution operation with the corresponding convolution kernel, and the process is repeated until the process is finished.

In the convolution process, the more input features of the first layer of convolution, the higher the segmentation precision of the image, and the better the identification of the target object edge. In order to improve the recognition accuracy, an input matrix of a preset semantic segmentation model of a convolutional neural network during semantic segmentation is set as a target matrix set, and the target matrix set comprises 4 normalized matrixes of a grayscale image and R, G, B, so that the obtained matrix size is height x width x 4, wherein height is the height of the picture, width is the width of the picture, and 4 represents that 1 channel of the grayscale image is added to three channels of RGB of the image. It can be understood that, when the original image is subjected to convolutional neural network semantic segmentation, the matrix size input to the first layer in convolution is height x width x 3, and when the input matrix is set as a target matrix set, the input matrix size is height x width x 4, compared with the input matrix set with three channels, after the input of the input matrix of four channels, the gray image only consists of matrices with two obvious characteristics of black and white, and the edge characteristics are enhanced in the gray image, so that when the gray image matrix corresponding to the gray image and the RGB matrix are taken as the input matrix in semantic segmentation, the identification of the edge characteristics of an object can be correspondingly enhanced in convolution, and the identification accuracy of the position of the edge characteristics of the object is facilitated.

Specifically, when performing semantic segmentation processing, in order to ensure a better effect, a semantic segmentation model is established first, where the semantic segmentation model is a pre-trained model containing all targets in an image, and when performing semantic segmentation on an image, only data of the image needs to be input into the model, and the model performs convolution calculation according to an established algorithm and outputs segmented data.

It is understood that the whole process of the convolutional neural network operation is included in the semantic segmentation model, and is not described in detail herein. The image segmentation method comprises an input module and an output module, wherein the input module is used for inputting image data needing semantic segmentation, and the output module is used for outputting segmented image data after segmentation is completed. Based on the above method, we can set the input dimension of the first convolutional layer in the input module to 4 in advance, i.e. include R, G, B, 4 dimensions (channels) of the grayscale image. The input dimension may also be automatically modified to 4 in the process of processing based on the semantic segmentation model, which is not specifically limited herein. And inputting the target matrix set obtained through the image in advance as an input matrix into the semantic segmentation model for operation, wherein in the operation process, the input characteristic of the gray image is added, so that the identification of the edge characteristic of the object is facilitated, and the identification precision of the edge characteristic is improved. The semantic segmentation of the convolutional neural network based on the semantic segmentation model is a common means, and will not be described in detail herein.

In order to improve the recognition accuracy of the edge positions of different types of features in an image when performing semantic segmentation on the image, for example, in a color image including a person and a vehicle, if the color difference between the boundary of the target feature of the person and other nearby objects is not obvious or fuzzy, the target feature of the person is not easy to be obtained by segmentation, and the deviation from the actual person is large. Therefore, when performing semantic segmentation, the input dimension of the first convolutional layer in the input module in the semantic segmentation model is set to 4 in advance. On this basis, the color image is further processed to obtain an input matrix matching the 4-dimension. In order to obtain a 4-dimensional input matrix, on the basis of 3 dimensions of an original RGB matrix, a gray image matrix processed by a preset means and the RGB matrix are subjected to common normalization processing to obtain a target matrix set, and the 4-dimensional target matrix set is used as the input matrix so as to increase the edge identification precision of the target edge position in the convolution process. After the color image is converted into the gray image, when the matrix and the RGB matrix are used as the input matrix together, the boundary of the target feature and the feature pixels beside the target feature are more obvious when the target feature of a person is obtained because the gray image has only two colors of black and white. Therefore, after the matrix of the gray-scale image is connected with the RGB matrix and is used as an input matrix for input, the method is more favorable for semantic segmentation so as to improve the identification precision of the target feature edge.

The present application further provides an electronic device, see fig. 4, the electronic device 1000 comprising a memory 1010 and a processor 1020.

The Processor 1020 may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The memory 1010 may include various types of storage units, such as system memory, Read Only Memory (ROM), and permanent storage. Wherein the ROM may store static data or instructions that are needed by the processor 1020 or other modules of the computer. The persistent storage device may be a read-write storage device. The persistent storage may be a non-volatile storage device that does not lose stored instructions and data even after the computer is powered off. In some embodiments, the persistent storage device employs a mass storage device (e.g., magnetic or optical disk, flash memory) as the persistent storage device. In other embodiments, the permanent storage may be a removable storage device (e.g., floppy disk, optical drive). The system memory may be a read-write memory device or a volatile read-write memory device, such as a dynamic random access memory. The system memory may store instructions and data that some or all of the processors require at runtime. Further, the memory 1010 may include any combination of computer-readable storage media, including various types of semiconductor memory chips (DRAM, SRAM, SDRAM, flash memory, programmable read-only memory), magnetic and/or optical disks, among others. In some embodiments, memory 1010 may include a removable storage device that is readable and/or writable, such as a Compact Disc (CD), a read-only digital versatile disc (e.g., DVD-ROM, dual layer DVD-ROM), a read-only Blu-ray disc, an ultra-density optical disc, a flash memory card (e.g., SD card, min SD card, Micro-SD card, etc.), a magnetic floppy disc, or the like. Computer-readable storage media do not contain carrier waves or transitory electronic signals transmitted by wireless or wired means.

The memory 1010 has stored thereon executable code that, when processed by the processor 1020, may cause the processor 1020 to perform some or all of the methods described above.

Furthermore, the method according to the present application may also be implemented as a computer program or computer program product comprising computer program code instructions for performing some or all of the steps of the above-described method of the present application.

Alternatively, the present application may also be embodied as a non-transitory machine-readable storage medium (or computer-readable storage medium, or machine-readable storage medium) having stored thereon executable code (or a computer program, or computer instruction code) which, when executed by a processor of an electronic device (or electronic device, server, etc.), causes the processor to perform some or all of the various steps of the above-described methods in accordance with the present application.

Having described embodiments of the present application, the foregoing description is intended to be exemplary, not exhaustive, and not limited to the disclosed embodiments. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The terminology used herein is chosen in order to best explain the principles of the embodiments, the practical application, or improvements made to the technology in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein.

Claims

1. An image semantic segmentation method, comprising:

obtaining an RGB matrix based on the image;

2. The method according to claim 1, wherein performing semantic segmentation processing on the image by using a preset semantic segmentation model of a convolutional neural network, wherein an input matrix of the preset semantic segmentation model of the convolutional neural network in semantic segmentation is the target matrix set, and the semantic segmentation comprises:

3. The method of claim 1, wherein before obtaining the set of target matrices, the method comprises:

4. The method of claim 3, wherein each element value in the normalized grayscale image matrix is 0 or 1, and each element value in the RGB matrix is between [0,1 ].

5. The method according to claim 3, wherein the processing the image by a preset means to obtain a gray-scale image matrix comprises:

converting the image into a grey-scale map;

6. The method according to claim 5, wherein the binarizing processing on the gray map comprises:

7. The method according to claim 6, wherein the binarizing the gray scale map by using an adaptive threshold binarization algorithm comprises:

acquiring an mxn pixel region with a current pixel point as a center;

8. The method according to any one of claims 1 to 7, wherein at least one class label is labeled in a segmented image obtained by performing semantic segmentation processing on the image by using a preset semantic segmentation model of a convolutional neural network.

9. An electronic device, comprising:

a processor; and

a memory having executable code stored thereon, which when executed by the processor, causes the processor to perform the method of any one of claims 1-8.

10. A storage medium having stored thereon executable code which, when executed by a processor of an electronic device, causes the processor to perform the method of any one of claims 1-8.