CN116580199A

CN116580199A - DeepLabV3+ based image segmentation method, device and storage medium

Info

Publication number: CN116580199A
Application number: CN202310556328.0A
Authority: CN
Inventors: 彭绍湖; 吴树贤; 冼咏炘; 谭敏聪
Original assignee: Guangzhou University
Current assignee: Guangzhou University
Priority date: 2023-05-16
Filing date: 2023-05-16
Publication date: 2023-08-11

Abstract

The embodiment of the specification provides an image segmentation method, device and storage medium based on deep LabV3+, wherein the method comprises the following steps: extracting image features of an image to be detected in an encoder based on a multi-scale feature fusion model; the multi-scale feature fusion model includes at least one two-layer convolution structure and one three-layer convolution structure. The technical scheme provided by the application is used for solving the problem that the identification precision of the boundary of the segmented cancer cell area is not high in the traditional image segmentation algorithm.

Description

DeepLabV3+ based image segmentation method, device and storage medium

Technical Field

The document relates to the field of computer medical image segmentation, in particular to a deep LabV3+ based image segmentation method, a deep LabV3+ based image segmentation device and a storage medium.

Background

The CT is a clinical diagnosis method which is widely used clinically in recent years, and is widely applied to the diagnosis and evaluation of rectal tumors due to the advantages of high signal-to-noise ratio, high spatial resolution, high imaging speed and the like.

During the diagnosis and treatment of rectal cancer, it is an indispensable link to accurately divide the area where the rectal tumor exists from the CT image, and a doctor determines the size, shape and judgment of whether metastasis occurs or not based on the rectal tumor area divided by the artificial intelligence technique and the like, and makes an optimal treatment scheme suitable for the patient.

However, the rectal tumors in different patients have differences in volume and morphology, meanwhile, the problems of low contrast, blurred boundary and the like of the rectal tumors in CT images exist, and the traditional image segmentation algorithm is generally difficult to extract and utilize the advanced semantic segmentation information in the CT images, so that the accuracy of the rectal tumor segmentation is not high.

Disclosure of Invention

In view of the above analysis, the present application aims to provide a deep labv3+ based image segmentation method, device and storage medium, which improve the accuracy of cancer cell region boundary identification.

In a first aspect, one or more embodiments of the present disclosure provide a deep labv3+ based image segmentation method, including:

extracting image features of an image to be detected in an encoder based on a multi-scale feature fusion model;

the multi-scale feature fusion model includes at least one two-layer convolution structure and one three-layer convolution structure.

Further, in the encoder, extracting image features of the image to be detected based on the multi-scale feature fusion model includes:

taking the image to be detected as input, and dividing the image to be detected into a preset first size through each two-layer convolution structure;

stitching the first size image into a first intermediate image of a first number of channels;

splicing the first intermediate image into a second intermediate image with a second channel number by using the three-layer convolution structure;

and extracting image features of the image to be detected according to the second intermediate image.

Further, the two-layer convolution structure includes: two convolution layers and a pooling layer;

the three-layer convolution structure includes: three convolution layers.

Further, the method further comprises:

inputting the second intermediate image into a cavity space pyramid pooling model with a cascade structure;

and extracting image features of the image to be detected by the cavity space pyramid pooling model with the cascade structure according to the second intermediate image.

Further, the cavity space pyramid pooling model with the cascade structure comprises: a common convolution layer, three expansion convolution layers with different expansion rates and a global average pooling layer;

the expansion convolution of the three different expansion rates corresponds to expansion rates of 1, 3 and 5 respectively.

Further, the extracting, by the cavity space pyramid pooling model with a cascade structure according to the second intermediate image, image features of the image to be detected includes:

the common convolution layer takes the second intermediate image as input;

each expansion convolution layer and the global average pooling layer respectively take output results of the second intermediate image and the upper layer as input;

and adding the output results of the common convolution layer, each expansion convolution layer and the global average pooling layer to obtain the image characteristics of the image to be detected.

Further, the number of the two-layer convolution structures is 1-3, and the number of the three-layer convolution structures is 1-2.

In a second aspect, an embodiment of the present application provides an image segmentation apparatus based on deep labv3+, including: a feature extraction module;

the feature extraction module is arranged in the encoder and is used for extracting image features of an image to be detected based on a multi-scale feature fusion model;

Further, the feature extraction module is configured to segment the image to be detected into a preset first size by using the image to be detected as an input through each of the two-layer convolution structures; stitching the first size image into a first intermediate image of a first number of channels; splicing the first intermediate image into a second intermediate image with a second channel number by using the three-layer convolution structure; and extracting image features of the image to be detected according to the second intermediate image.

In a third aspect, an embodiment of the present application provides a storage medium, including:

for storing computer-executable instructions which, when executed, implement the method of any of the first aspects.

Compared with the prior art, the application can at least realize the following technical effects:

the existing deep LabV3+ adopts semantic segmentation and cavity convolution to identify the area of the rectal tumor. However, the nature of semantic segmentation and hole convolution results in deep labv3+ being unable to effectively identify ambiguous portions in the rectal tumor region. Therefore, the multi-scale feature fusion model is adopted to replace the deep LabV3+ original model so as to extract the image features of the image to be detected. Because the multi-scale feature fusion model extracts features in a feature fusion mode, the image obtained by the multi-scale feature fusion model contains more features, and is helpful for distinguishing a fuzzy region. In addition, the image obtained by the multi-scale feature fusion model contains more features, so that the multi-scale feature fusion model is more suitable for being combined with cavity convolution compared with the existing semantic segmentation model, and the overall recognition accuracy of the model is improved.

Drawings

For a clearer description of one or more embodiments of the present description or of the solutions of the prior art, the drawings that are necessary for the description of the embodiments or of the prior art will be briefly described, it being apparent that the drawings in the description that follow are only some of the embodiments described in the description, from which, for a person skilled in the art, other drawings can be obtained without inventive faculty.

FIG. 1 is a diagram of non-boundary pixel points and boundary pixel points in an image provided in one or more embodiments of the present disclosure;

FIG. 2 is a flow chart of a DeepLabV3+ based image segmentation method provided in one or more embodiments of the present disclosure;

FIG. 3 is a schematic diagram of a structure of a hole-space convolution pooling pyramid provided in one or more embodiments of the present disclosure;

FIG. 4 is a schematic diagram of a training model method provided by one or more embodiments of the present disclosure;

FIG. 5 is a schematic diagram of a trained multi-scale feature fusion model provided in one or more embodiments of the present disclosure.

Detailed Description

In order to enable a person skilled in the art to better understand the technical solutions in one or more embodiments of the present specification, the technical solutions in one or more embodiments of the present specification will be clearly and completely described below with reference to the drawings in one or more embodiments of the present specification, and it is obvious that the described embodiments are only some embodiments of the present specification, not all embodiments. All other embodiments, which can be made by one or more embodiments of the present disclosure without inventive faculty, are intended to be within the scope of the present disclosure.

The prior art utilizes the deep LabV3+ model to carry out image recognition on a cancer cell area, however, the problem of blurring of the boundary exists in the cancer cell area. The boundary blurring is caused by the overlarge distance between adjacent pixels in the image, and under the scene, feature deletion is easy to be caused by semantic segmentation, namely, pixels at the boundary after segmentation are not distributed in the same image, so that the boundary recognition precision is reduced. Meanwhile, a cavity convolution model in the deep LabV3+ model can increase the receptive field of the characteristic image. In the case where a pixel loss occurs at the boundary, increasing the receptive field further reduces the recognition accuracy of the boundary.

Aiming at the scene and the technical problems, the embodiment of the application provides an image segmentation method based on DeepLabV3+: extracting image features of an image to be detected in an encoder based on a multi-scale feature fusion model; the multi-scale feature fusion model includes at least one two-layer convolution structure and at least one three-layer convolution structure. The original semantic segmentation model in the deep LabV3+ encoder is replaced by a multi-scale feature fusion model containing at least one two-layer convolution structure and one three-layer convolution structure.

In embodiments of the present application, the more convolutional layers, the greater the receptive field. And the larger the receptive field, the lower the recognition accuracy. To solve the problem of boundary blurring, as many preserved features as possible are utilized with multi-scale features. Meanwhile, a two-layer convolution structure is utilized to focus on a small receptive field, so that as many features as possible can be clearly embodied, and the pixel loss of the cancer cell boundary is prevented. And the method can be clearly embodied based on as many features as possible, and a three-layer convolution structure is used for expanding the receptive field, so that the boundary and surrounding pixels are identified, and the image segmentation accuracy is improved.

For example, as shown in fig. 1, a point a near the boundary in a cancer cell region, which is characterized by clear recognition, however, has a section of boundary blur (a broken line indicates boundary blur). The application is based on the characteristic of small receptive field and high recognition precision, and the pixel points around the point A are firstly and clearly recognized. And then expanding the receptive field, and placing the blurred boundary and the pixel points around the point A in the same characteristic diagram. At this time, in the feature map, only the pixels at the boundary are removed from the point a and the pixels around the point a. Since the pixel points of the pixel points around the point a are accurate, the positions of the boundary pixel points can be accurately identified based on the pixel points around the point a.

Based on the foregoing scenario, the embodiment of the present application provides an image segmentation method based on deep labv3+, as shown in fig. 2, including the following specific steps:

step 1, taking an image to be detected as input, and dividing the image to be detected into a preset first size through each two-layer convolution structure.

In an embodiment of the present application, a two-layer convolution structure includes: two convolutional layers and one pooling layer. Each convolution layer comprises: a 3 x 3 convolution kernel, a normalization layer and an activation function. The first dimension may be, but is not limited to: 64×64, 128×128, or 256×256.

And 2, splicing the images with the first size into a first intermediate image with a first channel number.

In the embodiment of the application, the characteristic images obtained by each convolution layer are spliced to increase the channel number of the first intermediate image. Wherein the first number of channels may be 64-256.

And 3, splicing the first intermediate image into a second intermediate image with a second channel number by using a three-layer convolution structure.

In an embodiment of the present application, a three-layer convolution structure includes: three convolution layers. Each convolution layer comprises: a 3 x 3 convolution kernel, a normalization layer and an activation function. Wherein the second channel number may be 512-3072.

Preferably, in order to further increase the feature quantity in the feature image, another three-layer convolution structure is added to the last two-layer convolution structure and the three-layer convolution structure. Wherein another three-layer convolution structure comprises: three convolutional layers and one pooling layer. Another three-layer convolution structure is where the first intermediate image is segmented into a second size. The second dimension may be, but is not limited to: 32 x 32 or 16 x 16. And then splicing the characteristic images obtained by each convolution layer by a three-layer convolution structure to increase the channel number of the second intermediate image.

Preferably, in order to ensure the identification effect, the number of the two-layer convolution structures is 1-3, and the total number of the three-layer convolution structures is 1-2.

And 4, extracting image features of the image to be detected according to the second intermediate image.

Therefore, the application is based on the characteristic of multi-scale feature fusion, and the channel number of the feature image is continuously increased so as to increase the features in the feature image, thereby avoiding the problem of boundary pixel loss.

In order to further cooperate with the image segmentation method based on deep LabV3+, the empty space pyramid pooling model is improved: the original cavity space pyramid pooling model is improved to be a cavity space pyramid pooling model with a cascade structure.

After improvement, the cavity space pyramid pooling model with the cascade structure comprises the following steps: a common convolution layer, three expansion convolution layers with different expansion rates and a global average pooling layer; the expansion convolutions of the three different expansion rates correspond to expansion rates of 1, 3 and 5, respectively.

Workflow of a pyramid pooling model of a hollow space with a cascade structure, as shown in fig. 3:

the common convolution layer takes a second intermediate image as an input;

each expansion convolution layer and the global average pooling layer respectively take output results of a second intermediate image and a previous layer as input;

Wherein dia is an expansion rate, cov1×1 represents a common convolution layer, cov3 ×3 represents an expansion convolution layer, image pooling represents a pooling layer, and concat represents a function corresponding to an addition rule in image processing.

In the prior art, the expansion rate of the expansion convolution is 1, 6 and 10. The relationship between the expansion rate and the receptive field is shown in formula 1:

F＝(d-1)×(k-1)+k

where k represents the convolution kernel size of the hole convolution, d represents the expansion rate/hole number, and F represents the size of the receptive field.

While a large expansion ratio can ensure a large receptive field, at the same time, the large expansion ratio reduces the feature extraction quantity of each expansion convolution, and finally affects the image recognition accuracy. If the expansion ratio is reduced, the corresponding receptive field will also be reduced, ultimately resulting in a loss of characteristics.

And the relationship between the expansion rate and the receptive field of the cavity space pyramid pooling model with the cascade structure is shown in formula 2:

F＝2×(d-1)×(k-1)+k

therefore, in the embodiment of the application, the expansion rates corresponding to the expansion convolution of three different expansion rates are 1, 3 and 5 respectively, so that the expansion rate and the receptive field are considered, and the accuracy of image identification is improved. In addition, the output of one layer above each layer is used as input, so that unobvious characteristic loss such as fuzzy boundary can be effectively prevented, and the accuracy of image recognition is further improved.

In order to illustrate the feasibility of the scheme of the present application, the embodiment of the present application provides a training process of the image segmentation model of the present application, as shown in fig. 4:

step S1, constructing a data set.

The data set consists of two parts, wherein one part is a human body CT image and can be acquired through professional medical instrument equipment; the other part is a mask map of the rectal tumor area, which is depicted by a skilled doctor and is generated by an image binarization method.

Since the format of CT images is DICOM, it is necessary to convert the format into a standard picture format suitable for use in a deep learning network model. The format conversion process is as follows: firstly, obtaining the optimal window width and window level of a CT sequence of each case by using RadiAntDICOM Viewer software, secondly, reading HU values of CT images by using a SimpleITK library, storing the HU values into an array, setting the optimal window width and window level of the array, improving the contrast of specific organs by using a gray histogram equalization technology, normalizing pixel values in the array to a range from 0 to 1, and finally, storing a gray map of which the image array is in a png format.

Rectal neoplasms occur only at the rectum, so the area actually being treated does not need a full picture. Through the mask image, the rectal neoplasm can be observed to be distributed in a certain area of the whole image, and the pixel value range of the area with the rectal neoplasm is calculated by using an image processing algorithm. A pixel range of length 200-456 and width 128-384 was counted to cover all tumor areas in the dataset. Therefore, the data set is finally cut into 256×256 pictures, i.e. irrelevant features are filtered, so that the network training speed is improved. Then combining the cut CT image and the mask image into a folder manually or automatically by a program, and dividing the folder into a training set and a testing set, wherein the proportion is 8:2.

and S2, training an encoder.

And inputting the processed training set picture with the resolution of 256 multiplied by 256 into a multi-scale feature fusion model for feature extraction, so that the size of the feature picture is gradually reduced, the feature dimension is gradually increased, and finally two effective feature layers, namely a shallow feature layer and a deep feature layer, are generated.

The multi-scale feature fusion model training process is shown in fig. 5, input represents an Input picture, cov represents a convolution layer with a convolution kernel size of 3×3, BN represents a standardization layer, reLU represents an activation function, pool represents a pooling layer, output represents an Output feature map, and contact represents feature stitching.

The multi-scale feature fusion model is formed by stacking a plurality of convolution layers with different numbers of convolution kernels of which the sizes are 3 multiplied by 3. When the picture passes through the multi-scale feature fusion model, a convolution layer with the convolution kernel size of 3 multiplied by 3 is firstly used for carrying out convolution on the Input picture twice, the feature map is extracted, the channel size of the feature map is increased to 32, and the size of the feature map after convolution is changed to 256 multiplied by 32. And then splicing the two convolution layers to obtain a feature map, wherein the size of the spliced feature map is changed into 256 multiplied by 64. And then the image size is halved through the largest pooling layer with the pooling unit size of 2 multiplied by 2, and the size after pooling is changed to 128 multiplied by 64. Then, a convolution layer with a convolution kernel size of 3×3 is used for convolution twice to extract the feature map and increase the channel size of the feature map to 64. Then the two convolution layers are spliced to obtain a feature map, spliced feature map size becomes 128 x 128. And then the image size is halved through the largest pooling layer with the pooling unit size of 2 multiplied by 2, and the size after pooling is 64 multiplied by 128.

Then, a convolution layer with a convolution kernel size of 3×3 is used for convolution twice to extract the feature map and increase the channel size of the feature map to 128. And then splicing the two convolution layers to obtain a characteristic diagram, wherein the size of the characteristic diagram after splicing is 64 multiplied by 256. And then the image size is halved through the largest pooling layer with the pooling unit size of 2 multiplied by 2, and the size after pooling is changed to 32 multiplied by 256.

And then, using a convolution layer with a convolution kernel size of 3×3 to perform three convolutions to extract the feature map, wherein the feature map size after image stitching becomes 32×32×768. Then the image size is halved through the largest pooling layer with the pooling unit size of 2×2, and the size after pooling becomes 16×16×768.

Finally, the convolution layer with the convolution kernel size of 3×3 performs three convolutions to extract the feature map, and increases the feature map channel size to 1024, so that the feature map size after image stitching becomes 16×16×3072.

And taking the feature map with the size of 128 multiplied by 128 obtained after convolution as a shallow feature layer, and taking the feature map with the size of 16 multiplied by 1024 obtained after the whole trunk feature extraction network processing as a deep feature layer. The deep feature layer is transmitted to a rapid cavity space pyramid pooling (ASPPF), and is respectively convolved with a common convolution, expansion convolution of three different expansion rates and global average pooling. Specifically, the fast cavity space convolution pooling pyramid (ASPPF) consists of five branches of a cavity convolution and global average pooling layer, wherein the expansion rates of the five branches are respectively 1×1 standard convolution layers, 3, 5 and the convolution kernel size is 3×3. Inputting an output characteristic diagram of a trunk characteristic extraction network, namely a deep characteristic layer, into a rapid cavity space convolution pooling pyramid (ASPPF), and firstly, carrying out convolution operation on the deep characteristic layer and a standard convolution layer of a first branch 1X 1 of the ASPPF, so that the output characteristic diagram keeps an original receptive field; and carrying out convolution operation on the convolved results sequentially through hole convolutions with expansion rates of 1, 3 and 5 and convolution kernel size of 3 multiplied by 3, so that the feature map output each time is related to a wider field of view on the local features of the previous layer, and the loss of small target features during information transmission is prevented.

Then, the output feature map and a fifth branch global average pooling layer are operated so as to obtain global features; and finally, stacking the feature graphs of the five branches on the channel dimension to obtain an output feature graph with multi-scale context information, and enhancing the capability of the network model for identifying the same object with different sizes.

And S3, training a decoder.

And (3) carrying out bilinear interpolation up-sampling 4 times on the output characteristic diagram subjected to channel number adjustment by 1X 1 convolution in the second step, and marking the output characteristic diagram as FA. And then the shallow characteristic layer in the second step is subjected to characteristic compression through 1X 1 convolution, and the output characteristic diagram is marked as FB. FA and FB are used as feature fusion, so that the diversity of tumor features is increased. And the fused result is subjected to 3X 3 convolution refinement feature, and finally 4 times of upsampling and Sigmoid function activation are performed through bilinear interpolation to obtain a prediction result with the same resolution as the input picture. Meanwhile, the network model outputs a training weight file.

And S4, checking the model.

And D, loading the training weight file output in the step three into a network model, reading the pictures and the labels in the test set, and performing data enhancement and normalization operations of overturning and translating the pictures and the labels. And inputting the processed test set pictures with the resolution ratio of 256 multiplied by 256 into a network model for testing, and finally evaluating the test result according to the index.

Actual measurement is carried out by using the trained model and the existing model, the test results are shown in table 1,

TABLE 1

	Average cross-over ratio	Average accuracy rate	Average recall rate	Average similarity
					UNet	92.47	95.19	90.50	95.82
UNet++	92.51	96.08	92.32	95.83
					DeepLabV3+	90.68	93.54	87.22	94.74
The network of the application	93.56	97.23	94.64	96.49

Compared with the prior art, the provided model has better average merging ratio, average accuracy, average recall rate and average homogeneity, so that the technical scheme provided by the application can improve the image recognition accuracy.

The embodiment of the application provides an image segmentation device based on deep LabV3+, which comprises the following components: a feature extraction module;

In the embodiment of the application, the feature extraction module is used for taking the image to be detected as input, and dividing the image to be detected into a preset first size through each two-layer convolution structure; stitching the first size image into a first intermediate image of a first number of channels; splicing the first intermediate image into a second intermediate image with a second channel number by using the three-layer convolution structure; and extracting image features of the image to be detected according to the second intermediate image.

An embodiment of the present application provides a storage medium including:

for storing computer-executable instructions that when executed implement the following flow:

the foregoing describes specific embodiments of the present disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims can be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.

In the 30 s of the 20 th century, improvements to one technology could clearly be distinguished as improvements in hardware (e.g., improvements to circuit structures such as diodes, transistors, switches, etc.) or software (improvements to the process flow). However, with the development of technology, many improvements of the current method flows can be regarded as direct improvements of hardware circuit structures. Designers almost always obtain corresponding hardware circuit structures by programming improved method flows into hardware circuits. Therefore, an improvement of a method flow cannot be said to be realized by a hardware entity module. For example, a programmable logic device (Programmable Logic Device, PLD) (e.g., field programmable gate array (Field Programmable Gate Array, FPGA)) is an integrated circuit whose logic function is determined by the programming of the device by a user. A designer programs to "integrate" a digital system onto a PLD without requiring the chip manufacturer to design and fabricate application-specific integrated circuit chips. Moreover, nowadays, instead of manually manufacturing integrated circuit chips, such programming is mostly implemented with "logic compiler" software, which is similar to the software compiler used in program development and writing, and the original code before being compiled is also written in a specific programming language, which is called hardware description language (Hardware Description Language, HDL), but also HDL is not only one, but a plurality of kinds, such as ABEL (Advanced Boolean E × pression Language), AHDL (Altera Hardware Description Language), confluence, CUPL (Cornell University Programming Language), HDCal, JHDL (Java Hardware Description Language), lava, lola, myHDL, PALASM, RHDL (Ruby Hardware Description Language), etc., VHDL (Very-High-Speed Integrated Circuit Hardware Description Language) and Verilog being currently most commonly used. It will also be apparent to those skilled in the art that a hardware circuit implementing the logic method flow can be readily obtained by merely slightly programming the method flow into an integrated circuit using several of the hardware description languages described above.

The controller may be implemented in any suitable manner, for example, the controller may take the form of, for example, a microprocessor or processor and a computer readable medium storing computer readable program code (e.g., software or firmware) executable by the (micro) processor, logic gates, switches, application specific integrated circuits (Application Specific Integrated Circuit, ASIC), programmable logic controllers, and embedded microcontrollers, examples of which include, but are not limited to, the following microcontrollers: ARC 625D, atmel AT91SAM, microchip PIC18F26K20, and Silicone Labs C8051F320, the memory controller may also be implemented as part of the control logic of the memory. Those skilled in the art will also appreciate that, in addition to implementing the controller in a pure computer readable program code, it is well possible to implement the same functionality by logically programming the method steps such that the controller is in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers, etc. Such a controller may thus be regarded as a kind of hardware component, and means for performing various functions included therein may also be regarded as structures within the hardware component. Or even means for achieving the various functions may be regarded as either software modules implementing the methods or structures within hardware components.

The system, apparatus, module or unit set forth in the above embodiments may be implemented in particular by a computer chip or entity, or by a product having a certain function. One typical implementation is a computer. In particular, the computer may be, for example, a personal computer, a laptop computer, a cellular telephone, a camera phone, a smart phone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or a combination of any of these devices.

For convenience of description, the above devices are described as being functionally divided into various units, respectively. Of course, the functions of each unit may be implemented in the same piece or pieces of software and/or hardware when implementing the embodiments of the present specification.

One skilled in the relevant art will recognize that one or more embodiments of the present description may be provided as a method, system, or computer program product. Accordingly, one or more embodiments of the present description may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present description can take the form of a computer program product on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.

The present description is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the specification. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

In one typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.

The memory may include volatile memory in a computer-readable medium, random Access Memory (RAM) and/or nonvolatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of computer-readable media.

Computer readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device. Computer-readable media, as defined herein, does not include transitory computer-readable media (transmission media), such as modulated data signals and carrier waves.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or apparatus that comprises the element.

One or more embodiments of the present specification may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. One or more embodiments of the specification may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.

In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for system embodiments, since they are substantially similar to method embodiments, the description is relatively simple, as relevant to see a section of the description of method embodiments.

The foregoing description is by way of example only and is not intended to limit the present disclosure. Various modifications and changes may occur to those skilled in the art. Any modifications, equivalent substitutions, improvements, etc. that fall within the spirit and principles of the present document are intended to be included within the scope of the claims of the present document.

Claims

1. The image segmentation method based on deep LabV3+ is characterized by comprising the following steps of:

2. The method of claim 1, wherein the step of determining the position of the substrate comprises,

in the encoder, based on a multi-scale feature fusion model, the image features of the image to be detected are extracted, including:

3. The method of claim 2, wherein the step of determining the position of the substrate comprises,

the two-layer convolution structure comprises: two convolution layers and a pooling layer;

the three-layer convolution structure includes: three convolution layers.

4. The method according to claim 2, wherein the method further comprises:

5. The method of claim 4, wherein the step of determining the position of the first electrode is performed,

the cavity space pyramid pooling model with the cascade structure comprises the following components: a common convolution layer, three expansion convolution layers with different expansion rates and a global average pooling layer;

6. The method of claim 5, wherein the step of determining the position of the probe is performed,

the cavity space pyramid pooling model with the cascade structure extracts image features of the image to be detected according to the second intermediate image, and comprises the following steps:

the common convolution layer takes the second intermediate image as input;

7. The method of claim 1, wherein the step of determining the position of the substrate comprises,

the number of the two-layer convolution structures is 1-3, and the number of the three-layer convolution structures is 1-2.

8. Image segmentation device based on deep labv3+, characterized in that includes: a feature extraction module;

the feature extraction module is arranged in the encoder and is used for extracting image features of an image to be detected based on the multi-scale feature fusion model;

9. The apparatus of claim 8, wherein the device comprises a plurality of sensors,

the feature extraction module is used for taking the image to be detected as input, and dividing the image to be detected into a preset first size through each two-layer convolution structure; stitching the first size image into a first intermediate image of a first number of channels; splicing the first intermediate image into a second intermediate image with a second channel number by using the three-layer convolution structure; and extracting image features of the image to be detected according to the second intermediate image.

10. A storage medium, comprising:

for storing computer-executable instructions which, when executed, implement the method of any of claims 1-7.