WO2022105125A1

WO2022105125A1 - Image segmentation method and apparatus, computer device, and storage medium

Info

Publication number: WO2022105125A1
Application number: PCT/CN2021/090817
Authority: WO
Inventors: 汪淼
Original assignee: 平安科技（深圳）有限公司
Priority date: 2020-11-17
Filing date: 2021-04-29
Publication date: 2022-05-27
Also published as: CN112396613B; CN112396613A

Abstract

An image segmentation method and apparatus, a computer device, and a storage medium, relating to the field of artificial intelligence. The image segmentation method comprises: obtaining a target image, and performing two-layer wavelet decomposition on the target image to obtain a multi-dimensional image block (S201); obtaining a preset dilated convolutional neural network, the dilated convolutional neural network comprising a first-layer network and a second-layer network, performing encoding processing on the multi-dimensional image block on the basis of an encoder in the first-layer network to obtain an encoding result, and performing decoding processing on the encoding result on the basis of a decoder in the first-layer network to obtain a binary segmentation result graph of the target image (S202); and performing classification, identification, and multi-layer convolution calculation on the binary segmentation result graph on the basis of the second-layer network to obtain a semantic segmentation result graph of the target image (S203). The semantic segmentation result graph can be stored in a blockchain. The method achieves accurate segmentation for an image.

Description

Image segmentation method, device, computer equipment and storage medium

This application claims the priority of the Chinese patent application filed on November 17, 2020 with the application number 202011288874.3 and the invention title is "image segmentation method, device, computer equipment and storage medium", the entire contents of which are incorporated by reference in this application.

technical field

The present application relates to the technical field of artificial intelligence, and in particular, to an image segmentation method, apparatus, computer equipment and storage medium.

Background technique

Synthetic aperture radar (SAR), as an imaging radar with high range resolution and azimuth resolution, has a wide range of applications in military and civilian fields. Detecting the target of interest in the SAR image and segmenting it from the background according to the contour of the target can lay the foundation for subsequent understanding, analysis and planning.

At present, the common segmentation methods include the maximum inter-class variance method, the edge detection algorithm based on local hybrid filtering, and the bias correction fuzzy c-means algorithm. A hot research field in recent years is the segmentation method based on deep learning, which learns image features through deep neural networks, and this highly abstract feature is more conducive to image segmentation. The inventor realized that this method realizes the classification of pixels through an end-to-end deep neural network, but the disadvantage of this method is that the linear difference is used, so that the detailed structure information is lost during segmentation, resulting in blurred boundaries. Among them, although the pooling layer has the effect of expanding the receptive field, it causes the loss of position information, and the position information often needs to be preserved in the process of semantic segmentation processing. As a result, the problem of inaccurate information extraction during image segmentation is finally caused.

SUMMARY OF THE INVENTION

The purpose of the embodiments of the present application is to provide an image segmentation method, device, computer equipment and storage medium, so as to solve the technical problem of insufficient information extraction during image segmentation.

In order to solve the above technical problems, the embodiments of the present application provide an image segmentation method, which adopts the following technical solutions:

acquiring a target image, and performing two-layer wavelet decomposition on the target image to obtain a multi-dimensional image block;

Acquire a preset atrous convolutional neural network, wherein the atrous convolutional neural network includes a first-layer network and a second-layer network, and encodes the multi-dimensional image block based on an encoder in the first-layer network Obtaining an encoding result, and decoding the encoding result based on the decoder of the first-layer network to obtain a binary segmentation result map of the target image;

Multi-layer convolution calculation is performed on the binary segmentation result graph based on the second-layer network to obtain a semantic segmentation result graph of the target image.

In order to solve the above technical problems, the embodiments of the present application also provide an image segmentation device, which adopts the following technical solutions:

a decomposition module, used for acquiring a target image, and performing two-layer wavelet decomposition on the target image to obtain a multi-dimensional image block;

The processing module is configured to obtain a preset atrous convolutional neural network, wherein the atrous convolutional neural network includes a first-layer network and a second-layer network, and the multi-dimensional The image block is subjected to encoding processing to obtain an encoding result, and a decoder based on the first layer network performs decoding processing on the encoding result to obtain a binary segmentation result map of the target image;

The computing module is configured to perform multi-layer convolution calculation on the binary segmentation result graph based on the second-layer network to obtain the semantic segmentation result graph of the target image.

In order to solve the above technical problem, an embodiment of the present application further provides a computer device, including a memory and a processor, and computer-readable instructions stored in the memory and executable on the processor, and the processor executes The computer-readable instructions also implement the following steps:

In order to solve the above technical problem, an embodiment of the present application further provides a computer-readable storage medium, where the computer-readable storage medium stores computer-readable instructions, and when the computer-readable instructions are executed by a processor, the processing The device also performs the following steps:

The above image segmentation method obtains a target image and performs two-layer wavelet decomposition on the target image to obtain a multi-dimensional image block. The decomposed multi-dimensional image block can improve the accuracy of image processing; then, obtain a preset atrous convolutional neural network, Among them, the atrous convolutional neural network includes a first-layer network and a second-layer network. The encoder in the first-layer network encodes the multi-dimensional image block to obtain an encoding result, and the decoder based on the first-layer network encodes the encoding result. The decoding process obtains the binary segmentation result map of the target image, and the multi-dimensional image block is processed by the preset atrous convolutional neural network, which can increase the receptive field within the controllable range of network parameters, so that each feature image contains The increasing amount of information is helpful for the extraction of global information of the image and avoids the loss of image information; finally, the multi-layer convolution calculation is performed on the binary segmentation result graph based on the second-layer network to obtain the semantic segmentation of the target image. As a result, more image information can be obtained during image segmentation, the accuracy of image signal description during local feature information extraction is improved, and within the controllable range of network parameters, the experience is greatly increased. It increases the amount of information contained in each feature, further makes the segmentation of image information more accurate, and the obtained image information is more complete.

Description of drawings

In order to illustrate the solutions in the present application more clearly, the following will briefly introduce the accompanying drawings used in the description of the embodiments of the present application. For those of ordinary skill, other drawings can also be obtained from these drawings without any creative effort.

FIG. 1 is an exemplary system architecture diagram to which the present application can be applied;

2 is a flowchart of an embodiment of an image segmentation method according to the present application;

3 is a schematic structural diagram of an embodiment of an image segmentation apparatus according to the present application;

FIG. 4 is a schematic structural diagram of an embodiment of a computer device according to the present application.

Reference numerals: image segmentation device 300 , decomposition module 301 , processing module 302 , and calculation module 303 .

Detailed ways

Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the technical field of this application; the terms used herein in the specification of the application are for the purpose of describing specific embodiments only It is not intended to limit the application; the terms "comprising" and "having" and any variations thereof in the description and claims of this application and the above description of the drawings are intended to cover non-exclusive inclusion. The terms "first", "second" and the like in the description and claims of the present application or the above-mentioned drawings are used to distinguish different objects, rather than to describe a specific order.

Reference herein to an "embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the present application. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor a separate or alternative embodiment that is mutually exclusive of other embodiments. It is explicitly and implicitly understood by those skilled in the art that the embodiments described herein may be combined with other embodiments.

In order to make those skilled in the art better understand the solutions of the present application, the technical solutions in the embodiments of the present application will be described clearly and completely below with reference to the accompanying drawings.

As shown in FIG. 1 , the system architecture 100 may include

terminal devices

101 , 102 , and 103 , a network 104 and a server 105 . The network 104 is a medium used to provide a communication link between the

terminal devices

101 , 102 , 103 and the server 105 . The network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.

The user can use the

terminal devices

101, 102, 103 to interact with the server 105 through the network 104 to receive or send messages and the like. Various communication client applications may be installed on the

terminal devices

101 , 102 and 103 , such as web browser applications, shopping applications, search applications, instant messaging tools, email clients, social platform software, and the like.

The

terminal devices

101, 102, and 103 can be various electronic devices that have a display screen and support web browsing, including but not limited to smart phones, tablet computers, e-book readers, MP3 players (Moving Picture Experts Group Audio Layer III, dynamic Picture Experts Compression Standard Audio Layer 3), MP4 (Moving Picture Experts Group Audio Layer IV, Moving Picture Experts Compression Standard Audio Layer 4) Players, Laptops and Desktops, etc.

The server 105 may be a server that provides various services, such as a background server that provides support for the pages displayed on the

terminal devices

101 , 102 , and 103 .

It should be noted that the image segmentation method provided by the embodiments of the present application is generally performed by a server/terminal device, and accordingly, an image segmentation apparatus is generally set in the server/terminal device.

It should be understood that the numbers of terminal devices, networks and servers in FIG. 1 are merely illustrative. There can be any number of terminal devices, networks and servers according to implementation needs.

Continuing to refer to FIG. 2 , a flowchart of one embodiment of the method for image segmentation according to the present application is shown. The described image segmentation method includes the following steps:

Step S201, acquiring a target image, and performing two-layer wavelet decomposition on the target image to obtain a multi-dimensional image block;

In this embodiment, a target image is acquired, and the target image is an image including target segmentation information. When the target image is obtained, the image is subjected to two-layer wavelet decomposition. Specifically, the wavelet is usually a signal whose local feature is not 0 in a limited interval. The first layer of wavelet decomposition is to decompose the image into low-frequency information and high-frequency information. The high-frequency information is the part of the image with strong changes in intensity. , such as the image outline; low-frequency information is the part of the image where the intensity of the image changes gently, such as the large color block in the image. On the basis of the first-level decomposition, the low-frequency information is decomposed into low-frequency information and high-frequency information, which is the second-level decomposition of the wavelet. Through MATLAB, the target image can be decomposed by two-layer wavelet to obtain multi-dimensional image blocks.

Step S202, obtaining a preset atrous convolutional neural network, wherein the atrous convolutional neural network includes a first-layer network and a second-layer network, based on the encoder in the first-layer network for the multi-dimensional image block. Perform encoding processing to obtain an encoding result, and perform decoding processing on the encoding result based on the decoder of the first-layer network to obtain a binary segmentation result map of the target image;

In this embodiment, when a multi-dimensional image block is obtained, a preset atrous convolutional neural network is obtained, wherein the atrous convolutional neural network includes a first-layer network and a second-layer network, and the first-layer network includes an encoder and a The decoder includes three first convolutional layers, three first atrous convolutional layers and two pooling layers, and encodes multi-dimensional image blocks according to the encoder; the decoder includes two upsampling layers, two The second convolution layer and the two second holes are convolutional layers, and the encoder decodes the encoding result output by the encoder based on the decoder, and finally obtains a binary segmentation result graph; the second layer network includes multiple convolution layers. According to the first layer network, the binary segmentation result map corresponding to the target image can be obtained, and according to the second layer network, the multi-layer convolution calculation can be performed on the obtained binary segmentation result map to obtain the semantic segmentation map corresponding to the target image.

Step S203, performing multi-layer convolution calculation on the binary segmentation result graph based on the second-layer network to obtain the semantic segmentation result graph of the target image.

In this embodiment, when the binary segmentation result map is obtained, multi-layer convolution calculation is performed on the binary segmentation map according to the second layer network to obtain the semantic segmentation map of the target image. Specifically, the second layer network includes a third convolution layer, a third hole convolution layer and a fourth convolution layer. When the binary segmentation result graph is obtained, the first convolution result of the first layer network is obtained, wherein, The first convolution result is obtained from the first sub-hole convolution result obtained by the first first hole convolution calculation in the encoder of the first-layer network, and is obtained by re-convolution calculation. Multiply the first convolution result and the binary segmentation result graph to obtain a multiplication result. Input the multiplication result to the third convolutional layer. According to the order of the third convolutional layer, the third convolutional convolutional layer and the fourth convolutional layer, the output result of the previous layer is used as the input of the next layer. Calculated by This obtains the final semantic segmentation result graph, which is the final segmentation result graph of the target image.

It should be emphasized that, in order to further ensure the privacy and security of the above-mentioned semantic segmentation result graph information, the above-mentioned semantic segmentation result graph information may also be stored in a node of a blockchain.

The blockchain referred to in this application is a new application mode of computer technologies such as distributed data storage, point-to-point transmission, consensus mechanism, and encryption algorithm. Blockchain, essentially a decentralized database, is a series of data blocks associated with cryptographic methods. Each data block contains a batch of network transaction information to verify its Validity of information (anti-counterfeiting) and generation of the next block. The blockchain can include the underlying platform of the blockchain, the platform product service layer, and the application service layer.

This embodiment realizes that more image information can be obtained during image segmentation, improves the accuracy of image signal description when extracting local feature information, and greatly increases the receptive field within the controllable range of network parameters. The amount of information contained in each feature is further improved, which further makes the segmentation of image information more accurate, and the obtained image information is more complete.

In some embodiments of the present application, the encoder includes a first convolution layer, a first dilated convolution layer, and a pooling layer, and the encoder in the first layer network encodes the multi-dimensional image block The encoded result obtained by processing includes:

Passing the multi-dimensional image block through the first convolution layer, the first hole convolution layer and the pooling layer in sequence to obtain a pooling result;

The encoding result corresponding to the multi-dimensional image block is obtained by performing down-fitting on the pooling result through a preset down-fitting layer.

In this embodiment, the encoder in the first-layer network includes a first convolutional layer, a first atrous convolutional layer, and a pooling layer. When the multi-dimensional image block is obtained, the multi-dimensional image block is convolved and activated based on the first convolution layer to obtain the first sub-convolution result; then the first sub-convolution result is convoluted by the first hole convolution layer. Product and activation, the first sub-hole convolution result is obtained, and finally the first sub-hole convolution result is processed by the pooling layer to obtain the sub-pooling result. Among them, the first convolution layer, the first hole convolution layer and the pooling layer are all multi-dimensional processors, such as three-dimensional convolution (conv 3*3*3), three-dimensional hole convolution (3-dilated conv 3*3* 3) and three-dimensional pooling (max pool 2*2*1). Before the first sub-convolution result and the first sub-hole convolution result are calculated by the first convolution layer and the first hole convolution layer, the results are directly calculated according to the first convolution layer and the first hole convolution layer respectively. The convolution result is processed by a relu activation function, and finally the first sub-convolution result and the first sub-hole convolution result are obtained respectively. When the sub-pooling result is obtained, the sub-pooling result is used as the input to encode the second first convolutional layer. The output of the previous layer is used as the input of the latter layer, and the final pooling result is calculated. When the pooling result is obtained, the pooling result is de-fitted by a preset de-fitting layer to obtain the encoding result corresponding to the multi-dimensional image block. Among them, the fitting drop layer includes the first convolution layer, the first hole convolution layer and the sub fitting drop layer (dropout 0.5). When the pooling result is obtained, the pooling result is used as the first convolution layer in the fitting drop layer. For the input of a convolutional layer, in the order of the first convolutional layer, the first atrous convolutional layer and the sub-reduction fitting layer, the output result of the previous layer is used as the input of the next layer, and the encoding corresponding to the multi-dimensional image block is calculated. result.

In this embodiment, the multi-dimensional image block is encoded by the encoder, which further improves the accuracy of the image processing, and the receptive field is increased through the hole convolution, thereby increasing the amount of information included in the output image.

In some embodiments of the present application, the above-mentioned decoder includes an upsampling layer, a second convolutional layer, and a second atrous convolutional layer, and the above-mentioned decoder based on the first-layer network performs decoding processing on the encoding result Obtaining the binary segmentation result graph of the target image includes:

When the encoding result is obtained, the encoding result is calculated according to the upsampling layer, the second convolution layer and the second hole convolution layer to obtain a hole convolution result;

The hole convolution result is calculated by a preset activation function, and a binary segmentation result map of the target image is obtained.

In this embodiment, the decoder includes an upsampling layer, a second convolutional layer, and a second atrous convolutional layer. When the encoding result is obtained, the upsampling layer in the decoder is used for computation to obtain the first upsampling result; Splicing to obtain the first splicing result; the first splicing result is used as the input of the second convolution layer, and the output result of the previous layer is used as the latter layer according to the order of the second convolution layer and the second hole convolution layer. The input of , the second sub-hole convolution result is obtained by calculation.

After that, the second sub-hole convolution result is processed through the second up-sampling layer to obtain the second up-sampling result, and the second up-sampling result is calculated with the first first hole convolution in the encoder. The obtained results are spliced to obtain a second splicing result; the second splicing result is passed through the second second convolutional layer in the decoder, and is again in the order of the second convolutional layer and the second hole convolutional layer. The output of the previous layer is used as the input of the latter layer to obtain the final atrous convolution result. Finally, when the hole convolution result is obtained, before the hole convolution result is calculated by the preset activation function, it is necessary to perform a convolution operation on the hole convolution result (such as conv 1*1*9), Then, the convolution calculation result of the hole convolution result is calculated by a preset activation function (such as a sigmoid function), that is, a binary segmentation result map of the objective function is obtained.

In particular, in this embodiment, the upsampling layer, the second convolutional layer and the second atrous convolutional layer are also multi-dimensional processors, the upsampling layer can be calculated by up-conv 2*2*1, the second convolutional layer and The second atrous convolutional layer is the same as the first convolutional layer and the first atrous convolutional layer. Before the second convolutional layer and the second atrous convolutional layer are calculated to obtain the second subconvolutional result and the second atroused convolutional result, the results obtained from the second convolutional layer and the second atroused convolutional layer are also directly obtained. The convolution result is processed by a relu activation function, so as to obtain the final second sub-convolution result and the second sub-hole convolution result respectively.

In this embodiment, the decoder processes the coding result to obtain a binary segmentation result map, which realizes efficient segmentation of pictures, and improves the amount of information included in the binary segmentation result map and the accuracy of picture segmentation.

In some embodiments of the present application, performing multi-layer convolution calculation on the binary segmentation result graph based on the second-layer network to obtain the semantic segmentation result graph of the target image includes:

Obtain the first convolution result of the first layer of network, and perform mask constraint on the binary segmentation result graph according to the first convolution result to obtain a mask result;

Multi-layer convolution calculation is performed on the mask result based on the second-layer network to obtain a semantic segmentation result map of the target image.

In this embodiment, the first convolution result is that when the first hole convolution result is obtained through the first hole convolution calculation of the encoder of the first layer network, the first hole convolution result is passed through once The result calculated by the activation function of convolution conv 1*1*9 and relu. A mask constraint is performed on the binary segmentation map according to the first convolution result. Specifically, the mask constraint is to multiply the first convolution result and the obtained binary segmentation result map to obtain an image of the region of interest, and the image of interest is the mask result. When the mask result is obtained, the mask result is calculated according to the output order of the third convolution layer, the third hole convolution layer and the fourth convolution layer to obtain a semantic segmentation result map of the target image. Among them, the third convolution layer and the third hole convolution layer use the same convolution and activation calculation methods as the first convolution layer and the first hole convolution layer, and the fourth convolution layer adopts conv 1*1* 1 and the calculation method of the activation function of relu.

In this embodiment, the information of the obtained semantic segmentation result map is more complete through mask constraints, and the accuracy of image segmentation is further improved.

In some embodiments of the present application, before obtaining the preset atrous convolutional neural network, the method further includes:

Selecting a preset number of images in the preset image library as training images, and using the remaining images in the preset image library as test images;

Obtaining a basic training network, training the basic training network according to the training image, and obtaining a trained basic training network;

The trained basic training network is tested according to the test image, and when the recognition success rate of the trained basic training network on the test image is greater than or equal to a preset success rate, the trained basic training network is determined. The training network is the atrous convolutional neural network.

In this embodiment, before the multi-dimensional image blocks are processed by the preset atrous convolutional neural network, the basic training network needs to be trained to obtain the atrous convolutional neural network. Specifically, the basic training network is a model with the same structure as the atrous convolutional neural network but with different parameters. Pre-selecting a preset number of images in the preset image library as training images, and using the remaining images in the preset image library as test images; acquiring a basic training network, inputting the training images into the basic training network, according to The training image and the standard segmentation map corresponding to the training image are adjusted, and the parameters of the basic training network are adjusted to obtain the trained basic training network. Then, test the trained basic training network according to the test image, and determine when the similarity between the recognition result of the trained basic training network for the test image and the standard segmentation map corresponding to the test image is greater than or equal to a preset threshold. The trained basic training network successfully recognizes the test image; when the trained basic training network recognizes the test image with a success rate greater than or equal to a preset success rate, the trained basic training network is determined to be a preset hole Convolutional Neural Networks.

In this embodiment, the basic training network is trained in advance, so that when the target image is obtained, the image segmentation can be quickly performed according to the trained network, which improves the efficiency and accuracy of image segmentation.

In some embodiments of the present application, the above-mentioned training of the basic training network according to the training image, the obtained basic training network after training includes:

Decomposing the training image into training image blocks, inputting the training image blocks into the basic training network to obtain training segmentation images;

A standard segmentation image of the training image is acquired, and the basic training network is trained according to the training segmentation image and the standard segmentation image to obtain a trained basic training network.

In this embodiment, when training images are obtained, two-layer wavelet decomposition is performed on each training image to obtain a corresponding training image block, and the training image block is input into the basic training network, and the output corresponding to the training image is obtained. Train to segment images. A standard segmented image of the training image is acquired, where the standard segmented image is a preset segmented image associated with the training image. The loss function of the basic training network can be calculated according to the standard segmentation image and the training segmentation image. When the loss function converges, the basic training network is the trained basic training network.

In this embodiment, the basic training network is trained by training image blocks, so that the trained network can accurately segment the image, avoid the error of image segmentation, and further improve the accuracy of image segmentation.

In some embodiments of the present application, the basic training network is trained according to the training segmentation image and the standard segmentation image, and the trained basic training network includes:

Obtain the first pixel number of the training segmented image, and the second pixel number of the standard segmented image;

The loss function of the basic training network is calculated according to the first pixel number and the second pixel number, and when the loss function converges, the basic training network is determined to be the trained basic training network.

In this embodiment, the loss function of the basic training network can be calculated according to the first pixel number of the training segmented image and the second pixel number of the standard segmented image. The specific calculation formula of the loss function is as follows:

loss=1-2|L ₁ ∩L ₂ |/(|L ₁ |+|L ₂ |)

Wherein, L ₁ represents the second pixel number of the standard segmentation image, and L ₂ represents the first pixel number of the training segment image. When the loss function converges, the obtained basic training network is the trained basic training network.

In this embodiment, the trained basic training network is constrained by the loss function, which reduces the training time and improves the efficiency of model training.

Those of ordinary skill in the art can understand that all or part of the processes in the methods of the above embodiments can be implemented by instructing relevant hardware through computer-readable instructions, and the computer-readable instructions can be stored in a computer-readable storage medium. , when the computer-readable instructions are executed, the processes of the above-mentioned method embodiments may be included. Wherein, the aforementioned storage medium may be a non-volatile storage medium such as a magnetic disk, an optical disk, a read-only memory (Read-Only Memory, ROM), or a random access memory (Random Access Memory, RAM) or the like.

It should be understood that although the various steps in the flowchart of the accompanying drawings are sequentially shown in the order indicated by the arrows, these steps are not necessarily executed in sequence in the order indicated by the arrows. Unless explicitly stated herein, the execution of these steps is not strictly limited to the order and may be performed in other orders. Moreover, at least a part of the steps in the flowchart of the accompanying drawings may include multiple sub-steps or multiple stages, and these sub-steps or stages are not necessarily executed at the same time, but may be executed at different times, and the execution sequence is also It does not have to be performed sequentially, but may be performed alternately or alternately with other steps or at least a portion of sub-steps or stages of other steps.

With further reference to FIG. 3 , as an implementation of the method shown in FIG. 2 above, the present application provides an embodiment of an image segmentation apparatus. The apparatus embodiment corresponds to the method embodiment shown in FIG. 2 . The apparatus may specifically Used in various electronic devices.

As shown in FIG. 3 , the image segmentation apparatus 300 in this embodiment includes: a decomposition module 301 , a processing module 302 , and a calculation module 303 . in:

A decomposition module 301 is used to acquire a target image, and perform two-layer wavelet decomposition on the target image to obtain a multi-dimensional image block;

The processing module 302 is configured to obtain a preset atrous convolutional neural network, wherein the atrous convolutional neural network includes a first-layer network and a second-layer network, and the encoder in the first-layer network determines the The multi-dimensional image block is subjected to encoding processing to obtain an encoding result, and a decoder based on the first layer network performs decoding processing on the encoding result to obtain a binary segmentation result map of the target image;

Wherein, the processing module 302 includes:

a first processing unit, configured to sequentially pass the multi-dimensional image block through the first convolutional layer, the first atrous convolutional layer and the pooling layer to obtain a pooling result;

A down-fitting unit, configured to perform down-fitting on the pooling result through a preset down-fitting layer to obtain an encoding result corresponding to the multi-dimensional image block.

The second processing unit is configured to, when the encoding result is obtained, calculate the encoding result according to the upsampling layer, the second convolution layer and the second hole convolution layer to obtain the hole volume product result;

The third processing unit is configured to calculate the hole convolution result by using a preset activation function to obtain a binary segmentation result map of the target image.

The calculation module 303 is configured to perform multi-layer convolution calculation on the binary segmentation result graph based on the second-layer network to obtain a semantic segmentation result graph of the target image.

Wherein, the computing module 303 includes:

a first constraining unit, configured to obtain a first convolution result of the first layer network, and perform mask constraint on the binary segmentation result graph according to the first convolution result to obtain a mask result;

The second constraint unit is configured to perform multi-layer convolution calculation on the mask result based on the second layer network to obtain a semantic segmentation result map of the target image.

The image segmentation device proposed in this embodiment further includes:

an acquisition module, used for selecting a preset number of images in the preset image library as training images, and using the remaining images in the preset image library as test images;

a training module for acquiring a basic training network, and training the basic training network according to the training image to obtain a trained basic training network;

The test module is configured to test the trained basic training network according to the test image, and determine the The basic training network after the training is the atrous convolutional neural network.

Wherein, the training module includes:

a decomposition unit, configured to decompose the training image into training image blocks, and input the training image blocks into the basic training network to obtain training segmentation images;

A training unit, configured to acquire a standard segmented image of the training image, and train the basic training network according to the training segmented image and the standard segmented image to obtain a trained basic training network.

Wherein, the training unit further includes:

an acquisition subunit for acquiring the first pixel number of the training segmentation image and the second pixel number of the standard segmentation image;

A confirmation subunit, configured to calculate the loss function of the basic training network according to the number of the first pixels and the number of the second pixels, and when the loss function converges, determine that the basic training network is a trained network Basic training network.

The image segmentation device proposed in this embodiment realizes that more image information can be obtained during image segmentation, improves the accuracy of image signal description during the extraction of local feature information, and greatly improves the accuracy of image signal description within the controllable range of network parameters. The receptive field is increased, and the amount of information contained in each feature is improved, which further makes the segmentation of image information more accurate, and the obtained image information is more complete.

To solve the above technical problems, the embodiments of the present application also provide computer equipment. For details, please refer to FIG. 4 , which is a block diagram of a basic structure of a computer device according to this embodiment.

The computer device 6 includes a memory 61 , a processor 62 , and a network interface 63 that communicate with each other through a system bus. It should be pointed out that only the computer device 6 with components 61-63 is shown in the figure, but it should be understood that it is not required to implement all of the shown components, and more or less components may be implemented instead. Among them, those skilled in the art can understand that the computer device here is a device that can automatically perform numerical calculation and/or information processing according to pre-set or stored instructions, and its hardware includes but is not limited to microprocessors, special-purpose Integrated circuit (Application Specific Integrated Circuit, ASIC), programmable gate array (Field-Programmable Gate Array, FPGA), digital processor (Digital Signal Processor, DSP), embedded equipment, etc.

The computer equipment may be a desktop computer, a notebook computer, a palmtop computer, a cloud server and other computing equipment. The computer device can perform human-computer interaction with the user through a keyboard, a mouse, a remote control, a touch pad or a voice control device.

The memory 61 includes at least one type of readable storage medium, and the readable storage medium includes flash memory, hard disk, multimedia card, card-type memory (for example, SD or DX memory, etc.), random access memory (RAM), static Random Access Memory (SRAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), Programmable Read Only Memory (PROM), Magnetic Memory, Magnetic Disk, Optical Disk, etc. The computer-readable storage medium may be non-volatile or volatile. In some embodiments, the memory 61 may be an internal storage unit of the computer device 6 , such as a hard disk or a memory of the computer device 6 . In other embodiments, the memory 61 may also be an external storage device of the computer device 6, such as a plug-in hard disk, a smart memory card (Smart Media Card, SMC), a secure digital (Secure Digital, SD) card, flash memory card (Flash Card), etc. Of course, the memory 61 may also include both the internal storage unit of the computer device 6 and its external storage device. In this embodiment, the memory 61 is generally used to store the operating system and various application software installed on the computer device 6, such as computer-readable instructions for an image segmentation method. In addition, the memory 61 can also be used to temporarily store various types of data that have been output or will be output.

In some embodiments, the processor 62 may be a central processing unit (Central Processing Unit, CPU), a controller, a microcontroller, a microprocessor, or other data processing chips. This processor 62 is typically used to control the overall operation of the computer device 6 . In this embodiment, the processor 62 is configured to execute computer-readable instructions stored in the memory 61 or process data, for example, computer-readable instructions for executing the image segmentation method.

The network interface 63 may include a wireless network interface or a wired network interface, and the network interface 63 is generally used to establish a communication connection between the computer device 6 and other electronic devices.

The computer device proposed in this embodiment realizes that more image information can be obtained during image segmentation, improves the accuracy of image signal description during local feature information extraction, and greatly increases the accuracy of image signal description within the controllable range of network parameters. The receptive field is improved, and the amount of information contained in each feature is increased, which further makes the segmentation of image information more accurate, and the obtained image information is more complete.

The present application also provides another embodiment, that is, to provide a computer-readable storage medium, where the computer-readable storage medium stores computer-readable instructions, and the computer-readable instructions can be executed by at least one processor to The at least one processor is caused to perform the steps of the image segmentation method as described above.

The computer-readable storage medium proposed in this embodiment realizes that more image information can be obtained during image segmentation, improves the accuracy of image signal description during local feature information extraction, and within the range of controllable network parameters, It greatly increases the receptive field, improves the amount of information contained in each feature, further makes the segmentation of image information more accurate, and the obtained image information is more complete.

From the description of the above embodiments, those skilled in the art can clearly understand that the methods of the above embodiments can be implemented by means of software plus a necessary general hardware platform, and of course hardware can also be used, but in many cases the former is better implementation. Based on this understanding, the technical solutions of the present application can be embodied in the form of software products in essence or the parts that make contributions to the prior art, and the computer software products are stored in a storage medium (such as ROM/RAM, magnetic disk, CD-ROM), including several instructions to make a terminal device (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) execute the methods described in the various embodiments of this application.

Obviously, the above-described embodiments are only a part of the embodiments of the present application, rather than all of the embodiments. The accompanying drawings show the preferred embodiments of the present application, but do not limit the patent scope of the present application. This application may be embodied in many different forms, rather these embodiments are provided so that a thorough and complete understanding of the disclosure of this application is provided. Although the present application has been described in detail with reference to the foregoing embodiments, those skilled in the art can still modify the technical solutions described in the foregoing specific embodiments, or perform equivalent replacements for some of the technical features. . Any equivalent structures made by using the contents of the description and drawings of this application, which are directly or indirectly used in other related technical fields, are all within the scope of protection of the patent of this application.

Claims

An image segmentation method, comprising the following steps:

acquiring a target image, and performing two-layer wavelet decomposition on the target image to obtain a multi-dimensional image block;

Acquire a preset atrous convolutional neural network, wherein the atrous convolutional neural network includes a first-layer network and a second-layer network, and encodes the multi-dimensional image block based on an encoder in the first-layer network Obtaining an encoding result, and decoding the encoding result based on the decoder of the first-layer network to obtain a binary segmentation result map of the target image;

Multi-layer convolution calculation is performed on the binary segmentation result graph based on the second-layer network to obtain a semantic segmentation result graph of the target image.
The image segmentation method according to claim 1, wherein the encoder comprises a first convolutional layer, a first atrous convolutional layer and a pooling layer, and the encoder in the first layer network performs a The steps of performing encoding processing on the multi-dimensional image block to obtain an encoding result specifically include:

Passing the multi-dimensional image block through the first convolution layer, the first hole convolution layer and the pooling layer in sequence to obtain a pooling result;

The encoding result corresponding to the multi-dimensional image block is obtained by performing down-fitting on the pooling result through a preset down-fitting layer.
The image segmentation method according to claim 1, wherein the decoder comprises an upsampling layer, a second convolutional layer and a second atrous convolutional layer, and the decoder based on the first layer network has a The step of performing decoding processing on the encoding result to obtain the binary segmentation result map of the target image specifically includes:

When the encoding result is obtained, the encoding result is calculated according to the upsampling layer, the second convolution layer and the second hole convolution layer to obtain a hole convolution result;

The hole convolution result is calculated by a preset activation function, and a binary segmentation result map of the target image is obtained.
The image segmentation method according to claim 1, wherein the step of performing multi-layer convolution calculation on the binary segmentation result graph based on the second layer network to obtain the semantic segmentation result graph of the target image is specifically include:

Obtain the first convolution result of the first layer of network, and perform mask constraint on the binary segmentation result graph according to the first convolution result to obtain a mask result;

Multi-layer convolution calculation is performed on the mask result based on the second-layer network to obtain a semantic segmentation result map of the target image.
The image segmentation method according to claim 1, wherein before the step of acquiring the preset atrous convolutional neural network, it further comprises:

Selecting a preset number of images in the preset image library as training images, and using the remaining images in the preset image library as test images;

Obtaining a basic training network, training the basic training network according to the training image, and obtaining a trained basic training network;

The trained basic training network is tested according to the test image, and when the recognition success rate of the trained basic training network on the test image is greater than or equal to a preset success rate, the trained basic training network is determined. The training network is the atrous convolutional neural network.
The image segmentation method according to claim 5, wherein the step of training the basic training network according to the training image to obtain the trained basic training network specifically comprises:

Decomposing the training image into training image blocks, inputting the training image blocks into the basic training network to obtain training segmentation images;

A standard segmentation image of the training image is acquired, and the basic training network is trained according to the training segmentation image and the standard segmentation image to obtain a trained basic training network.
The image segmentation method according to claim 6, wherein the step of training the basic training network according to the training segmentation image and the standard segmentation image, and obtaining the trained basic training network specifically comprises:

Obtain the first pixel number of the training segmented image, and the second pixel number of the standard segmented image;

The loss function of the basic training network is calculated according to the first pixel number and the second pixel number, and when the loss function converges, the basic training network is determined to be the trained basic training network.
An image segmentation device, comprising:

a decomposition module, used for acquiring a target image, and performing two-layer wavelet decomposition on the target image to obtain a multi-dimensional image block;

The processing module is configured to obtain a preset atrous convolutional neural network, wherein the atrous convolutional neural network includes a first-layer network and a second-layer network, and the multi-dimensional The image block is subjected to encoding processing to obtain an encoding result, and a decoder based on the first layer network performs decoding processing on the encoding result to obtain a binary segmentation result map of the target image;

The computing module is configured to perform multi-layer convolution calculation on the binary segmentation result graph based on the second-layer network to obtain the semantic segmentation result graph of the target image.
A computer device includes a memory and a processor, wherein computer-readable instructions are stored in the memory, and the processor also implements the following steps when executing the computer-readable instructions:

acquiring a target image, and performing two-layer wavelet decomposition on the target image to obtain a multi-dimensional image block;

Acquire a preset atrous convolutional neural network, wherein the atrous convolutional neural network includes a first-layer network and a second-layer network, and encodes the multi-dimensional image block based on an encoder in the first-layer network Obtaining an encoding result, and decoding the encoding result based on the decoder of the first-layer network to obtain a binary segmentation result map of the target image;

Multi-layer convolution calculation is performed on the binary segmentation result graph based on the second-layer network to obtain a semantic segmentation result graph of the target image.
9. The computer device of claim 9, wherein the encoder includes a first convolutional layer, a first dilated convolutional layer, and a pooling layer, the encoder in the first layer network based on the The steps of encoding the multi-dimensional image block to obtain the encoding result specifically include:

Passing the multi-dimensional image block through the first convolution layer, the first hole convolution layer and the pooling layer in sequence to obtain a pooling result;

The encoding result corresponding to the multi-dimensional image block is obtained by performing down-fitting on the pooling result through a preset down-fitting layer.
9. The computer device of claim 9, wherein the decoder comprises an upsampling layer, a second convolutional layer, and a second atrous convolutional layer, the decoder based on the first layer network The step of decoding the encoding result to obtain the binary segmentation result map of the target image specifically includes:

When the encoding result is obtained, the encoding result is calculated according to the upsampling layer, the second convolution layer and the second hole convolution layer to obtain a hole convolution result;

The hole convolution result is calculated by a preset activation function, and a binary segmentation result map of the target image is obtained.
The computer device according to claim 9, wherein the step of performing multi-layer convolution calculation on the binary segmentation result graph based on the second layer network to obtain the semantic segmentation result graph of the target image specifically comprises the following steps: :

Obtain the first convolution result of the first layer of network, and perform mask constraint on the binary segmentation result graph according to the first convolution result to obtain a mask result;

Multi-layer convolution calculation is performed on the mask result based on the second-layer network to obtain a semantic segmentation result map of the target image.
The computer device according to claim 9, wherein before the step of acquiring the preset atrous convolutional neural network, it further comprises:

Selecting a preset number of images in the preset image library as training images, and using the remaining images in the preset image library as test images;

Obtaining a basic training network, training the basic training network according to the training image, and obtaining a trained basic training network;

The trained basic training network is tested according to the test image, and when the recognition success rate of the trained basic training network on the test image is greater than or equal to a preset success rate, the trained basic training network is determined. The training network is the atrous convolutional neural network.
The computer device according to claim 13, wherein the step of training the basic training network according to the training image to obtain the trained basic training network specifically comprises:

Decomposing the training image into training image blocks, inputting the training image blocks into the basic training network to obtain training segmentation images;

A standard segmentation image of the training image is acquired, and the basic training network is trained according to the training segmentation image and the standard segmentation image to obtain a trained basic training network.
The computer device according to claim 14, wherein the step of training the basic training network according to the training segmentation image and the standard segmentation image, and obtaining the trained basic training network specifically comprises:

Obtain the first pixel number of the training segmented image, and the second pixel number of the standard segmented image;

The loss function of the basic training network is calculated according to the first pixel number and the second pixel number, and when the loss function converges, the basic training network is determined to be the trained basic training network.
A computer-readable storage medium, where computer-readable instructions are stored on the computer-readable storage medium, and when the computer-readable instructions are executed by a processor, the processor further performs the following steps:

acquiring a target image, and performing two-layer wavelet decomposition on the target image to obtain a multi-dimensional image block;

Acquire a preset atrous convolutional neural network, wherein the atrous convolutional neural network includes a first-layer network and a second-layer network, and encodes the multi-dimensional image block based on an encoder in the first-layer network Obtaining an encoding result, and decoding the encoding result based on the decoder of the first-layer network to obtain a binary segmentation result map of the target image;

Multi-layer convolution calculation is performed on the binary segmentation result graph based on the second-layer network to obtain a semantic segmentation result graph of the target image.
17. The computer-readable storage medium of claim 16, wherein the encoder comprises a first convolutional layer, a first atrous convolutional layer, and a pooling layer, the encoder based on the first layer network The step of encoding the multi-dimensional image block to obtain an encoding result specifically includes:

Passing the multi-dimensional image block through the first convolution layer, the first hole convolution layer and the pooling layer in sequence to obtain a pooling result;

The encoding result corresponding to the multi-dimensional image block is obtained by performing down-fitting on the pooling result through a preset down-fitting layer.
17. The computer-readable storage medium of claim 16, wherein the decoder comprises an upsampling layer, a second convolutional layer, and a second atrous convolutional layer, the decoder based on the first layer network The step of decoding the encoding result to obtain the binary segmentation result map of the target image specifically includes:

When the encoding result is obtained, the encoding result is calculated according to the upsampling layer, the second convolution layer and the second hole convolution layer to obtain a hole convolution result;

The hole convolution result is calculated by a preset activation function, and a binary segmentation result map of the target image is obtained.
The computer-readable storage medium according to claim 16, wherein the multi-layer convolution calculation is performed on the binary segmentation result graph based on the second layer network to obtain the semantic segmentation result graph of the target image. The steps include:

Obtain the first convolution result of the first layer of network, and perform mask constraint on the binary segmentation result graph according to the first convolution result to obtain a mask result;

Multi-layer convolution calculation is performed on the mask result based on the second-layer network to obtain a semantic segmentation result map of the target image.
The computer-readable storage medium according to claim 16, wherein before the step of acquiring the preset atrous convolutional neural network, it further comprises:

Selecting a preset number of images in the preset image library as training images, and using the remaining images in the preset image library as test images;

Obtaining a basic training network, training the basic training network according to the training image, and obtaining a trained basic training network;

The trained basic training network is tested according to the test image, and when the recognition success rate of the trained basic training network on the test image is greater than or equal to a preset success rate, the trained basic training network is determined. The training network is the atrous convolutional neural network.