CN109816659B

CN109816659B - Image segmentation method, device and system

Info

Publication number: CN109816659B
Application number: CN201910084083.XA
Authority: CN
Inventors: 熊鹏飞; 李瀚超
Original assignee: Beijing Kuangshi Technology Co Ltd
Current assignee: Beijing Kuangshi Technology Co Ltd
Priority date: 2019-01-28
Filing date: 2019-01-28
Publication date: 2021-03-23
Anticipated expiration: 2039-01-28
Also published as: CN109816659A

Abstract

The invention provides an image segmentation method, an image segmentation device and an image segmentation system, which relate to the technical field of image processing, and the method comprises the following steps: acquiring a target image to be segmented; inputting a target image into a main coding network, and coding the target image through the main coding network to obtain a first characteristic diagram; amplifying the size of the first characteristic diagram by preset times to obtain an amplified first characteristic diagram; inputting the amplified first characteristic diagram into an auxiliary coding network, and carrying out coding operation on the amplified first characteristic diagram through the auxiliary coding network to obtain a second characteristic diagram; and inputting the first feature map and the second feature map into a decoding network, fusing the first feature map and the second feature map through the decoding network to obtain a first fused feature map, and decoding the first fused feature map to obtain a segmentation result of the target image. The invention can effectively improve the accuracy of the image segmentation result.

Description

Image segmentation method, device and system

Technical Field

The present invention relates to the field of image processing technologies, and in particular, to an image segmentation method, apparatus, and system.

Background

Image segmentation is a core technology of computer vision. With the popularization of deep learning, the method plays an important role in application occasions such as unmanned driving, robot navigation, image recognition and the like. The main purpose of image segmentation is to determine the class of each pixel point in the image, and thus segment each object in the image on a pixel level basis.

The conventional image segmentation method generally enlarges the high-dimensional features obtained by down-sampling the image to the original image size, and then obtains the image segmentation result directly based on the enlarged image. The image segmentation method has poor classification expression capability, is easy to ignore the detail characteristics of each object in the image, and has low accuracy of the image segmentation result.

Disclosure of Invention

In view of the above, the present invention provides an image segmentation method, an image segmentation apparatus and an image segmentation system, so as to improve the accuracy of an image segmentation result.

In order to achieve the above purpose, the embodiment of the present invention adopts the following technical solutions:

in a first aspect, an embodiment of the present invention provides an image segmentation method, including: acquiring a target image to be segmented; inputting the target image into a main coding network, and coding the target image through the main coding network to obtain a first characteristic diagram; amplifying the size of the first characteristic diagram by preset times to obtain an amplified first characteristic diagram; inputting the amplified first characteristic diagram into an auxiliary coding network, and carrying out coding operation on the amplified first characteristic diagram through the auxiliary coding network to obtain a second characteristic diagram; inputting the first feature map and the second feature map into a decoding network, fusing the first feature map and the second feature map through the decoding network to obtain a first fused feature map, and performing decoding operation on the first fused feature map to obtain a segmentation result of the target image.

Further, the main coding network comprises an image scaling sub-network, a main down-sampling sub-network and a main feature association sub-network which are connected in sequence; the step of obtaining a first feature map by performing an encoding operation on the target image through the main encoding network includes: scaling, by the image scaling sub-network, a size of the target image to a specified size; performing down-sampling operation on the target image zoomed to a specified size through the main down-sampling sub-network to obtain a main down-sampling feature map; and performing full connection operation on the main down-sampling feature map through the main feature association sub-network, and fusing the main down-sampling feature map and the main down-sampling feature map subjected to full connection operation to obtain a first feature map.

Further, the image scaling subnetwork comprises at least one layer of convolutional layers; the primary down-sampling sub-network comprises one or more primary convolution groups; the plurality of main convolution groups are connected in sequence, each main convolution group is used for reducing the characteristic diagram input to the main convolution group to a specified characteristic dimension, and the specified characteristic dimensions corresponding to different main convolution groups are different; and each of the primary convolution groups comprises a plurality of convolution layers;

the main characteristic association sub-network comprises a main full-connection layer, a main convolution layer and a main point multiplication operation layer which are sequentially connected; and the main point multiplication operation layer is used for fusing the main downsampling feature map and the main downsampling feature map subjected to full connection operation to obtain a first feature map.

Further, the secondary encoding network comprises M secondary encoding groups; wherein M is a preset natural number not less than 1; the step of obtaining a second feature map by performing an encoding operation on the amplified first feature map through the auxiliary encoding network includes:

if m is equal to 1, performing coding operation on the 1 st auxiliary coding group based on the amplified first feature map to obtain a 1 st auxiliary coding feature map, and amplifying the 1 st auxiliary coding feature map by the preset times to obtain an amplified m-th auxiliary coding feature map; if m is larger than 1, performing coding operation on the mth auxiliary coding group based on the amplified (m-1) th auxiliary coding feature map to obtain an mth auxiliary coding feature map, and amplifying the mth auxiliary coding feature map by the preset times to obtain an amplified mth auxiliary coding feature map; wherein the values of M are taken from 2 to M-1 in sequence; if M is equal to M, performing coding operation on the M-th auxiliary coding group based on the amplified M-1-th auxiliary coding feature map to obtain an M-th auxiliary coding feature map; determining the m-th auxiliary coding feature maps as second feature maps; wherein the values of M are taken from 1 to M in order.

Further, each auxiliary encoding group comprises an auxiliary down-sampling sub-network and an auxiliary feature association sub-network which are sequentially connected;

if m is equal to 1, the step of performing coding operation based on the amplified first feature map through the 1 st auxiliary coding group to obtain the 1 st auxiliary coding feature map includes: performing downsampling operation based on the amplified first feature map through an auxiliary downsampling subnetwork of the 1 st auxiliary coding group to obtain a 1 st auxiliary downsampling feature map; performing full connection operation on the 1 st auxiliary down-sampling feature map through an auxiliary feature association sub-network of the 1 st auxiliary coding group, and fusing the 1 st auxiliary down-sampling feature map with the 1 st auxiliary down-sampling feature map subjected to full connection operation to obtain a 1 st auxiliary coding feature map;

if m is greater than 1, the step of obtaining the mth auxiliary coding feature map by performing coding operation on the mth auxiliary coding group based on the amplified (m-1) th auxiliary coding feature map includes: performing downsampling operation on the m-1 auxiliary coding feature map based on the amplified auxiliary coding feature map through an auxiliary downsampling subnetwork of the m auxiliary coding group to obtain an m auxiliary downsampling feature map; and performing full connection operation on the mth auxiliary down-sampling feature map through an auxiliary feature association sub-network of the mth auxiliary coding group, and fusing the mth auxiliary down-sampling feature map with the mth auxiliary down-sampling feature map subjected to full connection operation to obtain the mth auxiliary coding feature map.

Further, a main down-sampling sub-network of the main coding network also outputs a main intermediate feature map; the auxiliary down-sampling sub-network of the auxiliary coding group also outputs an auxiliary intermediate feature map;

if m is equal to 1, the step of performing downsampling operation based on the amplified first feature map through the auxiliary downsampling subnetwork of the 1 st auxiliary encoding group to obtain a 1 st auxiliary downsampling feature map includes: splicing the main intermediate characteristic diagram and the amplified first characteristic diagram to obtain a 1 st spliced characteristic diagram; performing downsampling operation on the 1 st splicing feature map through an auxiliary downsampling subnetwork of the 1 st auxiliary coding group to obtain a 1 st auxiliary downsampling feature map;

if m is larger than 1, the step of performing downsampling operation based on the amplified (m-1) th auxiliary coding feature map through the auxiliary downsampling subnetwork of the mth auxiliary coding group to obtain the mth auxiliary downsampling feature map comprises the following steps: splicing the auxiliary intermediate characteristic graph output by the m-1 auxiliary coding group with the amplified m-1 auxiliary coding characteristic graph to obtain an m-th spliced characteristic graph; performing downsampling operation on the mth splicing feature map through an auxiliary downsampling subnetwork of the mth auxiliary coding group to obtain an mth auxiliary downsampling feature map; wherein the values of M are taken from 2 to M in sequence.

Further, the secondary downsampling subnetwork comprises one or more secondary convolution groups; the auxiliary convolution groups are connected in sequence, each auxiliary convolution group is used for reducing the characteristic diagram input to the auxiliary convolution group to a specified characteristic dimension, and the specified characteristic dimensions corresponding to different auxiliary convolution groups are different; and each said auxiliary convolution group includes a plurality of convolution layers;

the auxiliary feature correlation sub-network comprises an auxiliary full-connection layer, an auxiliary convolution layer and an auxiliary point multiplication operation layer which are sequentially connected; and the auxiliary fully-connected layer and the auxiliary convolution layer in the auxiliary feature correlation sub-network are used for performing fully-connected operation on the auxiliary down-sampling feature map, and the auxiliary point multiplication operation layer is used for fusing the auxiliary down-sampling feature map and the auxiliary down-sampling feature map subjected to fully-connected operation to obtain a second feature map.

Further, the number of primary convolution groups included in the primary down-sampling sub-network and the number of secondary convolution groups included in each secondary encoding group are both N; wherein N is a natural number not less than 1; the step of performing downsampling operation on the 1 st splicing feature map through the auxiliary downsampling subnetwork of the 1 st auxiliary coding group to obtain the 1 st auxiliary downsampling feature map includes:

if n is equal to 1, splicing the output characteristic diagram of the 1 st main convolution group in the main coding network with the amplified first characteristic diagram to obtain a 1 st sub-splicing characteristic diagram, performing down-sampling operation on the 1 st sub-splicing characteristic diagram through the 1 st auxiliary convolution group in the 1 st auxiliary coding group, and determining the output characteristic diagram of the 1 st auxiliary convolution group in the 1 st auxiliary coding group as the 1 st auxiliary intermediate characteristic diagram of the 1 st auxiliary coding group;

if n is larger than 1, splicing the output characteristic graph of the (n-1) th auxiliary convolution group in the 1 st auxiliary coding group with the output characteristic graph of the nth main convolution group in the main coding network to obtain a 1 & n sub-splicing characteristic graph, performing downsampling operation on the 1 & n sub-splicing characteristic graph through the nth auxiliary convolution group in the 1 st auxiliary coding group, and determining the output characteristic graph of the nth auxiliary convolution group in the 1 st auxiliary coding group as the nth auxiliary intermediate characteristic graph of the 1 st auxiliary coding group; wherein the value of N is taken from 2 to N in sequence;

and determining the Nth auxiliary intermediate output graph of the 1 st auxiliary encoding group as a 1 st auxiliary downsampling feature graph.

Further, the step of performing downsampling operation on the mth spliced feature map through the auxiliary downsampling subnetwork of the mth auxiliary encoding group to obtain the mth auxiliary downsampled feature map includes:

if n is equal to 1, splicing the output characteristic graph of the 1 st auxiliary convolution group in the m-1 th auxiliary coding group with the amplified m-1 th auxiliary coding characteristic graph to obtain an m & lt 1 & gt sub-splicing characteristic graph, performing downsampling operation on the m & lt 1 & gt sub-splicing characteristic graph through the 1 st auxiliary convolution group in the m & lt 1 & gt auxiliary coding group, and determining the output characteristic graph of the 1 st auxiliary convolution group in the m & lt 1 & gt auxiliary coding group as the 1 st auxiliary intermediate characteristic graph of the m & gt auxiliary coding group;

if n is larger than 1, splicing the output characteristic graph of the nth auxiliary convolution group in the m-1 th auxiliary coding group and the output characteristic graph of the n-1 th auxiliary convolution group in the m-1 th auxiliary coding group to obtain an m & n sub-splicing characteristic graph; performing downsampling operation on the (m & n) th sub-splicing feature map through an nth auxiliary convolution group in the (m) th auxiliary coding group, and determining an output feature map of the nth auxiliary convolution group in the (m) th auxiliary coding group as an nth auxiliary intermediate feature map of the (m) th auxiliary coding group;

and determining the Nth auxiliary intermediate output graph of the mth auxiliary encoding group as an mth auxiliary downsampling feature graph.

Further, the decoding network comprises a convergence sub-network and a decoding sub-network; the fusion sub-network is used for amplifying the first feature map and the second feature map to a specified size, and fusing the amplified first feature map and the amplified second feature map to obtain a first fusion feature map; the decoding sub-network is used for decoding the first fusion characteristic graph to obtain a segmentation result of the target image.

Further, the fusion sub-network comprises a plurality of up-sampling layers and a bitwise addition operation layer; the input of the up-sampling layer is the first characteristic diagram or the second characteristic diagram; the inputs of different upsampling layers are different; each up-sampling layer is used for amplifying the characteristic diagram input to the up-sampling layer to a specified size to obtain an amplified first characteristic diagram or an amplified second characteristic diagram; and the bitwise addition operation layer is used for performing bitwise addition operation on the amplified first characteristic diagram and the amplified second characteristic diagram to obtain the first fused characteristic diagram.

Further, the step of fusing the first feature map and the second feature map through the decoding network to obtain a first fused feature map includes: inputting the output characteristic graph of the first main convolution group and the output characteristic graph of the first auxiliary convolution group in each auxiliary encoding group into the decoding network; and fusing the output characteristic diagram of the first main convolution group, the output characteristic diagram of the first auxiliary convolution group in each auxiliary encoding group, the first characteristic diagram and the second characteristic diagram through the decoding network to obtain a first fused characteristic diagram.

In a second aspect, an embodiment of the present invention provides an image segmentation apparatus, including: the target image acquisition module is used for acquiring a target image to be segmented; the main coding module is used for inputting the target image into a main coding network and coding the target image through the main coding network to obtain a first characteristic diagram; the size amplifying module is used for amplifying the size of the first characteristic diagram by preset times to obtain an amplified first characteristic diagram; the auxiliary coding module is used for inputting the amplified first characteristic diagram into an auxiliary coding network, and coding the amplified first characteristic diagram through the auxiliary coding network to obtain a second characteristic diagram; and the decoding module is used for inputting the first feature map and the second feature map into a decoding network, fusing the first feature map and the second feature map through the decoding network to obtain a first fused feature map, and decoding the first fused feature map to obtain a segmentation result of the target image.

In a third aspect, an embodiment of the present invention provides an image segmentation system, where the system includes: the device comprises an image acquisition device, a processor and a storage device; the image acquisition device is used for acquiring a target image; the storage means has stored thereon a computer program which, when executed by the processor, performs the method of any of the first aspects.

In a fourth aspect, the present invention provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, performs the steps of the method according to any one of the above first aspects.

The embodiment of the invention provides an image segmentation method, device and system, wherein a structure of a main coding network, an auxiliary coding network and a decoding network is constructed, and firstly, a target image is subjected to one-time coding operation through the main coding network to obtain a first characteristic diagram; the first feature graph is amplified and then input to an auxiliary coding network, and the amplified first feature graph is subjected to secondary coding operation through the auxiliary coding network to obtain a second feature graph; and then inputting the first feature map and the second feature map into a decoding network, fusing the first feature map and the second feature map through the decoding network to obtain a first fused feature map, and then performing decoding operation on the first fused feature map to obtain a segmentation result of the target image. In the mode, the target image is coded for multiple times based on a plurality of coding networks (a main coding network and an auxiliary coding network), so that the macro information of each object in the image can be effectively extracted, and the classification expression capability is improved; and after the feature maps (the first feature map and the second feature map) obtained by multiple encoding are fused, the fused feature maps are decoded to obtain a segmentation result, which is beneficial to effectively restoring the detail features of each object in the image, so that the accuracy of image segmentation can be effectively improved.

Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.

In order to make the aforementioned and other objects, features and advantages of the present invention comprehensible, preferred embodiments accompanied with figures are described in detail below.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.

Fig. 1 is a schematic structural diagram of an electronic device according to an embodiment of the present invention;

FIG. 2 is a flowchart of an image segmentation method according to an embodiment of the present invention;

fig. 3 is a schematic structural diagram of a first image segmentation model according to an embodiment of the present invention;

fig. 4 is a schematic structural diagram of a second image segmentation model according to an embodiment of the present invention;

fig. 5 is a schematic structural diagram of a third image segmentation model according to an embodiment of the present invention;

fig. 6 is a schematic structural diagram of a fourth image segmentation model according to an embodiment of the present invention;

fig. 7 is a schematic structural diagram of a fifth image segmentation model according to an embodiment of the present invention;

fig. 8 is a schematic structural diagram of a sixth image segmentation model according to an embodiment of the present invention;

fig. 9 is a schematic structural diagram of a secondary feature association sub-network according to an embodiment of the present invention;

fig. 10 is a schematic structural diagram of a seventh image segmentation model according to an embodiment of the present invention;

fig. 11 is a schematic structural diagram of an eighth image segmentation model according to an embodiment of the present invention;

fig. 12 is a schematic structural diagram of a ninth image segmentation model according to an embodiment of the present invention;

fig. 13 is a schematic structural diagram of a tenth image segmentation model according to an embodiment of the present invention;

FIG. 14 is a schematic structural diagram of a main feature association sub-network according to an embodiment of the present invention;

FIG. 15 is a schematic diagram illustrating comparison of effects provided by the embodiment of the present invention;

FIG. 16 is a schematic diagram illustrating another comparison of the effects provided by the embodiment of the present invention;

fig. 17 is a block diagram of an image segmentation apparatus according to an embodiment of the present invention.

Detailed Description

To make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is apparent that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Two important factors affecting the image segmentation result are the classification expression capability and the spatial expression capability of the neural network. The existing image segmentation technology mainly comprises the steps of designing a neural network comprising a coding layer and a decoding layer, inputting an image into the coding layer, and coding the image into a group of high-dimensional features through the coding layer, wherein the high-dimensional features correspond to feature images of the original input image after downsampling for many times; and the decoding layer restores the feature graph corresponding to the high-dimensional feature to be the same as the original image in size and outputs the feature graph. The image segmentation method has poor classification expression capability, is easy to ignore the detail characteristics of each object in the image, and has low accuracy of the image segmentation result. If a deeper network structure is used, better classification capability can be achieved, but too deep network structure often results in too low resolution of image features, and thus spatial description capability is lost.

Based on the above discussion, the reliability of the existing image segmentation method is poor, and the accuracy of the image segmentation result is not high. To solve the problem, embodiments of the present invention provide an image segmentation method, an image segmentation apparatus, and an image segmentation system, which can be applied to image segmentation tasks in any type of fields, such as unmanned driving, robot navigation, and image recognition, and the following describes embodiments of the present invention in detail.

The first embodiment is as follows:

first, an example electronic device 100 for implementing an image segmentation method, apparatus and system according to embodiments of the present invention is described with reference to fig. 1.

As shown in fig. 1, an electronic device 100 includes one or more processors 102, one or more memory devices 104, an input device 106, an output device 108, and an image capture device 110, which are interconnected via a bus system 112 and/or other type of connection mechanism (not shown). It should be noted that the components and structure of the electronic device 100 shown in fig. 1 are exemplary only, and not limiting, and the electronic device may have other components and structures as desired.

The processor 102 may be implemented in at least one hardware form of a Digital Signal Processor (DSP), a Field Programmable Gate Array (FPGA), a Programmable Logic Array (PLA), the processor 102 may be one or a combination of several of a Central Processing Unit (CPU) or other forms of processing units with data processing capability and/or instruction execution capability, and may control other components in the electronic device 100 to perform desired functions.

The storage 104 may include one or more computer program products that may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory. The volatile memory may include, for example, Random Access Memory (RAM), cache memory (cache), and/or the like. The non-volatile memory may include, for example, Read Only Memory (ROM), hard disk, flash memory, etc. On which one or more computer program instructions may be stored that may be executed by processor 102 to implement client-side functionality (implemented by the processor) and/or other desired functionality in embodiments of the invention described below. Various applications and various data, such as various data used and/or generated by the applications, may also be stored in the computer-readable storage medium.

The input device 106 may be a device used by a user to input instructions and may include one or more of a keyboard, a mouse, a microphone, a touch screen, and the like.

The output device 108 may output various information (e.g., images or sounds) to the outside (e.g., a user), and may include one or more of a display, a speaker, and the like.

The image capture device 110 may take images (e.g., photographs, videos, etc.) desired by the user and store the taken images in the storage device 104 for use by other components.

Exemplarily, an exemplary electronic device for implementing the image segmentation method, apparatus and system according to the embodiments of the present invention may be implemented as an intelligent terminal such as a smartphone, a tablet computer, a snapshot machine, and the like.

Example two:

referring to a flowchart of an image segmentation method shown in fig. 2, the method may be executed by the electronic device provided in the foregoing embodiment, and specifically includes the following steps:

step S202, acquiring a target image to be segmented; wherein the target image comprises a target object to be segmented. For example, the target image may include an animal such as a bird or a cat to be recognized, or the target image may include a vehicle, a pedestrian, a house, or the like to be recognized.

Step S204, inputting the target image into the main coding network, and carrying out coding operation on the target image through the main coding network to obtain a first characteristic diagram. The encoding operation mainly includes a dimension reduction (also called down-sampling) operation, and the purpose of the encoding operation is to extract a macro feature structure in a target image by reducing the dimension of the target image, so as to divide areas where different objects are located in the target image.

Step S206, amplifying the size of the first characteristic diagram by a preset multiple to obtain an amplified first characteristic diagram. The magnification factor can be flexibly set according to the requirement, and mainly depends on the original size of the target image, the size of the target image after dimensionality reduction, and the size of the image which can be processed by the auxiliary coding network.

And step S208, inputting the amplified first characteristic diagram into an auxiliary coding network, and carrying out coding operation on the amplified first characteristic diagram through the auxiliary coding network to obtain a second characteristic diagram. Different from a conventional image segmentation mode, the embodiment further provides an auxiliary coding network, and the auxiliary coding network can further perform coding operation based on the amplified first feature map, so that feature information in the image is further extracted, and the richness of information extraction is helpful for improving the classification expression capability. The method can better realize the feature multiplexing of the network level, and the network can obtain deep feature classification capability.

Step S210, inputting the first feature map and the second feature map into a decoding network, fusing the first feature map and the second feature map through the decoding network to obtain a first fused feature map, and performing decoding operation on the first fused feature map to obtain a segmentation result of the target image; the decoding operation mainly includes a dimension-up (also called up-sampling) operation, and the purpose of the decoding operation is to supplement the micro-feature structure in the target image lost in the process of encoding the target image.

According to the image segmentation method provided by the embodiment of the invention, the target image is coded for multiple times based on the multiple coding networks (the main coding network and the auxiliary coding network), so that the macro information of each object in the image can be effectively extracted, and the classification expression capability can be improved; and after the feature maps (the first feature map and the second feature map) obtained by multiple encoding are fused, the fused feature maps are decoded to obtain a segmentation result, which is beneficial to effectively restoring the detail features of each object in the image, so that the accuracy of image segmentation can be effectively improved.

In specific implementation, an image segmentation model may be constructed in advance, and the image segmentation model may be implemented based on the image segmentation method. Referring to fig. 3, a schematic structural diagram of a first image segmentation model is shown, which illustrates that the image segmentation model includes a primary coding network, a secondary coding network, and a decoding network; the method comprises the steps of outputting a first feature map by carrying out primary coding operation on a target image through a main coding network, amplifying the first feature map output by the main coding network (shown by a symbol x in figure 3), using the amplified first feature map as the input of an auxiliary coding network, and carrying out secondary coding operation on the amplified first feature map through the auxiliary coding network to realize sub-pixel-level coding. The first image segmentation model provided by the embodiment of the invention adopts a network feature multiplexing mode, does not directly amplify the high-dimensional features obtained by coding and then takes the amplified features as the output of the model, but further amplifies the features and then inputs the amplified features into the next coding network (the amplified first feature map is input into the auxiliary coding network), so that the deep feature classification capability can be further obtained.

For ease of understanding, the primary encoding network described above will first be described in greater detail. Fig. 3 is a schematic structural diagram of a second image segmentation model shown in fig. 4, which illustrates a structure of the primary coding network, and the primary coding network includes an image scaling sub-network, a primary down-sampling sub-network, and a primary feature association sub-network connected in sequence.

Based on this, a specific implementation manner of performing the encoding operation on the target image through the main encoding network in step S204 to obtain the first feature map may be as follows: scaling the size of the target image to a specified size by an image scaling sub-network; performing down-sampling operation on the target image zoomed to a specified size through a main down-sampling sub-network to obtain a main down-sampling feature map; and performing full connection operation on the main down-sampling feature map through a main feature association sub-network, and fusing the main down-sampling feature map and the main down-sampling feature map subjected to the full connection operation to obtain a first feature map.

In consideration of the fact that the target image to be segmented is usually a large image in practical application, the method provided by the embodiment of the invention reduces the target image based on the image scaling sub-network in the main coding network, so as to reduce the calculation amount of the subsequent image processing processes such as downsampling, full connection and fusion operation, and the method is favorable for improving the speed of image segmentation. After the main downsampling feature map output by the main downsampling sub-network is subjected to full connection operation based on the main feature association sub-network in the main coding network, the main downsampling feature map and the main downsampling feature map subjected to full connection operation are fused, and the correlation of each feature in the image can be enhanced.

For convenience of understanding, this embodiment provides a specific implementation of a secondary coding network, where the secondary coding network may include M secondary coding groups; wherein M is a preset natural number not less than 1; based on the structure, a natural number variable M is used for representing any one of the M auxiliary coding groups, and the value range of M is more than or equal to 1 and less than or equal to M. In the above step S208, performing an encoding operation on the amplified first feature map through the auxiliary encoding network to obtain a second feature map may be implemented in the following manner:

if m is equal to 1, performing coding operation on the 1 st auxiliary coding group based on the amplified first characteristic diagram to obtain a 1 st auxiliary coding characteristic diagram, and amplifying the 1 st auxiliary coding characteristic diagram by a preset multiple to obtain an amplified m-th auxiliary coding characteristic diagram;

if m is larger than 1, performing coding operation on the mth auxiliary coding group based on the amplified (m-1) th auxiliary coding feature map to obtain an mth auxiliary coding feature map, and amplifying the mth auxiliary coding feature map by preset times to obtain an amplified mth auxiliary coding feature map; wherein the values of M are taken from 2 to M-1 in sequence;

if M is equal to M, performing coding operation on the M-th auxiliary coding group based on the amplified M-1-th auxiliary coding feature map to obtain an M-th auxiliary coding feature map;

determining the m-th auxiliary coding feature maps as second feature maps; wherein the values of M are taken from 1 to M in order.

Fig. 5 is a schematic structural diagram of a third image segmentation model, which illustrates a specific structure of the secondary coding network on the basis of fig. 4, that is, fig. 5 illustrates that the secondary coding network includes a first secondary coding group and a second secondary coding group. Corresponding to the mode, the value of m is 1 and 2, the first auxiliary coding group performs coding operation based on the amplified first characteristic diagram to obtain a 1 st auxiliary coding characteristic diagram, and the second coding group performs coding operation based on the amplified 1 st auxiliary coding characteristic diagram to obtain a 2 nd auxiliary coding characteristic diagram; the 1 st auxiliary coding feature map and the 2 nd auxiliary coding feature map are both determined as the second feature map, that is, in fig. 5 provided in the embodiment of the present invention, the second feature map includes the 1 st auxiliary coding feature map output by the first auxiliary coding group and the 2 nd auxiliary coding feature map output by the second auxiliary coding group.

In a specific embodiment, each secondary encoding group includes a secondary down-sampling sub-network and a secondary feature association sub-network connected in sequence, as shown in fig. 6, a structural diagram of a fourth image segmentation model specifically illustrates, on the basis of fig. 5, a first secondary down-sampling sub-network and a first secondary feature association sub-network included in a first secondary encoding group, and a second secondary down-sampling sub-network and a second secondary feature association sub-network included in a second secondary encoding group. Explaining according to the value division situation of m, the specific execution mode of obtaining the corresponding auxiliary coding characteristic diagram by different auxiliary coding groups is as follows:

if m is equal to 1, the step of performing encoding operation based on the amplified first feature map by the 1 st auxiliary encoding group to obtain the 1 st auxiliary encoding feature map includes: firstly, carrying out downsampling operation on the basis of the amplified first feature map through an auxiliary downsampling subnetwork of a 1 st auxiliary coding group to obtain a 1 st auxiliary downsampling feature map; and then performing full connection operation on the 1 st auxiliary down-sampling feature map through an auxiliary feature association sub-network of the 1 st auxiliary coding group, and fusing the 1 st auxiliary down-sampling feature map and the 1 st auxiliary down-sampling feature map subjected to the full connection operation to obtain the 1 st auxiliary coding feature map.

If m is greater than 1, the step of obtaining the mth auxiliary coding feature map by performing coding operation on the mth auxiliary coding group based on the amplified (m-1) th auxiliary coding feature map includes: firstly, performing downsampling operation on the basis of the amplified (m-1) th auxiliary coding feature map through an auxiliary downsampling subnetwork of the mth auxiliary coding group to obtain an mth auxiliary downsampling feature map; and then, carrying out full connection operation on the mth auxiliary down-sampling feature map through an auxiliary feature association sub-network of the mth auxiliary coding group, and fusing the mth auxiliary down-sampling feature map and the mth auxiliary down-sampling feature map subjected to full connection operation to obtain the mth auxiliary coding feature map.

In fig. 6, the input of the first auxiliary downsampling subnetwork is the amplified first feature map, and the output of the first auxiliary downsampling subnetwork is the 1 st auxiliary downsampling feature map; the input of the first auxiliary feature correlation sub-network is the 1 st auxiliary downsampled feature map, and the output of the first auxiliary feature correlation sub-network is the 1 st auxiliary coding feature map. The input of the second auxiliary down-sampling sub-network is the amplified 1 st auxiliary coding feature map, and the output of the second auxiliary down-sampling sub-network is the 2 nd auxiliary down-sampling feature map; the input of the second auxiliary feature correlation sub-network is the 2 nd auxiliary downsampled feature map, and the output of the second auxiliary feature correlation sub-network is the 2 nd auxiliary coding feature map.

In order to further improve the classification expression capability of image segmentation and avoid a deeper network structure from influencing the feature resolution, on the premise of not changing the structure of an image segmentation model, the embodiment of the present invention further provides a hierarchical feature multiplexing method on the basis of the above network feature multiplexing, and the embodiment of the present invention provides a plurality of image segmentation models (fifth to tenth) applying the hierarchical feature multiplexing method, and the specific description is as follows:

as shown in fig. 7, an embodiment of the present invention provides a fifth structure of an image segmentation model, which further illustrates a plurality of stitching layers on the basis of fig. 6, and the structure of the fifth structure is shown in fig. 7

To represent the splice layer used to map the different levels of features. The main down-sampling sub-network of the main coding network also outputs a main intermediate characteristic diagram; the auxiliary down-sampling sub-network of the auxiliary encoding group also outputs an auxiliary intermediate feature map. The splicing layer is used for splicing the main intermediate map feature map output by the main downsampling subnetwork and the amplified first feature map, or splicing the auxiliary intermediate feature map output by the first auxiliary downsampling subnetwork and the amplified 1 st auxiliary coding feature map to form a corresponding splicing feature map, and features of different levels are spliced (also called as series connection) together for reuse, so that the complexity of the model can be increased without increasing extra calculation amount, the classification expression capability and the space description capability of the image segmentation model can be improved, and the segmentation speed and the segmentation accuracy are well balanced.

Based on this, if m is equal to 1, the step of performing downsampling operation based on the amplified first feature map through the auxiliary downsampling subnetwork of the 1 st auxiliary encoding group to obtain a 1 st auxiliary downsampling feature map includes:

splicing the main intermediate characteristic diagram and the amplified first characteristic diagram to obtain a 1 st spliced characteristic diagram; performing down-sampling operation on the 1 st splicing feature map through an auxiliary down-sampling sub-network of the 1 st auxiliary coding group to obtain a 1 st auxiliary down-sampling feature map;

if m is larger than 1, the auxiliary down-sampling sub-network of the mth auxiliary coding group performs down-sampling operation based on the amplified (m-1) th auxiliary coding feature map to obtain the mth auxiliary down-sampling feature map, and the method comprises the following steps: splicing the auxiliary intermediate characteristic graph output by the m-1 auxiliary coding group with the amplified m-1 auxiliary coding characteristic graph to obtain an m-th spliced characteristic graph; performing down-sampling operation on the mth splicing feature map through an auxiliary down-sampling subnetwork of the mth auxiliary coding group to obtain an mth auxiliary down-sampling feature map; wherein the values of M are taken from 2 to M in sequence.

Further, an embodiment of the present invention provides a specific implementation of hierarchical feature multiplexing. The image scaling subnetwork comprises at least one layer of convolution layer; the primary down-sampling sub-network comprises one or more primary convolution groups; the method comprises the following steps that a plurality of main convolution groups are connected in sequence, each main convolution group is used for reducing a feature map input to the main convolution group to an appointed feature dimension, and the appointed feature dimensions corresponding to different main convolution groups are different; and each main convolution group comprises a plurality of convolution layers; the main characteristic association sub-network comprises a main full-connection layer and a main convolution layer which are sequentially connected, and further comprises a main point multiplication operation layer; the main fully-connected layer and the main convolution layer in the main feature association sub-network are used for fully connecting the main downsampling feature map, and the main point multiplication operation layer is used for fusing the main downsampling feature map and the main downsampling feature map subjected to the fully-connecting operation to obtain a first feature map.

Said sub-sampling subnetwork comprises one or more sub-convolution groups; the auxiliary convolution groups are connected in sequence, each auxiliary convolution group is used for reducing the characteristic diagram input to the auxiliary convolution group to an appointed characteristic dimension, and the appointed characteristic dimensions corresponding to different auxiliary convolution groups are different; and each auxiliary convolution group comprises a plurality of convolution layers; the auxiliary characteristic correlation sub-network comprises an auxiliary full connection layer and an auxiliary convolution layer which are sequentially connected, and also comprises an auxiliary point multiplication operation layer; and the auxiliary fully-connected layer and the auxiliary convolution layer in the auxiliary feature correlation sub-network are used for performing fully-connected operation on the auxiliary downsampling feature map, and the auxiliary point multiplication operation layer is used for fusing the auxiliary downsampling feature map and the auxiliary downsampling feature map subjected to fully-connected operation to obtain a second feature map.

For the convenience of understanding, the embodiment of the present invention provides a structural schematic diagram of a sixth image segmentation model as shown in fig. 8, and on the basis of fig. 7, a plurality of main convolution groups in a main downsampling sub-network (three main convolution groups, namely, a first main convolution group, a second main convolution group and a third main convolution group, are shown in fig. 8), a plurality of main convolution groups in a first auxiliary downsampling sub-network (three auxiliary convolution groups, namely, an auxiliary convolution group 1-1, an auxiliary convolution group 1-2 and an auxiliary convolution group 1-3, are shown in fig. 8), and a plurality of auxiliary convolution groups in a second auxiliary downsampling sub-network (three auxiliary convolution groups, namely, an auxiliary convolution group 2-1, an auxiliary convolution group 2-2 and an auxiliary convolution group 2-3, are shown in fig. 8) are shown in detail. In addition, the embodiment of the present invention takes the sub-feature association sub-network as an example, and fig. 9 shows a structure of the sub-feature association sub-network in detail.

Based on the sixth image segmentation model shown in fig. 8, in an alternative embodiment, the number of primary convolution groups included in the primary downsampling sub-network and the number of secondary convolution groups included in each secondary encoding group are both N; wherein N is a natural number not less than 1. The embodiment of the invention specifically executes the idea of applying hierarchical feature multiplexing as follows:

for the convenience of understanding, a natural number variable N is used to represent any one of the N main convolution groups (or auxiliary encoding groups), and the value range of N is 1 ≦ N ≦ N.

The first step of performing downsampling operation on the 1 st splicing feature map through the 1 st auxiliary downsampling subnetwork of the 1 st auxiliary coding group to obtain the 1 st auxiliary downsampling feature map may be implemented with reference to the following embodiments:

if n is equal to 1, splicing the output characteristic diagram of the 1 st main convolution group in the main coding network with the amplified first characteristic diagram to obtain a 1 st sub-splicing characteristic diagram, performing down-sampling operation on the 1 st sub-splicing characteristic diagram through the 1 st auxiliary convolution group in the 1 st auxiliary coding group, and determining the output characteristic diagram of the 1 st auxiliary convolution group in the 1 st auxiliary coding group as a 1 st auxiliary intermediate characteristic diagram of the 1 st auxiliary coding group;

if n is larger than 1, splicing the output characteristic graph of the (n-1) th auxiliary convolution group in the 1 st auxiliary coding group with the output characteristic graph of the nth main convolution group in the main coding network to obtain a 1 & n sub-splicing characteristic graph, performing down-sampling operation on the 1 & n sub-splicing characteristic graph through the nth auxiliary convolution group in the 1 st auxiliary coding group, and determining the output characteristic graph of the nth auxiliary convolution group in the 1 st auxiliary coding group as the nth auxiliary intermediate characteristic graph of the 1 st auxiliary coding group; wherein the value of N is taken from 2 to N in sequence;

and determining the Nth auxiliary intermediate output graph of the 1 st auxiliary encoding group as the 1 st auxiliary down-sampling feature graph.

In specific implementation, referring to the schematic structural diagram of the sixth image segmentation model shown in fig. 8, if the value of N is 3, the value range of N is 1 or more and N is 3 or less. The input of the auxiliary convolution group 1-1 (i.e. the 1 st auxiliary convolution group in the 1 st auxiliary encoding group) is the 1 st sub-splicing characteristic diagram obtained by splicing the output characteristic diagram of the first main convolution group (i.e. the 1 st main convolution group in the main encoding network) and the amplified first characteristic diagram, and the output of the auxiliary convolution group 1-1 is the 1 st auxiliary intermediate characteristic diagram of the 1 st auxiliary encoding group. The input of the auxiliary convolution group 1-2 (i.e., the 2 nd auxiliary convolution group in the 1 st auxiliary encoding group) is the 1 st or 2 nd sub-concatenation feature map obtained by splicing the output feature map of the second main convolution group (i.e., the 2 nd main convolution group in the main encoding network) and the output feature map of the auxiliary convolution group 1-1 (i.e., the 1 st auxiliary intermediate feature map of the 1 st auxiliary encoding group), and the output of the auxiliary convolution group 1-2 is the 2 nd auxiliary intermediate feature map of the 1 st auxiliary encoding group. The input of the auxiliary convolution group 1-3 (i.e., the 3 rd auxiliary convolution group in the 1 st auxiliary encoding group) is the 1 st or 3 rd sub-splicing characteristic diagram obtained by splicing the output characteristic diagram of the third main convolution group (i.e., the 3 rd main convolution group in the main encoding network) and the output characteristic diagram of the auxiliary convolution group 1-2 (i.e., the 2 nd auxiliary intermediate characteristic diagram of the 1 st auxiliary encoding group), and the output of the auxiliary convolution group 1-2 is the 3 rd auxiliary intermediate characteristic diagram of the 1 st auxiliary encoding group.

(ii) the sub-downsampling subnetwork of the mth sub-coding group downsamples the mth splicing feature map to obtain the mth sub-downsampling feature map, with reference to the following embodiments:

if n is equal to 1, splicing the output characteristic graph of the 1 st auxiliary convolution group in the m-1 th auxiliary coding group with the amplified m-1 th auxiliary coding characteristic graph to obtain an m & lt 1 & gt sub-splicing characteristic graph, performing down-sampling operation on the m & lt 1 & gt sub-splicing characteristic graph through the 1 st auxiliary convolution group in the m & lt 1 & gt auxiliary coding group, and determining the output characteristic graph of the 1 st auxiliary convolution group in the m & lt 1 & gt auxiliary coding group as the 1 st auxiliary middle characteristic graph of the m & gt auxiliary coding group;

if n is larger than 1, splicing the output characteristic graph of the nth auxiliary convolution group in the m-1 th auxiliary coding group and the output characteristic graph of the n-1 th auxiliary convolution group in the m-1 th auxiliary coding group to obtain an m & n sub-splicing characteristic graph; performing downsampling operation on the m & n sub-splicing feature maps through the nth auxiliary convolution group in the mth auxiliary coding group, and determining the output feature map of the nth auxiliary convolution group in the mth auxiliary coding group as the nth auxiliary intermediate feature map of the mth auxiliary coding group;

In the sixth image segmentation model shown in fig. 8, the input of the auxiliary convolution group 2-1 (i.e., the 1 st auxiliary convolution group in the 2 nd auxiliary encoding group) is the 2 · 1 sub-concatenation feature map obtained by concatenating the output feature map of the auxiliary convolution group 1-1 (i.e., the 1 st auxiliary convolution group in the 1 st auxiliary encoding group) and the amplified 1 st auxiliary encoding feature map, and the output of the auxiliary convolution group 2-1 is the 1 st auxiliary intermediate feature map of the 2 nd auxiliary encoding group. The input of the secondary convolution group 2-2 (i.e., the 2 nd secondary convolution group in the 2 nd secondary encoding group) is the 2 & 2 nd sub-concatenation feature map obtained by splicing the output feature map of the secondary convolution group 1-2 (i.e., the 2 nd primary convolution group in the primary encoding network) and the output feature map of the secondary convolution group 2-1 (i.e., the 1 st secondary intermediate feature map of the 2 nd secondary encoding group), and the output of the secondary convolution group 2-2 is the 2 nd secondary intermediate feature map of the 2 nd secondary encoding group. The input of the auxiliary convolution group 2-3 (i.e., the 3 rd auxiliary convolution group in the 2 nd auxiliary encoding group) is the 2 & 3 th sub-splicing feature map obtained by splicing the output feature map of the auxiliary convolution group 1-3 (i.e., the 3 rd main convolution group in the main encoding network) and the output feature map of the auxiliary convolution group 2-2 (i.e., the 2 nd auxiliary intermediate feature map of the 2 nd auxiliary encoding group), and the output of the auxiliary convolution group 2-3 is the 3 rd auxiliary intermediate feature map of the 2 nd auxiliary encoding group.

For easy understanding, the embodiment of the present invention describes the decoding network in detail, referring to a schematic structural diagram of a seventh image segmentation model shown in fig. 10, where the decoding network includes a fusion sub-network and a decoding sub-network; the fusion sub-network is used for amplifying the first characteristic diagram and the second characteristic diagram to a specified size and fusing the amplified first characteristic diagram and the amplified second characteristic diagram to obtain a first fusion characteristic diagram; the decoding sub-network is used for decoding the first fusion feature map to obtain a segmentation result of the target image.

As shown in fig. 10, the fusion sub-network includes a plurality of upsampling layers (a first upsampling layer, a second upsampling layer, and a third upsampling layer, respectively) and a bitwise addition operation layer; the input of the up-sampling layer is a first characteristic diagram or a second characteristic diagram; the inputs of different upsampling layers are different; wherein the second feature map comprises the output feature map of the first secondary feature-associated sub-network and the output feature map of the second secondary feature-associated sub-network. As shown in fig. 10, the input of the first upsampling layer is the output feature map of the second auxiliary feature correlation sub-network, the input of the second upsampling layer is the output feature map of the first auxiliary feature correlation sub-network, and the input of the third upsampling layer is the output feature map of the main feature correlation sub-network, that is, the first feature map. Each up-sampling layer is used for amplifying the characteristic diagram input to the up-sampling layer to a specified size, and in specific implementation, the up-sampling layer can perform up-sampling operation on the characteristic diagram input to the up-sampling layer in a deconvolution mode, a bilinear interpolation mode and the like, so that the characteristic diagram input to the up-sampling layer is amplified to the specified size, and an amplified first characteristic diagram or an amplified second characteristic diagram is obtained; and the bitwise addition operation layer is used for performing bitwise addition operation on the amplified first characteristic diagram and the amplified second characteristic diagram to obtain a first fused characteristic diagram.

In order to further increase the complexity of the decoding network to improve the spatial expression capability of the image segmentation result features, an embodiment of the present invention further provides a specific implementation manner in which the first feature map and the second feature map are fused by the decoding network in step S210 to obtain a first fused feature map, which is as follows:

inputting the output characteristic graph of the first main convolution group and the output characteristic graph of the first auxiliary convolution group in each auxiliary encoding group into a decoding network; and fusing the output characteristic diagram of the first main convolution group, the output characteristic diagram of the first auxiliary convolution group in each auxiliary encoding group, the first characteristic diagram and the second characteristic diagram through a decoding network to obtain a first fused characteristic diagram.

In view of fig. 11, an eighth structural schematic diagram of an image segmentation model is provided in the embodiment of the present invention, and the diagram illustrates that an output feature map of a first main convolution group (that is, the first main convolution group), an output feature map of a sub-convolution group 1-1 in a first sub-encoding group (that is, a first sub-convolution group in the 1 st sub-encoding group), an output feature map of a sub-convolution group 2-1 in a second sub-encoding group (that is, a first sub-convolution group in the 2 nd sub-encoding group), an output feature map of a main feature association sub-network (that is, a first feature map), an output feature map of a first sub-feature association sub-network and an output feature map of a second sub-feature association sub-network (that is, a second feature map) are all input to a decoding network for fusion to obtain a first fusion feature map.

Specifically, referring to the structural diagram of the ninth image segmentation model shown in fig. 12, the fusion sub-network in the decoding network includes 6 upsampling layers, which are a first upsampling layer corresponding to the output feature map of the second auxiliary feature-associated sub-network, a second upsampling layer corresponding to the output feature map of the first auxiliary feature-associated sub-network, a third upsampling layer corresponding to the output feature map of the main feature-associated sub-network, a fourth upsampling layer corresponding to the output feature map of the auxiliary convolution group 2-1, a fifth upsampling layer corresponding to the output feature map of the auxiliary convolution group 1-1, and a sixth upsampling layer corresponding to the output feature map of the first main convolution group. The bitwise addition operation layer in fig. 12 is configured to perform bitwise addition operation on the output feature map of the amplified first main convolution group, the output feature map of the first auxiliary convolution group in each amplified auxiliary encoding group, the amplified first feature map, and the amplified second feature map, which are output by the 6 upsampling layers in the fused sub-network respectively, to obtain a first fused feature map.

Compared with the existing complex model, the image segmentation model corresponding to the image segmentation method provided by the embodiment has a smaller structure, and the constructed coding network and decoding network have simpler structures, so that the image segmentation performance cannot be reduced when the received target image to be segmented is a large image; and by adopting a network-level feature multiplexing and/or hierarchical feature multiplexing mode in the encoding operation process, no extra calculation amount is added, and the complexity of the model is enhanced while the high speed is ensured. Moreover, the embodiment of the invention can splice the intermediate features generated in the encoding process, and encode the spliced features again, so that the feature description performance can be more effectively improved, namely the classification expression capability and the space description capability of image segmentation are improved; in addition, a plurality of amplified feature maps in the coding network are fused through the decoding network, so that the decoding capability is further enhanced, and the performance of the image segmentation model is improved. Based on the image segmentation method and the image segmentation method, the speed and the accuracy of image segmentation are greatly improved.

Example three:

based on the image segmentation method provided by the second embodiment, the embodiment of the present invention will be described in detail by taking the example of segmenting a target image with a size of 1024 × 1024. Referring to fig. 13 first, an embodiment of the present invention provides a structural diagram of a tenth image segmentation model, and details dimensions of a structure applied to each part in the image segmentation model and an output feature map thereof on the basis of fig. 12.

Specifically, the main coding network (i.e., the backbone box shown in fig. 13) includes conv1 (i.e., the aforementioned image scaling sub-network), enc2 (i.e., the aforementioned first main convolution group), enc3 (i.e., the aforementioned second main convolution group), enc4 (i.e., the aforementioned third main convolution group), and fc attention (i.e., the aforementioned main feature association sub-network). In practical application, enc2, enc3 and enc4 can be realized by using an Xception module. The Xception module, also called depth-separable convolution module, has the main principle of reducing the dimensionality of features by multiple 1 × 1 convolution operations. In one implementation, two specific parameter information (XceptionA parameter information and XceptionB parameter information, respectively) of the convolution kernels in conv1, enc2, enc3 and enc4 are provided in the embodiment of the present invention, as shown in table 1 below:

TABLE 1

In actual application, firstly, inputting a target image with the size of 1024 × 1024 into a main coding network, and reducing the target image to a characteristic map with the size of 512 × 512 × 64 through conv1 in the main coding network; inputting the feature map output from the conv1 into an enc2 in a main coding network, and obtaining a 256 × 256 × 48 feature map after downsampling and dimensionality reduction; inputting the feature map output by enc2 in the main coding network into enc3 in the main coding network, performing down-sampling and dimensionality reduction to obtain a 128 × 128 × 96 feature map, inputting the feature map output by enc3 in the main coding network into enc4 in the main coding network, performing down-sampling and dimensionality reduction to obtain a 64 × 64 × 192 feature map, inputting the feature map output by enc4 into fc attention of the main coding network, and outputting the feature map after full connection operation and fusion operation to obtain a first feature map (64 × 64 × 192) of the main coding network.

Referring to fig. 14, an embodiment of the present invention further provides a schematic structural diagram of a main feature correlation sub-network, based on the image segmentation model structure shown in fig. 13, the main feature correlation sub-network includes an fc layer (i.e., the aforementioned main fully-connected layer), a conv layer (i.e., the aforementioned main convolution layer), and a main point multiplication operation layer (as shown in fig. 14)

). Where the dimension of the fc layer is 1000 and the parameters of conv are 1 × 1 × 192.And after full connection operation is carried out on the characteristic diagram output by enc4 in the main coding network through an fc layer and a conv layer in fc attention of the main coding network, the characteristic diagram output by enc4 in the main coding network is fused with the characteristic diagram output by enc4 after full connection operation through a main point multiplication operation layer, and a first characteristic diagram is obtained. Further, as shown in fig. 13, the first characteristic diagram is enlarged by a preset multiple (4 times).

Further, as shown in fig. 13, the secondary coding network includes two secondary coding groups, each secondary coding group includes a secondary down-sampling sub-network and a secondary feature association sub-network, and each secondary down-sampling sub-network includes three secondary convolution groups. In practical applications, the structural parameter information of the auxiliary convolution group and the auxiliary feature associated sub-network may be consistent with the structural parameter information of the main convolution group and the main feature associated sub-network in the main encoding network, and a specific application process may be implemented with reference to the image segmentation method in the second embodiment, which is not described herein.

Further, as shown in fig. 13, the decoding network (i.e., the decoder block in fig. 13) includes 6 upsampling layers (conv × 1(a), conv × 2, conv × 3, conv × 4, conv × 8, and conv × 16, respectively) for enlarging different sized feature maps input into the decoding network to a uniform size. Specifically, conv × 1(a) is used to amplify the feature map output by enc2 in the primary coding network by 1 time; conv × 2 is used for amplifying the feature map output by enc2 in the 1 st auxiliary coding group by 2 times; conv × 3 is used for amplifying the feature map output by enc2 in the 2 nd auxiliary coding group by 3 times; conv × 4 is used for amplifying the feature map output by fc attention in the main coding network by 4 times; conv × 8 is used for amplifying the feature map output by fc attention in the 1 st auxiliary coding group by 8 times; conv × 16 is used to enlarge the feature map output by fc attention in the 2 nd secondary encoding set by 16 times. In practical application, the feature maps with different sizes can be directly amplified to a uniform size based on corresponding amplification factors, and the upsampling layer is preferably used in the embodiment of the invention, so that the generalization capability of a decoding network can be enhanced, and the segmentation performance of an image segmentation model can be improved.

In addition, the decoding network in fig. 13 further includes two bitwise addition operation layers, one of which is used to fuse images output by conv × 1(a), conv × 2, and conv × 3 to obtain a first intermediate fusion feature map; and the other one is used for fusing images output by conv × 4, conv × 8 and conv × 16 to obtain a second intermediate fusion feature map, and fusing the first intermediate fusion feature map and the second intermediate fusion feature map to obtain the first fusion feature map. Based on the structure, the embodiment of the invention can divide the 6 feature graphs to be fused into two parts, and after the two parts are fused simultaneously, the fusion output results of the two parts are fused, thereby effectively shortening the calculation time of the decoding process and being beneficial to improving the image segmentation speed. Furthermore, the decoding network shown in fig. 13 is further provided with an intermediate processing convolution layer (i.e., conv × 1(b) shown in fig. 13), and conv × 1(b) is arranged between two bitwise addition layers, and the intermediate fusion feature map output by the first bitwise addition layer from left to right as shown in fig. 13 is processed by conv × 1(b) and then fused with the intermediate fusion feature map output by the second bitwise addition layer, so that the generalization capability of the decoding network can be further enhanced without generating extra calculation amount, the segmentation performance of the image segmentation model can be improved, and the effect of the image segmentation result is better.

To demonstrate the effect of the embodiments of the present invention, the embodiments of the present invention were subjected to parameter selection and testing on a common database (cityscaps). Through parameter adjustment, the embodiment of the present invention employs three layers of feature multiplexing (i.e., a primary coding network, a first secondary coding group, and a second secondary coding group). Compared with the method of baseline (DeepLab), the embodiment of the invention realizes 200 times of acceleration and 15% of performance improvement, and compared with the best real-time segmentation method (BiSeNet1), the embodiment of the invention realizes 3.5 times of acceleration and 3% of performance improvement. The specific comparison of the effects with other segmentation models is illustrated in table 2 below:

TABLE 2

Wherein, DFANetA and DFANetB in table 2 are the image segmentation models provided by the embodiment of the present invention; specifically, DFANet1 is a model designed based on the XceptionA structural parameters provided in the embodiment of the present invention, and DFANet1 is a model designed based on the XceptionB structural parameters provided in the embodiment of the present invention.

In addition, the embodiment of the invention also provides a comparison graph of two effects. An effect comparison diagram is shown in fig. 15, and the correspondence between the number of transmission frames per second and the accuracy (average intersection ratio%) of each model in table 2 is shown in fig. 15. As can be seen from fig. 15, the image segmentation model provided in the embodiment of the present invention can effectively maintain the balance between the segmentation speed and the segmentation accuracy, and can implement fast and accurate segmentation on an image.

Referring to fig. 16, another effect comparison schematic diagram is provided in the embodiment of the present invention, in fig. 16, a process of segmenting three different images is shown, based on the image segmentation model (including the primary encoding network, the first secondary encoding group, and the second secondary encoding group) provided in the embodiment of the present invention. Each column of images shown in fig. 16 is an original target image, a feature map output by the amplified main coding network, a feature map output by the amplified first auxiliary coding group, a feature map output by the amplified second auxiliary coding group, and an artificially scaled image segmentation result in turn. It can be seen that with the increase of the number of network layers, the classification expression capability and feature space description capability of the output image segmentation result become stronger and stronger, and the output image segmentation result is finally closer to the segmentation result calibrated manually. Therefore, the image segmentation model provided by the embodiment of the invention can improve the accuracy of image segmentation.

Example four:

as to the image segmentation method provided in the second embodiment, an embodiment of the present invention provides an image segmentation apparatus, referring to a structural block diagram of the image segmentation apparatus shown in fig. 17, the apparatus includes the following modules:

a target image obtaining module 1702, configured to obtain a target image to be segmented;

a main encoding module 1704, configured to input the target image to a main encoding network, and perform an encoding operation on the target image through the main encoding network to obtain a first feature map;

a size enlarging module 1706, configured to enlarge the size of the first feature map by a preset multiple to obtain an enlarged first feature map;

an auxiliary encoding module 1708, configured to input the amplified first feature map to an auxiliary encoding network, and perform an encoding operation on the amplified first feature map through the auxiliary encoding network to obtain a second feature map;

the decoding module 1710 is configured to input the first feature map and the second feature map into a decoding network, fuse the first feature map and the second feature map through the decoding network to obtain a first fused feature map, and perform a decoding operation on the first fused feature map to obtain a segmentation result of the target image.

According to the image segmentation device provided by the embodiment of the invention, the target image is coded for multiple times based on the multiple coding networks (the main coding network and the auxiliary coding network), so that the macro information of each object in the image can be effectively extracted, and the classification expression capability can be improved; and after the feature maps (the first feature map and the second feature map) obtained by multiple encoding are fused, the fused feature maps are decoded to obtain a segmentation result, which is beneficial to effectively restoring the detail features of each object in the image, so that the accuracy of image segmentation can be effectively improved.

In one embodiment, the primary encoding network comprises an image scaling sub-network, a primary down-sampling sub-network and a primary feature association sub-network connected in sequence; the primary encoding module 1704 is further configured to scale the size of the target image to a designated size through the image scaling sub-network; performing down-sampling operation on the target image zoomed to a specified size through a main down-sampling sub-network to obtain a main down-sampling feature map; and performing full connection operation on the main down-sampling feature map through a main feature association sub-network, and fusing the main down-sampling feature map and the main down-sampling feature map subjected to the full connection operation to obtain a first feature map.

In one embodiment, an image scaling subnetwork comprises at least one convolutional layer; the primary down-sampling sub-network comprises one or more primary convolution groups; the method comprises the following steps that a plurality of main convolution groups are connected in sequence, each main convolution group is used for reducing a feature map input to the main convolution group to an appointed feature dimension, and the appointed feature dimensions corresponding to different main convolution groups are different; and each main convolution group comprises a plurality of convolution layers; the main characteristic association sub-network comprises a main full-connection layer and a main convolution layer which are sequentially connected, and further comprises a main point multiplication operation layer; the main fully-connected layer and the main convolution layer in the main feature association sub-network are used for fully connecting the main downsampling feature map, and the main point multiplication operation layer is used for fusing the main downsampling feature map and the main downsampling feature map subjected to the fully-connecting operation to obtain a first feature map.

In one embodiment, the secondary encoding network includes M secondary encoding groups; wherein M is a preset natural number not less than 1; the auxiliary encoding module 1708 is further configured to, if m is equal to 1, perform encoding operation based on the amplified first feature map by the 1 st auxiliary encoding group to obtain a 1 st auxiliary encoding feature map, and amplify the 1 st auxiliary encoding feature map by the preset multiple to obtain an amplified mth auxiliary encoding feature map; if m is larger than 1, performing coding operation on the mth auxiliary coding group based on the amplified (m-1) th auxiliary coding feature map to obtain an mth auxiliary coding feature map, and amplifying the mth auxiliary coding feature map by preset times to obtain an amplified mth auxiliary coding feature map; wherein the values of M are taken from 2 to M-1 in sequence; if M is equal to M, performing coding operation on the M-th auxiliary coding group based on the amplified M-1-th auxiliary coding feature map to obtain an M-th auxiliary coding feature map; determining the m-th auxiliary coding feature maps as second feature maps; wherein the values of M are taken from 1 to M in order.

In one embodiment, each secondary encoding group comprises a secondary down-sampling sub-network and a secondary feature association sub-network connected in sequence; if m is equal to 1, the auxiliary encoding module 1708 is further configured to perform downsampling operation on the basis of the amplified first feature map through an auxiliary downsampling subnetwork of the 1 st auxiliary encoding group, so as to obtain a 1 st auxiliary downsampling feature map; performing full connection operation on the 1 st auxiliary down-sampling feature map through an auxiliary feature association sub-network of the 1 st auxiliary coding group, and fusing the 1 st auxiliary down-sampling feature map and the 1 st auxiliary down-sampling feature map subjected to the full connection operation to obtain a 1 st auxiliary coding feature map; if m is greater than 1, the auxiliary encoding module 1708 is further configured to perform downsampling operation on the m-1 th auxiliary encoding feature map through the auxiliary downsampling subnetwork of the m-th auxiliary encoding group based on the amplified auxiliary encoding feature map, so as to obtain an m-th auxiliary downsampling feature map; and performing full connection operation on the mth auxiliary down-sampling feature map through an auxiliary feature association sub-network of the mth auxiliary coding group, and fusing the mth auxiliary down-sampling feature map with the mth auxiliary down-sampling feature map subjected to full connection operation to obtain the mth auxiliary coding feature map.

In one embodiment, the primary down-sampling sub-network of the primary coding network also outputs a primary intermediate feature map; the auxiliary down-sampling sub-network of the auxiliary coding group also outputs an auxiliary intermediate characteristic graph; if m is equal to 1, the auxiliary encoding module 1708 is further configured to splice the main intermediate feature map and the amplified first feature map to obtain a 1 st spliced feature map; performing down-sampling operation on the 1 st splicing feature map through an auxiliary down-sampling sub-network of the 1 st auxiliary coding group to obtain a 1 st auxiliary down-sampling feature map; if m is greater than 1, the auxiliary encoding module 1708 is further configured to splice the auxiliary intermediate feature map output by the m-1 th auxiliary encoding group with the amplified m-1 th auxiliary encoding feature map to obtain an m-th spliced feature map; performing down-sampling operation on the mth splicing feature map through an auxiliary down-sampling subnetwork of the mth auxiliary coding group to obtain an mth auxiliary down-sampling feature map; wherein the values of M are taken from 2 to M in sequence.

In one embodiment, the secondary downsampling subnetwork comprises one or more secondary convolution groups; the auxiliary convolution groups are connected in sequence, each auxiliary convolution group is used for reducing the characteristic diagram input to the auxiliary convolution group to an appointed characteristic dimension, and the appointed characteristic dimensions corresponding to different auxiliary convolution groups are different; and each auxiliary convolution group comprises a plurality of convolution layers; the auxiliary characteristic correlation sub-network comprises an auxiliary full connection layer and an auxiliary convolution layer which are sequentially connected, and also comprises an auxiliary point multiplication operation layer; and the auxiliary fully-connected layer and the auxiliary convolution layer in the auxiliary feature correlation sub-network are used for performing fully-connected operation on the auxiliary downsampling feature map, and the auxiliary point multiplication operation layer is used for fusing the auxiliary downsampling feature map and the auxiliary downsampling feature map subjected to fully-connected operation to obtain a second feature map.

In one embodiment, the number of primary convolution groups included in the primary downsampling subnetwork and the number of secondary convolution groups included in each secondary encoding group are both N; wherein N is a natural number not less than 1; the auxiliary encoding module 1708 is further configured to, if n is equal to 1, splice the output feature map of the 1 st main convolutional group in the main encoding network with the amplified first feature map to obtain a 1 st sub-spliced feature map, perform downsampling on the 1 st sub-spliced feature map through the 1 st auxiliary convolutional group in the 1 st auxiliary encoding group, and determine the output feature map of the 1 st auxiliary convolutional group in the 1 st auxiliary encoding group as the 1 st auxiliary intermediate feature map of the 1 st auxiliary encoding group; if n is larger than 1, splicing the output characteristic graph of the (n-1) th auxiliary convolution group in the 1 st auxiliary coding group with the output characteristic graph of the nth main convolution group in the main coding network to obtain a 1 & n sub-splicing characteristic graph, performing down-sampling operation on the 1 & n sub-splicing characteristic graph through the nth auxiliary convolution group in the 1 st auxiliary coding group, and determining the output characteristic graph of the nth auxiliary convolution group in the 1 st auxiliary coding group as the nth auxiliary intermediate characteristic graph of the 1 st auxiliary coding group; wherein the value of N is taken from 2 to N in sequence; and determining the Nth auxiliary intermediate output graph of the 1 st auxiliary encoding group as the 1 st auxiliary down-sampling feature graph.

In an embodiment, the auxiliary encoding module 1708 is further configured to, if n is equal to 1, concatenate the output feature map of the 1 st auxiliary convolution group in the m-1 th auxiliary encoding group with the amplified m-1 st auxiliary encoding feature map to obtain an m · 1 st sub-concatenation feature map, perform downsampling on the m · 1 st sub-concatenation feature map through the 1 st auxiliary convolution group in the m th auxiliary encoding group, and determine the output feature map of the 1 st auxiliary convolution group in the m th auxiliary encoding group as the 1 st auxiliary intermediate feature map of the m th auxiliary encoding group; if n is larger than 1, splicing the output characteristic graph of the nth auxiliary convolution group in the m-1 th auxiliary coding group and the output characteristic graph of the n-1 th auxiliary convolution group in the m-1 th auxiliary coding group to obtain an m & n sub-splicing characteristic graph; performing downsampling operation on the m & n sub-splicing feature maps through the nth auxiliary convolution group in the mth auxiliary coding group, and determining the output feature map of the nth auxiliary convolution group in the mth auxiliary coding group as the nth auxiliary intermediate feature map of the mth auxiliary coding group; and determining the Nth auxiliary intermediate output graph of the mth auxiliary encoding group as an mth auxiliary downsampling feature graph.

In one embodiment, the decoding network includes a convergence sub-network and a decoding sub-network; the fusion sub-network is used for amplifying the first characteristic diagram and the second characteristic diagram to a specified size and fusing the amplified first characteristic diagram and the amplified second characteristic diagram to obtain a first fusion characteristic diagram; the decoding sub-network is used for decoding the first fusion feature map to obtain a segmentation result of the target image.

In one embodiment, the fused sub-network comprises a plurality of upsampling layers and a bitwise addition operation layer; the input of the up-sampling layer is the first characteristic diagram or the second characteristic diagram; the inputs of different upsampling layers are different; each up-sampling layer is used for amplifying the characteristic diagram input to the up-sampling layer to a specified size to obtain an amplified first characteristic diagram or an amplified second characteristic diagram; and the bitwise addition operation layer is used for performing bitwise addition operation on the amplified first characteristic diagram and the amplified second characteristic diagram to obtain a first fused characteristic diagram.

In an embodiment, the decoding module 1710 is further configured to input the output feature map of the first primary convolution group and the output feature map of the first secondary convolution group in each secondary encoding group into a decoding network; and fusing the output characteristic diagram of the first main convolution group, the output characteristic diagram of the first auxiliary convolution group in each auxiliary encoding group, the first characteristic diagram and the second characteristic diagram through a decoding network to obtain a first fused characteristic diagram.

Example five:

corresponding to the method and the device provided by the previous embodiment, the embodiment of the invention also provides an image segmentation system, which comprises an image acquisition device, a processor and a storage device; the image acquisition device is used for acquiring a target image; the storage means has stored thereon a computer program which, when executed by the processor, performs the method of any one of the embodiments two.

It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working process of the system described above may refer to the corresponding process in the foregoing embodiments, and is not described herein again.

Further, the present embodiment also provides a computer-readable storage medium, on which a computer program is stored, and the computer program is executed by a processor to perform the steps of the method provided in any one of the above embodiments two.

The image segmentation method, the image segmentation device, and the computer program product of the image segmentation system provided in the embodiments of the present invention include a computer-readable storage medium storing a program code, where instructions included in the program code may be used to execute the method described in the foregoing method embodiments, and specific implementation may refer to the method embodiments, and will not be described herein again.

In addition, in the description of the embodiments of the present invention, unless otherwise explicitly specified or limited, the terms "mounted," "connected," and "connected" are to be construed broadly, e.g., as meaning either a fixed connection, a removable connection, or an integral connection; can be mechanically or electrically connected; they may be connected directly or indirectly through intervening media, or they may be interconnected between two elements. The specific meanings of the above terms in the present invention can be understood in specific cases to those skilled in the art.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

In the description of the present invention, it should be noted that the terms "center", "upper", "lower", "left", "right", "vertical", "horizontal", "inner", "outer", etc., indicate orientations or positional relationships based on the orientations or positional relationships shown in the drawings, and are only for convenience of description and simplicity of description, but do not indicate or imply that the device or element being referred to must have a particular orientation, be constructed and operated in a particular orientation, and thus, should not be construed as limiting the present invention. Furthermore, the terms "first," "second," and "third" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance.

Finally, it should be noted that: the above-mentioned embodiments are only specific embodiments of the present invention, which are used for illustrating the technical solutions of the present invention and not for limiting the same, and the protection scope of the present invention is not limited thereto, although the present invention is described in detail with reference to the foregoing embodiments, those skilled in the art should understand that: any person skilled in the art can modify or easily conceive the technical solutions described in the foregoing embodiments or equivalent substitutes for some technical features within the technical scope of the present disclosure; such modifications, changes or substitutions do not depart from the spirit and scope of the embodiments of the present invention, and they should be construed as being included therein. Therefore, the protection scope of the present invention shall be subject to the protection scope of the appended claims.

Claims

1. An image segmentation method, comprising:

acquiring a target image to be segmented;

inputting the target image into a main coding network, and coding the target image through the main coding network to obtain a first characteristic diagram;

amplifying the size of the first characteristic diagram by preset times to obtain an amplified first characteristic diagram;

inputting the amplified first characteristic diagram into an auxiliary coding network, and carrying out coding operation on the amplified first characteristic diagram through the auxiliary coding network to obtain a second characteristic diagram;

inputting the first feature map and the second feature map into a decoding network, fusing the first feature map and the second feature map through the decoding network to obtain a first fused feature map, and performing decoding operation on the first fused feature map to obtain a segmentation result of the target image.

2. The method of claim 1, wherein the primary encoding network comprises a sequentially connected image scaling sub-network, primary down-sampling sub-network, and primary feature association sub-network;

the step of obtaining a first feature map by performing an encoding operation on the target image through the main encoding network includes:

scaling, by the image scaling sub-network, a size of the target image to a specified size;

performing down-sampling operation on the target image zoomed to a specified size through the main down-sampling sub-network to obtain a main down-sampling feature map;

and performing full connection operation on the main down-sampling feature map through the main feature association sub-network, and fusing the main down-sampling feature map and the main down-sampling feature map subjected to full connection operation to obtain a first feature map.

3. The method of claim 2, wherein the image scaling subnetwork comprises at least one convolutional layer;

the primary down-sampling sub-network comprises one or more primary convolution groups; the plurality of main convolution groups are connected in sequence, each main convolution group is used for reducing the characteristic diagram input to the main convolution group to a specified characteristic dimension, and the specified characteristic dimensions corresponding to different main convolution groups are different; and each of the primary convolution groups comprises a plurality of convolution layers;

4. The method of claim 2, wherein the secondary coding network comprises M secondary coding groups; wherein M is a preset natural number not less than 1;

the step of obtaining a second feature map by performing an encoding operation on the amplified first feature map through the auxiliary encoding network includes:

if m is equal to 1, performing coding operation on the 1 st auxiliary coding group based on the amplified first feature map to obtain a 1 st auxiliary coding feature map, and amplifying the 1 st auxiliary coding feature map by the preset times to obtain an amplified m-th auxiliary coding feature map;

if m is larger than 1, performing coding operation on the mth auxiliary coding group based on the amplified (m-1) th auxiliary coding feature map to obtain an mth auxiliary coding feature map, and amplifying the mth auxiliary coding feature map by the preset times to obtain an amplified mth auxiliary coding feature map; wherein the values of M are taken from 2 to M-1 in sequence;

5. The method of claim 4, wherein each of the secondary encoding groups comprises a secondary down-sampling sub-network and a secondary feature association sub-network connected in series;

if m is equal to 1, the step of performing coding operation based on the amplified first feature map through the 1 st auxiliary coding group to obtain the 1 st auxiliary coding feature map includes:

performing downsampling operation based on the amplified first feature map through an auxiliary downsampling subnetwork of the 1 st auxiliary coding group to obtain a 1 st auxiliary downsampling feature map;

performing full connection operation on the 1 st auxiliary down-sampling feature map through an auxiliary feature association sub-network of the 1 st auxiliary coding group, and fusing the 1 st auxiliary down-sampling feature map with the 1 st auxiliary down-sampling feature map subjected to full connection operation to obtain a 1 st auxiliary coding feature map;

if m is greater than 1, the step of obtaining the mth auxiliary coding feature map by performing coding operation on the mth auxiliary coding group based on the amplified (m-1) th auxiliary coding feature map includes:

performing downsampling operation on the m-1 auxiliary coding feature map based on the amplified auxiliary coding feature map through an auxiliary downsampling subnetwork of the m auxiliary coding group to obtain an m auxiliary downsampling feature map;

and performing full connection operation on the mth auxiliary down-sampling feature map through an auxiliary feature association sub-network of the mth auxiliary coding group, and fusing the mth auxiliary down-sampling feature map with the mth auxiliary down-sampling feature map subjected to full connection operation to obtain the mth auxiliary coding feature map.

6. The method of claim 5, wherein the primary down-sampling sub-network of the primary coding network further outputs a primary intermediate feature map; the auxiliary down-sampling sub-network of the auxiliary coding group also outputs an auxiliary intermediate feature map;

if m is equal to 1, the step of performing downsampling operation based on the amplified first feature map through the auxiliary downsampling subnetwork of the 1 st auxiliary encoding group to obtain a 1 st auxiliary downsampling feature map includes:

splicing the main intermediate characteristic diagram and the amplified first characteristic diagram to obtain a 1 st spliced characteristic diagram;

performing downsampling operation on the 1 st splicing feature map through an auxiliary downsampling subnetwork of the 1 st auxiliary coding group to obtain a 1 st auxiliary downsampling feature map;

if m is larger than 1, the step of performing downsampling operation based on the amplified (m-1) th auxiliary coding feature map through the auxiliary downsampling subnetwork of the mth auxiliary coding group to obtain the mth auxiliary downsampling feature map comprises the following steps:

splicing the auxiliary intermediate characteristic graph output by the m-1 auxiliary coding group with the amplified m-1 auxiliary coding characteristic graph to obtain an m-th spliced characteristic graph;

performing downsampling operation on the mth splicing feature map through an auxiliary downsampling subnetwork of the mth auxiliary coding group to obtain an mth auxiliary downsampling feature map; wherein the values of M are taken from 2 to M in sequence.

7. The method of claim 6, wherein the secondary downsampling subnetwork comprises one or more secondary convolution groups; the auxiliary convolution groups are connected in sequence, each auxiliary convolution group is used for reducing the characteristic diagram input to the auxiliary convolution group to a specified characteristic dimension, and the specified characteristic dimensions corresponding to different auxiliary convolution groups are different; and each said auxiliary convolution group includes a plurality of convolution layers;

8. The method of claim 7, wherein the number of primary convolution groups included in the primary downsampling subnetwork and the number of secondary convolution groups included in each of the secondary encoding groups are both N; wherein N is a natural number not less than 1;

the step of performing downsampling operation on the 1 st splicing feature map through the auxiliary downsampling subnetwork of the 1 st auxiliary coding group to obtain the 1 st auxiliary downsampling feature map includes:

9. The method according to claim 7, wherein the step of downsampling the mth concatenated feature map through the mth auxiliary encoding group auxiliary downsampling subnetwork to obtain the mth auxiliary downsampled feature map comprises:

10. The method of claim 1, wherein the decoding network comprises a convergence sub-network and a decoding sub-network;

the fusion sub-network is used for amplifying the first feature map and the second feature map to a specified size, and fusing the amplified first feature map and the amplified second feature map to obtain a first fusion feature map;

the decoding sub-network is used for decoding the first fusion characteristic graph to obtain a segmentation result of the target image.

11. The method of claim 10, wherein the convergence subnetwork comprises a plurality of upsampling layers and bitwise addition layers;

the input of the up-sampling layer is the first characteristic diagram or the second characteristic diagram; the inputs of different upsampling layers are different;

each up-sampling layer is used for amplifying the characteristic diagram input to the up-sampling layer to a specified size to obtain an amplified first characteristic diagram or an amplified second characteristic diagram;

and the bitwise addition operation layer is used for performing bitwise addition operation on the amplified first characteristic diagram and the amplified second characteristic diagram to obtain the first fused characteristic diagram.

12. The method according to claim 8, wherein the step of merging the first feature map and the second feature map through the decoding network to obtain a first merged feature map comprises:

inputting the output characteristic graph of the first main convolution group and the output characteristic graph of the first auxiliary convolution group in each auxiliary encoding group into the decoding network;

and fusing the output characteristic diagram of the first main convolution group, the output characteristic diagram of the first auxiliary convolution group in each auxiliary encoding group, the first characteristic diagram and the second characteristic diagram through the decoding network to obtain a first fused characteristic diagram.

13. An image segmentation apparatus, comprising:

the target image acquisition module is used for acquiring a target image to be segmented;

the main coding module is used for inputting the target image into a main coding network and coding the target image through the main coding network to obtain a first characteristic diagram;

the size amplifying module is used for amplifying the size of the first characteristic diagram by preset times to obtain an amplified first characteristic diagram;

the auxiliary coding module is used for inputting the amplified first characteristic diagram into an auxiliary coding network, and coding the amplified first characteristic diagram through the auxiliary coding network to obtain a second characteristic diagram;

and the decoding module is used for inputting the first feature map and the second feature map into a decoding network, fusing the first feature map and the second feature map through the decoding network to obtain a first fused feature map, and decoding the first fused feature map to obtain a segmentation result of the target image.

14. An image segmentation system, characterized in that the system comprises: the device comprises an image acquisition device, a processor and a storage device;

the image acquisition device is used for acquiring a target image;

the storage device has stored thereon a computer program which, when executed by the processor, performs the method of any of claims 1 to 12.

15. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of the preceding claims 1 to 12.