CN114494442A

CN114494442A - Image processing method, device and equipment

Info

Publication number: CN114494442A
Application number: CN202210339991.0A
Authority: CN
Inventors: 周波; 田欣兴; 苗瑞; 邹小刚; 梁书玉
Original assignee: Shenzhen HQVT Technology Co Ltd
Current assignee: Shenzhen HQVT Technology Co Ltd
Priority date: 2022-04-02
Filing date: 2022-04-02
Publication date: 2022-05-13

Abstract

The application provides an image processing method, an image processing device and image processing equipment, which relate to the image processing technology, and the method comprises the following steps: acquiring a plurality of images to be identified; according to a coding algorithm in a preset mixed U-Net network algorithm, performing two-dimensional convolution processing and three-dimensional convolution processing on an image to be recognized to obtain a preliminary characteristic map; the hybrid U-Net network algorithm comprises a coding algorithm and a decoding algorithm, wherein the coding algorithm and the decoding algorithm are used for performing convolution processing on the image on the two-dimensional convolution layer and the three-dimensional convolution layer through a residual learning network; and decoding the preliminary characteristic graph according to a decoding algorithm to obtain a target characteristic graph. The method improves the segmentation precision of the network and solves the technical problem that the position accuracy of the identified focus in the medical image is low.

Description

Image processing method, device and equipment

Technical Field

The present application relates to image processing technologies, and in particular, to an image processing method, an image processing apparatus, and an image processing device.

Background

Currently, in order to identify the lesion position in a medical image, the medical image needs to be identified.

In the prior art, when identifying a lesion position in a medical image, a texture feature-based medical image segmentation method may be used, where the texture feature-based medical image segmentation method is to count a plurality of pixel points in an image or an image region according to a texture distribution feature of a target corresponding to the image or the image region, and further calculate a pixel distribution state of the image.

However, in the prior art, since the texture characteristics reflect only the local distribution characteristics of the target object, the image information of a higher level cannot be obtained only by using the texture characteristics, and particularly, when the image includes a plurality of texture features, the over-segmentation is easily caused, so that the accuracy of the lesion position in the identified medical image is low.

Disclosure of Invention

The application provides an image processing method, device and equipment, which are used for solving the technical problem that the accuracy of a focus position in an identified medical image is low.

In a first aspect, the present application provides an image processing method, comprising

Acquiring a plurality of images to be identified;

according to a coding algorithm in a preset mixed U-Net network algorithm, performing two-dimensional convolution processing and three-dimensional convolution processing on the image to be recognized to obtain a primary characteristic diagram; the hybrid U-Net network algorithm comprises an encoding algorithm and a decoding algorithm, wherein the encoding algorithm and the decoding algorithm are used for performing convolution processing on the image on the two-dimensional convolution layer and the three-dimensional convolution layer through a residual learning network;

decoding the preliminary characteristic graph according to the decoding algorithm to obtain a target characteristic graph; the target feature map represents a plurality of pixel points obtained through decoding, and the pixel points comprise pixel points occupied by target positions and pixel points occupied by backgrounds except the target positions.

Further, according to a coding algorithm in a preset hybrid U-Net network algorithm, performing two-dimensional convolution processing and three-dimensional convolution processing on the image to be recognized to obtain a preliminary characteristic diagram, including:

according to a residual learning network of a coding algorithm in a preset mixed U-Net network algorithm, performing two-dimensional convolution processing on the image to be identified in a two-dimensional convolution layer to obtain a two-dimensional characteristic diagram; wherein the two-dimensional convolutional layer of the coding algorithm comprises 2 layers;

performing three-dimensional convolution processing on the two-dimensional characteristic diagram on the three-dimensional convolution layer to obtain a preliminary characteristic diagram; wherein the three-dimensional convolutional layer of the coding algorithm comprises 3 layers.

Further, decoding the preliminary feature map according to the decoding algorithm to obtain a target feature map, including:

performing three-dimensional convolution processing on the preliminary feature map at a three-dimensional convolution layer according to a residual learning network in the decoding algorithm to obtain a three-dimensional feature image; wherein the three-dimensional convolutional layer in the decoding algorithm comprises 2 layers;

performing two-dimensional convolution processing on the three-dimensional characteristic diagram on the two-dimensional convolution layer to obtain a convolution characteristic diagram; wherein, the two-dimensional convolution layer in the decoding algorithm comprises 2 layers;

performing convolution processing on the convolution characteristic graph by using a preset normalization index function to obtain a target characteristic graph; the target feature map represents a plurality of pixel points obtained through decoding, and the pixel points comprise pixel points occupied by target positions and pixel points occupied by backgrounds except the target positions.

Further, after performing two-dimensional convolution processing and three-dimensional convolution processing on the image to be recognized according to a coding algorithm in a preset hybrid U-Net network algorithm to obtain a preliminary feature map, the method further includes:

performing hole convolution processing on the preliminary feature map according to a preset hole convolution algorithm to obtain a first feature map; the receptive field of the first characteristic diagram is larger than that of the preliminary characteristic diagram, and the receptive field represents a mapping area of pixel points of the local characteristic diagram in the first characteristic diagram on the image to be identified.

Further, the method further comprises:

performing 1 × 1 convolution operation processing on the first feature maps respectively according to a preset spatial position attention algorithm to obtain a plurality of second feature maps;

performing size reshaping processing, dimension transformation processing and multiplication processing on the plurality of second feature maps to obtain a first channel attention heat map;

and summing the first channel attention heat map and the first feature map to obtain a third feature map.

Further, the method further comprises:

according to a preset channel attention algorithm, performing size remodeling processing, dimension transformation processing and multiplication processing on the first feature map to obtain a second channel attention heat map;

and summing the second channel attention heat map and the first feature map to obtain a fourth feature map.

Further, the method further comprises:

performing residual error connection processing on the third characteristic diagram and the fourth characteristic diagram, and processing through a preset linear rectification function to obtain a fifth characteristic diagram; wherein the fifth feature map characterizes a preliminary feature map that has undergone multiple convolutions.

In a second aspect, the present application provides an image processing apparatus comprising:

the device comprises a first acquisition unit, a second acquisition unit and a recognition unit, wherein the first acquisition unit is used for acquiring a plurality of images to be recognized;

the first convolution unit is used for performing two-dimensional convolution processing and three-dimensional convolution processing on the image to be identified according to a coding algorithm in a preset mixed U-Net network algorithm to obtain a preliminary characteristic map; the hybrid U-Net network algorithm comprises an encoding algorithm and a decoding algorithm, wherein the encoding algorithm and the decoding algorithm are used for performing convolution processing on the image on the two-dimensional convolution layer and the three-dimensional convolution layer through a residual learning network;

the decoding unit is used for decoding the preliminary characteristic graph according to the decoding algorithm to obtain a target characteristic graph; the target feature map represents a plurality of pixel points obtained through decoding, and the pixel points comprise pixel points occupied by target positions and pixel points occupied by backgrounds except the target positions.

Further, the first convolution unit includes:

the first convolution module is used for performing two-dimensional convolution processing on the image to be identified in a two-dimensional convolution layer according to a residual learning network of a coding algorithm in a preset mixed U-Net network algorithm to obtain a two-dimensional characteristic diagram; wherein the two-dimensional convolutional layer of the coding algorithm comprises 2 layers;

the second convolution module is used for performing three-dimensional convolution processing on the two-dimensional characteristic diagram on the three-dimensional convolution layer to obtain a preliminary characteristic diagram; wherein the three-dimensional convolutional layer of the coding algorithm comprises 3 layers.

Further, the decoding unit includes:

the third convolution module is used for performing three-dimensional convolution processing on the preliminary characteristic graph in a three-dimensional convolution layer according to the residual learning network in the decoding algorithm to obtain a three-dimensional characteristic image; wherein, the three-dimensional convolution layer in the decoding algorithm comprises 2 layers;

the fourth convolution module is used for performing two-dimensional convolution processing on the three-dimensional characteristic graph on the two-dimensional convolution layer to obtain a convolution characteristic graph; wherein, the two-dimensional convolution layer in the decoding algorithm comprises 2 layers;

the fifth convolution module is used for carrying out convolution processing on the convolution characteristic graph by utilizing a preset normalization index function to obtain a target characteristic graph; the target feature map represents a plurality of pixel points obtained through decoding, and the pixel points comprise pixel points occupied by target positions and pixel points occupied by backgrounds except the target positions.

Further, the apparatus further comprises:

the second convolution unit is used for performing two-dimensional convolution processing and three-dimensional convolution processing on the image to be identified according to a coding algorithm in a preset mixed U-Net network algorithm to obtain a preliminary feature map, and then performing hole convolution processing on the preliminary feature map according to a preset hole convolution algorithm to obtain a first feature map; the receptive field of the first characteristic diagram is larger than that of the preliminary characteristic diagram, and the receptive field represents a mapping area of pixel points of the local characteristic diagram in the first characteristic diagram on the image to be identified.

Further, the apparatus further comprises:

the third convolution unit is used for respectively carrying out 1 multiplied by 1 convolution operation processing on the first feature map according to a preset spatial position attention algorithm to obtain a plurality of second feature maps;

the first processing unit is used for performing size reshaping processing, dimension transformation processing and multiplication processing on the plurality of second feature maps to obtain a first channel attention heat map;

and the second processing unit is used for summing the first channel attention heat map and the first feature map to obtain a third feature map.

Further, the apparatus further comprises:

the second acquisition unit is used for performing size remodeling processing, dimension transformation processing and multiplication processing on the first feature map according to a preset channel attention algorithm to obtain a second channel attention heat map;

and the third processing unit is used for summing the second channel attention heat map and the first feature map to obtain a fourth feature map.

Further, the apparatus further comprises:

the residual error unit is used for performing residual error connection processing on the third characteristic diagram and the fourth characteristic diagram and processing the third characteristic diagram and the fourth characteristic diagram through a preset linear rectification function to obtain a fifth characteristic diagram; wherein the fifth feature map characterizes a preliminary feature map that has undergone multiple convolutions.

In a third aspect, the present application provides an electronic device, comprising a memory and a processor, wherein the memory stores a computer program operable on the processor, and the processor implements the method of the first aspect when executing the computer program.

In a fourth aspect, the present application provides a computer-readable storage medium having stored thereon computer-executable instructions for implementing the method of the first aspect when executed by a processor.

In a fifth aspect, the present application provides a computer program product comprising a computer program which, when executed by a processor, implements the method of the first aspect.

The application provides an image processing method, device and equipment, which are used for acquiring a plurality of images to be identified;

according to a coding algorithm in a preset mixed U-Net network algorithm, performing two-dimensional convolution processing and three-dimensional convolution processing on an image to be recognized to obtain a preliminary characteristic map; the hybrid U-Net network algorithm comprises a coding algorithm and a decoding algorithm, wherein the coding algorithm and the decoding algorithm are used for carrying out convolution processing on the image on the two-dimensional convolution layer and the three-dimensional convolution layer through a residual learning network; decoding the preliminary characteristic graph according to a decoding algorithm to obtain a target characteristic graph; the target characteristic graph represents a plurality of pixel points obtained through decoding processing, and the pixel points comprise pixel points occupied by target positions and pixel points occupied by backgrounds except the target positions. According to the scheme, according to a residual learning network of a coding algorithm in a preset mixed U-Net network algorithm, two-dimensional convolution processing is sequentially carried out on an image to be recognized in a two-dimensional convolution layer and three-dimensional convolution processing is sequentially carried out on a three-dimensional convolution layer to obtain a preliminary feature map, and finally decoding processing is carried out on the preliminary feature map according to the residual learning network in a decoding algorithm to obtain a target feature map. Therefore, when the image to be recognized is recognized, the two-dimensional convolution operation and the three-dimensional convolution operation are carried out through the residual learning network, so that the shallow information and the deep information of the image to be recognized can be extracted, the problems of gradient disappearance and overfitting in the training process of the liver and the tumor thereof are prevented, and better information of the target position can be obtained, so that the network segmentation precision is improved, and the technical problem of lower accuracy of the focus position in the recognized medical image is realized.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and together with the description, serve to explain the principles of the disclosure.

Fig. 1 is a schematic flowchart of an image processing method according to an embodiment of the present application;

fig. 2 is a schematic flowchart of another image processing method according to an embodiment of the present application;

fig. 3 is a schematic structural diagram of a hybrid U-Net network algorithm provided in an embodiment of the present application;

fig. 4 is a schematic flowchart of another image processing method according to an embodiment of the present application;

fig. 5 is a scene schematic diagram of a hole convolution algorithm according to an embodiment of the present application;

FIG. 6 is a schematic view of a spatial attention algorithm provided in an embodiment of the present application;

fig. 7 is a schematic view of a scenario of a channel attention algorithm provided in an embodiment of the present application;

fig. 8 is a schematic view of a scenario of a residual double attention module according to an embodiment of the present disclosure;

fig. 9 is a schematic flowchart of another image processing method according to an embodiment of the present application;

fig. 10 is a schematic structural diagram of an image processing apparatus according to an embodiment of the present application;

fig. 11 is a schematic structural diagram of another image processing apparatus according to an embodiment of the present application;

fig. 12 is a schematic structural diagram of an electronic device according to an embodiment of the present application;

fig. 13 is a block diagram of an electronic device according to an embodiment of the present application.

With the foregoing drawings in mind, certain embodiments of the disclosure have been shown and described in more detail below. These drawings and written description are not intended to limit the scope of the disclosed concepts in any way, but rather to illustrate the concepts of the disclosure to those skilled in the art by reference to specific embodiments.

Detailed Description

Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The implementations described in the exemplary embodiments below are not intended to represent all implementations consistent with the present disclosure.

In one example, in order to identify a lesion location in a medical image, the medical image needs to be identified. In the prior art, when identifying a lesion position in a medical image, a texture feature-based medical image segmentation method may be used, where the texture feature-based medical image segmentation method is to count a plurality of pixel points in an image or an image region according to a texture distribution feature of a target corresponding to the image or the image region, and further calculate a pixel distribution state of the image. However, in the prior art, since the texture characteristics reflect only the local distribution characteristics of the target object, the image information of a higher level cannot be obtained only by using the texture characteristics, and particularly, when the image includes a plurality of texture features, the over-segmentation is easily caused, so that the accuracy of the lesion position in the identified medical image is low.

The application provides an image processing method, an image processing device and image processing equipment, and aims to solve the above technical problems in the prior art.

The following describes the technical solutions of the present application and how to solve the above technical problems with specific embodiments. The following several specific embodiments may be combined with each other, and details of the same or similar concepts or processes may not be repeated in some embodiments. Embodiments of the present application will be described below with reference to the accompanying drawings.

Fig. 1 is a schematic flowchart of an image processing method according to an embodiment of the present application, and as shown in fig. 1, the method includes:

101. a plurality of images to be recognized are acquired.

For example, the execution subject of this embodiment may be an electronic device, or a terminal device, or an image processing apparatus or device, or other apparatuses or devices that can execute this embodiment, which is not limited in this respect. In this embodiment, an execution main body is described as an electronic device.

First, the electronic device needs to acquire a plurality of images to be recognized. Exemplarily, fig. 2 is a schematic flowchart of another image processing method provided in an embodiment of the present application, and as can be seen from fig. 2, a first step is to adjust a window, and set all gray values of an original image, for example, a liver CT image, which are less than-100 to-100, that is, all gray ranges below-100 are set to black, all gray values of a liver CT image, which are greater than 400, are set to 400, and all gray ranges above 400 are set to white, so as to enhance the degree of black-white contrast of the liver and its tumor; secondly, enhancing the image, and achieving the purpose of enhancing the image through operations of denoising, equalization, normalization and the like of the image; the third step is the cutting of the image, because the original image is 512 x 512, the pixel is too big, it is unfavorable for the network training, cut the image to 256 x 256; and fourthly, sending the processed images into a trained mixed U-Net network for training, and obtaining predicted images of the liver and the liver tumor through multiple times of training. Therefore, the electronic device needs to acquire a plurality of images to be recognized, which are obtained by cutting the original image in the first step, the second step and the third step.

102. According to a coding algorithm in a preset mixed U-Net network algorithm, performing two-dimensional convolution processing and three-dimensional convolution processing on an image to be recognized to obtain a preliminary characteristic map; the hybrid U-Net network algorithm comprises an encoding algorithm and a decoding algorithm, and the encoding algorithm and the decoding algorithm are used for performing convolution processing on the images on the two-dimensional convolution layer and the three-dimensional convolution layer through the residual learning network.

Illustratively, the U-Net network is an algorithm for semantic segmentation by using a full convolution network, and the hybrid U-Net network algorithm is improved based on the U-Net network and is a classic left half part coding structure and a right half part decoding structure. Fig. 3 is a schematic structural diagram of a hybrid U-Net network algorithm provided in an embodiment of the present application, and as can be seen from fig. 3, a coding algorithm of a coding part includes five layers of convolutions, where two-dimensional convolution layers L1 and L2 are both two-dimensional convolution operations, three-dimensional convolution layers L3, L4 and L5 are all three-dimensional convolution operations, and the five layers of convolutions introduce a residual learning network; the first two layers of the decoding part and the encoding part which are symmetrical are three-dimensional convolution operation, the second two layers are two-dimensional convolution operation, the last layer is convolution and normalization index function (Softmax function) operation, and the five-layer convolution also introduces a residual learning network, because the resolution in the medical image plane is about 4 times of the resolution between the layers, and the resolution in the plane and the resolution between the layers are the same through the two-layer two-dimensional convolution operation, and then the three-dimensional convolution can be used for operation. A plurality of continuous adjacent two-dimensional images are input into a mixed U-Net Network algorithm, firstly, shallow information of the images is sequentially extracted from two-dimensional Convolution layers L1 and L2 through a two-dimensional Convolution neural Network, the position information of the liver and tumors of the liver is determined, then, the roughly divided images are stacked according to the original sequence, then, the images are input into a three-dimensional Convolution neural Network, and the deep information of the images is sequentially extracted from three-dimensional Convolution layers L3, L4 and L5, so that the full Convolution neural Network (FCN) can obtain better position information, and therefore the dividing precision of the Network is improved.

103. Decoding the preliminary characteristic graph according to a decoding algorithm to obtain a target characteristic graph; the target characteristic graph represents a plurality of pixel points obtained through decoding processing, and the pixel points comprise pixel points occupied by target positions and pixel points occupied by backgrounds except the target positions.

Exemplarily, the electronic device sequentially performs three-dimensional convolution processing on the preliminary feature map at two three-dimensional convolution layers according to a residual learning network in a decoding algorithm to obtain a three-dimensional feature image, sequentially performs two-dimensional convolution processing on the three-dimensional feature map at two-dimensional convolution layers to obtain a convolution feature map, and finally performs convolution processing on the convolution feature map by using a preset Softmax function to obtain a target feature map, wherein the target feature map characterizes a plurality of pixel points obtained through decoding processing, and the plurality of pixel points include pixel points occupied by a target position and pixel points occupied by a background except the target position, so that the target position can be determined.

In the embodiment of the application, a plurality of images to be identified are obtained. According to a coding algorithm in a preset mixed U-Net network algorithm, performing two-dimensional convolution processing and three-dimensional convolution processing on an image to be recognized to obtain a preliminary characteristic map; the hybrid U-Net network algorithm comprises an encoding algorithm and a decoding algorithm, and the encoding algorithm and the decoding algorithm are used for performing convolution processing on the image on the two-dimensional convolution layer and the three-dimensional convolution layer through the residual learning network. Decoding the preliminary characteristic graph according to a decoding algorithm to obtain a target characteristic graph; the target characteristic graph represents a plurality of pixel points obtained through decoding processing, and the pixel points comprise pixel points occupied by target positions and pixel points occupied by backgrounds except the target positions. According to the scheme, according to a residual learning network of a coding algorithm in a preset mixed U-Net network algorithm, two-dimensional convolution processing is sequentially carried out on an image to be recognized in a two-dimensional convolution layer and three-dimensional convolution processing is sequentially carried out on a three-dimensional convolution layer to obtain a preliminary feature map, and finally decoding processing is carried out on the preliminary feature map according to the residual learning network in a decoding algorithm to obtain a target feature map. Therefore, when the image to be recognized is recognized, the two-dimensional convolution operation and the three-dimensional convolution operation are carried out through the residual learning network, so that the shallow information and the deep information of the image to be recognized can be extracted, the problems of gradient disappearance and overfitting in the training process of the liver and the tumor thereof are prevented, and better information of the target position can be obtained, so that the network segmentation precision is improved, and the technical problem of lower accuracy of the focus position in the recognized medical image is realized.

Fig. 4 is a schematic flowchart of another image processing method according to an embodiment of the present application, and as shown in fig. 4, the method includes:

201. a plurality of images to be recognized are acquired.

For example, this step may refer to step 101 in fig. 1, and is not described again.

202. According to a residual learning network of a coding algorithm in a preset mixed U-Net network algorithm, performing two-dimensional convolution processing on an image to be recognized in a two-dimensional convolution layer to obtain a two-dimensional characteristic map; wherein, the two-dimensional convolution layer of the coding algorithm comprises 2 layers.

For example, the electronic device may sequentially perform two-dimensional convolution processing on each image to be recognized in the two-dimensional convolution layers L1 and L2 according to a residual learning network of a coding algorithm in a preset hybrid U-Net network algorithm to obtain a two-dimensional feature map, and may further determine position information of the liver and a tumor thereof according to the two-dimensional feature map, and finally, may perform image stitching on each two-dimensional feature map subjected to coarse segmentation according to a composition sequence of the original image to form a complete image.

203. Performing three-dimensional convolution processing on the two-dimensional characteristic map on the three-dimensional convolution layer to obtain a primary characteristic map; wherein, the three-dimensional convolution layer of the coding algorithm comprises 3 layers.

For example, the electronic device may perform three-dimensional convolution processing on the two-dimensional feature map in the three-dimensional convolution layers L3, L4, and L5 in sequence, extract information at deep levels in the two-dimensional feature map, and enable the full convolution neural network to obtain better position information.

204. Performing hole convolution processing on the initial characteristic graph according to a preset hole convolution algorithm to obtain a first characteristic graph; the receptive field of the first characteristic diagram is larger than that of the preliminary characteristic diagram, and the receptive field represents a mapping area of pixel points of the local characteristic diagram in the first characteristic diagram on the image to be identified.

Exemplarily, as shown in fig. 5, fig. 5 is a scene schematic diagram of a hole convolution algorithm provided in an embodiment of the present application, and as can be known from fig. 5, the hole convolution algorithm includes five different expansion rates combined into a hole convolution branch including an original feature map, where the original feature map is an image to be recognized before two-dimensional convolution. The electronic equipment can combine to obtain a cavity convolution branch according to five different expansion rates in a cavity convolution algorithm, the cavity convolution processing is carried out on the primary characteristic graph to obtain a first characteristic graph, the perception field of image characteristic extraction is improved by increasing the cavity convolution under the condition that the resolution of an image is not damaged, the loss of the resolution of image information in the process of deep layer convolution downsampling in the front is made up, and the global information of the image is increased. The expansion rate parameter can be used for increasing the receptive field of the image, and has good effect on the detection of the tumor and the improvement of the segmentation precision. The receptive field is the mapping area size of the pixel points of the local characteristic diagram on the image to be identified after the convolution operation processing is carried out through the specified convolution kernel, and the formulas of the convolution kernel and the receptive field are as follows:

wherein the content of the first and second substances,ksizeis the size of the convolution kernel in the first two-dimensional convolution layer L1,r1 is the size of the receptive field of the void convolution kernel,dis a diagnosis rate: (d-1) the size is the number of filled spaces,strideis the step size of the convolution operation,RFithe receptive field of the upper layer is the receptive field of the upper layer,RFi+1 being current

And (4) receptive field.

After step 204, step 205 and step 206 are parallel schemes, and step 205 and step 206 are as follows:

205. respectively carrying out 1 × 1 convolution operation processing on the first feature map according to a preset spatial position attention algorithm to obtain a plurality of second feature maps; performing size reshaping processing, dimension transformation processing and multiplication processing on the plurality of second feature maps to obtain a first channel attention heat map; and summing the first channel attention heat map and the first feature map to obtain a third feature map.

Exemplarily, as shown in fig. 6, fig. 6 is a scene schematic diagram of a spatial location attention algorithm provided in an embodiment of the present application, and as can be seen from fig. 6, an electronic device first reshapes a size of a first feature map a to C × N, where N is D × H × W; the first characteristic diagram A respectively carries out three 1 multiplied by 1 convolution operations to obtain B, C, D second characteristic diagrams respectively; the second characteristic diagram B is subjected to size reshaping and dimension transformation to obtain E, the size is changed from original C multiplied by D multiplied by H multiplied by W to N multiplied by C, wherein N is D multiplied by H multiplied by W; e, multiplying the result of the second characteristic diagram C after size reshaping by the second characteristic diagram C, and obtaining a space supervision diagram S through a softmax function, wherein the size of the space supervision diagram S is NxN; and S and the result of size reshaping of the second characteristic diagram D are multiplied to obtain a first channel attention heat map, the first channel attention heat map is multiplied by a scale coefficient alpha, the first channel attention heat map is subjected to size reshaping to restore the original size of the first channel attention heat map, the first channel attention heat map is subjected to size reshaping, and finally the first channel attention heat map and the first characteristic diagram A are added to obtain the output of a spatial position attention algorithm, namely a third characteristic diagram G.

The elements of the S matrix are as follows:

wherein, B_iAn element of B, C_iIs an element of C, S_jiThe influence of the pixel at the ith position on the pixel at the jth position in the feature map is measured, and the more similar the feature representations of the two positions are, the more they are

The greater the relevance of (a), and conversely, the smaller the relevance.

The formula of the spatial locality attention algorithm is as follows:

wherein the elements of the S matrix are S_jiAlpha is a scale factor, initialized to 0, and gradually more weight is distributed in the training process, D_iAn element of D, A_jIs an element of A. As can be seen from the output formula of the spatial locality attention algorithm, E_jThe sum of the features of all positions and the first feature map A is obtained, therefore, the spatial position attention algorithm contains context information of the image, the features of key positions are highlighted according to selective aggregation of the context of the spatial position feature map of the image, the segmentation accuracy of the image is improved, 1 × 1 convolution operation is added after the convolution of branch holes, the nonlinear characteristic can be added on the premise that the feature scale is kept unchanged, and the network expresses more complex features.

206. According to a preset channel attention algorithm, performing size remodeling processing, dimension transformation processing and multiplication processing on the first feature map to obtain a second channel attention heat map; and summing the second channel attention heat map and the first feature map to obtain a fourth feature map.

Exemplarily, each convolution layer includes a plurality of filters, and channel information in a local receptive field region of an image can be learned through the convolution filters, as shown in fig. 7, fig. 7 is a scene schematic diagram of a channel attention algorithm provided in an embodiment of the present application, for convolution of a two-dimensional image and obtaining a two-dimensional feature map, two-dimensional parameter information in the two-dimensional feature map is obtained, where the two-dimensional parameter information includes a length, a width, and channel pixel points of the image; and for the three-dimensional image, obtaining a preliminary characteristic diagram, and obtaining three-dimensional parameter information in the preliminary characteristic diagram, wherein the three-dimensional parameter information comprises the length, the width, the height and the channel of the image. By adding a channel attention algorithm, the correlation characteristics among all channel maps can be integrated, and the integration process comprises the following steps: learning the weight of each channel, performing feature recombination according to the specific gravity of each channel in the feature map, performing global down-sampling, convolution operation and activation function (such as softmax function) processing, obtaining a second channel attention heat map by encoding the channel features, and multiplying the feature map subjected to two-dimensional convolution processing and the feature map subjected to three-dimensional convolution processing by two elements of the second channel attention heat map, so that the dependency relationship among all channels can be integrated.

Unlike the spatial attention mechanism, the channel attention mechanism operates as follows: firstly, reshaping the size of the first characteristic diagram A into C multiplied by N, wherein N is D multiplied by H multiplied by W; performing size reshaping on the first characteristic diagram A to obtain B, wherein the size of B is NxC, and B is subjected to a softmax function to obtain X, and the size of B is CxC; and multiplying the result B of the first characteristic diagram A through size reshaping by the X to obtain a second channel attention heat map, multiplying the second channel attention heat map by a scale coefficient beta, recovering the second channel attention heat map to a preset size through size reshaping, and finally adding the second channel attention heat map and the first characteristic diagram A to obtain the output of a channel attention algorithm, namely a fourth characteristic diagram E.

The elements of the X matrix are as follows:

wherein A is_i、A_jIs an element of A, X_jiMeasured is the ith in the characteristic diagram^thJ channel pair^thThe more similar the feature representations of two positions are, the greater their relevance, and vice versa, the smaller the relevance is.

The formula for the channel attention algorithm is as follows:

wherein the content of the first and second substances,

for the scale factor, initialized to 0, more weights are assigned step by step during the training process, A_jIs an element of A, X_jiThe elements of matrix X. As can be seen from the output formula of the channel attention algorithm, E_jThe method is the sum of the characteristics of all channels and the original characteristics A, so that the dependency relationship of the channels is established, and the distinguishing performance of the characteristics is improved.

207. Performing residual error connection processing on the third characteristic diagram and the fourth characteristic diagram, and processing through a preset linear rectification function to obtain a fifth characteristic diagram; and the fifth feature map represents the preliminary feature map after multiple convolutions.

Exemplarily, as shown in fig. 8, fig. 8 is a scene schematic diagram of a residual double attention module according to an embodiment of the present application, and as can be seen from fig. 8, a spatial attention algorithm and a channel attention algorithm are connected in parallel, the spatial attention algorithm outputs a third feature map U, the channel attention algorithm outputs a fourth feature map V, and the electronic device may perform residual connection processing on the third feature map U and the fourth feature map V, and perform processing through a preset Linear rectification function (ReLU) to obtain a fifth feature map, that is, a preliminary feature map after multiple convolutions.

208. Performing three-dimensional convolution processing on the preliminary feature map at the three-dimensional convolution layer according to a residual learning network in a decoding algorithm to obtain a three-dimensional feature image; wherein, the three-dimensional convolution layer in the decoding algorithm comprises 2 layers.

Exemplarily, the electronic device may sequentially perform three-dimensional convolution processing on the preliminary feature map at 2 three-dimensional convolution layers according to a residual learning network in a decoding algorithm to obtain a three-dimensional feature image.

209. Performing two-dimensional convolution processing on the three-dimensional characteristic graph on the two-dimensional convolution layer to obtain a convolution characteristic graph; wherein, the two-dimensional convolution layer in the decoding algorithm comprises 2 layers.

For example, the electronic device may perform two-dimensional convolution processing on the three-dimensional feature map in 2 two-dimensional convolution layers in sequence to obtain a convolution feature map.

210. Performing convolution processing on the convolution characteristic graph by using a preset normalization index function to obtain a target characteristic graph; the target characteristic graph represents a plurality of pixel points obtained through decoding processing, and the pixel points comprise pixel points occupied by target positions and pixel points occupied by backgrounds except the target positions.

Exemplarily, the electronic device may perform convolution processing on the convolution characteristic graph by using a preset softmax function to obtain a target characteristic graph, the target characteristic graph characterizes a plurality of pixel points obtained through decoding processing, the plurality of pixel points include pixel points occupied by a target position and pixel points occupied by a background other than the target position, and then the target position may be displayed according to the pixel points.

In the embodiment of the application, a plurality of images to be identified are obtained. According to a residual learning network of a coding algorithm in a preset mixed U-Net network algorithm, performing two-dimensional convolution processing on an image to be recognized in a two-dimensional convolution layer to obtain a two-dimensional characteristic map; wherein, the two-dimensional convolution layer of the coding algorithm comprises 2 layers. Performing three-dimensional convolution processing on the two-dimensional characteristic diagram in the three-dimensional convolution layer to obtain a preliminary characteristic diagram; wherein, the three-dimensional convolution layer of the coding algorithm comprises 3 layers. Performing hole convolution processing on the initial characteristic graph according to a preset hole convolution algorithm to obtain a first characteristic graph; the receptive field of the first characteristic diagram is larger than that of the preliminary characteristic diagram, and the receptive field represents a mapping area of pixel points of the local characteristic diagram in the first characteristic diagram on the image to be identified. Respectively carrying out 1 × 1 convolution operation processing on the first feature map according to a preset spatial position attention algorithm to obtain a plurality of second feature maps; performing size reshaping processing, dimension transformation processing and multiplication processing on the plurality of second feature maps to obtain a first channel attention heat map; and summing the first channel attention heat map and the first feature map to obtain a third feature map. According to a preset channel attention algorithm, performing size remodeling processing, dimension transformation processing and multiplication processing on the first feature map to obtain a second channel attention heat map; and summing the second channel attention heat map and the first feature map to obtain a fourth feature map. And performing residual error connection processing on the third characteristic diagram and the fourth characteristic diagram, and processing through a preset linear rectification function to obtain a fifth characteristic diagram. Performing three-dimensional convolution processing on the preliminary feature map at the three-dimensional convolution layer according to a residual learning network in a decoding algorithm to obtain a three-dimensional feature image; wherein, the three-dimensional convolution layer in the decoding algorithm comprises 2 layers. Performing two-dimensional convolution processing on the three-dimensional characteristic graph on the two-dimensional convolution layer to obtain a convolution characteristic graph; wherein, the two-dimensional convolution layer in the decoding algorithm comprises 2 layers. Performing convolution processing on the convolution characteristic graph by using a preset normalization index function to obtain a target characteristic graph; the target characteristic graph represents a plurality of pixel points obtained through decoding processing, and the pixel points comprise pixel points occupied by target positions and pixel points occupied by backgrounds except the target positions. Therefore, when the image to be recognized is recognized, the two-dimensional convolution operation and the three-dimensional convolution operation are carried out through the residual learning network, so that the shallow information and the deep information of the image to be recognized can be extracted, the problems of gradient disappearance and overfitting in the training process of the liver and the tumor thereof are prevented, and better information of the target position can be obtained, so that the network segmentation precision is improved, and the technical problem of lower accuracy of the focus position in the recognized medical image is realized. Moreover, the cavity convolution module is added at the output part of the two-dimensional convolution operation and the three-dimensional convolution operation, so that the characteristic loss can be reduced, the visual field perception domain of the liver and the tumor thereof can be increased, the target detection and classification precision of the liver and the tumor image thereof can be improved, the analysis capability of the target detection on the image background and the target distribution condition can be enhanced, and the target detection and classification precision of the liver and the tumor image thereof, which are different in size, complex in background and difficult in tumor boundary classification, can be improved; meanwhile, by introducing a spatial position attention algorithm and a channel attention algorithm, semantic information of the spatial position dimension and the channel dimension is correlated, the dependency relationship between the channel and the spatial position is enhanced, and further the feature expression capability of the network is enhanced, so that more accurate segmentation is realized.

Exemplarily, fig. 9 is a schematic flowchart of another image processing method provided in an embodiment of the present application.

Fig. 10 is a schematic structural diagram of an image processing apparatus according to an embodiment of the present application, and as shown in fig. 10, the apparatus includes:

a first acquiring unit 31 for acquiring a plurality of images to be recognized.

The first convolution unit 32 is configured to perform two-dimensional convolution processing and three-dimensional convolution processing on an image to be recognized according to a coding algorithm in a preset hybrid U-Net network algorithm to obtain a preliminary feature map; the hybrid U-Net network algorithm comprises an encoding algorithm and a decoding algorithm, and the encoding algorithm and the decoding algorithm are used for performing convolution processing on the images on the two-dimensional convolution layer and the three-dimensional convolution layer through the residual learning network.

The decoding unit 33 is configured to perform decoding processing on the preliminary feature map according to a decoding algorithm to obtain a target feature map; the target characteristic graph represents a plurality of pixel points obtained through decoding processing, and the pixel points comprise pixel points occupied by target positions and pixel points occupied by backgrounds except the target positions.

The apparatus of this embodiment may execute the technical solution in the method, and the specific implementation process and the technical principle are the same, which are not described herein again.

Fig. 11 is a schematic structural diagram of another image processing apparatus according to an embodiment of the present application, and based on the embodiment shown in fig. 10, as shown in fig. 11, the first convolution unit 32 includes:

the first convolution module 321 is configured to perform two-dimensional convolution processing on an image to be recognized in a two-dimensional convolution layer according to a residual learning network of a coding algorithm in a preset hybrid U-Net network algorithm to obtain a two-dimensional feature map; wherein, the two-dimensional convolution layer of the coding algorithm comprises 2 layers.

A second convolution module 322, configured to perform three-dimensional convolution processing on the two-dimensional feature map in the three-dimensional convolution layer to obtain a preliminary feature map; wherein, the three-dimensional convolution layer of the coding algorithm comprises 3 layers.

In one example, the decoding unit 33 includes:

the third convolution module 331 is configured to perform three-dimensional convolution processing on the preliminary feature map at the three-dimensional convolution layer according to a residual learning network in the decoding algorithm to obtain a three-dimensional feature image; wherein, the three-dimensional convolution layer in the decoding algorithm comprises 2 layers.

A fourth convolution module 332, configured to perform two-dimensional convolution processing on the three-dimensional feature map in the two-dimensional convolution layer to obtain a convolution feature map; wherein, the two-dimensional convolution layer in the decoding algorithm comprises 2 layers.

A fifth convolution module 333, configured to perform convolution processing on the convolution feature map by using a preset normalized exponential function to obtain a target feature map; the target characteristic graph represents a plurality of pixel points obtained through decoding processing, and the pixel points comprise pixel points occupied by target positions and pixel points occupied by backgrounds except the target positions.

In one example, the apparatus further comprises:

the second convolution unit 41 is configured to perform two-dimensional convolution processing and three-dimensional convolution processing on an image to be recognized according to a coding algorithm in a preset hybrid U-Net network algorithm to obtain a preliminary feature map, and then perform hole convolution processing on the preliminary feature map according to a preset hole convolution algorithm to obtain a first feature map; the receptive field of the first characteristic diagram is larger than that of the preliminary characteristic diagram, and the receptive field represents a mapping area of pixel points of the local characteristic diagram in the first characteristic diagram on the image to be identified.

In one example, the apparatus further comprises:

and a third convolution unit 42, configured to perform 1 × 1 convolution operation processing on the first feature maps respectively according to a preset spatial position attention algorithm, so as to obtain a plurality of second feature maps.

The first processing unit 43 is configured to perform a resizing process, a dimension transformation process, and a multiplication process on the plurality of second feature maps to obtain a first channel attention heat map.

And the second processing unit 44 is configured to sum the first channel attention heat map and the first feature map to obtain a third feature map.

In one example, the apparatus further comprises:

and a second obtaining unit 45, configured to perform size reshaping processing, dimension transformation processing, and multiplication processing on the first feature map according to a preset channel attention algorithm, so as to obtain a second channel attention heat map.

And the third processing unit 46 is configured to sum the second channel attention heat map and the first feature map to obtain a fourth feature map.

In one example, the apparatus further comprises:

a residual error unit 47, configured to perform residual error connection processing on the third feature map and the fourth feature map, and perform processing through a preset linear rectification function to obtain a fifth feature map; and the fifth feature map represents the preliminary feature map after multiple convolutions.

Fig. 12 is a schematic structural diagram of an electronic device according to an embodiment of the present application, and as shown in fig. 12, the electronic device includes: a memory 51, a processor 52;

the memory 51 stores a computer program that can be run on the processor 52.

The processor 52 is configured to perform the methods provided in the embodiments described above.

The electronic device further comprises a receiver 53 and a transmitter 54. The receiver 53 is used for receiving commands and data transmitted from an external device, and the transmitter 54 is used for transmitting commands and data to an external device.

Fig. 13 is a block diagram of an electronic device, which may be a mobile phone, a computer, a digital broadcast terminal, a messaging device, a game console, a tablet device, a medical device, an exercise device, a personal digital assistant, etc., according to an embodiment of the present application.

Apparatus 600 may include one or more of the following components: processing component 602, memory 604, power component 606, multimedia component 608, audio component 610, input/output (I/O) interface 612, sensor component 614, and communication component 616.

The processing component 602 generally controls overall operation of the device 600, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing component 602 may include one or more processors 620 to execute instructions to perform all or a portion of the steps of the methods described above. Further, the processing component 602 can include one or more modules that facilitate interaction between the processing component 602 and other components. For example, the processing component 602 can include a multimedia module to facilitate interaction between the multimedia component 608 and the processing component 602.

The memory 604 is configured to store various types of data to support operations at the apparatus 600. Examples of such data include instructions for any application or method operating on device 600, contact data, phonebook data, messages, pictures, videos, and so forth. The memory 604 may be implemented by any type or combination of volatile or non-volatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.

Power supply component 606 provides power to the various components of device 600. The power components 606 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for the apparatus 600.

The multimedia component 608 includes a screen that provides an output interface between the device 600 and the user. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 608 includes a front facing camera and/or a rear facing camera. The front camera and/or the rear camera may receive external multimedia data when the device 600 is in an operating mode, such as a shooting mode or a video mode. Each front camera and rear camera may be a fixed optical lens system or have a focal length and optical zoom capability.

The audio component 610 is configured to output and/or input audio signals. For example, audio component 610 includes a Microphone (MIC) configured to receive external audio signals when apparatus 600 is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signal may further be stored in the memory 604 or transmitted via the communication component 616. In some embodiments, audio component 610 further includes a speaker for outputting audio signals.

The I/O interface 612 provides an interface between the processing component 602 and peripheral interface modules, which may be keyboards, click wheels, buttons, etc. These buttons may include, but are not limited to: a home button, a volume button, a start button, and a lock button.

The sensor component 614 includes one or more sensors for providing status assessment of various aspects of the apparatus 600. For example, the sensor component 614 may detect an open/closed state of the device 600, the relative positioning of the components, such as a display and keypad of the device 600, the sensor component 614 may also detect a change in position of the device 600 or a component of the device 600, the presence or absence of user contact with the device 600, orientation or acceleration/deceleration of the device 600, and a change in temperature of the device 600. The sensor assembly 614 may include a proximity sensor configured to detect the presence of a nearby object without any physical contact. The sensor assembly 614 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 614 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.

The communication component 616 is configured to facilitate communications between the apparatus 600 and other devices in a wired or wireless manner. The apparatus 600 may access a wireless network based on a communication standard, such as WiFi, 2G or 3G, or a combination thereof. In an exemplary embodiment, the communication component 616 receives broadcast signals or broadcast related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 616 further includes a Near Field Communication (NFC) module to facilitate short-range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, Ultra Wideband (UWB) technology, Bluetooth (BT) technology, and other technologies.

In an exemplary embodiment, the apparatus 600 may be implemented by one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), controllers, micro-controllers, microprocessors or other electronic components for performing the above-described methods.

In an exemplary embodiment, a non-transitory computer readable storage medium comprising instructions, such as the memory 604 comprising instructions, executable by the processor 620 of the apparatus 600 to perform the above-described method is also provided. For example, the non-transitory computer readable storage medium may be a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.

Embodiments of the present application also provide a non-transitory computer-readable storage medium, where instructions in the storage medium, when executed by a processor of an electronic device, enable the electronic device to perform the method provided by the above embodiments.

An embodiment of the present application further provides a computer program product, where the computer program product includes: a computer program, stored in a readable storage medium, from which at least one processor of the electronic device can read the computer program, the at least one processor executing the computer program causing the electronic device to perform the solution provided by any of the embodiments described above.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims

1. An image processing method, comprising:

acquiring a plurality of images to be identified;

2. The method according to claim 1, wherein the two-dimensional convolution processing and the three-dimensional convolution processing are performed on the image to be recognized according to a coding algorithm in a preset hybrid U-Net network algorithm to obtain a preliminary feature map, and the preliminary feature map comprises:

3. The method according to claim 1, wherein decoding the preliminary feature map according to the decoding algorithm to obtain a target feature map comprises:

4. The method according to claim 1, wherein after performing two-dimensional convolution processing and three-dimensional convolution processing on the image to be recognized according to a coding algorithm in a preset hybrid U-Net network algorithm to obtain a preliminary feature map, the method further comprises:

5. The method of claim 4, further comprising:

6. The method of claim 4, further comprising:

7. The method according to any one of claims 5-6, further comprising:

8. An image processing apparatus characterized by comprising:

9. The apparatus of claim 8, wherein the first convolution unit comprises:

10. The apparatus of claim 8, wherein the decoding unit comprises:

the third convolution module is used for performing three-dimensional convolution processing on the preliminary characteristic graph in a three-dimensional convolution layer according to the residual learning network in the decoding algorithm to obtain a three-dimensional characteristic image; wherein the three-dimensional convolutional layer in the decoding algorithm comprises 2 layers;

the fifth convolution module is used for carrying out convolution processing on the convolution characteristic graph by using a preset normalized exponential function to obtain a target characteristic graph; the target feature map represents a plurality of pixel points obtained through decoding, and the pixel points comprise pixel points occupied by target positions and pixel points occupied by backgrounds except the target positions.

11. The apparatus of claim 8, further comprising:

12. The apparatus of claim 11, further comprising:

the third convolution unit is used for respectively carrying out 1 multiplied by 1 convolution operation processing on the first characteristic diagram according to a preset spatial position attention algorithm to obtain a plurality of second characteristic diagrams;

13. The apparatus of claim 11, further comprising:

14. The device according to any one of claims 12-13, wherein the device further comprises

Comprises the following steps:

15. An electronic device, comprising a memory, a processor, a computer program being stored in the memory and being executable on the processor, the processor implementing the method of any of the preceding claims 1-7 when executing the computer program.

16. A computer-readable storage medium having computer-executable instructions stored thereon, which when executed by a processor, perform the method of any one of claims 1-7.

17. A computer program product, characterized in that it comprises a computer program which, when being executed by a processor, carries out the method of any one of claims 1-7.