CN112669323A

CN112669323A - Image processing method and related equipment

Info

Publication number: CN112669323A
Application number: CN202011600406.5A
Authority: CN
Inventors: 陈俊希
Original assignee: Jiangsu Yuntian Lifei Technology Co ltd; Shenzhen Intellifusion Technologies Co Ltd
Current assignee: Jiangsu Yuntian Lifei Technology Co ltd; Shenzhen Intellifusion Technologies Co Ltd
Priority date: 2020-12-29
Filing date: 2020-12-29
Publication date: 2021-04-16

Abstract

The embodiment of the application provides an image processing method and related equipment, wherein the method comprises the following steps: filling K/2 circles of zeros around the first image to obtain a second image, wherein the size of the first image is integral multiple of the size of a sliding window, and K is the step length of the sliding window; performing image cutting on the second image according to the sliding window and the sliding window step length to obtain a plurality of image blocks; processing the plurality of image blocks by adopting a first neural network model to obtain a processing result of each sub image block in the plurality of sub image blocks, wherein the plurality of image blocks correspond to the plurality of sub image blocks one by one, and the sub image blocks are regional images which take the center of the image block as the center and have the size of K multiplied by K in the image blocks; and obtaining a processing result of the first image according to the processing result of all or part of the sub image blocks in the plurality of sub image blocks. By adopting the embodiment of the application, the image processing precision can be improved.

Description

Image processing method and related equipment

Technical Field

The present application relates to the field of artificial intelligence technologies, and in particular, to an image processing method and related devices.

Background

In the current crop distribution statistical scheme, distribution data of crops in a certain region, such as types of crops, distribution positions and distribution areas of different crops, are manually measured, and then a statistical chart is generated based on the distribution data of the crops in the certain region, which is obtained through manual measurement, so as to obtain a distribution map of the crops. Therefore, the existing crop distribution statistical scheme has low statistical efficiency and wastes time and labor. In addition, although the crop distribution prediction can be performed on the remote sensing image of the area that needs to be counted by the neural network, when the image is calculated by the neural network, a large amount of zero-padding (zero-padding) is performed to maintain the resolution, so that the prediction of the image edge by the neural network is not accurate, and the crop distribution precision predicted by the neural network is not high.

Disclosure of Invention

The embodiment of the application discloses an image processing method and related equipment, which are beneficial to improving the precision of image processing.

A first aspect of an embodiment of the present application discloses an image processing method, including: filling K/2 circles of zeros around the first image to obtain a second image, wherein the size of the first image is integral multiple of the size of a sliding window, and K is the step length of the sliding window; performing image cutting on the second image according to the sliding window and the sliding window step length to obtain a plurality of image blocks; processing the plurality of image blocks by adopting a first neural network model to obtain a processing result of each sub image block in a plurality of sub image blocks, wherein the plurality of image blocks correspond to the plurality of sub image blocks one by one, and the sub image blocks are regional images which take the center of the image block as the center and have the size of K multiplied by K in the image blocks; and obtaining the processing result of the first image according to the processing result of all or part of the sub image blocks in the plurality of sub image blocks.

In the embodiment of the application, K/2 circles of zeros are filled around the first image to obtain a second image, wherein the size of the first image is an integral multiple of the size of the sliding window; performing image cutting on the second image according to the sliding window and the sliding window step length K to obtain a plurality of image blocks, wherein the size of each image block in the plurality of image blocks is the same as that of the sliding window; and processing the plurality of image blocks by adopting a first neural network model to obtain a processing result of each sub-image block in the plurality of sub-image blocks, wherein the plurality of image blocks correspond to the plurality of sub-image blocks one by one, the sub-image blocks are area images which take the center of the image block as the center and have the size of K multiplied by K in the image block, namely the sub-image blocks are the central areas of the corresponding image blocks, and the first image can be obtained by combining all or part of the plurality of sub-image blocks, so the processing result of the first image can be obtained by combining the processing results of all or part of the plurality of sub-image blocks. In the embodiment of the application, the input of the first neural network model is an image block, the output of the first neural network model is the processing result of the sub-image block, and the sub-image block is the central area of the corresponding image block, so that the processing result of the sub-image block is the calculation result of the central area of each image block, thereby avoiding the precision problem caused by inaccurate prediction of the edge of the image by the neural network model, and being beneficial to improving the precision of image processing. It should be understood that the processing result of the sub image block may be a target object distribution map corresponding to the sub image block predicted by the first neural network model, and then the processing result of the first image is the target object distribution map corresponding to the first image; because the first image can be obtained by shooting the region of the target object to be counted, the target object distribution map of the region can be quickly obtained by adopting the embodiment of the application, and the efficiency is higher compared with that of manual counting.

In one possible implementation, the filling K/2 circles of zeros around the first image to obtain the second image includes: if the size of the first image is not the integral multiple of the size of the sliding window, performing filling processing on the first image to obtain a filled first image, wherein the size of the filled first image is the integral multiple of the size of the sliding window; and filling K/2 circles of zeros around the filled first image to obtain the second image.

In a possible implementation manner, the processing the plurality of image blocks by using the first neural network model to obtain a processing result of each of the plurality of sub-image blocks includes: processing the plurality of image blocks by adopting a first neural network model to obtain a processing result of each image block in the plurality of image blocks; and image interception is carried out on the processing result of each image block in the plurality of image blocks so as to obtain the processing result of each sub image block in the plurality of sub image blocks.

In the embodiment of the application, the first neural network model outputs the processing result of the image block, and image interception is performed on the processing result of the image block, so that the processing result of the sub-image block corresponding to the image block can be obtained, and the problem that the image processing precision is influenced due to inaccurate edge prediction of the image block by the first neural network model is solved. It should be understood that the processing result of the sub image block is the target object distribution map corresponding to the sub image block, and when the processing result of the first image is the target object distribution map corresponding to the first image, the processing result of the image block is the target object distribution map corresponding to the image block.

In a possible implementation manner, the image capturing the processing result of each image block in the plurality of image blocks to obtain the processing result of each sub image block in the plurality of sub image blocks includes: for the processing result of each image block in the plurality of image blocks, executing the following steps to obtain the processing result of each sub image block in the plurality of sub image blocks: and taking the center of the processing result of the target image block as a center, and performing image interception on the area image with the size of K multiplied by K in the processing result of the target image block to obtain a target sub image block, wherein the target image block is any one of the plurality of image blocks, and the target sub image block is a sub image block corresponding to the target image block.

In a possible implementation manner, the processing the plurality of image blocks by using the first neural network model to obtain a processing result of each of the plurality of image blocks includes: for each image block in the plurality of image blocks, executing the following steps to obtain a processing result of each image block in the plurality of image blocks: performing feature extraction on a target image block by adopting a deep convolutional neural network model to obtain a first feature map of the target image block, wherein the target image block is any one of the plurality of image blocks; performing convolution operation according to the first feature map and a first convolution kernel to obtain a first target feature map, wherein the size of the first convolution kernel is 1 × 1; performing dilation convolution operation according to the first feature map and a second convolution kernel to obtain a second target feature map, wherein the size of the second convolution kernel is n × n, the dilation rate of the second convolution kernel is a first preset value, and n is an integer greater than 1; performing dilation convolution operation according to the first feature map and a third convolution kernel to obtain a third target feature map, wherein the size of the third convolution kernel is n × n, and the dilation rate of the third convolution kernel is a second preset value; performing dilation convolution operation according to the first feature map and a fourth convolution kernel to obtain a fourth target feature map, wherein the size of the fourth convolution kernel is n × n, and the dilation rate of the fourth convolution kernel is a third preset value; performing pooling operation on the first feature map to obtain a fifth target feature map; wherein the first target feature map, the second target feature map, the third target feature map, the fourth target feature map, and the fifth target feature map are the same size; fusing the first target feature map, the second target feature map, the third target feature map, the fourth target feature map and the fifth target feature map to obtain a second feature map; performing convolution operation according to the second feature map and a fifth convolution kernel to obtain a third feature map, wherein the size of the fifth convolution kernel is 1 × 1; performing upsampling processing on the third feature map to obtain a fourth feature map; performing convolution operation according to the first feature map and a sixth convolution kernel to obtain a fifth feature map, wherein the size of the sixth convolution kernel is 1 × 1, and the sizes of the fourth feature map and the fifth feature map are the same; fusing the fourth feature map and the fifth feature map to obtain a sixth feature map; performing convolution operation according to the sixth feature map and a seventh convolution kernel to obtain a seventh feature map, wherein the size of the seventh convolution kernel is n × n; and performing upsampling processing on the seventh feature map to obtain a processing result of the target image block.

In the embodiment of the application, convolution layers with different expansion rates are adopted to carry out convolution calculation on the first feature map, so that features can be extracted under the conditions of different receptive fields under the condition that the calculation amount is not increased; the first characteristic diagram is subjected to direct pooling treatment, so that the purposes of denoising and reducing calculated amount are mainly achieved, and overfitting is prevented to a certain extent; and then the first target feature map, the second target feature map, the third target feature map, the fourth target feature map and the fifth target feature map are fused, so that the identification accuracy can be improved.

In a possible implementation manner, the processing result of the sub image block is a target object distribution map corresponding to the sub image block, and the processing result of the first image is a target object distribution map corresponding to the first image; the obtaining of the processing result of the first image according to the processing result of all or part of the sub image blocks in the plurality of sub image blocks includes: processing the target object distribution map corresponding to each of the plurality of sub image blocks to obtain the processed target object distribution map corresponding to all or part of the plurality of sub image blocks, wherein the processing includes hole filling and small connected domain removal, the holes refer to regions without target objects surrounded by the distribution regions of the same target object in the sub image blocks, and the small connected domains refer to scattered distribution regions of the target objects in the sub image blocks; and obtaining a target object distribution map corresponding to the first image according to the target object distribution map corresponding to all or part of the processed sub image blocks.

In the embodiment of the application, the target object distribution maps corresponding to the sub-image blocks are subjected to post-processing such as hole filling and small connected domain removal, and then the target object distribution maps corresponding to the processed sub-image blocks are combined to obtain the target object distribution map corresponding to the first image, so that the accuracy of target object distribution prediction is improved.

In one possible implementation, the first neural network model is obtained by: training a preset neural network by adopting a first training set to obtain a second neural network model, wherein sample image blocks in the first training set are provided with labels; predicting a second training set by using the second neural network model to obtain a third training set, wherein sample image blocks in the second training set have no label, and sample image blocks in the third training set have a pseudo label; merging the first training set and the third training set to obtain a fourth training set; and training the preset neural network by adopting the fourth training set to obtain the first neural network model.

In the embodiment of the application, a sample image block with a label is adopted to train a preset neural network to obtain a second neural network model; then, predicting the sample image block without the label by adopting the second neural network model to generate a pseudo label; and adding the sample image blocks with the pseudo labels into a training set, and retraining the preset neural network by adopting the training set containing the sample image blocks with the pseudo labels to obtain a first neural network model, so that the prediction precision of the first neural network model can be improved, and the improvement of the precision of image processing is facilitated.

In one possible implementation, the sample image block is obtained by the following strategy: cutting a sample image by taking the size of the sliding window as 1024 × 1024 and the step length of the sliding window as 900, and when the proportion of a background class area in the sliding window to the sliding window is larger than 7/8, not cutting the currently framed image block of the sliding window; when the proportion of the background class area in the sliding window to the sliding window is smaller than 1/3, reducing the sliding window step length to 512 so as to increase the sampling rate; or cutting the sample image by taking the size of the sliding window as 1024 × 1024 and the step length of the sliding window as 512, and when the proportion of the background type area in the sliding window is larger than 1/3, not cutting the currently framed image block of the sliding window.

In the embodiment of the application, when a sample image block used for training is obtained, a sample image can be cut with the size of a sliding window of 1024 × 1024 and the step length of the sliding window of 900 or 512; if the sliding window step length is 900, when the proportion of the background class area in the sliding window to the sliding window is greater than 7/8, it is indicated that the data which can be used for training in the image block framed by the sliding window currently is less, and the image block framed by the sliding window currently is not cut; when the ratio of the background class area in the sliding window to the sliding window is smaller than 1/3, it indicates that more data can be used for training in the image block framed by the sliding window currently, and the sliding window step length can be reduced, for example, to 512, to increase the sampling rate; if the sliding window step length is 512, and when the proportion of the background class area in the sliding window to the sliding window is greater than 1/3, it indicates that the data available for training in the image block currently framed by the sliding window is less, and the image block currently framed by the sliding window is not cut. Therefore, the effective data in the obtained sample image block can be as much as possible, and the first neural network model with high-precision prediction capability can be obtained through training.

In one possible implementation, the label of the sample image block is obtained by: setting a transition zone at the boundary between the edge of the sample image block and the class, wherein the boundary between the classes refers to the boundary between the distribution areas of different classes in the sample image block; setting the label probability of a non-transition band region in the sample image block to be 1, and setting the label probability of a transition band region in the sample image block to be between 0 and 1.

In the embodiment of the application, a transition zone is arranged at the boundary between the edge of a sample image block and a class, the label probability of a non-transition zone region in the sample image block is set to be 1, and the label probability of the transition zone region in the sample image block is set to be between 0 and 1, because the neural network model is difficult to correctly classify data at the boundary between the image edge and the class, compared with the case that the label probability of the transition zone region is set to be 0 or 1, the label probability of the transition zone region in the sample image block is set to be between 0 and 1, and the accuracy of class prediction of the transition zone by the neural network model is favorably improved.

A second aspect of the embodiments of the present application discloses an image processing apparatus, including: the device comprises a filling unit, a calculating unit and a calculating unit, wherein the filling unit is used for filling K/2 circles of zeros around a first image to obtain a second image, the size of the first image is integral multiple of the size of a sliding window, and K is the step length of the sliding window; the cutting unit is used for carrying out image cutting on the second image according to the sliding window and the sliding window step length to obtain a plurality of image blocks; the processing unit is used for processing the plurality of image blocks by adopting a first neural network model to obtain a processing result of each sub image block in the plurality of sub image blocks, wherein the plurality of image blocks correspond to the plurality of sub image blocks in a one-to-one manner, and the sub image blocks are regional images which take the center of the image block as the center and have the size of K multiplied by K in the image blocks; and obtaining the processing result of the first image according to the processing result of each sub image block in the plurality of sub image blocks.

In a possible implementation manner, in terms of filling K/2 circles of zeros around the first image to obtain the second image, the filling unit is specifically configured to: if the size of the first image is not the integral multiple of the size of the sliding window, performing filling processing on the first image to obtain a filled first image, wherein the size of the filled first image is the integral multiple of the size of the sliding window; and filling K/2 circles of zeros around the filled first image to obtain the second image.

In a possible implementation manner, in the aspect that the first neural network model is adopted to process the plurality of image blocks to obtain a processing result of each of the plurality of sub-image blocks, the processing unit is specifically configured to: processing the plurality of image blocks by adopting a first neural network model to obtain a processing result of each image block in the plurality of image blocks; and image interception is carried out on the processing result of each image block in the plurality of image blocks so as to obtain the processing result of each sub image block in the plurality of sub image blocks.

In a possible implementation manner, in the aspect of performing image truncation on the processing result of each image block in the plurality of image blocks to obtain the processing result of each sub image block in the plurality of sub image blocks, the processing unit is specifically configured to: for the processing result of each image block in the plurality of image blocks, executing the following steps to obtain the processing result of each sub image block in the plurality of sub image blocks: and taking the center of the processing result of the target image block as a center, and performing image interception on the area image with the size of K multiplied by K in the processing result of the target image block to obtain a target sub image block, wherein the target image block is any one of the plurality of image blocks, and the target sub image block is a sub image block corresponding to the target image block.

In a possible implementation manner, in the aspect that the first neural network model is used to process the plurality of image blocks to obtain the processing result of each of the plurality of image blocks, the processing unit is specifically configured to: for each image block in the plurality of image blocks, executing the following steps to obtain a processing result of each image block in the plurality of image blocks: performing feature extraction on a target image block by adopting a deep convolutional neural network model to obtain a first feature map of the target image block, wherein the target image block is any one of the plurality of image blocks; performing convolution operation according to the first feature map and a first convolution kernel to obtain a first target feature map, wherein the size of the first convolution kernel is 1 × 1; performing dilation convolution operation according to the first feature map and a second convolution kernel to obtain a second target feature map, wherein the size of the second convolution kernel is n × n, the dilation rate of the second convolution kernel is a first preset value, and n is an integer greater than 1; performing dilation convolution operation according to the first feature map and a third convolution kernel to obtain a third target feature map, wherein the size of the third convolution kernel is n × n, and the dilation rate of the third convolution kernel is a second preset value; performing dilation convolution operation according to the first feature map and a fourth convolution kernel to obtain a fourth target feature map, wherein the size of the fourth convolution kernel is n × n, and the dilation rate of the fourth convolution kernel is a third preset value; performing pooling operation on the first feature map to obtain a fifth target feature map; wherein the first target feature map, the second target feature map, the third target feature map, the fourth target feature map, and the fifth target feature map are the same size; fusing the first target feature map, the second target feature map, the third target feature map, the fourth target feature map and the fifth target feature map to obtain a second feature map; performing convolution operation according to the second feature map and a fifth convolution kernel to obtain a third feature map, wherein the size of the fifth convolution kernel is 1 × 1; performing upsampling processing on the third feature map to obtain a fourth feature map; performing convolution operation according to the first feature map and a sixth convolution kernel to obtain a fifth feature map, wherein the size of the sixth convolution kernel is 1 × 1, and the sizes of the fourth feature map and the fifth feature map are the same; fusing the fourth feature map and the fifth feature map to obtain a sixth feature map; performing convolution operation according to the sixth feature map and a seventh convolution kernel to obtain a seventh feature map, wherein the size of the seventh convolution kernel is n × n; and performing upsampling processing on the seventh feature map to obtain a processing result of the target image block.

In a possible implementation manner, the processing result of the sub image block is a target object distribution map corresponding to the sub image block, and the processing result of the first image is a target object distribution map corresponding to the first image; in the aspect that the processing result of the first image is obtained according to the processing result of all or part of the sub image blocks, the processing unit is specifically configured to: processing the target object distribution map corresponding to each of the plurality of sub image blocks to obtain the processed target object distribution map corresponding to all or part of the plurality of sub image blocks, wherein the processing includes hole filling and small connected domain removal, the holes refer to regions without target objects surrounded by the distribution regions of the same target object in the sub image blocks, and the small connected domains refer to scattered distribution regions of the target objects in the sub image blocks; and obtaining a target object distribution map corresponding to the first image according to the target object distribution map corresponding to all or part of the processed sub image blocks.

A third aspect of embodiments of the present application discloses an electronic device, comprising a processor, a memory, a communication interface, and one or more programs, stored in the memory and configured to be executed by the processor, the programs comprising instructions for performing the steps of the method according to any one of the first aspect of embodiments of the present application.

The fourth aspect of the present embodiment discloses a chip, which includes: a processor, configured to call and run a computer program from a memory, so that a device on which the chip is installed performs the method according to any one of the first aspect of the embodiments of the present application.

A fifth aspect of embodiments of the present application discloses a computer-readable storage medium, which is characterized by storing a computer program for electronic data exchange, wherein the computer program causes a computer to execute the method according to any one of the first aspect of embodiments of the present application.

A sixth aspect of embodiments of the present application discloses a computer program product, which causes a computer to execute the method according to any one of the first aspect of the embodiments of the present application.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the embodiments or the prior art descriptions will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings without creative efforts.

Fig. 1 is a schematic flowchart of an image processing method according to an embodiment of the present application.

Fig. 2 is a schematic diagram of an image processing process according to an embodiment of the present application.

Fig. 3 is a schematic diagram of a data processing flow of a neural network model provided in an embodiment of the present application.

Fig. 4 is a schematic diagram of a checkerboard effect provided by an embodiment of the present application.

Fig. 5 is a schematic diagram of post-processing of a target object distribution map according to an embodiment of the present application.

Fig. 6 is a schematic flowchart of a neural network model training process according to an embodiment of the present disclosure.

Fig. 7 is a schematic diagram of a remote sensing image and a tag provided in an embodiment of the present application.

Fig. 8 is a schematic diagram of a hard tag and a soft tag provided in an embodiment of the present application.

Fig. 9 is a schematic diagram of a transition zone in an image according to an embodiment of the present application.

Fig. 10 is a schematic diagram of a sample image visualization provided by an embodiment of the present application.

Fig. 11 is a schematic structural diagram of an image processing apparatus according to an embodiment of the present application.

Fig. 12 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

The embodiments of the present application will be described below with reference to the drawings.

Referring to fig. 1, fig. 1 is a flowchart illustrating an image processing method that can be applied to an electronic device according to an embodiment of the present disclosure, where the image processing method includes, but is not limited to, the following steps.

Step 101, filling K/2 circles of zeros around the first image to obtain a second image, wherein the size of the first image is an integral multiple of the size of the sliding window, and K is the sliding window step length.

Wherein the filling of K/2 circles of zeros around the first image to obtain the second image comprises: if the size of the first image is not an integral multiple of the size of a sliding window, performing filling processing on the first image to obtain a filled first image, wherein the size of the filled first image is an integral multiple of the size of the sliding window; and filling K/2 circles of zeros around the filled first image to obtain the second image. It should be understood that the padding process is also zero padding, i.e. the padded pixel values are zero.

For example, referring to fig. 2, fig. 2 is a schematic diagram of an image processing process according to an embodiment of the present disclosure, as shown in fig. 2, if the size of the first image is 2304 × 2304 and the size of the sliding window is 1024 × 1024, the size of the first image is not an integer multiple of the size of the sliding window; filling rows and columns in the first image to obtain a filled first image, wherein the size of the filled first image is 2560 multiplied by 2560, and the size of the filled first image is integral multiple of the size of the sliding window; and then, filling K/2 circles of zeros around the filled first image to obtain a second image, for example, if the sliding window step K is equal to 512, filling 256 circles of zeros around the filled first image to obtain a size of 3072 × 3072.

And 102, performing image cutting on the second image according to the sliding window and the sliding window step length to obtain a plurality of image blocks.

It should be understood that, image cutting is performed according to the sliding window, and the size of the obtained plurality of image blocks is the size of the sliding window; if the size of the sliding window is M × M, the size of the image block is also M × M, where M is a positive integer greater than or equal to K.

For example, as shown in fig. 2, the size of the sliding window is 1024 × 1024, and the size of the image block obtained by image segmentation performed by the sliding window is also 1024 × 1024.

Step 103, processing the plurality of image blocks by using a first neural network model to obtain a processing result of each of a plurality of sub image blocks, where the plurality of image blocks correspond to the plurality of sub image blocks one to one, and the sub image blocks are regional images of the image blocks, which take the center of the image block as the center and have a size of K × K.

It is understood that a sub image block is an image of the central area of the image block; each image block has a corresponding sub-image block, so that a plurality of image blocks correspond to a plurality of sub-image blocks; if the size of the sub image block is K × K, the size of the processing result of the sub image block is also K × K.

If the image cutting is required to be improved or the efficiency of processing the plurality of image blocks by adopting the first neural network model is required to be improved, the sliding window step length during the image cutting can be adjusted to obtain a larger sliding window step length or a larger central area (namely, the size of a larger sub-image block is obtained) is reserved, so that the processing efficiency is improved.

For example, the image capturing the processing result of each image block of the plurality of image blocks to obtain the processing result of each sub image block of the plurality of sub image blocks includes: for the processing result of each image block in the plurality of image blocks, executing the following steps to obtain the processing result of each sub image block in the plurality of sub image blocks: and taking the center of the processing result of the target image block as a center, and performing image interception on the area image with the size of K multiplied by K in the processing result of the target image block to obtain a target sub image block, wherein the target image block is any one of the plurality of image blocks, and the target sub image block is a sub image block corresponding to the target image block.

For example, as shown in fig. 2, the size of the image block is 1024 × 1024, the step size of the sliding window is 512, and the size of the sub image block is 512 × 512; specifically, any image block is input into the first neural network model, the first neural network model outputs the processing result of the image block, the size of the processing result of the image block is 1024 × 1024, the processing result of the image block is intercepted according to the size of the sub-image block corresponding to the image block, and thus the size of the processing result of each obtained sub-image block is 512 × 512.

In the embodiment of the application, the first neural network model outputs the processing result of the image block, and image interception is performed on the processing result of the image block, so that the processing result of the sub-image block corresponding to the image block can be obtained, and the problem that the image processing precision is influenced due to inaccurate edge prediction of the image block by the first neural network model is solved.

Referring to fig. 3, fig. 3 is a schematic view of a data processing flow of a neural network model according to an embodiment of the present disclosure, where the neural network model may be a first neural network model, and the processing flow shown in fig. 3 may be used to process the plurality of image blocks to obtain a processing result of each of the plurality of image blocks.

Specifically, the deep convolutional neural network model (DCNN) in the neural network model shown in FIG. 3 may be Deeplab V3plus, Xception-65 and ResNet-101, and DenseNet-121; when processing an image block, the multiple models may be used for processing (i.e., voting), and then the processing results of each model are averaged, and the average value is used as the processing result of the image block; wherein, each model used for voting is trained by different data, thus increasing the model difference. Wherein n can be 3, 5, 7, 9, etc.; the first expansion rate, that is, the expansion rate of the second convolution kernel is a first preset value, for example, the first expansion rate may be 6; the second expansion rate, that is, the expansion rate of the third convolution kernel is a second preset value, for example, the second expansion rate may be 12; the third expansion rate, i.e., the expansion rate of the fourth convolution kernel, is a third preset value, for example, the third expansion rate may be 18; the step size of the upsampling is 4.

When the processing result of the image block is obtained, the image block is firstly subjected to horizontal overturning, vertical overturning and horizontal and vertical overturning, and then input to the first neural network model for processing to obtain a processing result, namely a prediction result of the processing result of the image block, so that the processing results of the image block under different input modes can be obtained, then the processing results of the same image block under different input modes are averaged, and the average value is taken as a final processing result, so that the precision of image processing can be improved.

And 104, obtaining a processing result of the first image according to the processing result of all or part of the sub image blocks in the plurality of sub image blocks.

When the neural network model is predicted, only the central area of the prediction result is reserved, and the edge with inaccurate prediction is discarded, which is called expansion prediction. During convolution calculation, a large amount of zero padding is carried out for maintaining resolution, so that the neural network model is inaccurate in edge prediction; if the prediction result of the neural network is directly subjected to non-overlapping sliding window prediction splicing, the obtained prediction result splicing trace is obvious, namely, a checkered effect is generated, as shown in fig. 4. And by expansion prediction, the checkerboard effect can be eliminated.

It should be understood that the embodiments of the present application can be applied to predicting a target object distribution map in an image; in this case, the processing result of the sub image block is the target object distribution map corresponding to the sub image block, the processing result of the first image is the target object distribution map corresponding to the first image, and the processing result of the image block is the target object distribution map corresponding to the image block. For example, for predicting the crop distribution of a certain region, in which case the target object may include various crops, etc.

Specifically, as shown in fig. 5, the target object distribution map corresponding to each sub-image block is post-processed, where the post-processing mainly includes filling holes and removing small connected domains. It should be understood that, in the embodiment of the present application, the target object distribution map corresponding to the image block may be post-processed to obtain the target object distribution map corresponding to the processed image block, and then the best image of the target object distribution map corresponding to the processed image block is captured to obtain the target object distribution map corresponding to the processed sub-image block; or the method may first capture a perfect image of the target object distribution map corresponding to the image block to obtain a target object distribution map corresponding to the sub-image block, and then perform post-processing on the target object distribution map corresponding to the sub-image block to obtain a target object distribution map corresponding to the processed sub-image block; this is not a particular limitation of the present application.

The embodiments of the present application can be applied to statistics of crop distribution in a certain region, for example, a remote sensing image of the region is obtained by taking an aerial photograph of the region, and then the remote sensing image of the region is used as a first image, so as to calculate a distribution map of crops in the region by the image processing method provided in the present application.

In the image processing method described in fig. 1, K/2 circles of zeros are filled around a first image to obtain a second image, the size of the first image being an integer multiple of the size of the sliding window; performing image cutting on the second image according to the sliding window and the sliding window step length K to obtain a plurality of image blocks, wherein the size of each image block in the plurality of image blocks is the same as that of the sliding window; and processing the plurality of image blocks by adopting a first neural network model to obtain a processing result of each sub-image block in the plurality of sub-image blocks, wherein the plurality of image blocks correspond to the plurality of sub-image blocks one by one, the sub-image blocks are area images which take the center of the image block as the center and have the size of K multiplied by K in the image block, namely the sub-image blocks are the central areas of the corresponding image blocks, and the first image can be obtained by combining all or part of the plurality of sub-image blocks, so the processing result of the first image can be obtained by combining the processing results of all or part of the plurality of sub-image blocks. In the embodiment of the application, the input of the first neural network model is an image block, the output of the first neural network model is the processing result of the sub-image block, and the sub-image block is the central area of the corresponding image block, so that the processing result of the sub-image block is the calculation result of the central area of each image block, thereby avoiding the precision problem caused by inaccurate prediction of the edge of the image by the neural network model, and being beneficial to improving the precision of image processing. It should be understood that the processing result of the sub image block may be a target object distribution map corresponding to the sub image block predicted by the first neural network model, and then the processing result of the first image is the target object distribution map corresponding to the first image; because the first image can be obtained by shooting the region of the target object to be counted, the target object distribution map of the region can be quickly obtained by adopting the embodiment of the application, and the efficiency is higher compared with that of manual counting.

Specifically, a semi-supervised learning mode is adopted for training to obtain a first neural network model, a test set without labels is used for predicting to generate pseudo labels, then a training set is added, the preset neural network is retrained, the model obtained through retraining is the first neural network model, and the prediction accuracy of the first neural network model obtained through training is greatly improved.

Wherein, during model training, snapshot ensemble can be introduced. The snapshot ensemble is a simple and general scoring skill, a plurality of models which are converged to a local minimum value are stored through a learning rate adjusting strategy of cosine period annealing, and the model effect is improved through model self-fusion. Another function of the snapshot ensemble is to validate the new scheme. The result of deep learning training has certain randomness, and when new improved scheme verification is carried out, it is sometimes difficult to determine whether the small-amplitude promotion of the score on the line comes from the randomness or the effect of the improved scheme. The snapshot ensemble is a method for verifying a stable new scheme.

In one possible implementation, the sample image block is obtained by the following strategy: cutting a sample image by using the size of the sliding window as 1024 × 1024 and the sliding window step length as 900, and when the proportion of a background type area in the sliding window to the sliding window is larger than 7/8, not cutting an image block framed by the sliding window currently; when the proportion of the background class area in the sliding window to the sliding window is smaller than 1/3, reducing the sliding window step length to 512 so as to increase the sampling rate; and secondly, cutting the sample image by taking the size of the sliding window as 1024 multiplied by 1024 and the sliding window step length as 512, and when the proportion of the background type area in the sliding window to the sliding window is larger than 1/3, not cutting the image block currently framed by the sliding window.

Among them, the image cutting is mainly considered from the following three aspects:

(1) the speed is that GDAL (geographic Data Abstraction library) is used for image cutting, multi-process acceleration is directly carried out during reading, and when the size of an image block is 1024, a single image can be cut within 5-6 min.

(2) The image block size can comprise 1024 and 512 cutting modes.

(3) And (3) class balancing: filtering out areas with the background class area ratio larger than 7/8; and when the background class area ratio is less than 1/3, the step size of the sliding window is reduced, and the sampling rate is increased.

The sample image blocks can be obtained by cutting in 1024 and 512 cutting modes, and are used for training different models respectively, so that the difference degree of the models is improved, and model voting is facilitated.

Taking a target object distribution diagram corresponding to a predicted image block as an example, please refer to fig. 6, where fig. 6 is a schematic flowchart of a neural network model training process according to an embodiment of the present application. As shown in fig. 6, the first neural network model is obtained by three ways of training: (1) performing image cutting by adopting a first strategy to obtain sample image blocks with the size of 1024 multiplied by 1024, and training an Xcenter-65 + hollow space convolution pooling pyramid (ASPP) network architecture by adopting the sample image blocks to obtain a target object distribution diagram prediction result 1; (2) performing image cutting by adopting a strategy two to obtain sample image blocks with the size of 1024 multiplied by 1024, and training an Xconvergence-65 + ASPP network architecture by adopting the sample image blocks to obtain a target object distribution diagram prediction result 2; (3) performing image cutting by adopting a strategy two + random cutting mode to obtain sample image blocks with the size of 512 multiplied by 512, and training a ResNet-101+ ASPP network architecture by adopting the sample image blocks to obtain a target object distribution diagram prediction result 3; aiming at the results of the three modes, determining the first neural network model obtained by training in the optimal mode by adopting one-hot coding and argmax function, namely obtaining the optimal target object distribution diagram prediction result.

As illustrated by a target object distribution diagram corresponding to the predicted image block, the sample image in the embodiment of the present application may be a remote sensing image, as shown in fig. 7, the remote sensing image is an aerial image of the same area; wherein the upper half part is an original image, the lower half part is a single-channel image formed by corresponding different labels, the provided label (label) is the single-channel image with the size of 1:1 of the original image, and the size of the pixel corresponds to different labeling categories; wherein the pixel value of the flue-cured tobacco is 1, the pixel value of the corn is 2, the pixel value of the coix seed is 3, the pixel value of the artificial building is 4, and the pixel value of the background class is 0; in this manner, training set data visualization may be achieved.

In the image, the labels of the pixels at the boundary between the image edge and the category (i.e., the boundary between different categories) are hard labels (hard samples), and the labels of the pixels in the same category distribution region are soft labels (easy samples), as shown in fig. 8.

In particular, in the image segmentation task, the classification result of each pixel depends largely on the surrounding pixels, based on which hard labels in part of the samples can be explored. The following two types of data are mainly considered:

(1) data from image edge: in the convolution operation, the zero filling of the image edge is too much, so that the information is lacked, the correct classification is difficult, and the grid effect can be generated.

(2) Data at the interface between different classes: the boundary between classes is difficult to define, a plurality of labeling errors exist, and the boundary points between classes with unstable gradient usually only have a plurality of pixel offsets during training; for the network, the input information is highly similar, but the labels are different during training, and the labels are also unstable factors in the training process.

Referring to fig. 9, fig. 9 is a schematic diagram of a sample image visualization provided in an embodiment of the present application, and fig. 9 is a diagram of an original image, a model prediction result without dilation prediction, and a prediction confidence level visualization diagram of each pixel point by a model from top to bottom (where the confidence level p <0.8 is visualized as black, and p > is 0.8 is visualized as white), it is obvious that, for image edge data, it is difficult to make a correct classification for an information-missing network. For inter-class boundaries, the network has a low confidence in classifying this portion of data due to the unstable training process gradient.

It should be understood that the model trained after label smoothing is more stable and has stronger generalization capability, so that the label smoothing can be performed on the sample image block in the process of training the first neural network model. In the knowledge distillation, the student model trained by soft target output by the teacher model has stronger generalization capability than the model trained by the hard label directly. The soft label more reasonably reflects the real distribution condition of the sample, and the hard label only has the total probability and the 0 probability, which is too absolute. During knowledge distillation, the teacher model realizes the sorting (label smoothing) of the soft labels and the hard labels, outputs lower confidence coefficient for the hard labels, and outputs higher confidence coefficient for the soft labels, so that the student model learns more abundant information.

Specifically, the label smoothing processing adopts a mode that a transition zone is arranged at the boundary between the edge of the image and the class, as shown in fig. 10, labels of pixel points in the transition zone are regarded as hard labels, label smoothing processing needs to be performed, and the smoothing degree depends on the proportion of hard label pixels in each image block to the total input pixels during training; the size of the transition band w (width) is a hyper-parameter, w being 11 pixels.

When the model is trained, the soft label and the hard label adopt different modes to calculate cross entropy loss functions, the cross entropy loss function corresponding to the soft label is shown as a formula (1), and the cross entropy loss function corresponding to the hard label is shown as formulas (2) and (3).

Wherein K represents a classification category; p is a radical of_kExpressed as the result after softmax, i.e. confidence; the parameter alpha is used for controlling the smoothness degree of the label, and the value is taken as the proportion of the hard label pixel in the input data in each time; y is_kA label representing a manual annotation;

represents a soft label;

representing the probability value of the output of the model with the soft label as the input;

represents a hard tag;

representing the probability value of the output of the model with the hard tag as input;

The method of the embodiments of the present application is set forth above in detail and the apparatus of the embodiments of the present application is provided below.

Referring to fig. 11, fig. 11 is a schematic structural diagram of an image processing apparatus 1100 provided in an embodiment of the present application, the image processing apparatus 1100 is applied to an electronic device, and the image processing apparatus 1100 may include a filling unit 1101, a cutting unit 1102, and a processing unit 1103, where details of each unit are as follows:

a filling unit 1101, configured to fill K/2 turns of zero around a first image to obtain a second image, where the size of the first image is an integer multiple of the size of a sliding window, and K is a sliding window step size;

a cutting unit 1102, configured to perform image cutting on the second image according to the sliding window and the sliding window step length to obtain a plurality of image blocks;

the processing unit 1103 is configured to process the plurality of image blocks by using a first neural network model to obtain a processing result of each of a plurality of sub-image blocks, where the plurality of image blocks correspond to the plurality of sub-image blocks in a one-to-one manner, and the sub-image blocks are area images of the image blocks that take the center of the image block as a center and have a size of K × K; and obtaining the processing result of the first image according to the processing result of each sub image block in the plurality of sub image blocks.

In a possible implementation manner, in terms of filling K/2 circles of zeros around the first image to obtain the second image, the filling unit 1101 is specifically configured to: if the size of the first image is not the integral multiple of the size of the sliding window, performing filling processing on the first image to obtain a filled first image, wherein the size of the filled first image is the integral multiple of the size of the sliding window; and filling K/2 circles of zeros around the filled first image to obtain the second image.

In a possible implementation manner, in terms of processing the plurality of image blocks by using the first neural network model to obtain a processing result of each of the plurality of sub-image blocks, the processing unit 1103 is specifically configured to: processing the plurality of image blocks by adopting a first neural network model to obtain a processing result of each image block in the plurality of image blocks; and image interception is carried out on the processing result of each image block in the plurality of image blocks so as to obtain the processing result of each sub image block in the plurality of sub image blocks.

In a possible implementation manner, in terms of performing image truncation on the processing result of each image block of the plurality of image blocks to obtain the processing result of each sub image block of the plurality of sub image blocks, the processing unit 1103 is specifically configured to: for the processing result of each image block in the plurality of image blocks, executing the following steps to obtain the processing result of each sub image block in the plurality of sub image blocks: and taking the center of the processing result of the target image block as a center, and performing image interception on the area image with the size of K multiplied by K in the processing result of the target image block to obtain a target sub image block, wherein the target image block is any one of the plurality of image blocks, and the target sub image block is a sub image block corresponding to the target image block.

In a possible implementation manner, in terms of processing the plurality of image blocks by using the first neural network model to obtain a processing result of each of the plurality of image blocks, the processing unit 1103 is specifically configured to: for each image block in the plurality of image blocks, executing the following steps to obtain a processing result of each image block in the plurality of image blocks: performing feature extraction on a target image block by adopting a deep convolutional neural network model to obtain a first feature map of the target image block, wherein the target image block is any one of the plurality of image blocks; performing convolution operation according to the first feature map and a first convolution kernel to obtain a first target feature map, wherein the size of the first convolution kernel is 1 × 1; performing dilation convolution operation according to the first feature map and a second convolution kernel to obtain a second target feature map, wherein the size of the second convolution kernel is n × n, the dilation rate of the second convolution kernel is a first preset value, and n is an integer greater than 1; performing dilation convolution operation according to the first feature map and a third convolution kernel to obtain a third target feature map, wherein the size of the third convolution kernel is n × n, and the dilation rate of the third convolution kernel is a second preset value; performing dilation convolution operation according to the first feature map and a fourth convolution kernel to obtain a fourth target feature map, wherein the size of the fourth convolution kernel is n × n, and the dilation rate of the fourth convolution kernel is a third preset value; performing pooling operation on the first feature map to obtain a fifth target feature map; wherein the first target feature map, the second target feature map, the third target feature map, the fourth target feature map, and the fifth target feature map are the same size; fusing the first target feature map, the second target feature map, the third target feature map, the fourth target feature map and the fifth target feature map to obtain a second feature map; performing convolution operation according to the second feature map and a fifth convolution kernel to obtain a third feature map, wherein the size of the fifth convolution kernel is 1 × 1; performing upsampling processing on the third feature map to obtain a fourth feature map; performing convolution operation according to the first feature map and a sixth convolution kernel to obtain a fifth feature map, wherein the size of the sixth convolution kernel is 1 × 1, and the sizes of the fourth feature map and the fifth feature map are the same; fusing the fourth feature map and the fifth feature map to obtain a sixth feature map; performing convolution operation according to the sixth feature map and a seventh convolution kernel to obtain a seventh feature map, wherein the size of the seventh convolution kernel is n × n; and performing upsampling processing on the seventh feature map to obtain a processing result of the target image block.

In a possible implementation manner, the processing result of the sub image block is a target object distribution map corresponding to the sub image block, and the processing result of the first image is a target object distribution map corresponding to the first image; in terms of obtaining the processing result of the first image according to the processing result of all or part of the sub image blocks, the processing unit 1103 is specifically configured to: processing the target object distribution map corresponding to each of the plurality of sub image blocks to obtain the processed target object distribution map corresponding to all or part of the plurality of sub image blocks, wherein the processing includes hole filling and small connected domain removal, the holes refer to regions without target objects surrounded by the distribution regions of the same target object in the sub image blocks, and the small connected domains refer to scattered distribution regions of the target objects in the sub image blocks; and obtaining a target object distribution map corresponding to the first image according to the target object distribution map corresponding to all or part of the processed sub image blocks.

It should be noted that the implementation of each unit may also correspond to the corresponding description of the method embodiment shown in fig. 1. Of course, the image processing apparatus 1100 provided in the embodiment of the present application includes, but is not limited to, the above unit modules, for example: the image processing apparatus 1100 may further include a storage unit 1104, and the storage unit 1104 may be used to store program codes and data of the image processing apparatus 1100.

In the image processing apparatus 1100 depicted in fig. 11, K/2-round zeros are filled around a first image having a size that is an integral multiple of the size of the sliding window to obtain a second image; performing image cutting on the second image according to the sliding window and the sliding window step length K to obtain a plurality of image blocks, wherein the size of each image block in the plurality of image blocks is the same as that of the sliding window; and processing the plurality of image blocks by adopting a first neural network model to obtain a processing result of each sub-image block in the plurality of sub-image blocks, wherein the plurality of image blocks correspond to the plurality of sub-image blocks one by one, the sub-image blocks are area images which take the center of the image block as the center and have the size of K multiplied by K in the image block, namely the sub-image blocks are the central areas of the corresponding image blocks, and the first image can be obtained by combining all or part of the plurality of sub-image blocks, so the processing result of the first image can be obtained by combining the processing results of all or part of the plurality of sub-image blocks. In the embodiment of the application, the input of the first neural network model is an image block, the output of the first neural network model is the processing result of the sub-image block, and the sub-image block is the central area of the corresponding image block, so that the processing result of the sub-image block is the calculation result of the central area of each image block, thereby avoiding the precision problem caused by inaccurate prediction of the edge of the image by the neural network model, and being beneficial to improving the precision of image processing. It should be understood that the processing result of the sub image block may be a target object distribution map corresponding to the sub image block predicted by the first neural network model, and then the processing result of the first image is the target object distribution map corresponding to the first image; because the first image can be obtained by shooting the region of the target object to be counted, the target object distribution map of the region can be quickly obtained by adopting the embodiment of the application, and the efficiency is higher compared with that of manual counting.

Referring to fig. 12, fig. 12 is a schematic structural diagram of an electronic device 1210 according to an embodiment of the present disclosure, where the electronic device 1210 includes a processor 1211, a memory 1212, and a communication interface 1213, and the processor 1211, the memory 1212, and the communication interface 1213 are connected to each other through a bus 1214.

The memory 1212 includes, but is not limited to, Random Access Memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM), or portable read-only memory (CD-ROM), and the memory 1212 is used for associated computer programs and data. The communication interface 1213 is used to receive and transmit data.

The processor 1211 may be one or more Central Processing Units (CPUs), and in the case where the processor 1211 is one CPU, the CPU may be a single-core CPU or a multi-core CPU.

The processor 1211 in the electronic device 1210 is configured to read the computer program code stored in the memory 1212, and perform the following steps: filling K/2 circles of zeros around the first image to obtain a second image, wherein the size of the first image is integral multiple of the size of a sliding window, and K is the step length of the sliding window; performing image cutting on the second image according to the sliding window and the sliding window step length to obtain a plurality of image blocks; processing the plurality of image blocks by adopting a first neural network model to obtain a processing result of each sub image block in a plurality of sub image blocks, wherein the plurality of image blocks correspond to the plurality of sub image blocks one by one, and the sub image blocks are regional images which take the center of the image block as the center and have the size of K multiplied by K in the image blocks; and obtaining the processing result of the first image according to the processing result of all or part of the sub image blocks in the plurality of sub image blocks.

It should be noted that implementation of each operation may also correspond to the corresponding description of the embodiment shown in fig. 1, and details are not described here again.

In the electronic device 1210 depicted in fig. 12, K/2 circles of zeros are filled around a first image to obtain a second image, the size of the first image being an integer multiple of the size of the sliding window; performing image cutting on the second image according to the sliding window and the sliding window step length K to obtain a plurality of image blocks, wherein the size of each image block in the plurality of image blocks is the same as that of the sliding window; and processing the plurality of image blocks by adopting a first neural network model to obtain a processing result of each sub-image block in the plurality of sub-image blocks, wherein the plurality of image blocks correspond to the plurality of sub-image blocks one by one, the sub-image blocks are area images which take the center of the image block as the center and have the size of K multiplied by K in the image block, namely the sub-image blocks are the central areas of the corresponding image blocks, and the first image can be obtained by combining all or part of the plurality of sub-image blocks, so the processing result of the first image can be obtained by combining the processing results of all or part of the plurality of sub-image blocks. In the embodiment of the application, the input of the first neural network model is an image block, the output of the first neural network model is the processing result of the sub-image block, and the sub-image block is the central area of the corresponding image block, so that the processing result of the sub-image block is the calculation result of the central area of each image block, thereby avoiding the precision problem caused by inaccurate prediction of the edge of the image by the neural network model, and being beneficial to improving the precision of image processing. It should be understood that the processing result of the sub image block may be a target object distribution map corresponding to the sub image block predicted by the first neural network model, and then the processing result of the first image is the target object distribution map corresponding to the first image; because the first image can be obtained by shooting the region of the target object to be counted, the target object distribution map of the region can be quickly obtained by adopting the embodiment of the application, and the efficiency is higher compared with that of manual counting.

The embodiment of the present application further provides a chip, where the chip includes at least one processor, a memory and an interface circuit, where the memory, the transceiver and the at least one processor are interconnected by a line, and the at least one memory stores a computer program; when the computer program is executed by the processor, the method flow shown in fig. 1 is implemented.

An embodiment of the present application further provides a computer-readable storage medium, in which a computer program is stored, and when the computer program runs on a computer, the method flow shown in fig. 1 is implemented.

The embodiment of the present application further provides a computer program product, and when the computer program product runs on a computer, the method flow shown in fig. 1 is implemented.

It should be understood that the Processor mentioned in the embodiments of the present Application may be a Central Processing Unit (CPU), and may also be other general purpose processors, Digital Signal Processors (DSP), Application Specific Integrated Circuits (ASIC), Field Programmable Gate Arrays (FPGA) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components, and the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

It will also be appreciated that the memory referred to in the embodiments of the application may be either volatile memory or nonvolatile memory, or may include both volatile and nonvolatile memory. The non-volatile Memory may be a Read-Only Memory (ROM), a Programmable ROM (PROM), an Erasable PROM (EPROM), an Electrically Erasable PROM (EEPROM), or a flash Memory. Volatile Memory can be Random Access Memory (RAM), which acts as external cache Memory. By way of example, but not limitation, many forms of RAM are available, such as Static random access memory (Static RAM, SRAM), Dynamic Random Access Memory (DRAM), Synchronous Dynamic random access memory (Synchronous DRAM, SDRAM), Double Data Rate Synchronous Dynamic random access memory (DDR SDRAM), Enhanced Synchronous SDRAM (ESDRAM), Synchronous link SDRAM (SLDRAM), and Direct Rambus RAM (DR RAM).

It should be noted that when the processor is a general-purpose processor, a DSP, an ASIC, an FPGA or other programmable logic device, a discrete gate or transistor logic device, or a discrete hardware component, the memory (memory module) is integrated in the processor.

It should be noted that the memory described herein is intended to comprise, without being limited to, these and any other suitable types of memory.

It should also be understood that reference herein to first, second, third, fourth, and various numerical designations is made only for ease of description and should not be used to limit the scope of the present application.

It should be understood that the term "and/or" herein is merely one type of association relationship that describes an associated object, meaning that three relationships may exist, e.g., a and/or B may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" herein generally indicates that the former and latter related objects are in an "or" relationship.

It should be understood that, in the various embodiments of the present application, the sequence numbers of the above-mentioned processes do not mean the execution sequence, and the execution sequence of the processes should be obtained by the functions and the inherent logic of the processes, and should not constitute any limitation to the implementation process of the embodiments of the present application.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the above-described division of units is only one type of division of logical functions, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.

The above functions, if implemented in the form of software functional units and sold or used as a separate product, may be stored in a computer-readable storage medium. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method described in the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

The steps in the method of the embodiment of the application can be sequentially adjusted, combined and deleted according to actual needs.

The modules in the device can be merged, divided and deleted according to actual needs.

The above embodiments are only used to illustrate the technical solutions of the present application, and not to limit the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present application.

Claims

1. An image processing method, comprising:

filling K/2 circles of zeros around the first image to obtain a second image, wherein the size of the first image is integral multiple of the size of a sliding window, and K is the step length of the sliding window;

performing image cutting on the second image according to the sliding window and the sliding window step length to obtain a plurality of image blocks;

processing the plurality of image blocks by adopting a first neural network model to obtain a processing result of each sub image block in a plurality of sub image blocks, wherein the plurality of image blocks correspond to the plurality of sub image blocks one by one, and the sub image blocks are regional images which take the center of the image block as the center and have the size of K multiplied by K in the image blocks;

and obtaining the processing result of the first image according to the processing result of all or part of the sub image blocks in the plurality of sub image blocks.

2. The method of claim 1, wherein the processing the plurality of image blocks using the first neural network model to obtain a processing result of each of a plurality of sub-image blocks comprises:

processing the plurality of image blocks by adopting a first neural network model to obtain a processing result of each image block in the plurality of image blocks;

and image interception is carried out on the processing result of each image block in the plurality of image blocks so as to obtain the processing result of each sub image block in the plurality of sub image blocks.

3. The method of claim 2, wherein the processing the plurality of image blocks by using the first neural network model to obtain the processing result of each of the plurality of image blocks comprises:

for each image block in the plurality of image blocks, executing the following steps to obtain a processing result of each image block in the plurality of image blocks:

performing feature extraction on a target image block by adopting a deep convolutional neural network model to obtain a first feature map of the target image block, wherein the target image block is any one of the plurality of image blocks;

performing convolution operation according to the first feature map and a first convolution kernel to obtain a first target feature map, wherein the size of the first convolution kernel is 1 × 1;

performing dilation convolution operation according to the first feature map and a second convolution kernel to obtain a second target feature map, wherein the size of the second convolution kernel is n × n, the dilation rate of the second convolution kernel is a first preset value, and n is an integer greater than 1;

performing dilation convolution operation according to the first feature map and a third convolution kernel to obtain a third target feature map, wherein the size of the third convolution kernel is n × n, and the dilation rate of the third convolution kernel is a second preset value;

performing dilation convolution operation according to the first feature map and a fourth convolution kernel to obtain a fourth target feature map, wherein the size of the fourth convolution kernel is n × n, and the dilation rate of the fourth convolution kernel is a third preset value;

performing pooling operation on the first feature map to obtain a fifth target feature map; wherein the first target feature map, the second target feature map, the third target feature map, the fourth target feature map, and the fifth target feature map are the same size;

fusing the first target feature map, the second target feature map, the third target feature map, the fourth target feature map and the fifth target feature map to obtain a second feature map;

performing convolution operation according to the second feature map and a fifth convolution kernel to obtain a third feature map, wherein the size of the fifth convolution kernel is 1 × 1;

performing upsampling processing on the third feature map to obtain a fourth feature map;

performing convolution operation according to the first feature map and a sixth convolution kernel to obtain a fifth feature map, wherein the size of the sixth convolution kernel is 1 × 1, and the sizes of the fourth feature map and the fifth feature map are the same;

fusing the fourth feature map and the fifth feature map to obtain a sixth feature map;

performing convolution operation according to the sixth feature map and a seventh convolution kernel to obtain a seventh feature map, wherein the size of the seventh convolution kernel is n × n;

and performing upsampling processing on the seventh feature map to obtain a processing result of the target image block.

4. The method according to claim 1, wherein the processing result of the sub image block is a target object distribution map corresponding to the sub image block, and the processing result of the first image is a target object distribution map corresponding to the first image; the obtaining of the processing result of the first image according to the processing result of all or part of the sub image blocks in the plurality of sub image blocks includes:

processing the target object distribution map corresponding to each of the plurality of sub image blocks to obtain the processed target object distribution map corresponding to all or part of the plurality of sub image blocks, wherein the processing includes hole filling and small connected domain removal, the holes refer to regions without target objects surrounded by the distribution regions of the same target object in the sub image blocks, and the small connected domains refer to scattered distribution regions of the target objects in the sub image blocks;

and obtaining a target object distribution map corresponding to the first image according to the target object distribution map corresponding to all or part of the processed sub image blocks.

5. The method according to any one of claims 1-4, wherein the first neural network model is obtained by:

training a preset neural network by adopting a first training set to obtain a second neural network model, wherein sample image blocks in the first training set are provided with labels;

predicting a second training set by using the second neural network model to obtain a third training set, wherein sample image blocks in the second training set have no label, and sample image blocks in the third training set have a pseudo label;

merging the first training set and the third training set to obtain a fourth training set;

and training the preset neural network by adopting the fourth training set to obtain the first neural network model.

6. The method according to claim 5, wherein the sample image blocks are obtained by the following strategy:

cutting a sample image by taking the size of the sliding window as 1024 × 1024 and the step length of the sliding window as 900, and when the proportion of a background class area in the sliding window to the sliding window is larger than 7/8, not cutting the currently framed image block of the sliding window; when the proportion of the background class area in the sliding window to the sliding window is smaller than 1/3, reducing the sliding window step length to 512 so as to increase the sampling rate;

or cutting the sample image by taking the size of the sliding window as 1024 × 1024 and the step length of the sliding window as 512, and when the proportion of the background type area in the sliding window is larger than 1/3, not cutting the currently framed image block of the sliding window.

7. The method of claim 5, wherein the label of the sample image block is obtained by:

setting a transition zone at the boundary between the edge of the sample image block and the class, wherein the boundary between the classes refers to the boundary between the distribution areas of different classes in the sample image block;

setting the label probability of a non-transition band region in the sample image block to be 1, and setting the label probability of a transition band region in the sample image block to be between 0 and 1.

8. An image processing apparatus characterized by comprising:

the device comprises a filling unit, a calculating unit and a calculating unit, wherein the filling unit is used for filling K/2 circles of zeros around a first image to obtain a second image, the size of the first image is integral multiple of the size of a sliding window, and K is the step length of the sliding window;

the cutting unit is used for carrying out image cutting on the second image according to the sliding window and the sliding window step length to obtain a plurality of image blocks;

the processing unit is used for processing the plurality of image blocks by adopting a first neural network model to obtain a processing result of each sub image block in the plurality of sub image blocks, wherein the plurality of image blocks correspond to the plurality of sub image blocks in a one-to-one manner, and the sub image blocks are regional images which take the center of the image block as the center and have the size of K multiplied by K in the image blocks; and obtaining the processing result of the first image according to the processing result of each sub image block in the plurality of sub image blocks.

9. An electronic device comprising a processor, a memory, a communication interface, and one or more programs stored in the memory and configured to be executed by the processor, the programs comprising instructions for performing the steps in the method of any of claims 1-7.

10. A computer-readable storage medium, characterized in that it stores a computer program for electronic data exchange, wherein the computer program causes a computer to perform the method according to any one of claims 1-7.