WO2023272432A1

WO2023272432A1 - Image processing method and image processing apparatus

Info

Publication number: WO2023272432A1
Application number: PCT/CN2021/102742
Authority: WO
Inventors: 伍文龙; 洪国伟
Original assignee: 华为技术有限公司
Priority date: 2021-06-28
Filing date: 2021-06-28
Publication date: 2023-01-05
Also published as: CN117501300A

Abstract

The present application relates to the field of artificial intelligence, and provides an image processing method and an image processing apparatus. The method comprises: acquiring a resource parameter of a neural network processing system and a model parameter of a first neural network; and according to the resource parameter and the model parameter, determining a first slicing size and a first filling size of an input image, and the number of splicing layers in the first neural network, wherein when the first neural network processes the input image according to the first slicing size, the first filling size and the splicing layers, the quality of an obtained output image is within a preset threshold range. In the solution, where the quality of an output image is made satisfy a preset threshold range, an appropriate slicing size, filling size and splicing layer are determined according to a resource parameter of a system, and a model parameter, such that the operation overheads and the memory consumption can be effectively reduced.

Description

Image processing method and image processing device

technical field

The present application relates to the field of artificial intelligence, and more specifically, relates to an image processing method and an image processing device.

Background technique

With the rapid development of artificial intelligence (AI) technology, neural networks (for example, deep neural networks) have made great achievements in the processing and analysis of various media signals such as images, videos, and voices in recent years. For image processing, how to reduce the amount of calculation and memory consumption becomes crucial under the high requirements of processing high-resolution images through deep neural network models.

In order to solve the above-mentioned problems of large amount of calculation and large memory consumption in the processing of high-resolution input images, the idea of slicing has emerged, that is, the input image is divided into multiple slices and each slice is processed separately, and then the processed results to splice. In order to ensure that the stitched image is as identical as possible to the output image obtained without slice processing, adjacent pixels need to be filled on the slice. However, the filling of slices in the prior art is prone to the problem of blocky artifacts appearing at joints, which leads to performance degradation such as visual quality and detection success rate. Filling will increase the amount of calculation to a certain extent, so the methods for dividing (including slicing and filling) images in the prior art either reduce the amount of calculation and memory consumption, but the distortion of the output image is too large; The distortion of the output image is small, but the calculation amount and memory consumption have not been effectively reduced.

Therefore, how to reduce the amount of calculation and memory consumption under the premise of ensuring the quality of the output image is an urgent technical problem to be solved.

Contents of the invention

Embodiments of the present application provide an image processing method and an image processing device, which can reduce calculation amount and memory consumption under the premise of ensuring output image quality.

In a first aspect, an image processing method is provided, the method includes: acquiring resource parameters of a neural network processing system and model parameters of a first neural network; determining the first input image of the first neural network according to the resource parameters and model parameters The slice size and the first filling size and the number of layers of the splicing layer in the first neural network, wherein the quality of the first output image is within a preset threshold range, and the first output image is obtained when the first neural network according to the first The slice size and the first padding size and the output image obtained when the stitching layer processes the input image.

In the technical solution of this application, on the premise that the quality of the output image meets the preset threshold range, the appropriate slice size and filling size and the splicing layer in the first neural network are determined according to the system resource parameters and model parameters. The number of layers can effectively reduce the calculation overhead and memory consumption, and there will be no excessive calculation or excessive distortion.

The resource parameter is used to represent the computing capability and storage capability of the neural network processing system. The first neural network refers to any neural network (that is, a neural network model) that can be invoked by the neural network processing system. It can also be understood that the first neural network is any neural network that can run in the neural network processing system.

The neural network processing system can be a processor, a chip, a hardware module, and the like.

Optionally, network architecture analysis may be performed on the first neural network, so as to obtain its model parameters. The model parameters of the first neural network are used to represent the amount of calculation and storage required by the first neural network, and the model parameters may include parameters such as its receptive field, operator, and the amount of calculation of each layer of neural network.

The stitching layer can be understood as the neural network layer that stitches the processing results into a whole feature map after processing slices in the first neural network, "the number of layers of the stitching layer in the first neural network" In other words, which layer (which layer) in the first neural network is the stitching layer. Therefore, it can be seen that assuming that the first neural network is composed of layers 0 to L-1, with a total of L layers, the number of layers of the stitching layer in the first neural network must be one of 0 to L-1. Value, L is a positive integer greater than 1.

In this embodiment of the present application, the quality can be represented by, for example, degree of distortion or image precision. The higher the quality, the more accurate the result will be when the output image is used for subsequent processing such as image recognition and image classification.

With reference to the first aspect, in some implementations of the first aspect, the first slice size and the first padding size and the number of layers of the stitching layer in the first neural network are such that the quality of the first output image is within a preset In the case where the value is within the threshold range, the first neural network requires the smallest amount of computation or storage. In this way, a better technical effect of reducing operation overhead and memory consumption can be further achieved.

In combination with the first aspect, in some implementations of the first aspect, the size of the first slice is smaller than or equal to the size of the input image, the number of layers of the stitching layer is smaller than or equal to the total number of layers of the first neural network, and the first padding The size is determined according to the preset threshold range and the receptive field of the splicing layer, and the amount of computation or storage required by each layer of the first neural network does not exceed the computation capability or storage capability of the neural network system.

With reference to the first aspect, in some implementations of the first aspect, the above-mentioned method further includes: dividing the input image into multiple tiles according to the first slice size and the first padding size, and combining multiple tiles at the splicing layer The processing results corresponding to the tiles are spliced, and the first output image is obtained according to the spliced image, and the processing results corresponding to the multiple tiles are obtained by processing the multiple tiles by the first neural network.

With reference to the first aspect, in some implementations of the first aspect, the quality of the first output image is within a preset threshold range specifically: the ratio of the quality of the first output image to the quality of the second output image is within a preset Within the range, the second output image is the output image obtained when the first neural network directly processes the input image.

In a second aspect, an image processing device is provided, and the device includes a unit for executing the method in any one of the implementation manners of the first aspect above.

In a third aspect, an image processing device is provided, which includes: a memory for storing programs; a processor for executing the programs stored in the memory, and when the programs stored in the memory are executed, the processing The device is used to execute the method in any one of the implementation manners in the first aspect.

In a fourth aspect, a computer-readable medium is provided, where the computer-readable medium stores program code for execution by a device, where the program code includes a method for executing any one of the implementation manners in the first aspect.

In a fifth aspect, a computer program product containing instructions is provided, and when the computer program product is run on a computer, it causes the computer to execute the method in any one of the implementation manners in the first aspect above.

In a sixth aspect, a chip is provided, the chip includes a processor and a data interface, the processor reads the instructions stored on the memory through the data interface, and executes any one of the implementations in the first aspect above. method.

Optionally, as an implementation manner, the chip may further include a memory, the memory stores instructions, the processor is configured to execute the instructions stored in the memory, and when the instructions are executed, the The processor is configured to execute the method in any one of the implementation manners in the first aspect.

Description of drawings

FIG. 1 is a schematic flowchart of an image processing method according to an embodiment of the present application.

FIG. 2 is a schematic flowchart of an execution process of an image processing method according to an embodiment of the present application.

FIG. 3 is a schematic diagram of the test effect of the image processing method of the embodiment of the present application.

FIG. 4 is a schematic block diagram of an image processing device according to an embodiment of the present application.

FIG. 5 is a schematic diagram of a hardware structure of an image processing device according to an embodiment of the present application.

detailed description

The technical solution in this application will be described below with reference to the accompanying drawings.

The embodiment of the present application involves a neural network. In order to better understand the method of the embodiment of the present application, the relevant terms and concepts of the neural network are firstly introduced below.

In the traditional solution, in order to reduce the amount of calculation and memory consumption, the input image is sliced, but the existing technology does not consider a better filling method, but only fills it with 0 for the convenience of calculation, resulting in a blocky appearance at the connection The problem of artifacts. In addition, filling will also bring a certain amount of calculation, but the prior art does not consider how to balance the amount of calculation and the degree of distortion. In short, the existing technology does not consider that different division methods will make the calculation amount, memory consumption and distortion different, resulting in that although the input image is divided, these methods of dividing the image either reduce the calculation amount and memory consumption , but the distortion of the output image is too large; or although the distortion of the output image is guaranteed to be small, the calculation amount and memory consumption have not been effectively reduced. That is to say, the solution in the prior art cannot effectively improve the image processing performance.

In view of the above problems, the embodiment of the present application proposes an image processing scheme, which determines the number of layers, slice size and filling size of the stitching layer according to the resource parameters of the neural network processing system and the parameters of the neural network, and ensures the accuracy of the output results of the neural network. The quality is within a preset threshold, thereby solving the above-mentioned problems caused by dividing the image.

In the embodiment of the present application, dividing an image may be understood as slicing the input image and filling the slice, so as to divide the input image into multiple tiles.

FIG. 1 is a schematic flowchart of an image processing method according to an embodiment of the present application. Each step shown in FIG. 1 will be introduced below.

101. Acquire resource parameters of a neural network processing system and model parameters of a first neural network.

102. According to the above resource parameters and model parameters, determine the first slice size and the first filling size of the input image of the first neural network, and the number of stitching layers in the first neural network.

Optionally, according to the above resource parameters and model parameters, determine the slice size and fill size and the alternative range of the number of layers of the stitching layer, and then select any slice size and fill size from the alternative range as the first slice size and the first padding size.

In order to obtain a better technical effect, it is also possible to determine the optimal values of the slice size and filling size and the number of layers of the stitching layer according to the above-mentioned resource parameters and model parameters, and then use the above-mentioned optimal values as the above-mentioned first slice size and The first infill size and the number of layers for the stitched layer.

In the process of determining the first slice size, the first padding size, and the number of layers of the stitching layer in the first neural network, it is also necessary to make the quality of the first output image within a preset threshold range, and the first output image It is obtained after the first neural network processes the above-mentioned input image according to the above-mentioned first slice size, the first filling size and the number of layers of the stitching layer. This is because, if you only consider reducing the amount of computation and memory consumption, regardless of the final output, it will result in that although the amount of computation is reduced a lot, the quality of the output image is very poor (such as severe distortion), and it cannot be used, so it will be lost. meaning of image processing.

In this embodiment of the present application, the quality can be represented by, for example, degree of distortion or image precision. The higher the quality, the more accurate the result will be when the output image is used for subsequent processing such as image recognition and image classification. The specific quality parameters used to measure the quality can be, for example, mAP, IoU, etc. hereinafter. In this way, it is possible to further select the slice size and padding size, as well as the number of layers of the splicing layer, which can reduce the amount of computation, memory consumption, etc., while ensuring that the quality of the output image is within an acceptable range. While improving the quality of the output image, the computation load and memory consumption are further reduced.

Optionally, the aforementioned slice size and filling size, as well as the number of concatenated layers, may also be determined in combination with the performance index of the first neural network. The performance index may be calculation amount, memory consumption, delay, etc., that is to say, parameters used to evaluate some performances of the first neural network itself. For ease of understanding, the embodiments of the present application mainly use calculation amount as an example for introduction.

In some implementations, for example, according to the above-mentioned resource parameters and model parameters, the slice size and filling size and the alternative range of the number of layers of the stitching layer can be determined, and then according to the performance index and the above-mentioned preset threshold range, determine the slice size and Optimal values for fill size and number of layers for stitched layers. Assuming that the performance index is the amount of calculation, it is equivalent to first determining the slice size and filling size and the number of layers of the splicing layer according to the above resource parameters and model parameters. The overall calculation amount for processing will not exceed the computing power of the neural network system, and the quality of the first output image is within a preset range, and then find the slice size and fill size that make the above calculation amount optimal from the alternative range, It is also possible to find the number of concatenated layers that optimizes the amount of computation described above.

Assume that the first neural network Π is a network of L layers, which are respectively the 0th layer to the L-1 layer, wherein the 0th layer obtains the input image, and the L-1 layer outputs the processed image; the on-chip memory of the neural network processing system The size is M words (words, 1 word is 32 bits (bit)). For the convenience of statistics, the accuracy of the storage required for the input image, operator coefficients, output image, operator output, etc. is 1 word; the size of the input image is (H, W, C), where H represents the height of the input image, W represents the width of the input image, and C represents the number of channels of the input image; the hyperparameter that makes the quality of the first output image within the preset threshold range is λ, when λ=1, represents the ratio of the quality of the above-mentioned first output image to the quality of the second output image, that is, the value range of λ is 0≤λ≤1, and when λ=1, represents the above-mentioned first output image The quality of is the same as the quality of the second output image, and for ease of understanding, it can be regarded as the user's tolerance for the degradation of the quality of the output image after image division.

Optionally, the first slice size, the first fill size, and the number of layers of the concatenated layer can be determined by minimizing the calculation amount F of Π. This process can be represented by the following formula (1).

in,

Indicates the optimal value of the number of layers of the stitching layer,

represents the optimal value for the height of the slice size,

Indicates the optimal value of the width of the slice size, F _Π indicates the amount of computation in the first neural network processing, (h _l , w _l , c _l ) respectively indicate the height, width and number of channels of the input feature map of the l-th layer , l is an integer greater than or equal to 0 and less than or equal to L-1. Equation (1) is the one that will minimize the amount of computation

They are respectively determined as the number of layers of the spliced layer, the height of the first slice size, and the width of the first slice size. Equation (1) also needs to satisfy the following constraints.

_0≤lc≤L -1;

^0≤ht≤H -1;

0≤w ^t ≤W-1.

That is to say, the previous L constraints are to ensure that the sum of the memory occupied by the input, output and operator coefficients of each layer of the first neural network (from layer 0 to layer L-1) cannot exceed M words restrictions; the latter three constraints are to ensure that the layer number range of the splicing layer lc is the layer number range of the first neural network (from the 0th layer to the L- _1th layer), and the slice height ^{ht is} smaller than the input image The height H of the slice (less than or equal to H-1), the width w ^t of the slice is less than the width W of the input image (less than or equal to W-1).

Use N _T to represent the number of slices obtained after dividing the input image according to the height h ^t and width w ^t of the slice size without padding, then

That is, the number of slices N _T is the upper integer value of the quotient obtained by dividing the size of the input image by the size of the slice without padding. And the height and width (h _l , w _l ) of the input feature map of layer l satisfy the following formula (2).

Among them, the input tiles of layer l _c are spliced together; r _l is the input receptive field size of layer l; (λr _l +h ^t ,λr _l +w ^t ) is the block size after filling, filling The size is p=λr _l /2, therefore, in formula (2),

is the padded tile size of layer 0, specifically, the padded size of

The size of the block obtained after padding; H _Π (m,h) indicates the height of the output feature map of the mth layer when the height of the input feature map is h, therefore, H _Π (l- 1, h _l-1 ) means that when the height of the input feature map is h _l-1 , the height of the output feature map at layer l-1; W _Π (m, w) means that when the width of the input feature map is w, The width of the output feature map of the m-th layer, therefore, H _Π (l-1,w _l-1 ) in the formula (2) means that when the width of the input feature map is w _l-1 , at the l-1 layer The width of the output feature map. suppose

Respectively represent the height and width of the l-th layer operator. If the operator is a convolution, the height and width of the operator are the height and width of the convolution kernel. If the operator is pooling, the height and width of the operator are both 0 for max pooling, and the height and width of the pooling for average pooling. If the operator is an activation function, then the height and width of the operator are 0, assuming the activation function can be implemented as a lookup table that does not involve any computation. In formula (2)

Indicates that when l=l _c , the value is

That is N _T ; when l≠l _c , the value is

That is, it is 1, that is to say, the value is N _T in the stitching layer, and the value is 1 in the non-splicing layer.

is the whole first neural network when the input of the splicing layer (layer _lc ) is N _T input tiles, and the size of the input tiles of the 0th layer after filling is (h ₀ , w ₀ ) The total amount of computation; f _Π (n,m,x,y) represents the amount of computation from the nth layer to the mth layer, where the size of the input feature map of the nth layer is (x, y), therefore, f _Π ( 0,l _c -1,h ₀ ,w ₀ )N _T represents the amount of computation from layer 0 to layer l _c -1, since the number of blocks is N _T , so the amount of computation is the amount of computation for each block Multiplied by the number of tiles, where the size of the input feature map of layer 0 is (h ₀ ,w ₀ ),

Indicates the amount of computation from the _lc layer to the L-1 layer. Since the _lc layer splices the processing results of multiple tiles to obtain a feature map, the amount of computation is related to the size of the spliced feature map, where The size of the input feature map of the _lc layer is

In the embodiment of this application, the optimal value of the padding size is

By adjusting the size of the hyperparameter λ, the adjustment of the preset threshold range of the above output image can be realized. Assuming that the above-mentioned preset threshold range is the degraded range of the image quality of the output image, the closer λ is to 1, the less the image quality degrades, and the closer λ is to 0, the more the image quality degrades. The so-called image quality degradation of the output image can be understood as that the quality of the first output image is worse than the quality of the second output image.

In determining the optimal values for the slice size and the number of layers of the concatenated layer, optimization can be performed by exhaustively searching the parameter space of ^ht , ^wt , _lc . For example one could start the search in increasing order of parameter magnitude, thus iterating from the smallest allowed value of the slice size to the largest allowed value, and lc _iterating from 0 (taking input) to L-1 (producing the final output). To speed up the optimization, if the following factors are considered, the parameter search space can be reduced without degrading the optimization performance. Assume {l _c =0, . . . , m}, {h ^t }, {w ^t } are explored in the search with all L constraints satisfied. First, assume that l=m is an activation function layer (such as Relu, Sigmoid function, etc.) or a pooling layer, where (h _m , w _m )=(h _m+1 , w _m+1 ). In both cases, for l _c =m+1 and l _c =m, the total computation of the first neural network and the satisfaction of the L constraints are the same, because M _m+1 (m)≤M _m (m) and M _m+1 (i)=M _m (i), i≠m, where, when l _c =l, M _l (m) is the storage requirement of the mth layer. Therefore, l _c =m+1, {h ^t } and {w ^t } can be skipped in the search. Second, for a given tile size (h ^t , w ^t ) and concatenated layer l _c , assume that the memory requirement of layer l (l≤l _c ) exceeds M. Since r ₀ ≤r ₁ ≤…≤r _L-1 , so for concatenated layer l _c '>l _c , tile size (h ^t '>h ^t , w ^t '>w ^t ), layer l (l≤ l _c ') also has more memory requirements than M. Therefore, the search for concatenated layers l _c '>l _c and tile size (h ^t '>h ^t , w ^t '>w ^t ) is unnecessary and can be terminated.

That is, when determining the slice size from the range of candidates, the activation function layer and the next layer in the pooling layer can be skipped.

103. Divide the input image into multiple tiles according to the first slice size and the first padding size.

That is to say, the input image is divided according to the slice size determined in step 102 to obtain multiple slices, and then the multiple slices are filled according to the determined filling size to obtain multiple tiles.

104. Splicing the processing results corresponding to the above multiple tiles, and obtaining a first output image according to the spliced image, the processing results corresponding to the above multiple tiles are obtained by processing the multiple tiles by the first neural network .

That is to say, by inputting the above-mentioned multiple tiles into the first neural network respectively, the processing results corresponding to the multiple tiles can be obtained. After splicing these processing results, a spliced processed image can be obtained. The spliced processing The image can be used as the input of the next layer of the first neural network, or can be used as the target processing result of the first neural network on the input image, that is, the above-mentioned first output image.

In the embodiment of the present application, assuming that the output image obtained by directly inputting the input image into the first neural network is called the second output image, the closer the quality of the first output image is to the quality of the second output image, the better.

Stitching needs to obtain the processing results of all the above-mentioned multiple tiles before it can be performed.

In the solution shown in Figure 1, on the premise that the quality of the output image meets the preset threshold range, the appropriate slice size, padding size, and number of stitching layers are determined according to the system resource parameters and model parameters, so as to effectively Integrating the quality of the output image and reducing the computing overhead and memory consumption effectively improves the performance of image processing without excessive computation or serious distortion. In addition, when determining the slice size and filling size and the number of layers of the splicing layer, select the value that can make the quality of the output image meet the preset threshold range under the premise that the calculation amount or storage amount of the first neural network is the smallest value, which can further achieve Better technical effect of reducing computing overhead and memory consumption.

FIG. 2 is a schematic flowchart of an execution process of an image processing method according to an embodiment of the present application. Figure 2 can be seen as an example of Figure 1.

201. Analyze the neural network model to obtain the receptive field of the neural network model, the operator, and the calculation amount of each layer of the neural network.

The network architecture analysis module can be used to analyze the neural network model to obtain the above-mentioned receptive fields, operators and the calculation amount of each layer of neural network.

This neural network model can be regarded as an example of the first neural network in FIG. 1 .

202. Obtain the memory size of the neural network processing system.

The memory size can be, for example, the storage space size of a fast on-chip memory, which can be used to evaluate the computing capability and storage capability of the neural network system. The above-mentioned on-chip memory may include, for example: L1, L2, L3, HBM, DDR, and the like. The memory size can be regarded as an example of the resource parameter described in FIG. 1 .

Step 201 and step 202 can be regarded as an example of step 101, and step 201 and step 202 may or may not be performed at the same time, and there is no restriction on the order of execution.

203. Determine the first slice size and the first padding size of the input image and the number of spliced layers.

For step 203, reference may be made to the introduction of step 102.

204(a). Divide the input image to obtain multiple tiles.

For the contents of obtaining multiple tiles in step 204, please refer to the introduction of step 103 completely.

204(b), the current processing layer N of the neural network model is set to 0, and the next block to be processed is selected from the multiple blocks obtained in step 204(a).

205. Input the block to be processed to the Nth layer of the neural network model for processing, and obtain a processing result of the block, which becomes the next block to be processed.

206. Judging whether the Nth layer is the last layer, if the judgment result is "yes", execute step 207; if the judgment result is "no", execute step 208.

207. Output the first output image.

That is to say, the first output image is the processing result of the last layer of the neural network model.

208 . Determine whether the Nth layer is a concatenated layer, and if the determination result is “Yes”, perform step 209 ; if the determination result is “No”, perform step 210 .

209. Judging whether splicing is possible, if the judging result is "yes", go to step 211; if the judging result is "no", go to step 204(b).

In step 209, the basis for judging whether splicing is possible is whether the Nth layer has processed all the multiple tiles.

210. The value of N is incremented by 1, and step 205 is executed.

That is, the input tiles are processed using the next neural network layer. The function of step 210 is to traverse all neural network layers.

211 . Merge the processing results corresponding to the plurality of tiles, and the stitched tile becomes the next tile to be processed, and execute step 210 .

Steps 204-211 can be regarded as an example of steps 103-104, and mainly implement the following process. First, the input image is divided into tiles with a padding of a selected size. Step 0, set the current processing layer N of the neural network model to 0, and at this time, the next block from the input image can also be loaded into the memory as data input. Step 1, after loading the kernel of the Nth layer into the memory, start to process the data of the Nth layer. Assuming that N is the last layer of the network, the output of the Nth layer is the final output of the neural network model on the input image processing results. Assuming that N is neither the last layer nor the splicing layer, the output of the Nth layer will become the input of the N+1th layer, so it is kept in memory, and N is accumulated by 1, and this step 1 is repeated. Assuming that N is not the last layer, but a splicing layer, if the Nth layer has processed multiple tiles, all the processing results of the Nth layer are spliced, and the spliced tiles are kept in the memory. And add N to 1, repeat step 1; otherwise, repeat step 0.

In order to facilitate the understanding of the technical effects of the solutions of the embodiments of the present application, a more specific test example will be used below to illustrate. The solution of the embodiment of the present application can be used in various types of neural networks such as convolutional neural networks, and the following tests take a residual network (ResNet) as an example. The residual network is a deep convolutional network proposed in 2015. Compared with the traditional convolutional neural network, the residual network is easier to optimize and can increase the accuracy by increasing the depth. The core of the residual network is to solve the side effects (degeneration problem) caused by increasing the depth, so that the network performance can be improved by simply increasing the network depth. The residual network generally contains many submodules with the same structure. Usually, the residual network is used to connect a number to indicate the number of repetitions of the submodules. For example, ResNet50 means that there are 50 submodules in the residual network.

In this example, the resolution of the input image is adjusted to 1600x1600, and the 11th layer (Conv2_9) of the trained ResNet50 is set as the stitching layer. The receptive field size of the 11th layer of ResNet50 is 51, so it can be inferred that when the value of the padding size is greater than or equal to 26 (calculated by 51/2), it can ensure that the stitched tiles do not appear blocky artifacts, Three slice sizes are determined: 96x96, 192x192, 384x384 (that is, the test value of the slice size), and the fill size test value is 0, 64. The test results are shown in Figure 3.

FIG. 3 is a schematic diagram of the test effect of the image processing method of the embodiment of the present application. In FIG. 3 , curve A, curve B, and curve C are the change curves of the corresponding quality parameters of slice sizes 384x384, 192x192, and 96x96, respectively. It can be seen from FIG. 3 that when the padding size is equal to 64, it is equivalent to the case of p=64 and λ=1, and the quality parameters of the output image are the same as those without slices. . When the padding size is reduced from 64 to 0, which is equivalent to the case of p=0 and λ=0, the quality parameters of the output images of the three slice sizes all decrease, but in the case of the same padding size, the smaller the slice size The larger the value, the higher the quality parameter of the output image. Compared with padding size=64, when padding size=0, for a slice of size 384x384, the quality drops by 0.6% (ie (0.354-0.352)/0.354), and the amount of computation reduces by about 27%. For a slice of size 192 x 192, the quality drops by 0.8% (ie (0.354-0.351)/0.354), and the amount of operations is reduced by about 44%; for a slice of size 96x96, the quality drops by 2.3% (ie (0.354- 0.346)/0.354), the calculation amount is reduced by about 64%.

It should be noted that, in the example shown in Figure 3, the above quality parameters use the common target detection evaluation index, mAP@IoU. mAP means mean average precision; IoU means intersection over union, mAP@IoU means the detection accuracy of the trained model on all categories of a specific IoU, and IoU means the generated candidate box ( Candidate bound) and the original mark box (ground truth bound) overlap rate or overlapping degree, that is, the ratio of their intersection and union, the ideal situation is complete overlap, that is, the ratio is 1. In the example shown in Figure 3, the target detection performance index of the neural network model for processing the input image (with a resolution of 1600x1600) is mAP@IoU=0.5:0.95, that is, when the IoU is in the range between 0.5 and 0.95 mAP.

Therefore, the above three kinds of slices can be exchanged for a large reduction in the amount of calculation at the cost of a very small quality reduction.

The image processing apparatus according to the embodiment of the present application will be introduced below with reference to FIG. 4 . The image processing device shown in FIG. 4 can be used to execute each step of the image processing method of the embodiment of the present application, and the image processing device can be a computer, a server and other devices with sufficient computing power to construct a neural network.

FIG. 4 is a schematic block diagram of an image processing device according to an embodiment of the present application. The apparatus 2000 shown in FIG. 4 includes an acquisition unit 2001 and a processing unit 2002 .

The apparatus 2000 may be used to execute the steps of the image processing method of the embodiment of the present application. For example, the acquiring unit 2001 may be used to execute step 101 in the method shown in FIG. 1 , and the processing unit 2002 may be used to execute steps 102 to 104 in the method shown in FIG. 1 . For another example, the acquiring unit 2001 may be used to execute steps 201 and 202 in the method shown in FIG. 2 , and the processing unit 2002 may be used to execute steps 203 to 211 in the method shown in FIG. 13 .

In the device 2000 shown in FIG. 4, the acquiring unit 2001 may be equivalent to the communication interface 3003 in the device 3000 shown in FIG. For the processor 3002 in the apparatus 3000 shown in FIG. 5 , the above resource parameters and model parameters can be obtained from the memory 3001 through the processor 3002 at this time.

In addition, the processing unit 2002 in the apparatus 2000 shown in FIG. 4 may be equivalent to the processor 3002 in the apparatus 3000 shown in FIG. 5 .

FIG. 5 is a schematic diagram of a hardware structure of an image processing device according to an embodiment of the present application. The device 3000 shown in FIG. 5 includes a memory 3001 , a processor 3002 , a communication interface 3003 and a bus 3004 . Wherein, the memory 3001 , the processor 3002 , and the communication interface 3003 are connected to each other through a bus 3004 .

The memory 3001 may be a read only memory (read only memory, ROM), a static storage device, a dynamic storage device or a random access memory (random access memory, RAM). The memory 3001 may store programs, and when the programs stored in the memory 3001 are executed by the processor 3002, the processor 3002 and the communication interface 3003 are used to execute various steps of the image processing method of the embodiment of the present application.

The processor 3002 may be a general-purpose central processing unit (central processing unit, CPU), a microprocessor, an application specific integrated circuit (application specific integrated circuit, ASIC), a graphics processing unit (graphics processing unit, GPU) or one or more The integrated circuit is used to execute related programs to realize the functions required by the units in the image processing device of the embodiment of the present application, or to execute various steps of the image processing method of the embodiment of the present application.

The processor 3002 may also be an integrated circuit chip with signal processing capabilities. During implementation, each step of the image processing method in the embodiment of the present application may be completed by an integrated logic circuit of hardware in the processor 3002 or instructions in the form of software.

The above-mentioned processor 3002 may also be a general-purpose processor, a digital signal processor (digital signal processing, DSP), ASIC, FPGA or other programmable logic devices, discrete gate or transistor logic devices, and discrete hardware components. Various methods, steps, and logic block diagrams disclosed in the embodiments of the present application may be implemented or executed. A general-purpose processor may be a microprocessor, or the processor may be any conventional processor, or the like. The steps of the image processing method disclosed in the embodiments of the present application may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor. The software module can be located in a mature storage medium in the field such as random access memory, flash memory, read-only memory, programmable read-only memory or electrically erasable programmable memory, register. The storage medium is located in the memory 3001, and the processor 3002 reads the information in the memory 3001, and combines its hardware to complete the functions required by the units included in the image processing device of the embodiment of the present application, or execute the image processing method of the embodiment of the present application each step.

The communication interface 3003 implements communication between the apparatus 3000 and other devices or communication networks by using a transceiver device such as but not limited to a transceiver. For example, the control parameters corresponding to the inference results may be sent through the communication interface 3003 .

The bus 3004 may include a pathway for transferring information between various components of the device 3000 (eg, memory 3001 , processor 3002 , communication interface 3003 ).

It should be noted that although the above-mentioned device 3000 only shows memory, processor, and communication interface, those skilled in the art should understand that the device 3000 may also include other devices necessary for normal operation during specific implementation. Meanwhile, according to specific needs, those skilled in the art should understand that the apparatus 3000 may also include hardware devices for implementing other additional functions. In addition, those skilled in the art should understand that the apparatus 3000 may also only include the components necessary to realize the embodiment of the present application, and does not necessarily include all the components shown in FIG. 5 .

The embodiment of the present application does not specifically limit the specific structure of the execution subject of the method provided in the embodiment of the present application, as long as the program that records the code of the method provided in the embodiment of the present application can be executed according to the method provided in the embodiment of the present application Just communicate. For example, the subject of execution of the method provided by the embodiment of the present application may be a terminal device or a network device, or a functional module in the terminal device or network device that can call a program and execute the program.

Various aspects or features of the present application can be implemented as a method, apparatus, or article of manufacture using standard programming and/or engineering techniques. The term "article of manufacture" as used herein may encompass a computer program accessible from any computer readable device, carrier or media. For example, computer-readable media may include, but are not limited to, magnetic storage devices (such as hard disks, floppy disks, or tapes, etc.), optical disks (such as compact discs (compact disc, CD), digital versatile discs (digital versatile disc, DVD), etc. ), smart cards and flash memory devices (for example, erasable programmable read-only memory (EPROM), card, stick or key drive, etc.).

Various storage media described herein can represent one or more devices and/or other machine-readable media for storing information. The term "machine-readable medium" may include, but is not limited to, wireless channels and various other media capable of storing, containing and/or carrying instructions and/or data.

It should be noted that when the processor is a general-purpose processor, DSP, ASIC, FPGA or other programmable logic devices, discrete gate or transistor logic devices, or discrete hardware components, the memory (storage module) may be integrated in the processor.

It should also be noted that the memories described herein are intended to include, but are not limited to, these and any other suitable types of memories.

Those skilled in the art can appreciate that the units and steps of each example described in conjunction with the embodiments disclosed herein can be implemented by electronic hardware, or a combination of computer software and electronic hardware. Whether these functions are implemented by hardware or software depends on the specific application and design constraints of the technical solution. Professionals may use different methods to implement the described functions for each specific application, but such implementation should not be regarded as exceeding the protection scope of the present application.

Those skilled in the art can clearly understand that for the convenience and brevity of description, the specific working process of the above-described devices and units can refer to the corresponding process in the foregoing method embodiments, and details are not repeated here.

In the several embodiments provided in this application, it should be understood that the disclosed devices and methods may be implemented in other ways. For example, the device embodiments described above are only illustrative. For example, the division of the units is only a logical function division. In actual implementation, there may be other division methods. For example, multiple units or components can be combined or May be integrated into another system, or some features may be ignored, or not implemented. In addition, the mutual coupling or direct coupling or communication connection shown or discussed may be through some interfaces, and the indirect coupling or communication connection of devices or units may be in electrical, mechanical or other forms.

The units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in one place, or may be distributed to multiple network units. Part or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in each embodiment of the present application may be integrated into one unit, each unit may exist separately physically, or two or more units may be integrated into one unit.

If the functions described above are realized in the form of software function units and sold or used as independent products, they can be stored in a computer-readable storage medium. Based on this understanding, the essence of the technical solution of this application, or the part that contributes to the prior art, or the part of the technical solution, can be embodied in the form of computer software products, which are stored in a storage In the medium, the computer software product includes several instructions, which are used to make a computer device (which may be a personal computer, server, or network device, etc.) execute all or part of the steps of the methods described in the various embodiments of the present application. The aforementioned storage medium may include, but is not limited to: various media capable of storing program codes such as U disk, mobile hard disk, ROM, RAM, magnetic disk or optical disk.

The above is only a specific implementation of the application, but the scope of protection of the application is not limited thereto. Anyone familiar with the technical field can easily think of changes or substitutions within the technical scope disclosed in the application. Should be covered within the protection scope of this application. Therefore, the protection scope of the present application should be determined by the protection scope of the claims.

Claims

An image processing method, characterized in that, comprising:

Acquire resource parameters of the neural network processing system and model parameters of the first neural network, the resource parameters are used to represent the computing capability and storage capacity of the neural network processing system, and the model parameters are used to represent the first neural network The amount of computation and storage required;

According to the resource parameters and the model parameters, determine the first slice size and the first filling size of the input image of the first neural network and the number of layers of the splicing layer in the first neural network, wherein the first The quality of an output image is within a preset threshold range, the first output image is when the first neural network pairs the first slice size and the first padding size with the stitching layer on the The output image obtained when the input image is processed.
The method of claim 1, wherein the first slice size and the first padding size and the number of stitching layers in the first neural network are such that the first output image When the quality is within the preset threshold range, the first neural network requires the smallest amount of computation or storage.
The method according to claim 1 or 2, wherein the size of the first slice is smaller than or equal to the size of the input image, and the number of layers of the stitching layer is smaller than or equal to that of the first neural network. The total number of layers, the first filling size is determined according to the preset threshold range and the receptive field of the splicing layer, and the amount of computation or storage required by each layer of the first neural network does not exceed the specified The computing power or storage capacity of the neural network system.
The method according to any one of claims 1 to 3, further comprising:

dividing the input image into a plurality of tiles according to the first slice size and the first padding size;

The processing results corresponding to the plurality of tiles are stitched at the stitching layer, and the first output image is obtained according to the stitched image, and the processing results corresponding to the plurality of tiles are the first neural network obtained by processing the multiple tiles.
The method according to any one of claims 1 to 4, wherein the quality of the first output image within a preset threshold range is specifically: the difference between the quality of the first output image and the second output image The quality ratio of is within a preset range, and the second output image is an output image obtained when the first neural network directly processes the input image.
An image processing device, characterized in that it comprises:

An acquisition unit, configured to acquire resource parameters of the neural network processing system and model parameters of the first neural network, the resource parameters are used to represent the computing capability and storage capacity of the neural network processing system, and the model parameters are used to represent the Describe the amount of computation and storage required by the first neural network;

A processing unit, configured to determine the first slice size and the first padding size of the input image of the first neural network and the layer of the stitching layer in the first neural network according to the resource parameters and the model parameters number, wherein the quality of the first output image is within a preset threshold range, the first output image is obtained when the first neural network according to the first slice size, the first padding size and the An output image obtained when the splicing layer processes the input image.
The apparatus according to claim 6, wherein the first slice size and the first padding size and the stitching layer are such that the quality of the first output image is within a preset threshold range In some cases, the first neural network requires the least amount of computation or storage.
The device according to claim 6 or 7, wherein the size of the first slice is smaller than or equal to the size of the input image, and the number of layers of the stitching layer is smaller than or equal to that of the first neural network. The total number of layers, the first filling size is determined according to the preset threshold range and the receptive field of the splicing layer, and the amount of computation or storage required by each layer of the first neural network does not exceed the specified The computing power or storage capacity of the neural network system.
The device according to any one of claims 6 to 8, wherein the processing unit is further configured to:

dividing the input image into a plurality of tiles according to the first slice size and the first padding size;

The processing results corresponding to the multiple tiles are spliced at the splicing layer, and the processing results corresponding to the multiple tiles are obtained by processing the multiple tiles by the first neural network.
The device according to any one of claims 6 to 9, wherein the quality of the first output image within a preset threshold range is specifically: the difference between the quality of the first output image and the second output image The quality ratio of is within a preset range, and the second output image is an output image obtained when the first neural network directly processes the input image.
A computer-readable storage medium, characterized in that the computer-readable medium stores program code for execution by a device, and the program code includes instructions for executing the method according to any one of claims 1 to 5.
A chip, characterized in that the chip includes a processor and a data interface, and the processor reads instructions stored on the memory through the data interface to execute the method described in any one of claims 1 to 5. method.