WO2023272432A1 - Image processing method and image processing apparatus - Google Patents

Image processing method and image processing apparatus Download PDF

Info

Publication number
WO2023272432A1
WO2023272432A1 PCT/CN2021/102742 CN2021102742W WO2023272432A1 WO 2023272432 A1 WO2023272432 A1 WO 2023272432A1 CN 2021102742 W CN2021102742 W CN 2021102742W WO 2023272432 A1 WO2023272432 A1 WO 2023272432A1
Authority
WO
WIPO (PCT)
Prior art keywords
neural network
size
layer
output image
image
Prior art date
Application number
PCT/CN2021/102742
Other languages
French (fr)
Chinese (zh)
Inventor
伍文龙
洪国伟
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to CN202180099529.1A priority Critical patent/CN117501300A/en
Priority to PCT/CN2021/102742 priority patent/WO2023272432A1/en
Publication of WO2023272432A1 publication Critical patent/WO2023272432A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T1/00General purpose image data processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image

Definitions

  • the present application relates to the field of artificial intelligence, and more specifically, relates to an image processing method and an image processing device.
  • neural networks for example, deep neural networks
  • AI artificial intelligence
  • Embodiments of the present application provide an image processing method and an image processing device, which can reduce calculation amount and memory consumption under the premise of ensuring output image quality.
  • an image processing method includes: acquiring resource parameters of a neural network processing system and model parameters of a first neural network; determining the first input image of the first neural network according to the resource parameters and model parameters The slice size and the first filling size and the number of layers of the splicing layer in the first neural network, wherein the quality of the first output image is within a preset threshold range, and the first output image is obtained when the first neural network according to the first The slice size and the first padding size and the output image obtained when the stitching layer processes the input image.
  • the appropriate slice size and filling size and the splicing layer in the first neural network are determined according to the system resource parameters and model parameters.
  • the number of layers can effectively reduce the calculation overhead and memory consumption, and there will be no excessive calculation or excessive distortion.
  • the resource parameter is used to represent the computing capability and storage capability of the neural network processing system.
  • the first neural network refers to any neural network (that is, a neural network model) that can be invoked by the neural network processing system. It can also be understood that the first neural network is any neural network that can run in the neural network processing system.
  • the neural network processing system can be a processor, a chip, a hardware module, and the like.
  • network architecture analysis may be performed on the first neural network, so as to obtain its model parameters.
  • the model parameters of the first neural network are used to represent the amount of calculation and storage required by the first neural network, and the model parameters may include parameters such as its receptive field, operator, and the amount of calculation of each layer of neural network.
  • the stitching layer can be understood as the neural network layer that stitches the processing results into a whole feature map after processing slices in the first neural network, "the number of layers of the stitching layer in the first neural network" In other words, which layer (which layer) in the first neural network is the stitching layer. Therefore, it can be seen that assuming that the first neural network is composed of layers 0 to L-1, with a total of L layers, the number of layers of the stitching layer in the first neural network must be one of 0 to L-1. Value, L is a positive integer greater than 1.
  • the quality can be represented by, for example, degree of distortion or image precision.
  • the first slice size and the first padding size and the number of layers of the stitching layer in the first neural network are such that the quality of the first output image is within a preset
  • the first neural network requires the smallest amount of computation or storage. In this way, a better technical effect of reducing operation overhead and memory consumption can be further achieved.
  • the size of the first slice is smaller than or equal to the size of the input image
  • the number of layers of the stitching layer is smaller than or equal to the total number of layers of the first neural network
  • the first padding The size is determined according to the preset threshold range and the receptive field of the splicing layer, and the amount of computation or storage required by each layer of the first neural network does not exceed the computation capability or storage capability of the neural network system.
  • the above-mentioned method further includes: dividing the input image into multiple tiles according to the first slice size and the first padding size, and combining multiple tiles at the splicing layer
  • the processing results corresponding to the tiles are spliced, and the first output image is obtained according to the spliced image, and the processing results corresponding to the multiple tiles are obtained by processing the multiple tiles by the first neural network.
  • the quality of the first output image is within a preset threshold range specifically: the ratio of the quality of the first output image to the quality of the second output image is within a preset Within the range, the second output image is the output image obtained when the first neural network directly processes the input image.
  • an image processing device in a second aspect, includes a unit for executing the method in any one of the implementation manners of the first aspect above.
  • an image processing device which includes: a memory for storing programs; a processor for executing the programs stored in the memory, and when the programs stored in the memory are executed, the processing The device is used to execute the method in any one of the implementation manners in the first aspect.
  • a computer-readable medium stores program code for execution by a device, where the program code includes a method for executing any one of the implementation manners in the first aspect.
  • a computer program product containing instructions is provided, and when the computer program product is run on a computer, it causes the computer to execute the method in any one of the implementation manners in the first aspect above.
  • a chip in a sixth aspect, includes a processor and a data interface, the processor reads the instructions stored on the memory through the data interface, and executes any one of the implementations in the first aspect above. method.
  • the chip may further include a memory, the memory stores instructions, the processor is configured to execute the instructions stored in the memory, and when the instructions are executed, the The processor is configured to execute the method in any one of the implementation manners in the first aspect.
  • FIG. 1 is a schematic flowchart of an image processing method according to an embodiment of the present application.
  • FIG. 2 is a schematic flowchart of an execution process of an image processing method according to an embodiment of the present application.
  • FIG. 3 is a schematic diagram of the test effect of the image processing method of the embodiment of the present application.
  • FIG. 4 is a schematic block diagram of an image processing device according to an embodiment of the present application.
  • FIG. 5 is a schematic diagram of a hardware structure of an image processing device according to an embodiment of the present application.
  • the embodiment of the present application involves a neural network.
  • the relevant terms and concepts of the neural network are firstly introduced below.
  • the existing technology does not consider that different division methods will make the calculation amount, memory consumption and distortion different, resulting in that although the input image is divided, these methods of dividing the image either reduce the calculation amount and memory consumption , but the distortion of the output image is too large; or although the distortion of the output image is guaranteed to be small, the calculation amount and memory consumption have not been effectively reduced. That is to say, the solution in the prior art cannot effectively improve the image processing performance.
  • the embodiment of the present application proposes an image processing scheme, which determines the number of layers, slice size and filling size of the stitching layer according to the resource parameters of the neural network processing system and the parameters of the neural network, and ensures the accuracy of the output results of the neural network.
  • the quality is within a preset threshold, thereby solving the above-mentioned problems caused by dividing the image.
  • dividing an image may be understood as slicing the input image and filling the slice, so as to divide the input image into multiple tiles.
  • FIG. 1 is a schematic flowchart of an image processing method according to an embodiment of the present application. Each step shown in FIG. 1 will be introduced below.
  • the resource parameter is used to represent the computing capability and storage capability of the neural network processing system.
  • the first neural network refers to any neural network (that is, a neural network model) that can be invoked by the neural network processing system. It can also be understood that the first neural network is any neural network that can run in the neural network processing system.
  • the neural network processing system can be a processor, a chip, a hardware module, and the like.
  • network architecture analysis may be performed on the first neural network, so as to obtain its model parameters.
  • the model parameters of the first neural network are used to represent the amount of calculation and storage required by the first neural network, and the model parameters may include parameters such as its receptive field, operator, and the amount of calculation of each layer of neural network.
  • the stitching layer can be understood as the neural network layer that stitches the processing results into a whole feature map after processing slices in the first neural network, "the number of layers of the stitching layer in the first neural network" In other words, which layer (which layer) in the first neural network is the stitching layer. Therefore, it can be seen that assuming that the first neural network is composed of layers 0 to L-1, with a total of L layers, the number of layers of the stitching layer in the first neural network must be one of 0 to L-1. Value, L is a positive integer greater than 1.
  • resource parameters and model parameters determine the slice size and fill size and the alternative range of the number of layers of the stitching layer, and then select any slice size and fill size from the alternative range as the first slice size and the first padding size.
  • the first neural network In the process of determining the first slice size, the first padding size, and the number of layers of the stitching layer in the first neural network, it is also necessary to make the quality of the first output image within a preset threshold range, and the first output image It is obtained after the first neural network processes the above-mentioned input image according to the above-mentioned first slice size, the first filling size and the number of layers of the stitching layer. This is because, if you only consider reducing the amount of computation and memory consumption, regardless of the final output, it will result in that although the amount of computation is reduced a lot, the quality of the output image is very poor (such as severe distortion), and it cannot be used, so it will be lost. meaning of image processing.
  • the quality can be represented by, for example, degree of distortion or image precision.
  • the specific quality parameters used to measure the quality can be, for example, mAP, IoU, etc. hereinafter.
  • the aforementioned slice size and filling size, as well as the number of concatenated layers, may also be determined in combination with the performance index of the first neural network.
  • the performance index may be calculation amount, memory consumption, delay, etc., that is to say, parameters used to evaluate some performances of the first neural network itself.
  • the embodiments of the present application mainly use calculation amount as an example for introduction.
  • the slice size and filling size and the alternative range of the number of layers of the stitching layer can be determined, and then according to the performance index and the above-mentioned preset threshold range, determine the slice size and Optimal values for fill size and number of layers for stitched layers.
  • the performance index is the amount of calculation, it is equivalent to first determining the slice size and filling size and the number of layers of the splicing layer according to the above resource parameters and model parameters.
  • the overall calculation amount for processing will not exceed the computing power of the neural network system, and the quality of the first output image is within a preset range, and then find the slice size and fill size that make the above calculation amount optimal from the alternative range, It is also possible to find the number of concatenated layers that optimizes the amount of computation described above.
  • the first neural network ⁇ is a network of L layers, which are respectively the 0th layer to the L-1 layer, wherein the 0th layer obtains the input image, and the L-1 layer outputs the processed image; the on-chip memory of the neural network processing system
  • the size is M words (words, 1 word is 32 bits (bit)).
  • the size of the input image is (H, W, C), where H represents the height of the input image, W represents the width of the input image, and C represents the number of channels of the input image;
  • the quality of is the same as the quality of the second output image, and for ease of understanding, it can be regarded as the user's tolerance for the degradation of the quality of the output image after image division.
  • the first slice size, the first fill size, and the number of layers of the concatenated layer can be determined by minimizing the calculation amount F of ⁇ .
  • This process can be represented by the following formula (1).
  • Equation (1) is the one that will minimize the amount of computation They are respectively determined as the number of layers of the spliced layer, the height of the first slice size, and the width of the first slice size. Equation (1) also needs to satisfy the following constraints.
  • the previous L constraints are to ensure that the sum of the memory occupied by the input, output and operator coefficients of each layer of the first neural network (from layer 0 to layer L-1) cannot exceed M words restrictions; the latter three constraints are to ensure that the layer number range of the splicing layer lc is the layer number range of the first neural network (from the 0th layer to the L- 1th layer), and the slice height ht is smaller than the input image
  • the height H of the slice (less than or equal to H-1)
  • the width w t of the slice is less than the width W of the input image (less than or equal to W-1).
  • N T is the upper integer value of the quotient obtained by dividing the size of the input image by the size of the slice without padding.
  • the height and width (h l , w l ) of the input feature map of layer l satisfy the following formula (2).
  • the input tiles of layer l c are spliced together;
  • r l is the input receptive field size of layer l;
  • ( ⁇ r l +h t , ⁇ r l +w t ) is the block size after filling, filling
  • H ⁇ (m,h) indicates the height of the output feature map of the mth layer when the height of the input feature map is h, therefore, H ⁇ (l- 1, h l-1 ) means that when the height of the input feature map is h l-1 , the height of the output feature map at layer l-1;
  • W ⁇ (m, w) means that when the width of the input feature map is w, The width of the output feature map of the m-th layer, therefore, H ⁇ (l-1,w l-1 )
  • the height and width of the operator are the height and width of the convolution kernel. If the operator is pooling, the height and width of the operator are both 0 for max pooling, and the height and width of the pooling for average pooling. If the operator is an activation function, then the height and width of the operator are 0, assuming the activation function can be implemented as a lookup table that does not involve any computation.
  • f ⁇ (n,m,x,y) represents the amount of computation from the nth layer to the mth layer, where the size of the input feature map of the nth layer is (x, y), therefore, f ⁇ ( 0,l c -1,h 0 ,w 0 )N T represents the amount of computation from layer 0 to layer l c -1, since the number of blocks is N T , so the amount of computation is the amount of computation for each block Multiplied by the number of tiles, where the size of the input feature map of layer 0 is (h 0 ,w 0 ), Indicates the amount of computation from the lc layer to the L-1 layer. Since the lc layer splices the processing results of multiple tiles to obtain
  • the optimal value of the padding size is
  • the adjustment of the preset threshold range of the above output image can be realized.
  • the above-mentioned preset threshold range is the degraded range of the image quality of the output image
  • the closer ⁇ is to 1 the less the image quality degrades, and the closer ⁇ is to 0, the more the image quality degrades.
  • the so-called image quality degradation of the output image can be understood as that the quality of the first output image is worse than the quality of the second output image.
  • optimization can be performed by exhaustively searching the parameter space of ht , wt , lc .
  • the search in increasing order of parameter magnitude, thus iterating from the smallest allowed value of the slice size to the largest allowed value, and lc iterating from 0 (taking input) to L-1 (producing the final output).
  • activation function layer such as Relu, Sigmoid function, etc.
  • l c m+1, ⁇ h t ⁇ and ⁇ w t ⁇ can be skipped in the search.
  • the memory requirement of layer l (l ⁇ l c ) exceeds M. Since r 0 ⁇ r 1 ⁇ ... ⁇ r L-1 , so for concatenated layer l c '>l c , tile size (h t '>h t , w t '>w t ), layer l (l ⁇ l c ') also has more memory requirements than M. Therefore, the search for concatenated layers l c '>l c and tile size (h t '>h t , w t '>w t ) is unnecessary and can be terminated.
  • the activation function layer and the next layer in the pooling layer can be skipped.
  • the input image is divided according to the slice size determined in step 102 to obtain multiple slices, and then the multiple slices are filled according to the determined filling size to obtain multiple tiles.
  • the processing results corresponding to the above multiple tiles are obtained by processing the multiple tiles by the first neural network .
  • the processing results corresponding to the multiple tiles can be obtained. After splicing these processing results, a spliced processed image can be obtained.
  • the spliced processing The image can be used as the input of the next layer of the first neural network, or can be used as the target processing result of the first neural network on the input image, that is, the above-mentioned first output image.
  • the second output image the output image obtained by directly inputting the input image into the first neural network.
  • Stitching needs to obtain the processing results of all the above-mentioned multiple tiles before it can be performed.
  • the appropriate slice size, padding size, and number of stitching layers are determined according to the system resource parameters and model parameters, so as to effectively Integrating the quality of the output image and reducing the computing overhead and memory consumption effectively improves the performance of image processing without excessive computation or serious distortion.
  • the slice size and filling size and the number of layers of the splicing layer select the value that can make the quality of the output image meet the preset threshold range under the premise that the calculation amount or storage amount of the first neural network is the smallest value, which can further achieve Better technical effect of reducing computing overhead and memory consumption.
  • FIG. 2 is a schematic flowchart of an execution process of an image processing method according to an embodiment of the present application.
  • Figure 2 can be seen as an example of Figure 1.
  • the network architecture analysis module can be used to analyze the neural network model to obtain the above-mentioned receptive fields, operators and the calculation amount of each layer of neural network.
  • This neural network model can be regarded as an example of the first neural network in FIG. 1 .
  • the memory size can be, for example, the storage space size of a fast on-chip memory, which can be used to evaluate the computing capability and storage capability of the neural network system.
  • the above-mentioned on-chip memory may include, for example: L1, L2, L3, HBM, DDR, and the like.
  • the memory size can be regarded as an example of the resource parameter described in FIG. 1 .
  • Step 201 and step 202 can be regarded as an example of step 101, and step 201 and step 202 may or may not be performed at the same time, and there is no restriction on the order of execution.
  • step 203 reference may be made to the introduction of step 102.
  • step 204 For the contents of obtaining multiple tiles in step 204, please refer to the introduction of step 103 completely.
  • the current processing layer N of the neural network model is set to 0, and the next block to be processed is selected from the multiple blocks obtained in step 204(a).
  • the first output image is the processing result of the last layer of the neural network model.
  • step 209 the basis for judging whether splicing is possible is whether the Nth layer has processed all the multiple tiles.
  • step 205 is executed.
  • step 210 is to traverse all neural network layers.
  • Steps 204-211 can be regarded as an example of steps 103-104, and mainly implement the following process.
  • the input image is divided into tiles with a padding of a selected size.
  • Step 0 set the current processing layer N of the neural network model to 0, and at this time, the next block from the input image can also be loaded into the memory as data input.
  • Step 1 after loading the kernel of the Nth layer into the memory, start to process the data of the Nth layer. Assuming that N is the last layer of the network, the output of the Nth layer is the final output of the neural network model on the input image processing results.
  • N is neither the last layer nor the splicing layer
  • the output of the Nth layer will become the input of the N+1th layer, so it is kept in memory, and N is accumulated by 1, and this step 1 is repeated.
  • N is not the last layer, but a splicing layer
  • the solution of the embodiment of the present application can be used in various types of neural networks such as convolutional neural networks, and the following tests take a residual network (ResNet) as an example.
  • the residual network is a deep convolutional network proposed in 2015. Compared with the traditional convolutional neural network, the residual network is easier to optimize and can increase the accuracy by increasing the depth.
  • the core of the residual network is to solve the side effects (degeneration problem) caused by increasing the depth, so that the network performance can be improved by simply increasing the network depth.
  • the residual network generally contains many submodules with the same structure. Usually, the residual network is used to connect a number to indicate the number of repetitions of the submodules. For example, ResNet50 means that there are 50 submodules in the residual network.
  • the resolution of the input image is adjusted to 1600x1600, and the 11th layer (Conv2_9) of the trained ResNet50 is set as the stitching layer.
  • the receptive field size of the 11th layer of ResNet50 is 51, so it can be inferred that when the value of the padding size is greater than or equal to 26 (calculated by 51/2), it can ensure that the stitched tiles do not appear blocky artifacts,
  • Three slice sizes are determined: 96x96, 192x192, 384x384 (that is, the test value of the slice size), and the fill size test value is 0, 64. The test results are shown in Figure 3.
  • FIG. 3 is a schematic diagram of the test effect of the image processing method of the embodiment of the present application.
  • the quality drops by 0.8% (ie (0.354-0.351)/0.354), and the amount of operations is reduced by about 44%; for a slice of size 96x96, the quality drops by 2.3% (ie (0.354- 0.346)/0.354), the calculation amount is reduced by about 64%.
  • mAP means mean average precision
  • IoU means intersection over union
  • mAP@IoU means the detection accuracy of the trained model on all categories of a specific IoU
  • IoU means the generated candidate box ( Candidate bound) and the original mark box (ground truth bound) overlap rate or overlapping degree, that is, the ratio of their intersection and union, the ideal situation is complete overlap, that is, the ratio is 1.
  • the image processing apparatus will be introduced below with reference to FIG. 4 .
  • the image processing device shown in FIG. 4 can be used to execute each step of the image processing method of the embodiment of the present application, and the image processing device can be a computer, a server and other devices with sufficient computing power to construct a neural network.
  • FIG. 4 is a schematic block diagram of an image processing device according to an embodiment of the present application.
  • the apparatus 2000 shown in FIG. 4 includes an acquisition unit 2001 and a processing unit 2002 .
  • the apparatus 2000 may be used to execute the steps of the image processing method of the embodiment of the present application.
  • the acquiring unit 2001 may be used to execute step 101 in the method shown in FIG. 1
  • the processing unit 2002 may be used to execute steps 102 to 104 in the method shown in FIG. 1 .
  • the acquiring unit 2001 may be used to execute steps 201 and 202 in the method shown in FIG. 2
  • the processing unit 2002 may be used to execute steps 203 to 211 in the method shown in FIG. 13 .
  • the acquiring unit 2001 may be equivalent to the communication interface 3003 in the device 3000 shown in FIG.
  • the above resource parameters and model parameters can be obtained from the memory 3001 through the processor 3002 at this time.
  • processing unit 2002 in the apparatus 2000 shown in FIG. 4 may be equivalent to the processor 3002 in the apparatus 3000 shown in FIG. 5 .
  • FIG. 5 is a schematic diagram of a hardware structure of an image processing device according to an embodiment of the present application.
  • the device 3000 shown in FIG. 5 includes a memory 3001 , a processor 3002 , a communication interface 3003 and a bus 3004 .
  • the memory 3001 , the processor 3002 , and the communication interface 3003 are connected to each other through a bus 3004 .
  • the memory 3001 may be a read only memory (read only memory, ROM), a static storage device, a dynamic storage device or a random access memory (random access memory, RAM).
  • the memory 3001 may store programs, and when the programs stored in the memory 3001 are executed by the processor 3002, the processor 3002 and the communication interface 3003 are used to execute various steps of the image processing method of the embodiment of the present application.
  • the processor 3002 may be a general-purpose central processing unit (central processing unit, CPU), a microprocessor, an application specific integrated circuit (application specific integrated circuit, ASIC), a graphics processing unit (graphics processing unit, GPU) or one or more
  • the integrated circuit is used to execute related programs to realize the functions required by the units in the image processing device of the embodiment of the present application, or to execute various steps of the image processing method of the embodiment of the present application.
  • the processor 3002 may also be an integrated circuit chip with signal processing capabilities. During implementation, each step of the image processing method in the embodiment of the present application may be completed by an integrated logic circuit of hardware in the processor 3002 or instructions in the form of software.
  • the above-mentioned processor 3002 may also be a general-purpose processor, a digital signal processor (digital signal processing, DSP), ASIC, FPGA or other programmable logic devices, discrete gate or transistor logic devices, and discrete hardware components.
  • DSP digital signal processing
  • ASIC application specific integrated circuit
  • FPGA field-programmable gate array
  • Various methods, steps, and logic block diagrams disclosed in the embodiments of the present application may be implemented or executed.
  • a general-purpose processor may be a microprocessor, or the processor may be any conventional processor, or the like.
  • the steps of the image processing method disclosed in the embodiments of the present application may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor.
  • the software module can be located in a mature storage medium in the field such as random access memory, flash memory, read-only memory, programmable read-only memory or electrically erasable programmable memory, register.
  • the storage medium is located in the memory 3001, and the processor 3002 reads the information in the memory 3001, and combines its hardware to complete the functions required by the units included in the image processing device of the embodiment of the present application, or execute the image processing method of the embodiment of the present application each step.
  • the communication interface 3003 implements communication between the apparatus 3000 and other devices or communication networks by using a transceiver device such as but not limited to a transceiver.
  • a transceiver device such as but not limited to a transceiver.
  • the control parameters corresponding to the inference results may be sent through the communication interface 3003 .
  • the bus 3004 may include a pathway for transferring information between various components of the device 3000 (eg, memory 3001 , processor 3002 , communication interface 3003 ).
  • the device 3000 may also include other devices necessary for normal operation during specific implementation.
  • the apparatus 3000 may also include hardware devices for implementing other additional functions.
  • the apparatus 3000 may also only include the components necessary to realize the embodiment of the present application, and does not necessarily include all the components shown in FIG. 5 .
  • the embodiment of the present application does not specifically limit the specific structure of the execution subject of the method provided in the embodiment of the present application, as long as the program that records the code of the method provided in the embodiment of the present application can be executed according to the method provided in the embodiment of the present application Just communicate.
  • the subject of execution of the method provided by the embodiment of the present application may be a terminal device or a network device, or a functional module in the terminal device or network device that can call a program and execute the program.
  • Computer-readable media may include, but are not limited to, magnetic storage devices (such as hard disks, floppy disks, or tapes, etc.), optical disks (such as compact discs (compact disc, CD), digital versatile discs (digital versatile disc, DVD), etc. ), smart cards and flash memory devices (for example, erasable programmable read-only memory (EPROM), card, stick or key drive, etc.).
  • magnetic storage devices such as hard disks, floppy disks, or tapes, etc.
  • optical disks such as compact discs (compact disc, CD), digital versatile discs (digital versatile disc, DVD), etc.
  • smart cards and flash memory devices for example, erasable programmable read-only memory (EPROM), card, stick or key drive, etc.
  • Various storage media described herein can represent one or more devices and/or other machine-readable media for storing information.
  • the term "machine-readable medium” may include, but is not limited to, wireless channels and various other media capable of storing, containing and/or carrying instructions and/or data.
  • the processor is a general-purpose processor, DSP, ASIC, FPGA or other programmable logic devices, discrete gate or transistor logic devices, or discrete hardware components
  • the memory storage module may be integrated in the processor.
  • memories described herein are intended to include, but are not limited to, these and any other suitable types of memories.
  • the disclosed devices and methods may be implemented in other ways.
  • the device embodiments described above are only illustrative.
  • the division of the units is only a logical function division. In actual implementation, there may be other division methods.
  • multiple units or components can be combined or May be integrated into another system, or some features may be ignored, or not implemented.
  • the mutual coupling or direct coupling or communication connection shown or discussed may be through some interfaces, and the indirect coupling or communication connection of devices or units may be in electrical, mechanical or other forms.
  • the units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in one place, or may be distributed to multiple network units. Part or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
  • each functional unit in each embodiment of the present application may be integrated into one unit, each unit may exist separately physically, or two or more units may be integrated into one unit.
  • the functions described above are realized in the form of software function units and sold or used as independent products, they can be stored in a computer-readable storage medium.
  • the essence of the technical solution of this application, or the part that contributes to the prior art, or the part of the technical solution can be embodied in the form of computer software products, which are stored in a storage
  • the computer software product includes several instructions, which are used to make a computer device (which may be a personal computer, server, or network device, etc.) execute all or part of the steps of the methods described in the various embodiments of the present application.
  • the aforementioned storage medium may include, but is not limited to: various media capable of storing program codes such as U disk, mobile hard disk, ROM, RAM, magnetic disk or optical disk.

Landscapes

  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Image Analysis (AREA)

Abstract

The present application relates to the field of artificial intelligence, and provides an image processing method and an image processing apparatus. The method comprises: acquiring a resource parameter of a neural network processing system and a model parameter of a first neural network; and according to the resource parameter and the model parameter, determining a first slicing size and a first filling size of an input image, and the number of splicing layers in the first neural network, wherein when the first neural network processes the input image according to the first slicing size, the first filling size and the splicing layers, the quality of an obtained output image is within a preset threshold range. In the solution, where the quality of an output image is made satisfy a preset threshold range, an appropriate slicing size, filling size and splicing layer are determined according to a resource parameter of a system, and a model parameter, such that the operation overheads and the memory consumption can be effectively reduced.

Description

图像处理方法和图像处理装置Image processing method and image processing device 技术领域technical field
本申请涉及人工智能领域,并且更具体地,涉及一种图像处理方法和图像处理装置。The present application relates to the field of artificial intelligence, and more specifically, relates to an image processing method and an image processing device.
背景技术Background technique
随着人工智能(artificial intelligence,AI)技术的快速发展,神经网络(例如,深度神经网络)近年来在图像、视频以及语音等多种媒体信号的处理与分析中取得了很大的成就。对于图像处理方面,在通过深层神经网络模型处理高分辨率图像的这种高要求下,如何降低计算量和内存消耗变得至关重要。With the rapid development of artificial intelligence (AI) technology, neural networks (for example, deep neural networks) have made great achievements in the processing and analysis of various media signals such as images, videos, and voices in recent years. For image processing, how to reduce the amount of calculation and memory consumption becomes crucial under the high requirements of processing high-resolution images through deep neural network models.
为了解决上述高分辨率输入图像的处理计算量大和内存消耗大问题,出现了切片的思想,也就是说,将输入图像划分为多个切片再分别对各个切片进行处理,然后将处理后的结果进行拼接。为了确保拼接后的图像与不进行切片处理得到的输出图像尽可能相同,需要在切片上填充相邻像素。但是现有技术中切片的填充容易出现连接处显示出块状伪像的问题,从而导致视觉质量、检测成功率等性能下降。由于填充会一定程度增加计算量,所以导致现有技术的划分(包括切片和填充)图像的方法,要么虽然降低了计算量和内存消耗,但输出图像的失真度过大;要么虽然保证了较小的输出图像的失真度,但计算量和内存消耗并没有得到有效降低。In order to solve the above-mentioned problems of large amount of calculation and large memory consumption in the processing of high-resolution input images, the idea of slicing has emerged, that is, the input image is divided into multiple slices and each slice is processed separately, and then the processed results to splice. In order to ensure that the stitched image is as identical as possible to the output image obtained without slice processing, adjacent pixels need to be filled on the slice. However, the filling of slices in the prior art is prone to the problem of blocky artifacts appearing at joints, which leads to performance degradation such as visual quality and detection success rate. Filling will increase the amount of calculation to a certain extent, so the methods for dividing (including slicing and filling) images in the prior art either reduce the amount of calculation and memory consumption, but the distortion of the output image is too large; The distortion of the output image is small, but the calculation amount and memory consumption have not been effectively reduced.
因此,如何在保证输出图像质量的前提下降低计算量和内存消耗是亟待解决的技术问题。Therefore, how to reduce the amount of calculation and memory consumption under the premise of ensuring the quality of the output image is an urgent technical problem to be solved.
发明内容Contents of the invention
本申请实施例提供一种图像处理方法和图像处理装置,能够在保证输出图像质量的前提下降低计算量和内存消耗。Embodiments of the present application provide an image processing method and an image processing device, which can reduce calculation amount and memory consumption under the premise of ensuring output image quality.
第一方面,提供一种图像处理方法,该方法包括:获取神经网络处理系统的资源参数和第一神经网络的模型参数;根据资源参数和模型参数,确定第一神经网络的输入图像的第一切片尺寸和第一填充尺寸以及拼接层在第一神经网络中的层数,其中,第一输出图像的质量在预设的阈值范围内,第一输出图像是当第一神经网络根据第一切片尺寸和第一填充尺寸和拼接层对输入图像进行处理时得到的输出图像。In a first aspect, an image processing method is provided, the method includes: acquiring resource parameters of a neural network processing system and model parameters of a first neural network; determining the first input image of the first neural network according to the resource parameters and model parameters The slice size and the first filling size and the number of layers of the splicing layer in the first neural network, wherein the quality of the first output image is within a preset threshold range, and the first output image is obtained when the first neural network according to the first The slice size and the first padding size and the output image obtained when the stitching layer processes the input image.
在本申请的技术方案中,在使得输出图像的质量满足预设的阈值范围的前提下,根据系统资源参数和模型参数来确定合适的切片大小和填充大小以及拼接层在第一神经网络中的层数,从而能够有效降低运算开销和内存消耗,不会出现运算量过大或失真过于严重的情况。In the technical solution of this application, on the premise that the quality of the output image meets the preset threshold range, the appropriate slice size and filling size and the splicing layer in the first neural network are determined according to the system resource parameters and model parameters. The number of layers can effectively reduce the calculation overhead and memory consumption, and there will be no excessive calculation or excessive distortion.
该资源参数用于表示该神经网络处理系统的运算能力和存储能力。该第一神经网络是指可以被该神经网络处理系统调用的任意神经网络(即神经网络模型),也可以理解为, 第一神经网络是可以在该神经网络处理系统中运行的任意神经网络。The resource parameter is used to represent the computing capability and storage capability of the neural network processing system. The first neural network refers to any neural network (that is, a neural network model) that can be invoked by the neural network processing system. It can also be understood that the first neural network is any neural network that can run in the neural network processing system.
神经网络处理系统可以是处理器、芯片、硬件模块等。The neural network processing system can be a processor, a chip, a hardware module, and the like.
可选地,可以对第一神经网络进行网络架构分析,从而得到其模型参数。第一神经网络的模型参数用于表示该第一神经网络需要的运算量和存储量,模型参数可以包括其感受野、算子、每层神经网络的运算量等参数。Optionally, network architecture analysis may be performed on the first neural network, so as to obtain its model parameters. The model parameters of the first neural network are used to represent the amount of calculation and storage required by the first neural network, and the model parameters may include parameters such as its receptive field, operator, and the amount of calculation of each layer of neural network.
拼接层可以理解为就是第一神经网络中的在将一个个切片进行处理之后,把这些处理结果拼接成一个整体的特征图的神经网络层,“拼接层在第一神经网络中的层数”换而言之就是,第一神经网络中的哪一层(第几层)是拼接层。因此,可以看出,假设第一神经网络由第0层至第L-1层组成,共计L层,则拼接层在第一神经网络中的层数必定是0至L-1中的某个取值,L为大于1的正整数。The stitching layer can be understood as the neural network layer that stitches the processing results into a whole feature map after processing slices in the first neural network, "the number of layers of the stitching layer in the first neural network" In other words, which layer (which layer) in the first neural network is the stitching layer. Therefore, it can be seen that assuming that the first neural network is composed of layers 0 to L-1, with a total of L layers, the number of layers of the stitching layer in the first neural network must be one of 0 to L-1. Value, L is a positive integer greater than 1.
在本申请实施例中,质量例如可以用失真度或图像精度来表示,质量越高,当将输出图像用于后续图像识别、图像分类等处理的时候,结果越准确。In this embodiment of the present application, the quality can be represented by, for example, degree of distortion or image precision. The higher the quality, the more accurate the result will be when the output image is used for subsequent processing such as image recognition and image classification.
结合第一方面,在第一方面的某些实现方式中,第一切片尺寸和第一填充尺寸以及拼接层在第一神经网络中的层数是使得第一输出图像的质量在预设的阈值范围内的情况下,第一神经网络需要的运算量或存储量最小的值。这样能够进一步达到更好的降低运算开销和内存消耗的技术效果。With reference to the first aspect, in some implementations of the first aspect, the first slice size and the first padding size and the number of layers of the stitching layer in the first neural network are such that the quality of the first output image is within a preset In the case where the value is within the threshold range, the first neural network requires the smallest amount of computation or storage. In this way, a better technical effect of reducing operation overhead and memory consumption can be further achieved.
结合第一方面,在第一方面的某些实现方式中,第一切片的尺寸小于或等于输入图像的尺寸,拼接层的层数小于或等于第一神经网络的总层数,第一填充尺寸是根据预设的阈值范围和拼接层的感受野确定的,第一神经网络的每一层需要的运算量或存储量不超过神经网络系统的运算能力或存储能力。In combination with the first aspect, in some implementations of the first aspect, the size of the first slice is smaller than or equal to the size of the input image, the number of layers of the stitching layer is smaller than or equal to the total number of layers of the first neural network, and the first padding The size is determined according to the preset threshold range and the receptive field of the splicing layer, and the amount of computation or storage required by each layer of the first neural network does not exceed the computation capability or storage capability of the neural network system.
结合第一方面,在第一方面的某些实现方式中,上述所述方法还包括:根据第一切片尺寸和第一填充尺寸将输入图像划分为多个图块,在拼接层对多个图块对应的处理结果进行拼接,以及根据拼接后的图像得到第一输出图像,多个图块对应的处理结果是第一神经网络对多个图块进行处理得到的。With reference to the first aspect, in some implementations of the first aspect, the above-mentioned method further includes: dividing the input image into multiple tiles according to the first slice size and the first padding size, and combining multiple tiles at the splicing layer The processing results corresponding to the tiles are spliced, and the first output image is obtained according to the spliced image, and the processing results corresponding to the multiple tiles are obtained by processing the multiple tiles by the first neural network.
结合第一方面,在第一方面的某些实现方式中,第一输出图像的质量在预设的阈值范围内具体为:第一输出图像的质量与第二输出图像的质量的比值在预设范围内,第二输出图像是当第一神经网络直接对输入图像进行处理时得到的输出图像。With reference to the first aspect, in some implementations of the first aspect, the quality of the first output image is within a preset threshold range specifically: the ratio of the quality of the first output image to the quality of the second output image is within a preset Within the range, the second output image is the output image obtained when the first neural network directly processes the input image.
第二方面,提供了一种图像处理装置,该装置包括用于执行上述第一方面的任意一种实现方式的方法的单元。In a second aspect, an image processing device is provided, and the device includes a unit for executing the method in any one of the implementation manners of the first aspect above.
第三方面,提供了一种图像处理装置,该装置包括:存储器,用于存储程序;处理器,用于执行所述存储器存储的程序,当所述存储器存储的程序被执行时,所述处理器用于执行第一方面中的任意一种实现方式中的方法。In a third aspect, an image processing device is provided, which includes: a memory for storing programs; a processor for executing the programs stored in the memory, and when the programs stored in the memory are executed, the processing The device is used to execute the method in any one of the implementation manners in the first aspect.
第四方面,提供一种计算机可读介质,该计算机可读介质存储用于设备执行的程序代码,该程序代码包括用于执行第一方面中的任意一种实现方式中的方法。In a fourth aspect, a computer-readable medium is provided, where the computer-readable medium stores program code for execution by a device, where the program code includes a method for executing any one of the implementation manners in the first aspect.
第五方面,提供一种包含指令的计算机程序产品,当该计算机程序产品在计算机上运行时,使得计算机执行上述第一方面中的任意一种实现方式中的方法。In a fifth aspect, a computer program product containing instructions is provided, and when the computer program product is run on a computer, it causes the computer to execute the method in any one of the implementation manners in the first aspect above.
第六方面,提供一种芯片,所述芯片包括处理器与数据接口,所述处理器通过所述数据接口读取存储器上存储的指令,执行上述第一方面中的任意一种实现方式中的方法。In a sixth aspect, a chip is provided, the chip includes a processor and a data interface, the processor reads the instructions stored on the memory through the data interface, and executes any one of the implementations in the first aspect above. method.
可选地,作为一种实现方式,所述芯片还可以包括存储器,所述存储器中存储有指令, 所述处理器用于执行所述存储器上存储的指令,当所述指令被执行时,所述处理器用于执行第一方面中的任意一种实现方式中的方法。Optionally, as an implementation manner, the chip may further include a memory, the memory stores instructions, the processor is configured to execute the instructions stored in the memory, and when the instructions are executed, the The processor is configured to execute the method in any one of the implementation manners in the first aspect.
附图说明Description of drawings
图1是本申请实施例的图像处理方法的示意性流程图。FIG. 1 is a schematic flowchart of an image processing method according to an embodiment of the present application.
图2是本申请实施例的图像处理方法的执行过程的示意性流程图。FIG. 2 is a schematic flowchart of an execution process of an image processing method according to an embodiment of the present application.
图3是本申请实施例的图像处理方法的测试效果示意图。FIG. 3 is a schematic diagram of the test effect of the image processing method of the embodiment of the present application.
图4是本申请实施例的图像处理装置的示意性框图。FIG. 4 is a schematic block diagram of an image processing device according to an embodiment of the present application.
图5是本申请实施例的图像处理装置的硬件结构示意图。FIG. 5 is a schematic diagram of a hardware structure of an image processing device according to an embodiment of the present application.
具体实施方式detailed description
下面将结合附图,对本申请中的技术方案进行描述。The technical solution in this application will be described below with reference to the accompanying drawings.
本申请实施例涉及神经网络,为了更好地理解本申请实施例的方法,下面先对神经网络的相关术语和概念进行介绍。The embodiment of the present application involves a neural network. In order to better understand the method of the embodiment of the present application, the relevant terms and concepts of the neural network are firstly introduced below.
在传统方案中,为了减少运算量和降低内存消耗,对输入图像进行了切片,但现有技术没有考虑更好的填充方式,只是为了方便运算用0等进行填充,导致连接处显示出块状伪像的问题。此外,填充也会带来一定的运算量,而现有技术没有考虑如何兼顾运算量和失真度。简而言之,现有技术没有考虑不同划分方式会使得计算量、内存消耗和失真度不同,导致虽然对输入图像进行了划分,但这些划分图像的方法,要么虽然降低了计算量和内存消耗,但输出图像的失真度过大;要么虽然保证了较小的输出图像的失真度,但计算量和内存消耗并没有得到有效降低。也就是说,现有技术的方案无法有效提高图像处理性能。In the traditional solution, in order to reduce the amount of calculation and memory consumption, the input image is sliced, but the existing technology does not consider a better filling method, but only fills it with 0 for the convenience of calculation, resulting in a blocky appearance at the connection The problem of artifacts. In addition, filling will also bring a certain amount of calculation, but the prior art does not consider how to balance the amount of calculation and the degree of distortion. In short, the existing technology does not consider that different division methods will make the calculation amount, memory consumption and distortion different, resulting in that although the input image is divided, these methods of dividing the image either reduce the calculation amount and memory consumption , but the distortion of the output image is too large; or although the distortion of the output image is guaranteed to be small, the calculation amount and memory consumption have not been effectively reduced. That is to say, the solution in the prior art cannot effectively improve the image processing performance.
针对上述问题,本申请实施例提出一种图像处理方案,根据神经网络处理系统的资源参数和神经网络的参数来确定拼接层的层数、切片尺寸和填充尺寸,且保证神经网络的输出结果的质量在预设的阈值范围内,从而解决上述划分图像所导致的问题。In view of the above problems, the embodiment of the present application proposes an image processing scheme, which determines the number of layers, slice size and filling size of the stitching layer according to the resource parameters of the neural network processing system and the parameters of the neural network, and ensures the accuracy of the output results of the neural network. The quality is within a preset threshold, thereby solving the above-mentioned problems caused by dividing the image.
在本申请实施例中,划分图像可以理解为对输入图像进行切片以及对切片进行填充,从而将输入图像划分为多个图块。In the embodiment of the present application, dividing an image may be understood as slicing the input image and filling the slice, so as to divide the input image into multiple tiles.
图1是本申请实施例的图像处理方法的示意性流程图。下面对图1所示各个步骤进行介绍。FIG. 1 is a schematic flowchart of an image processing method according to an embodiment of the present application. Each step shown in FIG. 1 will be introduced below.
101、获取神经网络处理系统的资源参数和第一神经网络的模型参数。101. Acquire resource parameters of a neural network processing system and model parameters of a first neural network.
该资源参数用于表示该神经网络处理系统的运算能力和存储能力。该第一神经网络是指可以被该神经网络处理系统调用的任意神经网络(即神经网络模型),也可以理解为,第一神经网络是可以在该神经网络处理系统中运行的任意神经网络。The resource parameter is used to represent the computing capability and storage capability of the neural network processing system. The first neural network refers to any neural network (that is, a neural network model) that can be invoked by the neural network processing system. It can also be understood that the first neural network is any neural network that can run in the neural network processing system.
神经网络处理系统可以是处理器、芯片、硬件模块等。The neural network processing system can be a processor, a chip, a hardware module, and the like.
可选地,可以对第一神经网络进行网络架构分析,从而得到其模型参数。第一神经网络的模型参数用于表示该第一神经网络需要的运算量和存储量,模型参数可以包括其感受野、算子、每层神经网络的运算量等参数。Optionally, network architecture analysis may be performed on the first neural network, so as to obtain its model parameters. The model parameters of the first neural network are used to represent the amount of calculation and storage required by the first neural network, and the model parameters may include parameters such as its receptive field, operator, and the amount of calculation of each layer of neural network.
102、根据上述资源参数和模型参数,确定第一神经网络的输入图像的第一切片尺寸和第一填充尺寸以及拼接层在第一神经网络中的层数。102. According to the above resource parameters and model parameters, determine the first slice size and the first filling size of the input image of the first neural network, and the number of stitching layers in the first neural network.
拼接层可以理解为就是第一神经网络中的在将一个个切片进行处理之后,把这些处理结果拼接成一个整体的特征图的神经网络层,“拼接层在第一神经网络中的层数”换而言之就是,第一神经网络中的哪一层(第几层)是拼接层。因此,可以看出,假设第一神经网络由第0层至第L-1层组成,共计L层,则拼接层在第一神经网络中的层数必定是0至L-1中的某个取值,L为大于1的正整数。The stitching layer can be understood as the neural network layer that stitches the processing results into a whole feature map after processing slices in the first neural network, "the number of layers of the stitching layer in the first neural network" In other words, which layer (which layer) in the first neural network is the stitching layer. Therefore, it can be seen that assuming that the first neural network is composed of layers 0 to L-1, with a total of L layers, the number of layers of the stitching layer in the first neural network must be one of 0 to L-1. Value, L is a positive integer greater than 1.
可选地,可以是根据上述资源参数和模型参数,确定切片尺寸和填充尺寸以及拼接层的层数的备选范围,再从备选范围内选择任意切片尺寸和填充尺寸作为上述第一切片尺寸和第一填充尺寸。Optionally, according to the above resource parameters and model parameters, determine the slice size and fill size and the alternative range of the number of layers of the stitching layer, and then select any slice size and fill size from the alternative range as the first slice size and the first padding size.
为了获得更好的技术效果,还可以是根据上述资源参数和模型参数,确定切片尺寸和填充尺寸以及拼接层的层数的最优值,再将上述最优值作为上述第一切片尺寸和第一填充尺寸以及拼接层的层数。In order to obtain a better technical effect, it is also possible to determine the optimal values of the slice size and filling size and the number of layers of the stitching layer according to the above-mentioned resource parameters and model parameters, and then use the above-mentioned optimal values as the above-mentioned first slice size and The first infill size and the number of layers for the stitched layer.
在确定第一切片尺寸、第一填充尺寸和拼接层在第一神经网络中的层数的过程中,还需要使得第一输出图像的质量在预设的阈值范围内,该第一输出图像是第一神经网络根据上述第一切片尺寸、第一填充尺寸和拼接层的层数对上述输入图像进行处理后得到的。这是因为,如果只考虑降低运算量和内存消耗,而不关系最终输出结果,就会导致,虽然计算量下降了很多,但是输出图像的质量非常差(例如严重失真),无法使用,就失去了图像处理的意义。In the process of determining the first slice size, the first padding size, and the number of layers of the stitching layer in the first neural network, it is also necessary to make the quality of the first output image within a preset threshold range, and the first output image It is obtained after the first neural network processes the above-mentioned input image according to the above-mentioned first slice size, the first filling size and the number of layers of the stitching layer. This is because, if you only consider reducing the amount of computation and memory consumption, regardless of the final output, it will result in that although the amount of computation is reduced a lot, the quality of the output image is very poor (such as severe distortion), and it cannot be used, so it will be lost. meaning of image processing.
在本申请实施例中,质量例如可以用失真度或图像精度来表示,质量越高,当将输出图像用于后续图像识别、图像分类等处理的时候,结果越准确。具体用于衡量质量的质量参数例如可以采用下文中的mAP、IoU等。这样就可以使得在保证输出图像的质量在可接受范围内的情况下进一步选择能够使得运算量、内存消耗等更低的切片尺寸和填充尺寸,以及拼接层的层数,也就是说,在保证输出图像的质量的同时,进一步降低运算量和内存消耗。In this embodiment of the present application, the quality can be represented by, for example, degree of distortion or image precision. The higher the quality, the more accurate the result will be when the output image is used for subsequent processing such as image recognition and image classification. The specific quality parameters used to measure the quality can be, for example, mAP, IoU, etc. hereinafter. In this way, it is possible to further select the slice size and padding size, as well as the number of layers of the splicing layer, which can reduce the amount of computation, memory consumption, etc., while ensuring that the quality of the output image is within an acceptable range. While improving the quality of the output image, the computation load and memory consumption are further reduced.
可选地,还可以结合第一神经网络的性能指标来确定上述切片尺寸和填充尺寸,以及拼接层的层数。该性能指标可以为运算量或内存消耗或延迟等,也就是说,用于评价第一神经网络本身的一些性能的参数。为了便于理解,本申请实施例主要以运算量为例进行介绍。Optionally, the aforementioned slice size and filling size, as well as the number of concatenated layers, may also be determined in combination with the performance index of the first neural network. The performance index may be calculation amount, memory consumption, delay, etc., that is to say, parameters used to evaluate some performances of the first neural network itself. For ease of understanding, the embodiments of the present application mainly use calculation amount as an example for introduction.
在一些实现方式中,例如可以是根据上述资源参数和模型参数,确定切片尺寸和填充尺寸以及拼接层的层数的备选范围,之后根据性能指标和上述预设的阈值范围,确定切片尺寸和填充尺寸以及拼接层的层数的最优值。假设性能指标为运算量,则相当于,先根据上述资源参数和模型参数确定了切片尺寸和填充尺寸以及拼接层的层数在某个备选范围内时,第一神经网络对划分后的图像进行处理的整体运算量不会超过神经网络系统的运算能力,且第一输出图像的质量在预设范围内,然后从该备选范围内找到使得上述运算量最优的切片尺寸和填充尺寸,还可以找到使得上述运算量最优的拼接层的层数。In some implementations, for example, according to the above-mentioned resource parameters and model parameters, the slice size and filling size and the alternative range of the number of layers of the stitching layer can be determined, and then according to the performance index and the above-mentioned preset threshold range, determine the slice size and Optimal values for fill size and number of layers for stitched layers. Assuming that the performance index is the amount of calculation, it is equivalent to first determining the slice size and filling size and the number of layers of the splicing layer according to the above resource parameters and model parameters. The overall calculation amount for processing will not exceed the computing power of the neural network system, and the quality of the first output image is within a preset range, and then find the slice size and fill size that make the above calculation amount optimal from the alternative range, It is also possible to find the number of concatenated layers that optimizes the amount of computation described above.
假设第一神经网络Π是L层的网络,分别为第0层到第L-1层,其中第0层获取输入图像,第L-1层输出处理后的图像;神经网络处理系统的片上存储器大小为M字(words,1个word为32比特(bit)),为了方便统计,输入图像、算子系数、输出图像、算子输出等需要的存储量的精度为1字;输入图像的尺寸为(H,W,C),其中H表示输入图像的高度,W表示输入图像的宽度,C表示输入图像的通道数;使得第一输出图像的质量在 预设的阈值范围内的超参数为λ,当λ=1时表示上述第一输出图像的质量和第二输出图像的质量的比值,也就是λ的取值范围为0≤λ≤1,当λ=1时表示上述第一输出图像的质量和第二输出图像的质量相同,为了便于理解而,可以看作是用户对图像划分后输出图像的质量的下降容忍度。Assume that the first neural network Π is a network of L layers, which are respectively the 0th layer to the L-1 layer, wherein the 0th layer obtains the input image, and the L-1 layer outputs the processed image; the on-chip memory of the neural network processing system The size is M words (words, 1 word is 32 bits (bit)). For the convenience of statistics, the accuracy of the storage required for the input image, operator coefficients, output image, operator output, etc. is 1 word; the size of the input image is (H, W, C), where H represents the height of the input image, W represents the width of the input image, and C represents the number of channels of the input image; the hyperparameter that makes the quality of the first output image within the preset threshold range is λ, when λ=1, represents the ratio of the quality of the above-mentioned first output image to the quality of the second output image, that is, the value range of λ is 0≤λ≤1, and when λ=1, represents the above-mentioned first output image The quality of is the same as the quality of the second output image, and for ease of understanding, it can be regarded as the user's tolerance for the degradation of the quality of the output image after image division.
可选地,可以通过使得Π的运算量F最小,确定第一切片尺寸、第一填充尺寸和拼接层的层数。该过程可以用下面的式子(1)表示。Optionally, the first slice size, the first fill size, and the number of layers of the concatenated layer can be determined by minimizing the calculation amount F of Π. This process can be represented by the following formula (1).
Figure PCTCN2021102742-appb-000001
Figure PCTCN2021102742-appb-000001
其中,
Figure PCTCN2021102742-appb-000002
表示拼接层的层数最优值,
Figure PCTCN2021102742-appb-000003
表示切片大小的高度的最优值,
Figure PCTCN2021102742-appb-000004
表示切片大小的宽度的最优值,F Π表示第一神经网络处理过程中的运算量,(h l,w l,c l)分别表示第l层的输入特征图的高度、宽度和通道数,l为大于或等于0小于或等于L-1的整数。式子(1)是将使得运算量最小的
Figure PCTCN2021102742-appb-000005
分别确定为上述拼接层的层数、第一切片大小的高度、第一切片大小的宽度。式子(1)还需要满足下面的约束条件。
in,
Figure PCTCN2021102742-appb-000002
Indicates the optimal value of the number of layers of the stitching layer,
Figure PCTCN2021102742-appb-000003
represents the optimal value for the height of the slice size,
Figure PCTCN2021102742-appb-000004
Indicates the optimal value of the width of the slice size, F Π indicates the amount of computation in the first neural network processing, (h l , w l , c l ) respectively indicate the height, width and number of channels of the input feature map of the l-th layer , l is an integer greater than or equal to 0 and less than or equal to L-1. Equation (1) is the one that will minimize the amount of computation
Figure PCTCN2021102742-appb-000005
They are respectively determined as the number of layers of the spliced layer, the height of the first slice size, and the width of the first slice size. Equation (1) also needs to satisfy the following constraints.
Figure PCTCN2021102742-appb-000006
Figure PCTCN2021102742-appb-000006
0≤l c≤L-1; 0≤lc≤L -1;
0≤h t≤H-1; 0≤ht≤H -1;
0≤w t≤W-1. 0≤w t ≤W-1.
也就是说,前面的L个约束条件是为了保证第一神经网络的每一层(从第0层至第L-1层)的输入、输出和算子系数所占用的内存总和不能超过M字的限制;后面3个约束条件则是为了保证拼接层l c的层数范围为第一神经网络的层数范围(从第0层至第L-1层),切片的高度h t小于输入图像的高度H(小于或等于H-1),切片的宽度w t小于输入图像的宽度W(小于或等于W-1)。 That is to say, the previous L constraints are to ensure that the sum of the memory occupied by the input, output and operator coefficients of each layer of the first neural network (from layer 0 to layer L-1) cannot exceed M words restrictions; the latter three constraints are to ensure that the layer number range of the splicing layer lc is the layer number range of the first neural network (from the 0th layer to the L- 1th layer), and the slice height ht is smaller than the input image The height H of the slice (less than or equal to H-1), the width w t of the slice is less than the width W of the input image (less than or equal to W-1).
用N T表示将输入图像按照没有填充的切片大小的高度h t与宽度w t进行分割之后得到的切片数量,则
Figure PCTCN2021102742-appb-000007
也就是说,切片数量N T为输入图像大小除以没有填充的切片大小得到的商的上取整值。且第l层的输入特征图的高度和宽度(h l,w l)满足下面的式子(2)。
Use N T to represent the number of slices obtained after dividing the input image according to the height h t and width w t of the slice size without padding, then
Figure PCTCN2021102742-appb-000007
That is, the number of slices N T is the upper integer value of the quotient obtained by dividing the size of the input image by the size of the slice without padding. And the height and width (h l , w l ) of the input feature map of layer l satisfy the following formula (2).
Figure PCTCN2021102742-appb-000008
Figure PCTCN2021102742-appb-000008
其中,第l c层的输入图块拼接在一起;r l是第l层的输入的感受野大小;(λr l+h t,λr l+w t)是经过填充之后的图块大小,填充大小为p=λr l/2,因此,式子(2)中,
Figure PCTCN2021102742-appb-000009
是第0层的填充后的图块大小,具体而言,是经过填充大小为
Figure PCTCN2021102742-appb-000010
的填充之后得到的图块大小;H Π(m,h)表示输入特征图的高度为h时,在第m层的输出特征图的高度,因此,式子(2)中H Π(l-1,h l-1)表示输入特征图的高度是h l-1时,在第l-1层的输出特征图的高度;W Π(m,w)表示输入特征图的宽度为w时,在第m层的输出特征图的宽度,因此,式子(2)中H Π(l-1,w l-1)表示输入特征图的宽度是w l-1时,在第l-1层的输出特征图的宽度。假设
Figure PCTCN2021102742-appb-000011
分别表示第l层算子的高度和宽度。如果算子是卷积,则算子的高度和宽度就是卷积核的高度和宽度。如果算子是池化,则对于最大池化, 算子的高度和宽度均为0,而对于平均池化,则是池化的高度和宽度。如果算子是激活函数,则假设激活函数可以实现为不涉及任何计算的查找表,那么算子的高度和宽度为0。式子(2)中的
Figure PCTCN2021102742-appb-000012
表示,当l=l c时,取值为
Figure PCTCN2021102742-appb-000013
即为N T;当l≠l c时,取值为
Figure PCTCN2021102742-appb-000014
即为1,也就是说,在拼接层取值为N T,在非拼接层取值为1。
Among them, the input tiles of layer l c are spliced together; r l is the input receptive field size of layer l; (λr l +h t ,λr l +w t ) is the block size after filling, filling The size is p=λr l /2, therefore, in formula (2),
Figure PCTCN2021102742-appb-000009
is the padded tile size of layer 0, specifically, the padded size of
Figure PCTCN2021102742-appb-000010
The size of the block obtained after padding; H Π (m,h) indicates the height of the output feature map of the mth layer when the height of the input feature map is h, therefore, H Π (l- 1, h l-1 ) means that when the height of the input feature map is h l-1 , the height of the output feature map at layer l-1; W Π (m, w) means that when the width of the input feature map is w, The width of the output feature map of the m-th layer, therefore, H Π (l-1,w l-1 ) in the formula (2) means that when the width of the input feature map is w l-1 , at the l-1 layer The width of the output feature map. suppose
Figure PCTCN2021102742-appb-000011
Respectively represent the height and width of the l-th layer operator. If the operator is a convolution, the height and width of the operator are the height and width of the convolution kernel. If the operator is pooling, the height and width of the operator are both 0 for max pooling, and the height and width of the pooling for average pooling. If the operator is an activation function, then the height and width of the operator are 0, assuming the activation function can be implemented as a lookup table that does not involve any computation. In formula (2)
Figure PCTCN2021102742-appb-000012
Indicates that when l=l c , the value is
Figure PCTCN2021102742-appb-000013
That is N T ; when l≠l c , the value is
Figure PCTCN2021102742-appb-000014
That is, it is 1, that is to say, the value is N T in the stitching layer, and the value is 1 in the non-splicing layer.
Figure PCTCN2021102742-appb-000015
是当拼接层(第l c层)的輸入为N T个输入图块,第0层的已经经过填充之后的输入图块的大小为(h 0,w 0)时的整个第一神经网络的总运算量;f Π(n,m,x,y)表示从第n层到第m层的运算量,其中第n层的输入特征图的大小为(x,y),因此,f Π(0,l c-1,h 0,w 0)N T表示从第0层到第l c-1层的运算量,由于图块数为N T,所以运算量为每个图块的运算量乘以图块数,其中第0层的输入特征图的大小为(h 0,w 0),
Figure PCTCN2021102742-appb-000016
表示从第l c层到第L-1层的运算量,由于第l c层将多个图块的处理结果进行拼接得到一个特征图,所以运算量与拼接得到的特征图的大小相关,其中第l c层的输入特征图的大小为
Figure PCTCN2021102742-appb-000017
Figure PCTCN2021102742-appb-000015
is the whole first neural network when the input of the splicing layer (layer lc ) is N T input tiles, and the size of the input tiles of the 0th layer after filling is (h 0 , w 0 ) The total amount of computation; f Π (n,m,x,y) represents the amount of computation from the nth layer to the mth layer, where the size of the input feature map of the nth layer is (x, y), therefore, f Π ( 0,l c -1,h 0 ,w 0 )N T represents the amount of computation from layer 0 to layer l c -1, since the number of blocks is N T , so the amount of computation is the amount of computation for each block Multiplied by the number of tiles, where the size of the input feature map of layer 0 is (h 0 ,w 0 ),
Figure PCTCN2021102742-appb-000016
Indicates the amount of computation from the lc layer to the L-1 layer. Since the lc layer splices the processing results of multiple tiles to obtain a feature map, the amount of computation is related to the size of the spliced feature map, where The size of the input feature map of the lc layer is
Figure PCTCN2021102742-appb-000017
在本申请实施例中填充大小的最优值为
Figure PCTCN2021102742-appb-000018
In the embodiment of this application, the optimal value of the padding size is
Figure PCTCN2021102742-appb-000018
通过调整超参数λ的大小,就可以实现对于上述输出图像的预设的阈值范围的调整。假设上述预设阈值范围是输出图像的图像质量的下降范围,则λ越接近1,图像质量下降越少,λ越接近0,图像质量下降越多。所谓输出图像的图像质量下降可以理解为第一输出图像的质量比第二输出图像的质量差。By adjusting the size of the hyperparameter λ, the adjustment of the preset threshold range of the above output image can be realized. Assuming that the above-mentioned preset threshold range is the degraded range of the image quality of the output image, the closer λ is to 1, the less the image quality degrades, and the closer λ is to 0, the more the image quality degrades. The so-called image quality degradation of the output image can be understood as that the quality of the first output image is worse than the quality of the second output image.
在确定切片大小和拼接层的层数的最优值的过程中,可以通过详尽搜索h t、w t、l c的参数空间来执行优化。例如可以以参数幅度的增加顺序开始搜索,因此,从切片大小的最小允许值迭代到最大允许值,并且l c从0(获取输入)迭代到L-1(产生最终输出)。为了加快优化速度,如果考虑以下因素,可以在不降低优化性能的情况下减少参数搜索空间。假设在满足所有L个约束的情况下,在搜索中探索了{l c=0,…,m},{h t},{w t}。首先,假设l=m是激活函数层(例如Relu,Sigmoid函数等)或池化层,其中(h m,w m)=(h m+1,w m+1)。在这两种情况下,对于l c=m+1和l c=m,第一神经网络的总运算量和L个约束的满足条件相同,这是因为M m+1(m)≤M m(m)和M m+1(i)=M m(i),i≠m,其中,当l c=l时,M l(m)是第m层的存储需求。因此,可以在搜索中跳过l c=m+1、{h t}和{w t}。其次,对于给定的图块大小(h t,w t)和拼接层l c,假设第l层(l≤l c)的内存需求超过M。由于r 0≤r 1≤…≤r L-1,因此对于拼接层l c'>l c,图块大小(h t'>h t,w t'>w t),第l层(l≤l c’)的内存需求也超过M。因此,搜索拼接层l c'>l c和图块大小(h t'>h t,w t'>w t)是不必要的,可以终止。 In determining the optimal values for the slice size and the number of layers of the concatenated layer, optimization can be performed by exhaustively searching the parameter space of ht , wt , lc . For example one could start the search in increasing order of parameter magnitude, thus iterating from the smallest allowed value of the slice size to the largest allowed value, and lc iterating from 0 (taking input) to L-1 (producing the final output). To speed up the optimization, if the following factors are considered, the parameter search space can be reduced without degrading the optimization performance. Assume {l c =0, . . . , m}, {h t }, {w t } are explored in the search with all L constraints satisfied. First, assume that l=m is an activation function layer (such as Relu, Sigmoid function, etc.) or a pooling layer, where (h m , w m )=(h m+1 , w m+1 ). In both cases, for l c =m+1 and l c =m, the total computation of the first neural network and the satisfaction of the L constraints are the same, because M m+1 (m)≤M m (m) and M m+1 (i)=M m (i), i≠m, where, when l c =l, M l (m) is the storage requirement of the mth layer. Therefore, l c =m+1, {h t } and {w t } can be skipped in the search. Second, for a given tile size (h t , w t ) and concatenated layer l c , assume that the memory requirement of layer l (l≤l c ) exceeds M. Since r 0 ≤r 1 ≤…≤r L-1 , so for concatenated layer l c '>l c , tile size (h t '>h t , w t '>w t ), layer l (l≤ l c ') also has more memory requirements than M. Therefore, the search for concatenated layers l c '>l c and tile size (h t '>h t , w t '>w t ) is unnecessary and can be terminated.
也就是说,在从备选范围中确定切片大小时,可以跳过对于激活函数层和池化层中的下一层。That is, when determining the slice size from the range of candidates, the activation function layer and the next layer in the pooling layer can be skipped.
103、根据第一切片尺寸和第一填充尺寸将输入图像划分为多个图块。103. Divide the input image into multiple tiles according to the first slice size and the first padding size.
也就是说,将输入图像按照步骤102确定的切片尺寸进行分割得到多个切片,再按照确定的填充尺寸对多个切片进行填充就可以得到多个图块。That is to say, the input image is divided according to the slice size determined in step 102 to obtain multiple slices, and then the multiple slices are filled according to the determined filling size to obtain multiple tiles.
104、对上述多个图块对应的处理结果进行拼接,以及根据拼接后的图像得到第一输出图像,上述多个图块对应的处理结果是第一神经网络对多个图块进行处理得到的。104. Splicing the processing results corresponding to the above multiple tiles, and obtaining a first output image according to the spliced image, the processing results corresponding to the above multiple tiles are obtained by processing the multiple tiles by the first neural network .
也就是说,将上述多个图块分别输入到第一神经网络可以得到多个图块对应的处理结果,将这些处理结果进行拼接之后,就能得到拼接后的处理图像,该拼接后的处理图像可以作为第一神经网络的下一层的输入,也可以作为第一神经网络对输入图像的目标处理结 果,即上述第一输出图像。That is to say, by inputting the above-mentioned multiple tiles into the first neural network respectively, the processing results corresponding to the multiple tiles can be obtained. After splicing these processing results, a spliced processed image can be obtained. The spliced processing The image can be used as the input of the next layer of the first neural network, or can be used as the target processing result of the first neural network on the input image, that is, the above-mentioned first output image.
在本申请实施例中,假设将输入图像直接输入到第一神经网络得到的输出图像称之为第二输出图像,则第一输出图像的质量越接近第二输出图像的质量越好。In the embodiment of the present application, assuming that the output image obtained by directly inputting the input image into the first neural network is called the second output image, the closer the quality of the first output image is to the quality of the second output image, the better.
拼接需要得到所有上述多个图块的处理结果才执行。Stitching needs to obtain the processing results of all the above-mentioned multiple tiles before it can be performed.
在图1所示方案中,在使得输出图像的质量满足预设的阈值范围的前提下,根据系统资源参数和模型参数来确定合适的切片大小和填充大小以及拼接层的层数,从而能够有效综合输出图像的质量和降低运算开销和内存消耗,即有效提高了图像处理的性能,不会出现运算量过大或失真过于严重的情况。此外,确定切片大小和填充大小以及拼接层的层数的时候选择可以使得输出图像的质量满足预设的阈值范围的前提下第一神经网络的运算量或存储量最小的值,这样能够进一步达到更好的降低运算开销和内存消耗的技术效果。In the solution shown in Figure 1, on the premise that the quality of the output image meets the preset threshold range, the appropriate slice size, padding size, and number of stitching layers are determined according to the system resource parameters and model parameters, so as to effectively Integrating the quality of the output image and reducing the computing overhead and memory consumption effectively improves the performance of image processing without excessive computation or serious distortion. In addition, when determining the slice size and filling size and the number of layers of the splicing layer, select the value that can make the quality of the output image meet the preset threshold range under the premise that the calculation amount or storage amount of the first neural network is the smallest value, which can further achieve Better technical effect of reducing computing overhead and memory consumption.
图2是本申请实施例的图像处理方法的执行过程的示意性流程图。图2可以看作是图1的一个示例。FIG. 2 is a schematic flowchart of an execution process of an image processing method according to an embodiment of the present application. Figure 2 can be seen as an example of Figure 1.
201、对神经网络模型进行分析,得到神经网络模型的感受野、算子和每层神经网络的运算量。201. Analyze the neural network model to obtain the receptive field of the neural network model, the operator, and the calculation amount of each layer of the neural network.
可以利用网络架构分析模块对神经网络模型进行分析,得到上述感受野、算子和每层神经网络的运算量。The network architecture analysis module can be used to analyze the neural network model to obtain the above-mentioned receptive fields, operators and the calculation amount of each layer of neural network.
该神经网络模型可以看作是图1中第一神经网络的一例。This neural network model can be regarded as an example of the first neural network in FIG. 1 .
202、获取神经网络处理系统的内存大小。202. Obtain the memory size of the neural network processing system.
该内存大小例如可以是快速的片上存储器的存储空间大小,可以用于评价神经网络系统的运算能力和存储能力。上述片上存储器例如可以包括:L1、L2、L3、HBM、DDR等。内存大小可以看作是图1所述资源参数的一例。The memory size can be, for example, the storage space size of a fast on-chip memory, which can be used to evaluate the computing capability and storage capability of the neural network system. The above-mentioned on-chip memory may include, for example: L1, L2, L3, HBM, DDR, and the like. The memory size can be regarded as an example of the resource parameter described in FIG. 1 .
步骤201和步骤202可以看作是步骤101的一个示例,步骤201和步骤202可以同时执行也可以不同时执行,且执行无先后顺序的限制。Step 201 and step 202 can be regarded as an example of step 101, and step 201 and step 202 may or may not be performed at the same time, and there is no restriction on the order of execution.
203、确定输入图像的第一切片尺寸和第一填充尺寸以及拼接层的层数。203. Determine the first slice size and the first padding size of the input image and the number of spliced layers.
步骤203可以完全参照步骤102的介绍。For step 203, reference may be made to the introduction of step 102.
204(a)、对输入图像进行划分,得到多个图块。204(a). Divide the input image to obtain multiple tiles.
步骤204得到多个图块的内容可以完全参照步骤103的介绍。For the contents of obtaining multiple tiles in step 204, please refer to the introduction of step 103 completely.
204(b)、神经网络模型的当前处理层N设为0,从步骤204(a)得到的多个图块中选出下一个要处理的图块。204(b), the current processing layer N of the neural network model is set to 0, and the next block to be processed is selected from the multiple blocks obtained in step 204(a).
205、将要处理的图块输入到神经网络模型的第N层进行处理,得到该图块的处理结果,成为下一个要处理的图块。205. Input the block to be processed to the Nth layer of the neural network model for processing, and obtain a processing result of the block, which becomes the next block to be processed.
206、判断第N层是否为最后一层,如果判断结果为“是”,则执行步骤207;如果判断结果为“否”,则执行步骤208。206. Judging whether the Nth layer is the last layer, if the judgment result is "yes", execute step 207; if the judgment result is "no", execute step 208.
207、输出第一输出图像。207. Output the first output image.
也就是说,第一输出图像时神经网络模型的最后一层的处理结果。That is to say, the first output image is the processing result of the last layer of the neural network model.
208、判断第N层是否为拼接层,如果判断结果为“是”,则执行步骤209;如果判断结果为“否”,则执行步骤210。208 . Determine whether the Nth layer is a concatenated layer, and if the determination result is “Yes”, perform step 209 ; if the determination result is “No”, perform step 210 .
209、判断是否可以拼接,如果判断结果为“是”,则执行步骤211;如果判断结果为“否”,则执行步骤204(b)。209. Judging whether splicing is possible, if the judging result is "yes", go to step 211; if the judging result is "no", go to step 204(b).
步骤209对于是否可拼接的判断依据是,第N层是否对所有多个图块都进行了处理。In step 209, the basis for judging whether splicing is possible is whether the Nth layer has processed all the multiple tiles.
210、N的值累加1,执行步骤205。210. The value of N is incremented by 1, and step 205 is executed.
也就是说,利用下一层神经网络层对输入的图块进行处理。步骤210的作用是遍历所有神经网络层。That is, the input tiles are processed using the next neural network layer. The function of step 210 is to traverse all neural network layers.
211、将多个图块分别对应的处理结果进行拼接,拼接好的图块成为下一个要处理的图块,执行步骤210。211 . Merge the processing results corresponding to the plurality of tiles, and the stitched tile becomes the next tile to be processed, and execute step 210 .
步骤204-211可以看作是步骤103-104的一例,主要实现了下面的过程。首先,将输入图像划分为带有选定大小的填充的图块。步骤0,将神经网络模型的当前处理层N设置为0,此时还可以将来自输入图像的下一个图块作为数据输入加载到内存中。步骤1,在将第N层的内核加载到内存中之后,开始对第N层进行数据处理。假设N是网络的最后一层,则第N层的输出是神经网络模型对输入图像处理结果的最终输出。假设N既不是最后一层也不是拼接层,则第N层的输出将成为第N+1层的输入,因此保留在内存中,将N累加1,重复执行此步骤1。假设N不是最后一层,而是拼接层,则如果第N层已经对多个图块都进行了处理,则将第N层的所有处理结果进行拼接,拼接后的图块保留在存储器中,并且将N累加1,重复步骤1;否则,重复步骤0。Steps 204-211 can be regarded as an example of steps 103-104, and mainly implement the following process. First, the input image is divided into tiles with a padding of a selected size. Step 0, set the current processing layer N of the neural network model to 0, and at this time, the next block from the input image can also be loaded into the memory as data input. Step 1, after loading the kernel of the Nth layer into the memory, start to process the data of the Nth layer. Assuming that N is the last layer of the network, the output of the Nth layer is the final output of the neural network model on the input image processing results. Assuming that N is neither the last layer nor the splicing layer, the output of the Nth layer will become the input of the N+1th layer, so it is kept in memory, and N is accumulated by 1, and this step 1 is repeated. Assuming that N is not the last layer, but a splicing layer, if the Nth layer has processed multiple tiles, all the processing results of the Nth layer are spliced, and the spliced tiles are kept in the memory. And add N to 1, repeat step 1; otherwise, repeat step 0.
为了便于理解本申请实施例的方案的技术效果,下面以一个较为具体的测试例子进行说明。本申请实施例的方案可以用于卷积神经网络等各类神经网络,下面的测试以残差网络(residual network,ResNet)为例。残差网络是在2015年提出的一种深度卷积网络,相比于传统的卷积神经网络,残差网络更容易优化,并且能够通过增加相当的深度来提高准确率。残差网络的核心是解决了增加深度带来的副作用(退化问题),这样能够通过单纯地增加网络深度,来提高网络性能。残差网络一般会包含很多结构相同的子模块,通常会采用残差网络连接一个数字表示子模块重复的次数,比如ResNet50表示残差网络中有50个子模块。In order to facilitate the understanding of the technical effects of the solutions of the embodiments of the present application, a more specific test example will be used below to illustrate. The solution of the embodiment of the present application can be used in various types of neural networks such as convolutional neural networks, and the following tests take a residual network (ResNet) as an example. The residual network is a deep convolutional network proposed in 2015. Compared with the traditional convolutional neural network, the residual network is easier to optimize and can increase the accuracy by increasing the depth. The core of the residual network is to solve the side effects (degeneration problem) caused by increasing the depth, so that the network performance can be improved by simply increasing the network depth. The residual network generally contains many submodules with the same structure. Usually, the residual network is used to connect a number to indicate the number of repetitions of the submodules. For example, ResNet50 means that there are 50 submodules in the residual network.
在该例子中,将输入图像的分辨率调整为1600x1600,将已训练ResNet50的第11层(Conv2_9)设置为拼接层。ResNet50的第11层的感受场大小为51,因此可以推断出当填充大小的取值大于或等于26(通过51/2计算得到)时,可以保证拼接后的图块不出现块状伪像,确定了三个切片大小分别为:96x96、192x192、384x384(即切片大小的测试取值),填充大小测试取值为0,64。测试结果如图3所示。In this example, the resolution of the input image is adjusted to 1600x1600, and the 11th layer (Conv2_9) of the trained ResNet50 is set as the stitching layer. The receptive field size of the 11th layer of ResNet50 is 51, so it can be inferred that when the value of the padding size is greater than or equal to 26 (calculated by 51/2), it can ensure that the stitched tiles do not appear blocky artifacts, Three slice sizes are determined: 96x96, 192x192, 384x384 (that is, the test value of the slice size), and the fill size test value is 0, 64. The test results are shown in Figure 3.
图3是本申请实施例的图像处理方法的测试效果示意图。在图3中,曲线A、曲线B、曲线C分别为切片大小384x384、192x192、96x96的对应质量参数的变化曲线。从图3可以看出,当填充大小等于64时,相当于p=64,λ=1的情况,输出图像的质量参数与不进行切片的情况下是相同的。。当填充大小从64降低为0的过程中,相当于p=0,λ=0的情况,三种切片大小的输出图像的质量参数均下降了,但在填充大小相同的情况下,切片大小越大,输出图像的质量参数越高。与填充大小=64相比,当填充大小=0时,对于大小为384x384切片,质量下降了0.6%(即(0.354-0.352)/0.354),运算量减少了约27%。对于大小为192 x 192的切片,质量下降了0.8%(即(0.354-0.351)/0.354),运算量减少了约44%;对于大小为96x96的切片,质量下降了2.3%(即(0.354-0.346)/0.354),运算量减少约64%。FIG. 3 is a schematic diagram of the test effect of the image processing method of the embodiment of the present application. In FIG. 3 , curve A, curve B, and curve C are the change curves of the corresponding quality parameters of slice sizes 384x384, 192x192, and 96x96, respectively. It can be seen from FIG. 3 that when the padding size is equal to 64, it is equivalent to the case of p=64 and λ=1, and the quality parameters of the output image are the same as those without slices. . When the padding size is reduced from 64 to 0, which is equivalent to the case of p=0 and λ=0, the quality parameters of the output images of the three slice sizes all decrease, but in the case of the same padding size, the smaller the slice size The larger the value, the higher the quality parameter of the output image. Compared with padding size=64, when padding size=0, for a slice of size 384x384, the quality drops by 0.6% (ie (0.354-0.352)/0.354), and the amount of computation reduces by about 27%. For a slice of size 192 x 192, the quality drops by 0.8% (ie (0.354-0.351)/0.354), and the amount of operations is reduced by about 44%; for a slice of size 96x96, the quality drops by 2.3% (ie (0.354- 0.346)/0.354), the calculation amount is reduced by about 64%.
需要说明的是,在图3所示例子中,上述质量参数采用的是常见的目标检测的评价指 标,mAP@IoU。mAP表示平均精度均值(mean average precision);IoU表示交并比(intersection over union),mAP@IoU表示了训练得到的模型在特定IoU的所有类别上的检测精度,IoU表示了产生的候选框(candidate bound)与原标记框(ground truth bound)的交叠率或者说重叠度,也就是它们的交集与并集的比值,最理想情况是完全重叠,即比值为1。在图3所示例子中,该神经网络模型对该输入图像(分辨率为1600x1600)进行处理的目标检测性能指标为mAP@IoU=0.5:0.95,即当IoU在0.5与0.95之间范围时的mAP。It should be noted that, in the example shown in Figure 3, the above quality parameters use the common target detection evaluation index, mAP@IoU. mAP means mean average precision; IoU means intersection over union, mAP@IoU means the detection accuracy of the trained model on all categories of a specific IoU, and IoU means the generated candidate box ( Candidate bound) and the original mark box (ground truth bound) overlap rate or overlapping degree, that is, the ratio of their intersection and union, the ideal situation is complete overlap, that is, the ratio is 1. In the example shown in Figure 3, the target detection performance index of the neural network model for processing the input image (with a resolution of 1600x1600) is mAP@IoU=0.5:0.95, that is, when the IoU is in the range between 0.5 and 0.95 mAP.
所以上述三种切片都可以以非常小的质量下降为代价来换取较大幅度的运算量减少。Therefore, the above three kinds of slices can be exchanged for a large reduction in the amount of calculation at the cost of a very small quality reduction.
下面先结合图4对本申请实施例的图像处理装置进行介绍。图4所示的图像处理装置可以用于执行本申请实施例的图像处理方法的各个步骤,该图像处理装置可以是电脑、服务器等运算能力足以用来构建神经网络的装置。The image processing apparatus according to the embodiment of the present application will be introduced below with reference to FIG. 4 . The image processing device shown in FIG. 4 can be used to execute each step of the image processing method of the embodiment of the present application, and the image processing device can be a computer, a server and other devices with sufficient computing power to construct a neural network.
图4是本申请实施例的图像处理装置的示意性框图。图4所示的装置2000包括获取单元2001和处理单元2002。FIG. 4 is a schematic block diagram of an image processing device according to an embodiment of the present application. The apparatus 2000 shown in FIG. 4 includes an acquisition unit 2001 and a processing unit 2002 .
装置2000可以用于执行本申请实施例的图像处理方法的步骤。例如,获取单元2001可以用于执行图1所示方法中的步骤101,处理单元2002可以用于执行图1所示方法中的步骤102至步骤104。又例如,获取单元2001可以用于执行图2所示方法中的步骤201、202,处理单元2002可以用于执行图13所示方法中的步骤203至步骤211。The apparatus 2000 may be used to execute the steps of the image processing method of the embodiment of the present application. For example, the acquiring unit 2001 may be used to execute step 101 in the method shown in FIG. 1 , and the processing unit 2002 may be used to execute steps 102 to 104 in the method shown in FIG. 1 . For another example, the acquiring unit 2001 may be used to execute steps 201 and 202 in the method shown in FIG. 2 , and the processing unit 2002 may be used to execute steps 203 to 211 in the method shown in FIG. 13 .
图4所示的装置2000中,获取单元2001可以相当于图5所示的装置3000中的通信接口3003,通过该通信接口3003可以获取上述资源参数和模型参数,或者,获取单元2001也可以相当于图5所示的装置3000中的处理器3002,此时可以通过处理器3002从存储器3001中获取上述资源参数和模型参数。In the device 2000 shown in FIG. 4, the acquiring unit 2001 may be equivalent to the communication interface 3003 in the device 3000 shown in FIG. For the processor 3002 in the apparatus 3000 shown in FIG. 5 , the above resource parameters and model parameters can be obtained from the memory 3001 through the processor 3002 at this time.
此外,上述图4所示的装置2000中的处理单元2002可以相当于图5所示的装置3000中处理器3002。In addition, the processing unit 2002 in the apparatus 2000 shown in FIG. 4 may be equivalent to the processor 3002 in the apparatus 3000 shown in FIG. 5 .
图5是本申请实施例的图像处理装置的硬件结构示意图。图5所示的装置3000包括存储器3001、处理器3002、通信接口3003以及总线3004。其中,存储器3001、处理器3002、通信接口3003通过总线3004实现彼此之间的通信连接。FIG. 5 is a schematic diagram of a hardware structure of an image processing device according to an embodiment of the present application. The device 3000 shown in FIG. 5 includes a memory 3001 , a processor 3002 , a communication interface 3003 and a bus 3004 . Wherein, the memory 3001 , the processor 3002 , and the communication interface 3003 are connected to each other through a bus 3004 .
存储器3001可以是只读存储器(read only memory,ROM),静态存储设备,动态存储设备或者随机存取存储器(random access memory,RAM)。存储器3001可以存储程序,当存储器3001中存储的程序被处理器3002执行时,处理器3002和通信接口3003用于执行本申请实施例的图像处理方法的各个步骤。The memory 3001 may be a read only memory (read only memory, ROM), a static storage device, a dynamic storage device or a random access memory (random access memory, RAM). The memory 3001 may store programs, and when the programs stored in the memory 3001 are executed by the processor 3002, the processor 3002 and the communication interface 3003 are used to execute various steps of the image processing method of the embodiment of the present application.
处理器3002可以采用通用的中央处理器(central processing unit,CPU),微处理器,应用专用集成电路(application specific integrated circuit,ASIC),图形处理器(graphics processing unit,GPU)或者一个或多个集成电路,用于执行相关程序,以实现本申请实施例的图像处理装置中的单元所需执行的功能,或者执行本申请实施例的图像处理方法的各个步骤。The processor 3002 may be a general-purpose central processing unit (central processing unit, CPU), a microprocessor, an application specific integrated circuit (application specific integrated circuit, ASIC), a graphics processing unit (graphics processing unit, GPU) or one or more The integrated circuit is used to execute related programs to realize the functions required by the units in the image processing device of the embodiment of the present application, or to execute various steps of the image processing method of the embodiment of the present application.
处理器3002还可以是一种集成电路芯片,具有信号的处理能力。在实现过程中,本申请实施例的图像处理方法的各个步骤可以通过处理器3002中的硬件的集成逻辑电路或者软件形式的指令完成。The processor 3002 may also be an integrated circuit chip with signal processing capabilities. During implementation, each step of the image processing method in the embodiment of the present application may be completed by an integrated logic circuit of hardware in the processor 3002 or instructions in the form of software.
上述处理器3002还可以是通用处理器、数字信号处理器(digital signal processing, DSP)、ASIC、FPGA或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。可以实现或者执行本申请实施例中的公开的各方法、步骤及逻辑框图。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。结合本申请实施例所公开的图像处理方法的步骤可以直接体现为硬件译码处理器执行完成,或者用译码处理器中的硬件及软件模块组合执行完成。软件模块可以位于随机存储器,闪存、只读存储器,可编程只读存储器或者电可擦写可编程存储器、寄存器等本领域成熟的存储介质中。该存储介质位于存储器3001,处理器3002读取存储器3001中的信息,结合其硬件完成本申请实施例的图像处理装置中包括的单元所需执行的功能,或者执行本申请实施例的图像处理方法的各个步骤。The above-mentioned processor 3002 may also be a general-purpose processor, a digital signal processor (digital signal processing, DSP), ASIC, FPGA or other programmable logic devices, discrete gate or transistor logic devices, and discrete hardware components. Various methods, steps, and logic block diagrams disclosed in the embodiments of the present application may be implemented or executed. A general-purpose processor may be a microprocessor, or the processor may be any conventional processor, or the like. The steps of the image processing method disclosed in the embodiments of the present application may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor. The software module can be located in a mature storage medium in the field such as random access memory, flash memory, read-only memory, programmable read-only memory or electrically erasable programmable memory, register. The storage medium is located in the memory 3001, and the processor 3002 reads the information in the memory 3001, and combines its hardware to complete the functions required by the units included in the image processing device of the embodiment of the present application, or execute the image processing method of the embodiment of the present application each step.
通信接口3003使用例如但不限于收发器一类的收发装置,来实现装置3000与其他设备或通信网络之间的通信。例如,可以通过通信接口3003发送推理结果所对应的控制参数。The communication interface 3003 implements communication between the apparatus 3000 and other devices or communication networks by using a transceiver device such as but not limited to a transceiver. For example, the control parameters corresponding to the inference results may be sent through the communication interface 3003 .
总线3004可包括在装置3000各个部件(例如,存储器3001、处理器3002、通信接口3003)之间传送信息的通路。The bus 3004 may include a pathway for transferring information between various components of the device 3000 (eg, memory 3001 , processor 3002 , communication interface 3003 ).
需要说明的是,尽管上述装置3000仅仅示出了存储器、处理器、通信接口,但是在具体实现过程中,本领域的技术人员应当理解,装置3000还可以包括实现正常运行所必须的其他器件。同时,根据具体需要,本领域的技术人员应当理解,装置3000还可包括实现其他附加功能的硬件器件。此外,本领域的技术人员应当理解,装置3000也可仅仅包括实现本申请实施例所必须的器件,而不必包括图5中所示的全部器件。It should be noted that although the above-mentioned device 3000 only shows memory, processor, and communication interface, those skilled in the art should understand that the device 3000 may also include other devices necessary for normal operation during specific implementation. Meanwhile, according to specific needs, those skilled in the art should understand that the apparatus 3000 may also include hardware devices for implementing other additional functions. In addition, those skilled in the art should understand that the apparatus 3000 may also only include the components necessary to realize the embodiment of the present application, and does not necessarily include all the components shown in FIG. 5 .
本申请实施例并未对本申请实施例提供的方法的执行主体的具体结构进行特别限定,只要能够通过运行记录有本申请实施例提供的方法的代码的程序,以根据本申请实施例提供的方法进行通信即可。例如,本申请实施例提供的方法的执行主体可以是终端设备或网络设备,或者,是终端设备或网络设备中能够调用程序并执行程序的功能模块。The embodiment of the present application does not specifically limit the specific structure of the execution subject of the method provided in the embodiment of the present application, as long as the program that records the code of the method provided in the embodiment of the present application can be executed according to the method provided in the embodiment of the present application Just communicate. For example, the subject of execution of the method provided by the embodiment of the present application may be a terminal device or a network device, or a functional module in the terminal device or network device that can call a program and execute the program.
本申请的各个方面或特征可以实现成方法、装置或使用标准编程和/或工程技术的制品。本文中使用的术语“制品”可以涵盖可从任何计算机可读器件、载体或介质访问的计算机程序。例如,计算机可读介质可以包括但不限于:磁存储器件(例如,硬盘、软盘或磁带等),光盘(例如,压缩盘(compact disc,CD)、数字通用盘(digital versatile disc,DVD)等),智能卡和闪存器件(例如,可擦写可编程只读存储器(erasable programmable read-only memory,EPROM)、卡、棒或钥匙驱动器等)。Various aspects or features of the present application can be implemented as a method, apparatus, or article of manufacture using standard programming and/or engineering techniques. The term "article of manufacture" as used herein may encompass a computer program accessible from any computer readable device, carrier or media. For example, computer-readable media may include, but are not limited to, magnetic storage devices (such as hard disks, floppy disks, or tapes, etc.), optical disks (such as compact discs (compact disc, CD), digital versatile discs (digital versatile disc, DVD), etc. ), smart cards and flash memory devices (for example, erasable programmable read-only memory (EPROM), card, stick or key drive, etc.).
本文描述的各种存储介质可代表用于存储信息的一个或多个设备和/或其它机器可读介质。术语“机器可读介质”可以包括但不限于:无线信道和能够存储、包含和/或承载指令和/或数据的各种其它介质。Various storage media described herein can represent one or more devices and/or other machine-readable media for storing information. The term "machine-readable medium" may include, but is not limited to, wireless channels and various other media capable of storing, containing and/or carrying instructions and/or data.
需要说明的是,当处理器为通用处理器、DSP、ASIC、FPGA或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件时,存储器(存储模块)可以集成在处理器中。It should be noted that when the processor is a general-purpose processor, DSP, ASIC, FPGA or other programmable logic devices, discrete gate or transistor logic devices, or discrete hardware components, the memory (storage module) may be integrated in the processor.
还需要说明的是,本文描述的存储器旨在包括但不限于这些和任意其它适合类型的存储器。It should also be noted that the memories described herein are intended to include, but are not limited to, these and any other suitable types of memories.
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件 还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的保护范围。Those skilled in the art can appreciate that the units and steps of each example described in conjunction with the embodiments disclosed herein can be implemented by electronic hardware, or a combination of computer software and electronic hardware. Whether these functions are implemented by hardware or software depends on the specific application and design constraints of the technical solution. Professionals may use different methods to implement the described functions for each specific application, but such implementation should not be regarded as exceeding the protection scope of the present application.
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。Those skilled in the art can clearly understand that for the convenience and brevity of description, the specific working process of the above-described devices and units can refer to the corresponding process in the foregoing method embodiments, and details are not repeated here.
在本申请所提供的几个实施例中,应该理解到,所揭露的装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。此外,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。In the several embodiments provided in this application, it should be understood that the disclosed devices and methods may be implemented in other ways. For example, the device embodiments described above are only illustrative. For example, the division of the units is only a logical function division. In actual implementation, there may be other division methods. For example, multiple units or components can be combined or May be integrated into another system, or some features may be ignored, or not implemented. In addition, the mutual coupling or direct coupling or communication connection shown or discussed may be through some interfaces, and the indirect coupling or communication connection of devices or units may be in electrical, mechanical or other forms.
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。The units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in one place, or may be distributed to multiple network units. Part or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
另外,在本申请各个实施例中的各功能单元可以集成在一个单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。In addition, each functional unit in each embodiment of the present application may be integrated into one unit, each unit may exist separately physically, or two or more units may be integrated into one unit.
所述功能如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上,或者说对现有技术做出贡献的部分,或者该技术方案的部分,可以以计算机软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,该计算机软件产品包括若干指令,该指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。前述的存储介质可以包括但不限于:U盘、移动硬盘、ROM、RAM、磁碟或者光盘等各种可以存储程序代码的介质。If the functions described above are realized in the form of software function units and sold or used as independent products, they can be stored in a computer-readable storage medium. Based on this understanding, the essence of the technical solution of this application, or the part that contributes to the prior art, or the part of the technical solution, can be embodied in the form of computer software products, which are stored in a storage In the medium, the computer software product includes several instructions, which are used to make a computer device (which may be a personal computer, server, or network device, etc.) execute all or part of the steps of the methods described in the various embodiments of the present application. The aforementioned storage medium may include, but is not limited to: various media capable of storing program codes such as U disk, mobile hard disk, ROM, RAM, magnetic disk or optical disk.
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以所述权利要求的保护范围为准。The above is only a specific implementation of the application, but the scope of protection of the application is not limited thereto. Anyone familiar with the technical field can easily think of changes or substitutions within the technical scope disclosed in the application. Should be covered within the protection scope of this application. Therefore, the protection scope of the present application should be determined by the protection scope of the claims.

Claims (12)

  1. 一种图像处理方法,其特征在于,包括:An image processing method, characterized in that, comprising:
    获取神经网络处理系统的资源参数和第一神经网络的模型参数,所述资源参数用于表示所述神经网络处理系统的运算能力和存储能力,所述模型参数用于表示所述第一神经网络需要的运算量和存储量;Acquire resource parameters of the neural network processing system and model parameters of the first neural network, the resource parameters are used to represent the computing capability and storage capacity of the neural network processing system, and the model parameters are used to represent the first neural network The amount of computation and storage required;
    根据所述资源参数和所述模型参数,确定所述第一神经网络的输入图像的第一切片尺寸和第一填充尺寸以及拼接层在所述第一神经网络中的层数,其中,第一输出图像的质量在预设的阈值范围内,所述第一输出图像是当所述第一神经网络根据所述第一切片尺寸和所述第一填充尺寸和所述拼接层对所述输入图像进行处理时得到的输出图像。According to the resource parameters and the model parameters, determine the first slice size and the first filling size of the input image of the first neural network and the number of layers of the splicing layer in the first neural network, wherein the first The quality of an output image is within a preset threshold range, the first output image is when the first neural network pairs the first slice size and the first padding size with the stitching layer on the The output image obtained when the input image is processed.
  2. 如权利要求1所述的方法,其特征在于,所述第一切片尺寸和所述第一填充尺寸以及拼接层在所述第一神经网络中的层数是使得所述第一输出图像的质量在预设的阈值范围内的情况下,所述第一神经网络需要的运算量或存储量最小的值。The method of claim 1, wherein the first slice size and the first padding size and the number of stitching layers in the first neural network are such that the first output image When the quality is within the preset threshold range, the first neural network requires the smallest amount of computation or storage.
  3. 如权利要求1或2所述的方法,其特征在于,所述第一切片的尺寸小于或等于所述输入图像的尺寸,所述拼接层的层数小于或等于所述第一神经网络的总层数,所述第一填充尺寸是根据所述预设的阈值范围和所述拼接层的感受野确定的,所述第一神经网络的每一层需要的运算量或存储量不超过所述神经网络系统的运算能力或存储能力。The method according to claim 1 or 2, wherein the size of the first slice is smaller than or equal to the size of the input image, and the number of layers of the stitching layer is smaller than or equal to that of the first neural network. The total number of layers, the first filling size is determined according to the preset threshold range and the receptive field of the splicing layer, and the amount of computation or storage required by each layer of the first neural network does not exceed the specified The computing power or storage capacity of the neural network system.
  4. 如权利要求1至3中任一项所述的方法,其特征在于,所述方法还包括:The method according to any one of claims 1 to 3, further comprising:
    根据所述第一切片尺寸和所述第一填充尺寸将所述输入图像划分为多个图块;dividing the input image into a plurality of tiles according to the first slice size and the first padding size;
    在所述拼接层对所述多个图块对应的处理结果进行拼接,以及根据拼接后的图像得到所述第一输出图像,所述多个图块对应的处理结果是所述第一神经网络对所述多个图块进行处理得到的。The processing results corresponding to the plurality of tiles are stitched at the stitching layer, and the first output image is obtained according to the stitched image, and the processing results corresponding to the plurality of tiles are the first neural network obtained by processing the multiple tiles.
  5. 如权利要求1至4中任一项所述的方法,其特征在于,所述第一输出图像的质量在预设的阈值范围内具体为:所述第一输出图像的质量与第二输出图像的质量的比值在预设范围内,所述第二输出图像是当所述第一神经网络直接对所述输入图像进行处理时得到的输出图像。The method according to any one of claims 1 to 4, wherein the quality of the first output image within a preset threshold range is specifically: the difference between the quality of the first output image and the second output image The quality ratio of is within a preset range, and the second output image is an output image obtained when the first neural network directly processes the input image.
  6. 一种图像处理装置,其特征在于,包括:An image processing device, characterized in that it comprises:
    获取单元,用于获取神经网络处理系统的资源参数和第一神经网络的模型参数,所述资源参数用于表示所述神经网络处理系统的运算能力和存储能力,所述模型参数用于表示所述第一神经网络需要的运算量和存储量;An acquisition unit, configured to acquire resource parameters of the neural network processing system and model parameters of the first neural network, the resource parameters are used to represent the computing capability and storage capacity of the neural network processing system, and the model parameters are used to represent the Describe the amount of computation and storage required by the first neural network;
    处理单元,用于根据所述资源参数和所述模型参数,确定所述第一神经网络的输入图像的第一切片尺寸和第一填充尺寸以及拼接层在所述第一神经网络中的层数,其中,第一输出图像的质量在预设的阈值范围内,所述第一输出图像是当所述第一神经网络根据所述第一切片尺寸、所述第一填充尺寸和所述拼接层对所述输入图像进行处理时得到的输出图像。A processing unit, configured to determine the first slice size and the first padding size of the input image of the first neural network and the layer of the stitching layer in the first neural network according to the resource parameters and the model parameters number, wherein the quality of the first output image is within a preset threshold range, the first output image is obtained when the first neural network according to the first slice size, the first padding size and the An output image obtained when the splicing layer processes the input image.
  7. 如权利要求6所述的装置,其特征在于,所述第一切片尺寸和所述第一填充尺寸和所述拼接层是使得所述第一输出图像的质量在预设的阈值范围内的情况下,所述第一神经网络需要的运算量或存储量最小的值。The apparatus according to claim 6, wherein the first slice size and the first padding size and the stitching layer are such that the quality of the first output image is within a preset threshold range In some cases, the first neural network requires the least amount of computation or storage.
  8. 如权利要求6或7所述的装置,其特征在于,所述第一切片的尺寸小于或等于所述输入图像的尺寸,所述拼接层的层数小于或等于所述第一神经网络的总层数,所述第一填充尺寸是根据所述预设的阈值范围和所述拼接层的感受野确定的,所述第一神经网络的每一层需要的运算量或存储量不超过所述神经网络系统的运算能力或存储能力。The device according to claim 6 or 7, wherein the size of the first slice is smaller than or equal to the size of the input image, and the number of layers of the stitching layer is smaller than or equal to that of the first neural network. The total number of layers, the first filling size is determined according to the preset threshold range and the receptive field of the splicing layer, and the amount of computation or storage required by each layer of the first neural network does not exceed the specified The computing power or storage capacity of the neural network system.
  9. 如权利要求6至8中任一项所述的装置,其特征在于,所述处理单元还用于:The device according to any one of claims 6 to 8, wherein the processing unit is further configured to:
    根据所述第一切片尺寸和所述第一填充尺寸将所述输入图像划分为多个图块;dividing the input image into a plurality of tiles according to the first slice size and the first padding size;
    在所述拼接层对所述多个图块对应的处理结果进行拼接,所述多个图块对应的处理结果是所述第一神经网络对所述多个图块进行处理得到的。The processing results corresponding to the multiple tiles are spliced at the splicing layer, and the processing results corresponding to the multiple tiles are obtained by processing the multiple tiles by the first neural network.
  10. 如权利要求6至9中任一项所述的装置,其特征在于,所述第一输出图像的质量在预设的阈值范围内具体为:所述第一输出图像的质量与第二输出图像的质量的比值在预设范围内,所述第二输出图像是当所述第一神经网络直接对所述输入图像进行处理时得到的输出图像。The device according to any one of claims 6 to 9, wherein the quality of the first output image within a preset threshold range is specifically: the difference between the quality of the first output image and the second output image The quality ratio of is within a preset range, and the second output image is an output image obtained when the first neural network directly processes the input image.
  11. 一种计算机可读存储介质,其特征在于,所述计算机可读介质存储用于设备执行的程序代码,该程序代码包括用于执行如权利要求1至5中任一项所述方法的指令。A computer-readable storage medium, characterized in that the computer-readable medium stores program code for execution by a device, and the program code includes instructions for executing the method according to any one of claims 1 to 5.
  12. 一种芯片,其特征在于,所述芯片包括处理器与数据接口,所述处理器通过所述数据接口读取存储器上存储的指令,以执行如权利要求1至5中任一项所述的方法。A chip, characterized in that the chip includes a processor and a data interface, and the processor reads instructions stored on the memory through the data interface to execute the method described in any one of claims 1 to 5. method.
PCT/CN2021/102742 2021-06-28 2021-06-28 Image processing method and image processing apparatus WO2023272432A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202180099529.1A CN117501300A (en) 2021-06-28 2021-06-28 Image processing method and image processing apparatus
PCT/CN2021/102742 WO2023272432A1 (en) 2021-06-28 2021-06-28 Image processing method and image processing apparatus

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2021/102742 WO2023272432A1 (en) 2021-06-28 2021-06-28 Image processing method and image processing apparatus

Publications (1)

Publication Number Publication Date
WO2023272432A1 true WO2023272432A1 (en) 2023-01-05

Family

ID=84690947

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/102742 WO2023272432A1 (en) 2021-06-28 2021-06-28 Image processing method and image processing apparatus

Country Status (2)

Country Link
CN (1) CN117501300A (en)
WO (1) WO2023272432A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116168066A (en) * 2023-04-25 2023-05-26 河海大学 Building three-dimensional point cloud registration preprocessing method based on data analysis

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6084981A (en) * 1994-11-29 2000-07-04 Hitachi Medical Corporation Image processing apparatus for performing image converting process by neural network
WO2020135601A1 (en) * 2018-12-29 2020-07-02 北京市商汤科技开发有限公司 Image processing method and device, vehicle-mounted operation platform, electronic device and system
WO2020241337A1 (en) * 2019-05-24 2020-12-03 株式会社日立製作所 Image processing device
CN112215854A (en) * 2020-10-19 2021-01-12 珠海金山网络游戏科技有限公司 Image processing method and device
CN112257759A (en) * 2020-09-27 2021-01-22 华为技术有限公司 Image processing method and device
CN113034358A (en) * 2019-12-09 2021-06-25 华为技术有限公司 Super-resolution image processing method and related device

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6084981A (en) * 1994-11-29 2000-07-04 Hitachi Medical Corporation Image processing apparatus for performing image converting process by neural network
WO2020135601A1 (en) * 2018-12-29 2020-07-02 北京市商汤科技开发有限公司 Image processing method and device, vehicle-mounted operation platform, electronic device and system
WO2020241337A1 (en) * 2019-05-24 2020-12-03 株式会社日立製作所 Image processing device
CN113034358A (en) * 2019-12-09 2021-06-25 华为技术有限公司 Super-resolution image processing method and related device
CN112257759A (en) * 2020-09-27 2021-01-22 华为技术有限公司 Image processing method and device
CN112215854A (en) * 2020-10-19 2021-01-12 珠海金山网络游戏科技有限公司 Image processing method and device

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116168066A (en) * 2023-04-25 2023-05-26 河海大学 Building three-dimensional point cloud registration preprocessing method based on data analysis

Also Published As

Publication number Publication date
CN117501300A (en) 2024-02-02

Similar Documents

Publication Publication Date Title
US9916531B1 (en) Accumulator constrained quantization of convolutional neural networks
KR102631381B1 (en) Convolutional neural network processing method and apparatus
US11887005B2 (en) Content adaptive attention model for neural network-based image and video encoders
US9082160B2 (en) Image processing method, image compression device and mobile terminal
US20180253635A1 (en) Neural network devices and methods of operating the same
US11537857B2 (en) Pooling processing method and system applied to convolutional neural network
US20200279358A1 (en) Method, device, and system for testing an image
JP2023523029A (en) Image recognition model generation method, apparatus, computer equipment and storage medium
CN105979265A (en) Image compression method and apparatus
WO2023272432A1 (en) Image processing method and image processing apparatus
KR20210024126A (en) Feature map magnification method, apparatus, device and computer-readable recording medium
WO2021147276A1 (en) Data processing method and apparatus, and chip, electronic device and storage medium
CN114897151A (en) Access optimization method and device, electronic equipment and storage medium
CN117376645A (en) Video preloading method, device, computer equipment and storage medium
CN116630302A (en) Cell image segmentation method and device and electronic equipment
KR20210136476A (en) Compressing device and method using parameters of a quad-tree method
JP7033507B2 (en) Neural network processor, neural network processing method, and program
CN112990440B (en) Data quantization method for neural network model, readable medium and electronic device
CN114253956A (en) Edge caching method and device and electronic equipment
US11019366B2 (en) Image compression and decompression using triangulation
US10277912B2 (en) Methods and apparatus for storing data related to video decoding
WO2022007586A1 (en) Data processing method and apparatus, and related device
WO2021237870A1 (en) Data encoding method, data decoding method, data processing method, encoder, decoder, system, movable platform, and computer-readable medium
KR102504007B1 (en) Context vector extracting module generating context vector from partitioned image and operating method thereof
CN113068043B (en) PNG image compression method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21947396

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 202180099529.1

Country of ref document: CN

NENP Non-entry into the national phase

Ref country code: DE