CN112581366B - Portable image super-resolution system and system construction method - Google Patents

Portable image super-resolution system and system construction method Download PDF

Info

Publication number
CN112581366B
CN112581366B CN202011376766.1A CN202011376766A CN112581366B CN 112581366 B CN112581366 B CN 112581366B CN 202011376766 A CN202011376766 A CN 202011376766A CN 112581366 B CN112581366 B CN 112581366B
Authority
CN
China
Prior art keywords
dpu
resolution
neural network
image
reasoning
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011376766.1A
Other languages
Chinese (zh)
Other versions
CN112581366A (en
Inventor
刘明亮
王晓航
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Heilongjiang University
Original Assignee
Heilongjiang University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Heilongjiang University filed Critical Heilongjiang University
Priority to CN202011376766.1A priority Critical patent/CN112581366B/en
Publication of CN112581366A publication Critical patent/CN112581366A/en
Application granted granted Critical
Publication of CN112581366B publication Critical patent/CN112581366B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformation in the plane of the image
    • G06T3/40Scaling the whole image or part thereof
    • G06T3/4053Super resolution, i.e. output image resolution higher than sensor resolution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/04Inference or reasoning models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformation in the plane of the image
    • G06T3/40Scaling the whole image or part thereof
    • G06T3/4046Scaling the whole image or part thereof using neural networks

Abstract

The invention discloses a portable image super-resolution system and a system construction method. The PL hardware layer is responsible for constructing a DPU IP core, is used for reasoning a neural network and accelerates the reasoning process; the PS embedded Linux system layer is responsible for reading and storing pictures, scheduling tasks of the DPUs, performing sub-pixel convolution operation on neural network output and communicating with an upper computer; the client application layer faces to a user, and the user simply operates the picture which needs to be subjected to super-resolution and is specified by the user, acquires the output high-resolution image and modifies the parameters. The deployment of the convolutional neural network of the image super-resolution on the ZYNQ embedded platform is realized, the network reasoning is successfully accelerated, and a good output result is obtained.

Description

Portable image super-resolution system and system construction method
Technical Field
The invention belongs to the field of images; in particular to a portable image super-resolution system and a system construction method.
Background
The image Super Resolution (SR) problem is a classical problem in the field of computer vision, the aim of which is to restore High Resolution (HR) images from Low Resolution (LR) image reconstruction. Various SR methods have been widely used in industry, security and medicine, while also showing great promise in the field of social entertainment. Therefore, the problem of super-resolution image calculation has attracted the attention of many excellent scholars in the field of computer vision, and many excellent super-resolution image algorithms have been proposed.
Early SR methods were mainly based on image interpolation methods, such as nearest neighbor interpolation, bilinear interpolation and bicubic interpolation. Image interpolation algorithms typically generate high resolution images by interpolating new pixels into low resolution images, where the pixels are obtained by weighted averaging of adjacent pixel values of the low resolution images. Some more effective methods are super-resolution processing using statistical image prior information, which can obtain more image details but also require more prior knowledge.
In recent years, many machine learning-based super-resolution methods have been proposed in succession, such as the SRCNN, which for the first time successfully introduces neural network technology into the super-resolution problem. This method uses a lightweight structure of the network, but achieves a higher result quality than the most advanced methods at the time. FSRCNN uses 1x1 convolution to expand and reduce the number of feature maps, and unlike SRCNN, instead of inputting interpolated pseudo high resolution maps into the network, this method enlarges the image through a deconvolution layer, greatly reducing the time cost of training and reasoning. ESPCN provides a new up-sampling method of sub-pixel convolution, time cost of training and reasoning is reduced, and an output result with better quality is obtained.
Although the existing method based on machine learning achieves very excellent results, the following defects still exist: (1) the parameter quantity and the calculation quantity of the traditional network model are huge, the time and the power consumption consumed by calculation are not ideal, and meanwhile, the application of the traditional network model on an embedded platform is limited (2), because the traditional network generally tends to increase the depth and the width of the network in order to obtain better output quality, the training and the deployment are difficult (3), because the data precision required in the network training stage is higher, the data format of float32 or float64 is adopted, and the output quality improvement brought by using high-precision data in the inference stage of the network is very weak.
Disclosure of Invention
The invention provides a portable image super-resolution system and a system construction method, which realize the deployment of a convolutional neural network of image super-resolution on a ZYNQ embedded platform, successfully accelerate network reasoning and obtain a good output result.
The invention is realized by the following technical scheme:
a portable image super-resolution system comprises a PL hardware layer, a PS embedded Linux system layer and a client application layer, wherein the PL hardware layer is responsible for constructing a DPU IP core, is used for reasoning of a neural network and accelerates the reasoning process; the PS embedded Linux system layer is responsible for reading and storing pictures, scheduling tasks of the DPUs, performing sub-pixel convolution operation on neural network output and communicating with an upper computer; the client application layer faces a user, and the user simply operates the pictures needing super resolution, acquires the output high-resolution images and modifies parameters.
A system construction method of a portable image super-resolution system comprises a PL (personal information Unit) end logic construction step, an embedded Linux customization and transplantation step, a convolutional neural network model design training and deployment step, a lower computer control construction step and a DPU network accelerated reasoning construction step.
Further, the step of constructing the PL side logic is specifically,
step S2.1: integration and connection of DPU IP cores; the DPU IP core adopts a low RAM consumption mode, the DSP slice consumption mode is set to be high, and depth direction convolution is not used; the working frequency of the DSP slice needs to be fixed to be twice of the working frequency of the DPU;
step S2.2: and configuring the ZYNQ core according to a development board schematic diagram.
Further, the step of performing embedded Linux customization and transplantation specifically comprises
Step S3.1: customizing the U-Boot of the embedded Linux; namely, the SD card starting mode is directly used;
step S3.2: the Rootfs customization of the embedded Linux is carried out on the basis of the step S3.1; configuring the root file system of the installation environment by enabling the specified starting path in the last step to contain the file pair installation environment for system starting and running;
step S3.3: kernel customization of the embedded Linux is carried out on the basis of the step S3.2; modifying the equipment tree file of the system;
step S3.4: compiling and transplanting the embedded Linux on the basis of the step S3.3; and (4) running build function compilation of the Peerlinux, generating a U-Boot file after compiling, copying the generated U-Boot file and ub file to a FAT32 partition of the SD card, and decompressing a rootfs compression packet to an EXT4 partition of the SD card.
Further, the step of designing, training and deploying the convolutional neural network model is specifically,
step S4.1: constructing a convolution neural network model by using two methods of 5 residual blocks and sub-pixel convolution;
step S4.2: training the convolutional neural network model of the step S4.1; training by using 100 pictures of 1920x1080x3, wherein the picture comprises 50 landscape pictures and 50 paintings, training data and verification data are 9:1, reducing the pictures to 640x360x3 to obtain input of convolutional neural network model training, inputting the input value of the model constructed in the step 1, using an original 1920x1080x3 image as a Lable of a network, and storing a finally trained convolutional neural network model;
step S4.3: and deploying the convolutional neural network model trained in the step S4.2 to realize the super-resolution function of the image.
Further, the control of the lower computer is specifically realized by acquiring the addresses of all pictures in the specified directory and reporting the task amount; performing double-process processing according to the task quantity;
firstly, reading a picture, reading the picture of the process I, then carrying out DPU reasoning of the process I and obtaining a result, carrying out sub-pixel convolution on the result of the process I to form a hyper-resolution image of the process I, storing the hyper-resolution image of the process I and reporting the time consumption of a task;
and the second process reads the picture, performs DPU reasoning on the second process after reading the picture of the second process and acquires a result, performs sub-pixel convolution on the result of the second process to form a hyper-resolution image of the second process, stores the hyper-resolution image of the second process and reports the time consumed by the task.
Further, the step of constructing the DPU network accelerated inference specifically includes the following steps:
step S6.1: opening the DPU equipment;
step S6.2: loading a network model by using DPU equipment;
step S6.3: reading an image address list;
step S6.4: feeding the low resolution image of step S3 to the DPU;
step S6.5: starting a DPU task;
step S6.6: acquiring the output tensor in the step S6.5;
step S6.7: processing the output tensor in step S6.6 using the sub-pixel convolution at the PS end;
step S6.8: judging whether the image address list reaches the upper limit, if so, performing step S9, and if not, performing step S6.3;
step S6.9: and (6) ending.
The invention has the beneficial effects that:
under the condition of maintaining the quality of an output picture, the fast image super-resolution processing speed is realized through the accelerated reasoning of the DPU network; meanwhile, the algorithm is deployed on an embedded platform, so that compared with the algorithm deployed on a PC (personal computer) platform, the algorithm greatly reduces the power consumption, and the system has portability
Drawings
FIG. 1 is a diagram of the program architecture of the system of the present invention.
Fig. 2 is a functional distribution diagram of the system of the present invention.
Fig. 3 is an architecture diagram of the hardware system of the present invention.
FIG. 4 is a schematic diagram of xc7z020 according to the present invention.
FIG. 5 is a schematic diagram of the JTAG download circuit of the present invention.
Fig. 6 is a schematic diagram of a USB interface circuit of the present invention.
FIG. 7 is a schematic diagram of a UART interface circuit according to the present invention
Fig. 8 is a schematic diagram of the QSPI FLASH circuit of the present invention.
Fig. 9 is a schematic diagram of a gigabit port circuit of the present invention.
FIG. 10 is a schematic diagram of the SD card circuit of the present invention.
Fig. 11 is a schematic diagram of a power supply circuit of the present invention, in which, (a) a 0V power supply circuit, (b) an 8V power supply circuit, (c) a 5V power supply circuit, and (d) a 3V power supply circuit.
FIG. 12 is a diagram illustrating the configuration of the DPU IP core parameters according to the present invention.
FIG. 13 is a schematic diagram of the DPU IP core operating frequency configuration of the present invention.
FIG. 14 is a PL-terminal global wiring diagram of the present invention.
FIG. 15 is a diagram of the DPU bus address assignment of the present invention.
FIG. 16 is a schematic diagram of a ZYNQ core configuration of the present invention.
Fig. 17 is a schematic diagram of the configuration of the Petalinux start-up mode according to the present invention.
FIG. 18 is a schematic diagram of the shut down generation boot areas automation mode of the present invention.
FIG. 19 is a schematic diagram of the Linux boot path and CMA space being manually configured according to the present invention.
FIG. 20 is a schematic diagram of an OpenCV environment cross-compilation configuration in accordance with the present invention.
FIG. 21 is a schematic diagram of a Python environment cross-compilation configuration of the present invention.
FIG. 22 is a schematic diagram of a device tree file configuration according to the present invention.
Fig. 23 is a schematic diagram of the residual block structure of the present invention.
FIG. 24 is a schematic diagram of the convolutional neural network structure of the present invention.
FIG. 25 is a flow chart of a control method of the present invention.
FIG. 26 is a flow chart of the DPU accelerated neural network inference method of the present invention.
Fig. 27 is a natural image contrast diagram of the present invention, in which (a) an original image and (b) a super-resolution result diagram.
Fig. 28 is a comparison graph of artificial drawing according to the present invention, wherein (a) the original drawing and (b) the super-resolution result graph.
Detailed Description
The technical solutions in the embodiments of the present invention will be described clearly and completely with reference to the accompanying drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Example 1
A portable image super-resolution system comprises a PL hardware layer, a PS embedded Linux system layer and a client application layer, wherein the PL hardware layer is responsible for constructing a DPU IP core, is used for reasoning of a neural network and accelerates the reasoning process; the PS embedded Linux system layer is responsible for reading and storing pictures, scheduling tasks of the DPUs, performing sub-pixel convolution operation on neural network output and communicating with an upper computer; the client application layer faces a user, and the user simply operates the pictures needing super resolution, acquires the output high-resolution images and modifies parameters.
A system construction method of a portable image super-resolution system comprises a PL (personal information Unit) end logic construction step, an embedded Linux customization and transplantation step, a convolutional neural network model design training and deployment step, a lower computer control construction step and a DPU network accelerated reasoning construction step.
Further, the step of constructing the PL side logic is specifically,
step S2.1: integration and connection of DPU IP cores; the DPU IP core adopts a low RAM consumption mode, the DSP slice consumption mode is set to be high, and depth direction convolution is not used; the working frequency of the DSP slice needs to be fixed to be twice of the working frequency of the DPU;
step S2.2: and configuring the ZYNQ core according to a development board schematic diagram.
Further, the step of performing embedded Linux customization and transplantation specifically comprises
Step S3.1: customizing the U-Boot of the embedded Linux; namely, the SD card starting mode is directly used;
step S3.2: the Rootfs customization of the embedded Linux is carried out on the basis of the step S3.1; configuring the root file system of the installation environment by enabling the specified starting path in the last step to contain the file pair installation environment for system starting and running;
step S3.3: kernel customization of the embedded Linux is carried out on the basis of the step S3.2; modifying the equipment tree file of the system;
step S3.4: compiling and transplanting the embedded Linux on the basis of the step S3.3; and (4) running build function compilation of the Peerlinux, generating a U-Boot file after compiling, copying the generated U-Boot file and ub file to a FAT32 partition of the SD card, and decompressing a rootfs compression packet to an EXT4 partition of the SD card.
Further, the step of designing, training and deploying the convolutional neural network model is specifically,
step S4.1: constructing a convolution neural network model by using two methods of 5 residual blocks and sub-pixel convolution;
step S4.2: training the convolutional neural network model of the step S4.1; training by using 100 pictures of 1920x1080x3, wherein the picture comprises 50 landscape pictures and 50 paintings, training data and verification data are 9:1, reducing the pictures to 640x360x3 to obtain input of convolutional neural network model training, inputting the input value of the model constructed in the step 1, using an original 1920x1080x3 image as a Lable of a network, and storing a finally trained convolutional neural network model;
step S4.3: and deploying the convolutional neural network model trained in the step S4.2 to realize the super-resolution function of the image.
Further, the step 3 specifically includes performing model freezing, model quantization and model compiling on the convolutional neural network model trained in the step 2.
Furthermore, the model freezing body is that the definition of the model calculation graph and the model weight are combined into the same file, so that the deployment of the model is facilitated.
Further, the model quantization is that a convolutional neural network model with a parameter of float32 type is subjected to int8 quantization operation, the quantized model is corrected through a quantized data set (the adverse effect of precision reduction caused by quantization is reduced to the minimum), and the model can be conveniently quantized through calling a decentq tool carried by a DNNDK tool kit through a script.
Further, the model is compiled such that the quantized model is compiled into a model which can be read by the DPU. The DNNC tool carried by the DNNDK tool package can be called through the script to compile the model into an ELF file, and then the file is packaged into an o file, so that the calling of the DNNDK-Python API is facilitated.
(this is the result analysis, not the step (explanation)) PSNR (peak signal to noise ratio) and SSIM (structural similarity) are the evaluation indexes for evaluating the quality of the neural network output picture, the higher the quality is, and the smaller the parameter number and training time is, the better the quality is.
Due to the use of the sub-pixel convolution technology, images flowing in the network are low-resolution images, and the final effect is that the speed of the network reasoning stage is greatly improved; the memory consumption in the network reasoning stage is reduced; the net end effect is improved.
If the task is super resolution 4 times, the low resolution image size is 100 x 100, the original high resolution image size is 400 x 400, if no sub-pixel convolution is used, the network input is a picture pre-magnified by a bicubic interpolation algorithm, i.e. 400 x 400, if sub-pixel convolution is used, the input does not need to be pre-magnified, i.e. 100 x 100, which results in:
the speed of the network reasoning phase is greatly improved.
Through calculation, compared with the method that the sub-pixel convolution is not used, the common convolution with the output of 3 channels is used for replacement, the calculation amount is reduced by about 160 times; making the inference time decrease proportionally.
The network reasoning phase consumes less memory.
Through calculation, compared with the method that the sub-pixel convolution is not used, the common convolution with the output of 3 channels is used for replacement, and the memory consumption is reduced by about 16 times.
The net end effect is improved.
Example 2
The experimental conditions are as follows: network a (sub-pixel convolution) and network B (normal convolution), using the same training set, training the same number of iterations, using the same test set, yields the following results:
PSNR SSIM
network A 32.10 0.8958
Network B 31.23 0.8901
Compared with the network B, the network A improves the PSNR (peak signal-to-noise ratio) by 2.79 percent and the SSIM (structural similarity) by 0.64 percent, so that the final inference result of the network is improved.
Due to the use of the residual learning technology, the phenomenon that the gradient of a deep network disappears when the network is in a training stage is relieved, and the convergence speed of the final effect network in the training stage is greatly improved;
example 3
The experimental conditions are as follows: network a (using residual learning), network B (not using residual learning), other conditions being the same;
the results of the experiment are shown in FIG. 3
As is apparent from fig. 3, the network a using the residual learning technique (represented by a green line) has a significantly improved network convergence rate compared to the network B not using residual learning (represented by an orange line), which means that in the model training phase, the network a using the residual learning technique can obtain a better result with a smaller number of training iterations, and the time and labor cost consumed by training the network are reduced.
Due to the lightweight design of the network structure, the network model is smaller and more exquisite, and the final effect is that the network reasoning speed is increased; memory consumption is reduced; the final result is a slight degradation in quality.
The experimental environment is as follows: the network A uses 5 residual blocks, the number of convolution kernels of each convolution layer is 64, the network B uses 10 residual blocks, the number of convolution kernels of each convolution layer is 128, and other conditions are the same;
PSNR SSIM amount of ginseng Time consuming
Network A 32.10 0.8958 101179 2657s
Network B 32.81 0.9039 755931 8684s
As can be seen from the above table, although the quality of the design network output result of 10 residual blocks is improved, the quality is greatly improved with the parameter, and under the cost that PSNR and SSIM are reduced by 2.16% and 0.89%, respectively, the parameter and training time are greatly improved by 86.61% and 69.40%, respectively, which is very disadvantageous to the network model deployed on an embedded platform due to large parameter amount and long training time, which results in the substantial increase of model inference time and memory consumption, and the cost improvement due to the increase of the requirement on a hardware platform. Through a plurality of tests, a better balance point can be obtained between the performance and the quality by using the given structure in the invention.
Example 4
(for explanation) due to the use of the model quantization technology, the final effect of converting the network weight from the Float32 type to the Int8 type is that the speed of the network inference phase is greatly improved; the memory consumption in the network reasoning stage is reduced; the network reasoning output index is not obviously reduced.
The speed of the network reasoning phase is greatly improved.
The data type before quantization is Float32 (32-bit floating point type), a large amount of DSP slice resources are consumed for calculation in the FPGA, and because the resources are very limited, a batch of data needs to be divided into a plurality of batches for respective calculation, while the resources consumed for calculation by using Int8 (8-bit integer type) are few, so that a batch of data only needs to be divided into a few batches for calculation, and the calculation time is greatly reduced.
The network reasoning phase consumes less memory.
Since the weight is changed to Int8, the characteristic diagram calculated in the network is also of type Int8, and compared with type Float32, the use of type Int8 can directly reduce the memory space consumption by 4 times.
The decrease of the network reasoning output index is not obvious
The experimental conditions are as follows: network A uses quantization techniques and network B does not use quantization techniques
PSNR SSIM
Network A 32.10 0.8958
Network B 32.06 0.8957
From the data, the evaluation index of the quantized network output image is slightly reduced but the amplitude is very small, the quality reduction of the output image caused by the index reduction can hardly be observed by human eyes, the reasoning speed is greatly improved, the memory consumption is greatly reduced, and the quantization is very important for the neural network model to be deployed on a portable platform with low computing power and low power consumption.
Further, the control of the lower computer is specifically realized by acquiring the addresses of all pictures in the specified directory and reporting the task amount; performing double-process simultaneous processing according to the task amount;
firstly, reading a picture, reading the picture of the process I, then carrying out DPU reasoning of the process I and obtaining a result, carrying out sub-pixel convolution on the result of the process I to form a hyper-resolution image of the process I, storing the hyper-resolution image of the process I and reporting the time consumption of a task;
and the second process reads the picture, performs DPU reasoning on the second process after reading the picture of the second process and acquires a result, performs sub-pixel convolution on the result of the second process to form a hyper-resolution image of the second process, stores the hyper-resolution image of the second process and reports the time consumed by the task.
Further, the step of constructing the DPU network accelerated inference specifically includes the following steps:
step S6.1: opening the DPU equipment;
step S6.2: loading a network model by using DPU equipment;
step S6.3: reading an image address list;
step S6.4: feeding the low resolution image of step S3 to the DPU;
step S6.5: starting a DPU task;
step S6.6: acquiring the output tensor in the step S6.5;
step S6.7: processing the output tensor in step S6.6 using the sub-pixel convolution at the PS end;
step S6.8: judging whether the image address list reaches the upper limit, if so, performing step S9, and if not, performing step S6.3;
step S6.9: and (6) ending.
Example 5
The test is carried out through a debugging mode, and the program can output information and verify conveniently. The design and a computer are connected to the same router, and the upper computer logs in the embedded Linux of the development board by using SSH to execute the program.
The program running process and the network reasoning time can be known through outputting, when the program runs, the super-resolution image output by the program can be found under a result directory, the super-resolution tasks of the natural image and the artificial drawing can be better completed by comparing the front part and the back part of the image with (a) and (b) of fig. 27 and (a) and (b) of fig. 28, the resolution of the image is successfully amplified according to the specified magnification, and the quality of the output image is greatly improved compared with that of the input image.
The test task is super-resolution of a single 640x360x3 image to 1920x1080x 3. The network inference speed comparison table is shown in the following table.
TABLE 5-1 network inference speed comparison
Figure GDA0002929305340000091
In terms of time, as can be seen from table 5-1, the network inference time is about 2.2s, the time for processing one picture by a single process is about 44.8 s, and the total processing time of each picture is about 22.43 s in practice due to the two-process design. The power consumption of the GPU and the power consumption of the CPU are both measured by AIDA64 software, the power consumption of other parts in the system is not included, the power consumption of the design in Vivado is estimated to be 3.7w, and the maximum power of the platform power supply is 10w, so that the power consumption of other parts of the platform is not more than 10w in total even if the power consumption of other parts of the platform is included.

Claims (4)

1. A system construction method of a portable image super-resolution system is characterized in that the system comprises a PL hardware layer, a PS embedded Linux system layer and a client application layer, wherein the PL hardware layer is responsible for constructing a DPU IP core, is used for reasoning of a neural network and accelerates the reasoning process; the PS embedded Linux system layer is responsible for reading and storing pictures, scheduling tasks of the DPUs, performing sub-pixel convolution operation on neural network output and communicating with an upper computer; the client application layer faces to a user, and the user simply operates the picture needing super resolution specified by the user, acquires the output high-resolution image and modifies parameters;
the system construction method comprises the steps of constructing PL end logic, customizing and transplanting embedded Linux, designing, training and deploying a convolutional neural network model, constructing control of a lower computer and constructing DPU network accelerated reasoning;
the steps of the design training and deployment of the convolutional neural network model are specifically,
step S4.1: constructing a convolution neural network model by using two methods of 5 residual blocks and sub-pixel convolution;
step S4.2: training the convolutional neural network model of the step S4.1; training by using 100 pictures of 1920x1080x3, wherein the picture comprises 50 landscape pictures and 50 paintings, training data and verification data are 9:1, the pictures are reduced to 640x360x3 to obtain input of convolutional neural network model training, the input is input to the model constructed in the step S4.1, an original 1920x1080x3 image is used as a Lable of a network, and a finally trained convolutional neural network model is stored;
step S4.3: deploying the convolutional neural network model trained in the step S4.2 to realize the super-resolution function of the image;
the step of constructing the DPU network accelerated reasoning specifically comprises the following steps:
step S6.1: opening a DPU device;
step S6.2: loading a network model by using DPU equipment;
step S6.3: reading an image address list;
step S6.4: sending the low-resolution image of the step S6.3 into a DPU;
step S6.5: starting a DPU task;
step S6.6: acquiring the output tensor in the step S6.5;
step S6.7: processing the output tensor in step S6.6 using the sub-pixel convolution at the PS end;
step S6.8: judging whether the image address list reaches the upper limit, if so, performing a step SS6.9, and if not, performing a step S6.3;
step S6.9: and (6) ending.
2. The system building method according to claim 1, wherein the step of building PL side logic is embodied as
Step S2.1: integration and connection of DPU IP cores; the DPU IP core adopts a low RAM consumption mode, the DSP slice consumption mode is set to be high, and depth direction convolution is not used; the working frequency of the DSP slice needs to be fixed to be twice of the working frequency of the DPU;
step S2.2: and configuring the ZYNQ core according to a development board schematic diagram.
3. The system building method according to claim 1, wherein the step of performing embedded Linux customization and migration is specifically
Step S3.1: customizing the U-Boot of the embedded Linux; namely, the SD card starting mode is directly used;
step S3.2: the Rootfs customization of the embedded Linux is carried out on the basis of the step S3.1; enabling the starting path appointed in the last step to contain the files for starting and running the system and configuring a root file system of the installation environment;
step S3.3: kernel customization of the embedded Linux is carried out on the basis of the step S3.2; modifying the equipment tree file of the system;
step S3.4: compiling and transplanting the embedded Linux on the basis of the step S3.3; and (4) running build function compilation of the Peerlinux, generating a U-Boot file after compiling, copying the generated U-Boot file and ub file to a FAT32 partition of the SD card, and decompressing a rootfs compression packet to an EXT4 partition of the SD card.
4. The system building method according to claim 1, wherein the control of the building lower computer is specifically to obtain addresses of all pictures in a specified directory and report task amount; performing double-process simultaneous processing according to the task quantity;
firstly, reading a picture, reading the picture of the process I, then carrying out DPU reasoning of the process I and obtaining a result, carrying out sub-pixel convolution on the result of the process I to form a hyper-resolution image of the process I, storing the hyper-resolution image of the process I and reporting the time consumption of a task;
and the second process reads the picture, performs DPU reasoning on the second process after reading the picture of the second process and acquires a result, performs sub-pixel convolution on the result of the second process to form a hyper-resolution image of the second process, stores the hyper-resolution image of the second process and reports the time consumed by the task.
CN202011376766.1A 2020-11-30 2020-11-30 Portable image super-resolution system and system construction method Active CN112581366B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011376766.1A CN112581366B (en) 2020-11-30 2020-11-30 Portable image super-resolution system and system construction method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011376766.1A CN112581366B (en) 2020-11-30 2020-11-30 Portable image super-resolution system and system construction method

Publications (2)

Publication Number Publication Date
CN112581366A CN112581366A (en) 2021-03-30
CN112581366B true CN112581366B (en) 2022-05-20

Family

ID=75128080

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011376766.1A Active CN112581366B (en) 2020-11-30 2020-11-30 Portable image super-resolution system and system construction method

Country Status (1)

Country Link
CN (1) CN112581366B (en)

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2823409A4 (en) * 2012-03-04 2015-12-02 Adam Jeffries Data systems processing
CN105205782B (en) * 2015-09-06 2019-08-16 京东方科技集团股份有限公司 Supersolution is as method and system, server, user equipment and its method
CN107679621B (en) * 2017-04-19 2020-12-08 赛灵思公司 Artificial neural network processing device
US10885607B2 (en) * 2017-06-01 2021-01-05 Qualcomm Incorporated Storage for foveated rendering
US10951875B2 (en) * 2018-07-03 2021-03-16 Raxium, Inc. Display processing circuitry
WO2020183059A1 (en) * 2019-03-14 2020-09-17 Nokia Technologies Oy An apparatus, a method and a computer program for training a neural network
CN111242314B (en) * 2020-01-08 2023-03-21 中国信息通信研究院 Deep learning accelerator benchmark test method and device
CN111325327B (en) * 2020-03-06 2022-03-08 四川九洲电器集团有限责任公司 Universal convolution neural network operation architecture based on embedded platform and use method
CN111754403B (en) * 2020-06-15 2022-08-12 南京邮电大学 Image super-resolution reconstruction method based on residual learning
CN111787321A (en) * 2020-07-06 2020-10-16 济南浪潮高新科技投资发展有限公司 Image compression and decompression method and system for edge end based on deep learning

Also Published As

Publication number Publication date
CN112581366A (en) 2021-03-30

Similar Documents

Publication Publication Date Title
US7765500B2 (en) Automated generation of theoretical performance analysis based upon workload and design configuration
US8351654B2 (en) Image processing using geodesic forests
CN110097609B (en) Sample domain-based refined embroidery texture migration method
CN109544662B (en) Method and system for coloring cartoon style draft based on SRUnet
CN108711182A (en) Render processing method, device and mobile terminal device
CN115409755B (en) Map processing method and device, storage medium and electronic equipment
CN111813686B (en) Game testing method and device, testing terminal and storage medium
JP2005518032A (en) Spatial optimization texture map
US11189060B2 (en) Generating procedural materials from digital images
US10922852B2 (en) Oil painting stroke simulation using neural network
WO2022262660A1 (en) Pruning and quantization compression method and system for super-resolution network, and medium
KR20200132682A (en) Image optimization method, apparatus, device and storage medium
CN113781308A (en) Image super-resolution reconstruction method and device, storage medium and electronic equipment
CN111369430A (en) Mobile terminal portrait intelligent background replacement method based on mobile deep learning engine
US20230033319A1 (en) Method, apparatus and device for processing shadow texture, computer-readable storage medium, and program product
CN112364744A (en) TensorRT-based accelerated deep learning image recognition method, device and medium
CN112581366B (en) Portable image super-resolution system and system construction method
CN109993701A (en) A method of the depth map super-resolution rebuilding based on pyramid structure
Silva et al. Efficient algorithm for convolutional dictionary learning via accelerated proximal gradient consensus
Mlakar et al. Subdivision‐specialized linear algebra kernels for static and dynamic mesh connectivity on the gpu
Liu et al. A fast and accurate super-resolution network using progressive residual learning
Vecchio et al. Matfuse: Controllable material generation with diffusion models
CN111274145A (en) Relationship structure chart generation method and device, computer equipment and storage medium
KR20220130498A (en) Method and apparatus for image outpainting based on deep-neural network
JP7208314B1 (en) LEARNING DEVICE, LEARNING METHOD AND LEARNING PROGRAM

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant