CN105869117A - Method for accelerating GPU directed at deep learning super-resolution technology - Google Patents
Method for accelerating GPU directed at deep learning super-resolution technology Download PDFInfo
- Publication number
- CN105869117A CN105869117A CN201610184129.1A CN201610184129A CN105869117A CN 105869117 A CN105869117 A CN 105869117A CN 201610184129 A CN201610184129 A CN 201610184129A CN 105869117 A CN105869117 A CN 105869117A
- Authority
- CN
- China
- Prior art keywords
- gpu
- convolution
- super
- degree
- resolution technique
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 103
- 238000005516 engineering process Methods 0.000 title claims abstract description 22
- 238000013135 deep learning Methods 0.000 title abstract 4
- 238000013527 convolutional neural network Methods 0.000 claims abstract description 12
- 238000012545 processing Methods 0.000 claims abstract description 8
- 230000008569 process Effects 0.000 claims description 12
- 238000004364 calculation method Methods 0.000 claims description 9
- 238000012360 testing method Methods 0.000 claims description 7
- 238000005457 optimization Methods 0.000 claims description 5
- 238000013528 artificial neural network Methods 0.000 claims description 4
- 230000001133 acceleration Effects 0.000 description 5
- 230000008901 benefit Effects 0.000 description 4
- 238000010586 diagram Methods 0.000 description 4
- HPTJABJPZMULFH-UHFFFAOYSA-N 12-[(Cyclohexylcarbamoyl)amino]dodecanoic acid Chemical group OC(=O)CCCCCCCCCCCNC(=O)NC1CCCCC1 HPTJABJPZMULFH-UHFFFAOYSA-N 0.000 description 3
- 230000006872 improvement Effects 0.000 description 3
- 239000002699 waste material Substances 0.000 description 3
- 230000000694 effects Effects 0.000 description 2
- 230000007246 mechanism Effects 0.000 description 2
- 238000003860 storage Methods 0.000 description 2
- 238000013517 stratification Methods 0.000 description 2
- 238000012546 transfer Methods 0.000 description 2
- 230000006978 adaptation Effects 0.000 description 1
- 239000008280 blood Substances 0.000 description 1
- 210000004369 blood Anatomy 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 239000012141 concentrate Substances 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 238000007667 floating Methods 0.000 description 1
- 238000011773 genetically engineered mouse model Methods 0.000 description 1
- 238000003475 lamination Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 239000012466 permeate Substances 0.000 description 1
- 238000012805 post-processing Methods 0.000 description 1
- 239000000047 product Substances 0.000 description 1
- 238000012549 training Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T3/00—Geometric image transformation in the plane of the image
- G06T3/40—Scaling the whole image or part thereof
- G06T3/4053—Super resolution, i.e. output image resolution higher than sensor resolution
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T1/00—General purpose image data processing
- G06T1/20—Processor architectures; Processor configuration, e.g. pipelining
Abstract
The invention discloses a method for accelerating a GPU directed at a deep learning super-resolution technology. The method conducts concurrent processing on all the steps of a super-resolution technology which is based on deep learning and a convolutional neural network, and operates on a GPU. The concurrent processing refers to conducting concurrent task dividing on convolutions of the super-resolution technology which is based on deep learning and the convolutional neural network into millions of micro-tasks which are irrelevant to one another and can be concurrently executed in any order so as to fully exhibit the super-strong calculating capability of the GPU. Further, the method uses features of a GPU memory to cache convolution nuclear data and input image data to a shared memory and a register so as to substantially optimize calculating speeds of the convolutions. The method integrates the convolutions and non-linear layers. The method selects the optimal method for the sizes of different convolutions. According to the invention, the method accelerates the high quality super-resolution method to meet velocity requirements for processing videos, and does not cause any image quality loss.
Description
Technical field
The present invention relates to a kind of image super-resolution field and the method for GPU acceleration, specifically a kind of for degree of depth study
The GPU accelerated method of super-resolution technique.
Background technology
One secondary low-resolution image is converted to high-definition picture by image super-resolution exactly, and it is in post processing of image and regards
Frequently non-linear editing has a wide range of applications.Super-resolution method (such as bicubic) in early days is often based upon simple inserting
Value, can work with fast and reliable, be also easy to integrated chip;But the high-definition picture quality that these methods obtain is not
Good, significant artificial trace can be produced, such as ring, aliasing, the effect such as fuzzy.So super-resolution method of quality is difficult to
Meet current high quality video requirement.The super-resolution method of current performance advanced person can generate high-quality image, but accompanies
Along with huge computing cost, it is difficult to meet reality application needs.There is the super-resolution method that some GPU accelerate at present,
These methods have reached the sufficiently fast speed of service, but also sacrifice the running quality of method.
Degree of depth learning art has obtained huge advance in recent years, and Computer Vision Recognition accuracy rate is obviously improved, based on the degree of depth
Study, the super-resolution technique of convolutional neural networks are also arisen at the historic moment, and are published in European Computer visual conference in 2014
Super-resolution method based on convolutional neural networks (Chao Dong, Chen Change Loy, Kaiming He, Xiaoou
Tang.Learning a Deep Convolutional Network for Image Super-Resolution,in Proceedings
Of European Conference on Computer Vision (ECCV), 2014, pp.184-199, referred to as SRCNN)
It it is one of performance the best way.By well-designed 3 layers of convolution and 2 layers of RELU (non-linear layer), instruction of magnanimity
Practicing data, ingenious careful training parameter fine setting, SRCNN was once becoming the super-resolution method that performance is optimal.So
And the method depends on the computing cost of flood tide, using CPU to perform each frame of the method needs 300 seconds (1920*1080
To 3840*2160, single channel, the most all tests are all based on this resolution), even if using based on GEMM
GPU convolution, accelerated method, each frame is also required to close to 1 second, it is impossible to meet reality application needs.
Summary of the invention
For making to disclosure satisfy that actual application needs, the present invention based on degree of depth study, the super-resolution technique of convolutional neural networks
A kind of GPU accelerated method for degree of depth study super-resolution technique is provided.
For achieving the above object, the GPU accelerated method for degree of depth study super-resolution technique of the present invention, institute
Method of stating is by based on degree of depth study, the most all parallelizations of super-resolution technique of convolutional neural networks, and at GPU
Run.Parallelization of the present invention is to carry out convolution based on degree of depth study, the super-resolution technique of convolutional neural networks
Parallel task divides, convolution operation is divided into millions of orthogonal, can be with micro-of random order executed in parallel
Business, so that the superpower computation capability of GPU is played.
Further, in described method: carry out task division according to convolution output pixel, the calculating of each output pixel is appointed
Business is assigned to a micro-task, thus convolution task can be performed in parallel on a large scale, and the meter of neighbor
The data calculating the dependence of micro-task are also adjacent, and perfection reaches to merge and accesses, thus makes full use of the video memory bit wide of GPU
And bandwidth.
Further, in described method: utilize the cache sharing memorizer as convolution kernel parameter, thus reduce complete
Intra-office is deposited I/O and accelerates convolution.Specifically, first by concurrent thread block, convolution kernel parameter read sharing of thread block
In memorizer, the most each thread again from this shared memorizer obtain needed for convolution kernel parameter.The method can reduce
GPU reads the global memory's handling capacity needed for convolution kernel parameter, thus is greatly optimized, accelerates the execution speed of convolution.
Further, in described method: utilize the cache sharing memorizer or depositor as input picture, thus
Reduce global memory I/O and accelerate convolution.Specifically, first find out the input picture region that concurrent thread block is relied on,
Then, during this area data is read the shared memorizer of thread block by thread block, last each thread is from this shared memorizer
Input image data needed for acquisition;If also or time every thread required input view data is sufficiently small, the most once by institute
Need data to be read in the depositor in thread, then calculate.The method can reduce GPU and read input picture
Required global memory's handling capacity, thus it is greatly optimized, accelerates the execution speed of convolution.
Further, in described method: use deep neural network GPU speed technology, merge convolution algorithm and non-thread
Property computing, the method can reduce the global memory's handling capacity needed for convolution, non-linear layer, thus accelerates whole process
Perform speed.Specifically, described deep neural network GPU speed technology, refer to: by the processing procedure of non-linear layer
Merging in convolutional calculation, convolutional calculation carries out non-linear layer computing after completing the most in a register, thus eliminates one
The wheel I/O to global memory.
Further, in described method: use degree of depth convolutional network GPU speed technology, for different convolution size choosings
Take optimum optimization method.Described degree of depth convolutional network GPU speed technology, refers to: to different size convolutional layer, test
Each optimizes speed technology, and then selects the fastest speed technology with the method obtaining the fastest overall operation speed.
Compared with prior art, the present invention has a following significant advantage:
The present invention is directed to convolution carry out parallelization and optimize acceleration, accelerate to meet by a high-quality super-resolution method
The rate request of Video processing, and any image quality loss will not be brought.
Further, carry out task according to output pixel and divide realization, thus realize the parallelization of convolution;Will be based on the degree of depth
Study, the most all parallelizations of super-resolution technique of convolutional neural networks such that it is able to utilize GPU superpower
Computation capability;
Further, utilize the characteristic of GPU bin, be cached to convolution kernel data and input image data share and deposit
Reservoir and depositor, thus the calculating speed of convolution is greatly optimized;
Further, merge convolution and non-linear layer, choose optimum optimization for different convolution sizes.
The present invention takes full advantage of the hardware of GPU and stores characteristic, significantly accelerates the calculating speed of convolution, so that
One high-quality super-resolution method can run to meet the speed of real work requirement.
Accompanying drawing explanation
The detailed description with reference to the following drawings, non-limiting example made by reading, other features of the present invention, mesh
And advantage will become apparent from:
Fig. 1 is the flow chart of SRCNN;
Fig. 2 is the schematic diagram of convolution parallelization in the present invention one preferred embodiment;
Fig. 3 is the schematic diagram in the present invention one preferred embodiment by shared memory cache convolution kernel parameter improvement;
Fig. 4 is the schematic diagram improved by shared memory cache input picture block in the present invention one preferred embodiment;
Fig. 5 is by merging convolution and the schematic diagram of NONLINEAR CALCULATION improvement in the present invention one preferred embodiment.
Detailed description of the invention
Below in conjunction with specific embodiment, the present invention is described in detail.Following example will assist in those skilled in the art
Member is further appreciated by the present invention, but limits the present invention the most in any form.It should be pointed out that, the common skill to this area
For art personnel, without departing from the inventive concept of the premise, it is also possible to make some deformation and improvement.These broadly fall into
Protection scope of the present invention.
As it is shown in figure 1, the flow chart of SRCNN.As one embodiment of the present invention, super-resolution GPU of the present invention
Speed technology for SRCNN, its flow chart as it is shown in figure 1, contain bicubic pretreatment (not shown), three
Individual convolutional layer and two RELU layers (respectively after first convolution and second convolution).Three convolutional layer sizes are divided
It is not (according to output channel * convolution width * convolution height * input channel): 64*9*9*1,32*1*1*64,1*5*5*32.
In 1080P to a 4K image super-resolution, required floating number multiply-add operation amount is 66.6G, required storage
I/O is 800GBytes.The biggest amount of calculation obviously cannot calculate the time meeting real work, production by CPU
Requirement.Therefore for this kind of situation, the present invention uses GPU process, by each step of SRCNN flow process
Parallelization, GPU realize, and make full use of GPU hardware characteristic and be optimized and accelerate.
The present invention carries out parallelization for convolution, optimizes and accelerate, this is because bicubic pretreatment computing cost is very emphatically
GPU low, easy realizes, and the parallelization of non-linear layer RELU simultaneously realizes being it is clear that time more than 95%
Expense is all in convolution.
For understanding how adaptation enters GPU to SRCNN method, and how to optimize GPU concurrent program, first introduce GPU
Framework.Due to the restriction of physical factor, in the past few years the operating frequency of processor cannot be substantially improved, and computer industry is led to
Crossing the core amounts lifting computing capability increasing processor, typical product has multi-core central processing unit (CPU) and has
The graphic process unit (GPU) of numerous cores.Wherein GPU has thousands of computing units and the video memory of ultra high bandwidth, example
As Nvidia GTX 980TI has global memory's bandwidth of 2816 CUDA cores and 336GB/s.If by one
Individual mass computing task is divided into tens thousand of or even millions of micro-tasks, then gives GPU when process, and GPU can be by
This task scheduling slightly distributes to these CUDA cores, and numerous CUDA cores can concomitantly, process efficiently
Micro-task, so that GPU performs speed and reaches the hundreds times of CPU.
GPU has the bin mechanism of a stratification, and the global memory (global of GPU is employed herein
Memory), memorizer (shared memory) and depositor (register) are shared.The access bandwidth of this three classes bin,
Time delay, capacity and addressable unit have the biggest difference.Global memory can be had the biggest by all thread accesses
Capacity (number GB), but access bandwidth is minimum, often becomes the bottleneck of whole process.Shared memorizer is a kind of by journey
The Cache that sequence person controls, in GPU, whole computing unit is divided into several thread block, has in each thread block
A number of thread and an independent shared memorizer, this shared memorizer can be by all threads in this thread block
Access, have the highest access bandwidth and relatively low time delay.Register-bit, inside each thread, has the highest bandwidth
With minimum time delay, but capacity is the least, storage will be greatly reduced in a register if the data of Reusability are stored
The expense accessed.
The GPU that the present invention illustrates accelerates, in SRCNN technology, first to transfer to video memory by input image data from internal memory,
Carry out bicubic pretreatment, carry out the most successively ground floor convolution (conv1), relu, second layer convolution (conv2), relu,
Then data are transferred to internal memory by third layer convolution (conv3) from video memory.Carrying out each layer of convolution when, all adopt
With the parallelization carrying out task division according to output pixel such that it is able to utilize the computation capability that GPU is superpower;
In order to further speed up the calculating speed of convolution, the present invention proposes to use shared memory cache convolution kernel data, make
With shared memorizer or register cache input picture blocks of data, merge convolution and nonlinear operation;Further,
The present invention is directed to different size of convolution, test the execution speed of different convolution method, have chosen the combination of prestissimo
So that whole flow process is the fastest.Key technology details of the present invention is as follows.
In a preferred embodiment, in order to make convolution parallelization, convolution task is divided into number according to output pixel by the present invention
Million micro-tasks, referred to as convolution direct computing method (direct GPU implementation of convolution), such as Fig. 2
Shown in.Whole convolution seeks to calculate the value of each output pixel, and the calculating task of the most each output pixel can conduct
One independent micro-task is assigned on a GPU thread perform, this between task be slightly independent, can be concurrent
, without be in communication with each other.This dividing mode additional advantage is that the adjacent thread accesses concurrently performed
Input image data is also adjacent, such as thread (x, y) access I (a, b) while, thread (x+1, y) also access
I (a+1, b), therefore can automatically be merged into by GPU hardware and once access, thus makes full use of GPU by these access request
Video memory bit wide and bandwidth.The remainder (relu) of SRCNN is also parallelized so that whole SRCNN flow process all may be used
To perform on GPU, it is to avoid data transfer between CPU/GPU repeatedly.By the parallelization of convolution, SRCNN
Execution speed accelerate to 1 second/frame from 300 seconds/frame (use CPU).
Utilizing the mechanism of GPU stratification bin, convolution kernel data and input image data are cached to share by the present invention
In memorizer or depositor, thus accelerate convolution 2 to 10 times.
In a preferred embodiment, the present invention is by taking convolution kernel data pre-head to shared memorizer, it is possible to save superfluous
Remaining convolution kernel data global memory I/O expense, is referred to as sharing convolution Nuclear Data method (shared kernel), as shown in Figure 3.
In above-mentioned convolution direct computing method, each thread have read identical convolution kernel data, redundancy repeat read
Take and create substantial amounts of global memory I/O waste.In shared convolution Nuclear Data method, a thread block is first by convolution kernel
Data pre-head takes to shared memorizer, the volume needed for then all threads obtain from this shared memorizer again in thread block
Long-pending Nuclear Data.This shared memorizer is actually the cache of convolution kernel data, saves and reads convolution kernel in a large number
Global memory's I/O expense of data.
In a preferred embodiment, the present invention is by extremely sharing input picture blocks of data pre-read with memorizer or depositor
In, it is possible to save the input image data global memory I/O expense of redundancy, be referred to as sharing input picture block method (shared
Patch) or the input picture block method (registered pixel) of register cache, as shown in Figure 4.Carry out wide or tall and big in
1 convolution time, adjacent output pixel depends on the input picture block overlapped each other.Convolution direct computing method does not accounts for
This overlapping relation, therefore input image data is read in each thread redundantly, also brings global memory I/O
Waste.When convolution width and height are bigger when, this I/O waste can become the most serious.Shared defeated in the present invention
Enter in the input picture block method of Image module algorithm and register cache, first find the input picture block that a thread block is relied on
Region, then this region is read into shared memorizer or depositor, and (only smaller in region, depositor can accommodate
In the case of feasible), the most each thread obtains required input image data again from shared memorizer.At this moment share and deposit
Reservoir or depositor are exactly the cache of input image data, save in a large amount of overall situation reading input image data
Deposit I/O expense.
In a preferred embodiment, the present invention, by merging convolution and non-linear layer, eliminates the I/O expense of non-linear layer,
As shown in Figure 5.Traditional acceleration to convolutional neural networks concentrates in the acceleration to convolution, this is because convolution is meter
The bottleneck of evaluation time, the most also because non-linear layer is difficult to obtain acceleration.But, when convolution accelerates to the fastest degree
Time, the time overhead of non-linear layer can not ignore.In convolutional neural networks, non-linear layer is always followed at volume
After lamination, and each output pixel of non-linear layer depends only on an input pixel of correspondence.Therefore, the present invention
Permeate a process by convolution and non-linear layer, and after performing convolution, output valve is carried out non-by each thread immediately
Linear operation, is then written back global memory.Such way can be removed convolutional layer from and be write back global memory and non-linear layer reading
The expense of global memory, this calculating time being equivalent to almost completely eliminate non-linear layer.
In a preferred embodiment, when the present invention is directed to operation that different size convolutional layer tests each optimization speed technology
Between, and then select the fastest speed technology with the method obtaining the fastest overall operation speed, run time test result
It is shown in Table 1.Method in wherein cuDNN is the convolution algorithm storehouse that Nvidia provides.Therefrom it will be seen that ground floor is rolled up
The long-pending cuDNN of employing, second layer convolution use shares convolution nuclear parameter and the input picture block of register cache, third layer
Convolution uses shares convolution nuclear parameter and during input picture block, and whole flow process can be accomplished the fastest, be finally reached 0.15 second/
Frame, this is 2000 times of CPU speed.
When table 1 uses each to optimize accelerated method, the operation time of each convolutional layer
In upper table: use Nvidia GTX980TI and two-way Intel E5-2697V2@2.7GHz 12 cores
Processers, tests 1920*1080 to 3840*2160 single channel super-resolution.
To sum up, a high-quality super-resolution method is accelerated to meet the rate request of Video processing by the present invention, and
Any image quality loss will not be brought.
Above the specific embodiment of the present invention is described.It is to be appreciated that the invention is not limited in
Stating particular implementation, those skilled in the art can make various deformation or amendment within the scope of the claims,
This has no effect on the flesh and blood of the present invention.
Claims (10)
1. the GPU accelerated method for degree of depth study super-resolution technique, it is characterised in that: described method is by base
In degree of depth study, the most all parallelizations of the super-resolution technique of convolutional neural networks, and run at GPU;Institute
Stating parallelization is that convolution based on degree of depth study, the super-resolution technique of convolutional neural networks is carried out parallel task division,
Convolution operation is divided into millions of orthogonal, can be with micro-task of random order executed in parallel so that GPU
Superpower computation capability is played.
GPU accelerated method for degree of depth study super-resolution technique the most according to claim 1, its feature exists
In: in described method: carry out task division according to convolution output pixel, the calculating task of each output pixel is assigned to
One micro-task, thus convolution task can be performed in parallel on a large scale, and the micro-task of calculating of neighbor depends on
The data relied also are adjacent, and perfection reaches to merge and accesses, thus makes full use of video memory bit wide and the bandwidth of GPU.
GPU accelerated method for degree of depth study super-resolution technique the most according to claim 1, its feature exists
In: in described method: utilize the cache sharing memorizer as convolution kernel parameter, thus reduce global memory I/O
And accelerate convolution.
GPU accelerated method for degree of depth study super-resolution technique the most according to claim 3, its feature exists
In: the memorizer cache as convolution kernel parameter is shared in described utilization, refers to: first by concurrent thread block by convolution
Nuclear parameter reads in the shared memorizer of thread block, the most each thread again from this shared memorizer obtain needed for convolution
Nuclear parameter.
GPU accelerated method for degree of depth study super-resolution technique the most according to claim 1, its feature exists
In: in described method: utilize the cache sharing memorizer or depositor as input picture, thus reduce in the overall situation
Deposit I/O and accelerate convolution.
GPU accelerated method for degree of depth study super-resolution technique the most according to claim 5, its feature exists
In: memorizer or the depositor cache as input picture is shared in described utilization, refers to: first find out concurrent thread
The input picture region that block is relied on, during then this area data is read the shared memorizer of thread block by thread block,
Rear each thread input image data needed for this shared memorizer obtains;If also or every thread required input picture number
According to time sufficiently small, the most once desired data be read in the depositor in thread, then calculate.
GPU accelerated method for degree of depth study super-resolution technique the most according to claim 1, its feature exists
In: in described method: use deep neural network GPU speed technology, merge convolution algorithm and nonlinear operation, subtract
Global memory's handling capacity needed for few convolution, non-linear layer, thus accelerate the execution speed of whole process.
GPU accelerated method for degree of depth study super-resolution technique the most according to claim 7, its feature exists
In: described deep neural network GPU speed technology, refer to: the processing procedure of non-linear layer is merged in convolutional calculation
In, convolutional calculation carries out non-linear layer computing after completing the most in a register, thus eliminates one and take turns global memory
I/O。
9. according to the GPU accelerated method for degree of depth study super-resolution technique described in any one of claim 1-8,
It is characterized in that: in described method: use degree of depth convolutional network GPU speed technology, choose for different convolution sizes
Optimum optimization method.
GPU accelerated method for degree of depth study super-resolution technique the most according to claim 9, its feature
It is: described degree of depth convolutional network GPU speed technology, refers to: to different size convolutional layer, test each optimization and add
Speed technology, and then select the fastest speed technology with the method obtaining the fastest overall operation speed.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610184129.1A CN105869117B (en) | 2016-03-28 | 2016-03-28 | GPU acceleration method for deep learning super-resolution technology |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610184129.1A CN105869117B (en) | 2016-03-28 | 2016-03-28 | GPU acceleration method for deep learning super-resolution technology |
Publications (2)
Publication Number | Publication Date |
---|---|
CN105869117A true CN105869117A (en) | 2016-08-17 |
CN105869117B CN105869117B (en) | 2021-04-02 |
Family
ID=56626131
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610184129.1A Expired - Fee Related CN105869117B (en) | 2016-03-28 | 2016-03-28 | GPU acceleration method for deep learning super-resolution technology |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN105869117B (en) |
Cited By (35)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106447609A (en) * | 2016-08-30 | 2017-02-22 | 上海交通大学 | Image super-resolution method based on depth convolutional neural network |
CN106779057A (en) * | 2016-11-11 | 2017-05-31 | 北京旷视科技有限公司 | The method and device of the calculating binary neural network convolution based on GPU |
CN107085827A (en) * | 2017-04-27 | 2017-08-22 | 中国电子科技集团公司第二十八研究所 | The super-resolution image recovery method realized based on hardware platform |
CN107341127A (en) * | 2017-07-05 | 2017-11-10 | 西安电子科技大学 | Convolutional neural networks accelerated method based on OpenCL standards |
CN107515736A (en) * | 2017-07-01 | 2017-12-26 | 广州深域信息科技有限公司 | A kind of method for accelerating depth convolutional network calculating speed on embedded device |
WO2018068623A1 (en) * | 2016-10-14 | 2018-04-19 | 腾讯科技(深圳)有限公司 | Machine learning method and system |
CN108012156A (en) * | 2017-11-17 | 2018-05-08 | 深圳市华尊科技股份有限公司 | A kind of method for processing video frequency and control platform |
CN108052891A (en) * | 2017-12-08 | 2018-05-18 | 触景无限科技(北京)有限公司 | Facial contour parallel calculating method and device |
CN108062532A (en) * | 2017-12-28 | 2018-05-22 | 北京智慧眼科技股份有限公司 | Deep learning recognition of face network optimized approach, device and storage medium |
CN108073548A (en) * | 2016-11-14 | 2018-05-25 | 耐能股份有限公司 | Convolution algorithm device and convolution algorithm method |
CN108268931A (en) * | 2016-12-30 | 2018-07-10 | 华为技术有限公司 | The methods, devices and systems of data processing |
CN108268944A (en) * | 2016-12-31 | 2018-07-10 | 上海兆芯集成电路有限公司 | Neural network unit with the memory that can be remolded |
TWI634490B (en) * | 2016-11-14 | 2018-09-01 | 美商耐能股份有限公司 | Convolution operation device and convolution operation method |
CN108564524A (en) * | 2018-04-24 | 2018-09-21 | 开放智能机器(上海)有限公司 | A kind of convolutional calculation optimization method of visual pattern |
WO2018196863A1 (en) * | 2017-04-28 | 2018-11-01 | 北京市商汤科技开发有限公司 | Convolution acceleration and calculation processing methods and apparatuses, electronic device and storage medium |
CN109165723A (en) * | 2018-08-03 | 2019-01-08 | 北京字节跳动网络技术有限公司 | Method and apparatus for handling data |
CN109409513A (en) * | 2018-10-10 | 2019-03-01 | 广州市百果园信息技术有限公司 | A kind of task processing method neural network based and relevant device |
CN109740731A (en) * | 2018-12-15 | 2019-05-10 | 华南理工大学 | A kind of adaptive convolutional layer hardware accelerator design method |
CN109754073A (en) * | 2018-12-29 | 2019-05-14 | 北京中科寒武纪科技有限公司 | Data processing method, device, electronic equipment and readable storage medium storing program for executing |
CN109886407A (en) * | 2019-02-27 | 2019-06-14 | 上海商汤智能科技有限公司 | Data processing method, device, electronic equipment and computer readable storage medium |
CN110009644A (en) * | 2019-03-26 | 2019-07-12 | 深兰科技(上海)有限公司 | A kind of method and apparatus of characteristic pattern row pixel segmentation |
CN110084361A (en) * | 2017-10-30 | 2019-08-02 | 上海寒武纪信息科技有限公司 | A kind of arithmetic unit and method |
CN110188863A (en) * | 2019-04-30 | 2019-08-30 | 杭州电子科技大学 | A kind of convolution kernel and its compression algorithm of convolutional neural networks |
CN110321998A (en) * | 2018-03-31 | 2019-10-11 | 北京深鉴智能科技有限公司 | Convolutional neural networks implementation method, device, acceleration equipment, storage medium |
CN110399883A (en) * | 2019-06-28 | 2019-11-01 | 苏州浪潮智能科技有限公司 | Image characteristic extracting method, device, equipment and computer readable storage medium |
US10497258B1 (en) | 2018-09-10 | 2019-12-03 | Sony Corporation | Vehicle tracking and license plate recognition based on group of pictures (GOP) structure |
CN110633785A (en) * | 2018-06-21 | 2019-12-31 | 清华大学 | Method and system for calculating convolutional neural network |
WO2020177250A1 (en) * | 2019-03-06 | 2020-09-10 | 上海熠知电子科技有限公司 | Data reading system and method |
CN111914985A (en) * | 2019-05-10 | 2020-11-10 | 杭州海康威视数字技术股份有限公司 | Configuration method and device of deep learning network model and storage medium |
WO2021047118A1 (en) * | 2019-09-12 | 2021-03-18 | 浪潮电子信息产业股份有限公司 | Image processing method, device and system |
US20210118095A1 (en) * | 2019-10-17 | 2021-04-22 | Samsung Electronics Co., Ltd. | Image processing apparatus and method |
CN113286174A (en) * | 2021-05-21 | 2021-08-20 | 浙江商汤科技开发有限公司 | Video frame extraction method and device, electronic equipment and computer readable storage medium |
CN113806044A (en) * | 2021-08-31 | 2021-12-17 | 天津大学 | Heterogeneous platform task bottleneck elimination method for computer vision application |
CN114445687A (en) * | 2021-12-31 | 2022-05-06 | 苏州浪潮智能科技有限公司 | Image identification reasoning method, system, storage medium and equipment |
WO2022121474A1 (en) * | 2020-12-11 | 2022-06-16 | 苏州浪潮智能科技有限公司 | Method and system for optimizing convolutional residual structure of neural network, device, and medium |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20140085318A1 (en) * | 2012-09-26 | 2014-03-27 | Siemens Corporation | Multi-GPU FISTA Implementation for MR Reconstruction with Non-Uniform K-Space Sampling |
CN104778659A (en) * | 2015-04-15 | 2015-07-15 | 杭州电子科技大学 | Single-frame image super-resolution reconstruction method on basis of deep learning |
CN105279741A (en) * | 2015-11-17 | 2016-01-27 | 集美大学 | Image super-resolution reconstruction method and system based on graph-cut algorithm |
-
2016
- 2016-03-28 CN CN201610184129.1A patent/CN105869117B/en not_active Expired - Fee Related
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20140085318A1 (en) * | 2012-09-26 | 2014-03-27 | Siemens Corporation | Multi-GPU FISTA Implementation for MR Reconstruction with Non-Uniform K-Space Sampling |
CN104778659A (en) * | 2015-04-15 | 2015-07-15 | 杭州电子科技大学 | Single-frame image super-resolution reconstruction method on basis of deep learning |
CN105279741A (en) * | 2015-11-17 | 2016-01-27 | 集美大学 | Image super-resolution reconstruction method and system based on graph-cut algorithm |
Non-Patent Citations (9)
Title |
---|
LINGQI ZHANG等: ""High accuracy digital image correlation powered by GPU-based parallel computing"", 《OPTICS AND LASERS IN ENGINEERING》 * |
LIUBOV A. FLORES等: ""Parallel CT image reconstruction based on GPUs"", 《RADIATION PHYSICS AND CHEMISTRY》 * |
刘进锋 等: ""一种简洁高效的加速卷积神经网络的方法"", 《科学技术与工程》 * |
李佳骏 等: ""基于GPU 的HOTPANTS 并行优化"", 《天文研究与技术》 * |
李大霞: ""cuda-convnet深层卷积神经网络算法的一种速度优化"", 《中国优秀硕士学位论文全文数据库 信息科技辑(月刊 )》 * |
胡传平等: ""基于深度学习的图像超分辨率算法研究"", 《铁道警察学院学报》 * |
金鹭: ""基于GPU的表面形貌测量系统的研究"", 《中国优秀硕士学位论文全文数据库信息科技辑(月刊)》 * |
陈湘骥等: ""基于GPU 加速的实时视频超分辨率重建"", 《计算机应用》 * |
马永军 等: ""面向CPU+GPU 异构平台的模板匹配目标识别并行算法"", 《天津科技大学学报》 * |
Cited By (56)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106447609A (en) * | 2016-08-30 | 2017-02-22 | 上海交通大学 | Image super-resolution method based on depth convolutional neural network |
WO2018068623A1 (en) * | 2016-10-14 | 2018-04-19 | 腾讯科技(深圳)有限公司 | Machine learning method and system |
CN106779057A (en) * | 2016-11-11 | 2017-05-31 | 北京旷视科技有限公司 | The method and device of the calculating binary neural network convolution based on GPU |
CN106779057B (en) * | 2016-11-11 | 2020-04-17 | 北京旷视科技有限公司 | Method and device for calculating binary neural network convolution based on GPU |
CN108073548B (en) * | 2016-11-14 | 2021-09-10 | 耐能股份有限公司 | Convolution operation device and convolution operation method |
CN108073548A (en) * | 2016-11-14 | 2018-05-25 | 耐能股份有限公司 | Convolution algorithm device and convolution algorithm method |
US10936937B2 (en) | 2016-11-14 | 2021-03-02 | Kneron, Inc. | Convolution operation device and convolution operation method |
TWI634490B (en) * | 2016-11-14 | 2018-09-01 | 美商耐能股份有限公司 | Convolution operation device and convolution operation method |
CN108268931A (en) * | 2016-12-30 | 2018-07-10 | 华为技术有限公司 | The methods, devices and systems of data processing |
CN108268944A (en) * | 2016-12-31 | 2018-07-10 | 上海兆芯集成电路有限公司 | Neural network unit with the memory that can be remolded |
CN108268944B (en) * | 2016-12-31 | 2020-09-11 | 上海兆芯集成电路有限公司 | Neural network unit with remodelable memory |
CN107085827A (en) * | 2017-04-27 | 2017-08-22 | 中国电子科技集团公司第二十八研究所 | The super-resolution image recovery method realized based on hardware platform |
CN107085827B (en) * | 2017-04-27 | 2020-06-16 | 中国电子科技集团公司第二十八研究所 | Super-resolution image restoration method based on hardware platform |
WO2018196863A1 (en) * | 2017-04-28 | 2018-11-01 | 北京市商汤科技开发有限公司 | Convolution acceleration and calculation processing methods and apparatuses, electronic device and storage medium |
US11429852B2 (en) | 2017-04-28 | 2022-08-30 | Beijing Sensetime Technology Development Co., Ltd. | Convolution acceleration and computing processing method and apparatus, electronic device, and storage medium |
CN107515736B (en) * | 2017-07-01 | 2021-01-15 | 广州深域信息科技有限公司 | Method for accelerating computation speed of deep convolutional network on embedded equipment |
CN107515736A (en) * | 2017-07-01 | 2017-12-26 | 广州深域信息科技有限公司 | A kind of method for accelerating depth convolutional network calculating speed on embedded device |
CN107341127A (en) * | 2017-07-05 | 2017-11-10 | 西安电子科技大学 | Convolutional neural networks accelerated method based on OpenCL standards |
CN107341127B (en) * | 2017-07-05 | 2020-04-14 | 西安电子科技大学 | Convolutional neural network acceleration method based on OpenCL standard |
CN110084361A (en) * | 2017-10-30 | 2019-08-02 | 上海寒武纪信息科技有限公司 | A kind of arithmetic unit and method |
CN110084361B (en) * | 2017-10-30 | 2021-03-23 | 上海寒武纪信息科技有限公司 | Arithmetic device and method |
US11922132B2 (en) | 2017-10-30 | 2024-03-05 | Shanghai Cambricon Information Technology Co., Ltd. | Information processing method and terminal device |
CN108012156A (en) * | 2017-11-17 | 2018-05-08 | 深圳市华尊科技股份有限公司 | A kind of method for processing video frequency and control platform |
CN108052891A (en) * | 2017-12-08 | 2018-05-18 | 触景无限科技(北京)有限公司 | Facial contour parallel calculating method and device |
CN108062532A (en) * | 2017-12-28 | 2018-05-22 | 北京智慧眼科技股份有限公司 | Deep learning recognition of face network optimized approach, device and storage medium |
CN110321998A (en) * | 2018-03-31 | 2019-10-11 | 北京深鉴智能科技有限公司 | Convolutional neural networks implementation method, device, acceleration equipment, storage medium |
CN110321998B (en) * | 2018-03-31 | 2022-06-14 | 赛灵思公司 | Convolutional neural network implementation method and device, acceleration equipment and storage medium |
CN108564524A (en) * | 2018-04-24 | 2018-09-21 | 开放智能机器(上海)有限公司 | A kind of convolutional calculation optimization method of visual pattern |
CN110633785B (en) * | 2018-06-21 | 2021-01-05 | 清华大学 | Method and system for calculating convolutional neural network |
CN110633785A (en) * | 2018-06-21 | 2019-12-31 | 清华大学 | Method and system for calculating convolutional neural network |
CN109165723B (en) * | 2018-08-03 | 2021-03-19 | 北京字节跳动网络技术有限公司 | Method and apparatus for processing data |
CN109165723A (en) * | 2018-08-03 | 2019-01-08 | 北京字节跳动网络技术有限公司 | Method and apparatus for handling data |
US10497258B1 (en) | 2018-09-10 | 2019-12-03 | Sony Corporation | Vehicle tracking and license plate recognition based on group of pictures (GOP) structure |
CN109409513A (en) * | 2018-10-10 | 2019-03-01 | 广州市百果园信息技术有限公司 | A kind of task processing method neural network based and relevant device |
CN109740731A (en) * | 2018-12-15 | 2019-05-10 | 华南理工大学 | A kind of adaptive convolutional layer hardware accelerator design method |
WO2020119318A1 (en) * | 2018-12-15 | 2020-06-18 | 华南理工大学 | Self-adaptive selection and design method for convolutional-layer hardware accelerator |
CN109754073A (en) * | 2018-12-29 | 2019-05-14 | 北京中科寒武纪科技有限公司 | Data processing method, device, electronic equipment and readable storage medium storing program for executing |
CN109886407A (en) * | 2019-02-27 | 2019-06-14 | 上海商汤智能科技有限公司 | Data processing method, device, electronic equipment and computer readable storage medium |
CN109886407B (en) * | 2019-02-27 | 2021-10-22 | 上海商汤智能科技有限公司 | Data processing method and device, electronic equipment and computer readable storage medium |
WO2020177250A1 (en) * | 2019-03-06 | 2020-09-10 | 上海熠知电子科技有限公司 | Data reading system and method |
CN110009644A (en) * | 2019-03-26 | 2019-07-12 | 深兰科技(上海)有限公司 | A kind of method and apparatus of characteristic pattern row pixel segmentation |
CN110188863A (en) * | 2019-04-30 | 2019-08-30 | 杭州电子科技大学 | A kind of convolution kernel and its compression algorithm of convolutional neural networks |
CN110188863B (en) * | 2019-04-30 | 2021-04-09 | 杭州电子科技大学 | Convolution kernel compression method of convolution neural network suitable for resource-limited equipment |
CN111914985B (en) * | 2019-05-10 | 2023-07-04 | 杭州海康威视数字技术股份有限公司 | Configuration method, device and storage medium of deep learning network model |
CN111914985A (en) * | 2019-05-10 | 2020-11-10 | 杭州海康威视数字技术股份有限公司 | Configuration method and device of deep learning network model and storage medium |
CN110399883A (en) * | 2019-06-28 | 2019-11-01 | 苏州浪潮智能科技有限公司 | Image characteristic extracting method, device, equipment and computer readable storage medium |
US11614964B2 (en) | 2019-09-12 | 2023-03-28 | Inspur Electronic Information Industry Co., Ltd. | Deep-learning-based image processing method and system |
WO2021047118A1 (en) * | 2019-09-12 | 2021-03-18 | 浪潮电子信息产业股份有限公司 | Image processing method, device and system |
US20210118095A1 (en) * | 2019-10-17 | 2021-04-22 | Samsung Electronics Co., Ltd. | Image processing apparatus and method |
US11854159B2 (en) * | 2019-10-17 | 2023-12-26 | Samsung Electronics Co., Ltd. | Image processing apparatus and method |
WO2022121474A1 (en) * | 2020-12-11 | 2022-06-16 | 苏州浪潮智能科技有限公司 | Method and system for optimizing convolutional residual structure of neural network, device, and medium |
CN113286174A (en) * | 2021-05-21 | 2021-08-20 | 浙江商汤科技开发有限公司 | Video frame extraction method and device, electronic equipment and computer readable storage medium |
CN113806044B (en) * | 2021-08-31 | 2023-11-07 | 天津大学 | Heterogeneous platform task bottleneck eliminating method for computer vision application |
CN113806044A (en) * | 2021-08-31 | 2021-12-17 | 天津大学 | Heterogeneous platform task bottleneck elimination method for computer vision application |
CN114445687A (en) * | 2021-12-31 | 2022-05-06 | 苏州浪潮智能科技有限公司 | Image identification reasoning method, system, storage medium and equipment |
CN114445687B (en) * | 2021-12-31 | 2024-01-19 | 苏州浪潮智能科技有限公司 | Image recognition reasoning method, system, storage medium and device |
Also Published As
Publication number | Publication date |
---|---|
CN105869117B (en) | 2021-04-02 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN105869117A (en) | Method for accelerating GPU directed at deep learning super-resolution technology | |
DE102018117813A1 (en) | Timely data reconstruction with an external recurrent neural network | |
CN109389556A (en) | The multiple dimensioned empty convolutional neural networks ultra-resolution ratio reconstructing method of one kind and device | |
CN106991011A (en) | It is a kind of for big data task handle it is parallel and cooperate with the method optimized based on CPU multithreadings and many granularities of GPU | |
DE112020004237T5 (en) | VIDEO UPSAMPLING USING ONE OR MORE NEURAL NETWORKS | |
DE102020104637A1 (en) | TECHNIQUES FOR EFFICIENT PARTITIONING OF MEMORY | |
Wu et al. | How many labeled license plates are needed? | |
Zhou et al. | RSANet: towards real-time object detection with residual semantic-guided attention feature pyramid network | |
CN105931256A (en) | CUDA (compute unified device architecture)-based large-format remote sensing image fast segmentation method | |
CN113392968A (en) | Micro-training for iterative small sample refinement of neural networks | |
DE112020000865T5 (en) | STORAGE MANAGEMENT SYSTEM | |
JP2019212243A (en) | Learning identification device and learning identification method | |
CN109191392A (en) | A kind of image super-resolution reconstructing method of semantic segmentation driving | |
DE102021205690A1 (en) | Training neural networks with limited data using invertible augmentation operators | |
JP2020077066A (en) | Learning device and method for learning | |
DE102022121509A1 (en) | SINGLE FRAME INVERSE RENDERING | |
CN106802787A (en) | MapReduce optimization methods based on GPU sequences | |
CN106484532B (en) | GPGPU parallel calculating method towards SPH fluid simulation | |
DE112021000303T5 (en) | BARRIER-FREE AND BARRIESS-FREE SYNCHRONIZATION OF SHARED STORAGE | |
DE102021116231A1 (en) | INTERFERENCE-FREE MULTIPLEXER | |
CN103942095B (en) | Two-dimensional phase position unwrapping method based on heterogeneous accelerating platform | |
CN108648213A (en) | A kind of implementation method of KCF track algorithms on TMS320C6657 | |
CN105869105A (en) | Method for accelerating GPU directed at A+ super-resolution technology | |
JP2021015523A (en) | Learning device and learning method | |
Tan et al. | PPEDNet: Pyramid pooling encoder-decoder network for real-time semantic segmentation |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20210402 |